Re: gpucc breaks cuda 7.0.28/7_CUDALibraries/simpleCUFFT

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: gpucc breaks cuda 7.0.28/7_CUDALibraries/simpleCUFFT

Vassil Vassilev via cfe-dev
Would you mind upload your simpleCUFFT.cu code? It looks related to device code generation because building was successful. 

On Tue, Apr 5, 2016 at 1:56 PM, Peter Steinbach <[hidden email]> wrote:
Hi guys,

first of all, please accept my apologies for contacting you by mail. I was a bit lost, which mailing list to choose from as pointed to by
http://llvm.org/docs/CompileCudaWithLLVM.html
and the subsequent
http://llvm.org/docs/#mailing-lists
Feel free to deflect this request to the relevant mailing list or bug tracker.

In any case, I am very interested in using GPUCC in favor of NVCC for a multitude of reasons (C++1X, compilation speed, ...). I started to "port" my favorite samples from the nvidia SDK.
With clang 3.8, samples-7.0.28/7_CUDALibraries/simpleCUFFT as compiled with clang produces an error at runtime! Here is what I see with a K20c:

$ clang++ --cuda-path=/sw/apps/cuda/7.0.28   -I../../common/inc  -m64    --cuda-gpu-arch=sm_35 --cuda-gpu-arch=sm_35 -o simpleCUFFT.o -c simpleCUFFT.cu
$ clang++ --cuda-path=/sw/apps/cuda/7.0.28    -L/sw/apps/cuda/7.0.28/lib64 -lcudart -ldl -lrt -pthread  -m64      -o simpleCUFFT.llvm simpleCUFFT.o  -lcufft
$ ./simpleCUFFT.llvm
[simpleCUFFT] is starting...
GPU Device 0: "Tesla K20c" with compute capability 3.5

Transforming signal cufftExecC2C
Launching ComplexPointwiseMulAndScale<<< >>>
simpleCUFFT.cu(132) : getLastCudaError() CUDA error : Kernel execution failed [ ComplexPointwiseMulAndScale ] : (8) invalid device function.

The same source code works just fine with nvcc 7.0.
Any help would be appreciated.

Best,
Peter

PS. From random comments, I had the feeling that you are looking at the SHOC benchmarks with gpucc. If so, please comment on:
https://github.com/vetter/shoc/issues/48
I don't wanna do work that is either pointless (support for textures) or was already done. ;)
--
Peter Steinbach, Dr. rer. nat.
HPC Developer, Scientific Computing Facility

Max Planck Institute of Molecular Cell Biology and Genetics
Pfotenhauerstr. 108
01307 Dresden
Germany


phone <a href="tel:%2B49%20351%20210%202882" value="+493512102882">+49 351 210 2882
fax   <a href="tel:%2B49%20351%20210%201689" value="+493512101689">+49 351 210 1689
www.mpi-cbg.de


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: gpucc breaks cuda 7.0.28/7_CUDALibraries/simpleCUFFT

Vassil Vassilev via cfe-dev
Hi Jingyue,

as I said, it's 7_CUDALibraries/simpleCUFFT from the nvidia SDK. I'll
attach a tarball with the code and the llvm makefile, dubbed Makefile.llvm.

Just unpack the tarball and call the added Makefile
$ tar xf clang-simpleCUFFT.tgz
$ make -C clang-simpleCUFFT/7_CUDALibraries/simpleCUFFT/ -f Makefile.llvm
$ ./clang-simpleCUFFT/7_CUDALibraries/simpleCUFFT/simpleCUFFT.llvm
...
simpleCUFFT.cu(132) : getLastCudaError() CUDA error : Kernel execution
failed [ ComplexPointwiseMulAndScale ] : (8) invalid device function.

The problem is reproducible for Fermi and Kepler. ( I am using cuda 7.0
runtime libraries )

Thanks!
Peter

On 04/05/2016 11:12 PM, Jingyue Wu wrote:
> Would you mind upload your simpleCUFFT.cu code? It looks related to device
> code generation because building was successful.
>

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

clang-simpleCUFFT.tgz (972K) Download Attachment
smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: gpucc breaks cuda 7.0.28/7_CUDALibraries/simpleCUFFT

Vassil Vassilev via cfe-dev
In reply to this post by Vassil Vassilev via cfe-dev
Peter,

I can't reproduce the problem with recent clang. I've tried compiling the same smaple code with cuda-7.0 and 7.5.
clang version 3.9.0 (trunk 268962) (llvm/trunk 268980)

Could you tell me what was clang version you used? If you still see the problem, please file a clang  bug on llvm.org/bugs.

--Artem


% clang++ -I../../common/inc --cuda-gpu-arch=sm_35 simpleCUFFT.cu -L/usr/local/cuda-7.5/lib64 -lcufft -lcudart -o simpleCUFFT-clang
% LD_LIBRARY_PATH=/usr/local/cuda-7.5/lib64 ./simpleCUFFT-clang
[simpleCUFFT] is starting...
GPU Device 0: "Tesla K40c" with compute capability 3.5

Transforming signal cufftExecC2C
Launching ComplexPointwiseMulAndScale<<< >>>
Transforming signal back cufftExecC2C


On Tue, Apr 5, 2016 at 2:12 PM, Jingyue Wu <[hidden email]> wrote:
Would you mind upload your simpleCUFFT.cu code? It looks related to device code generation because building was successful. 

On Tue, Apr 5, 2016 at 1:56 PM, Peter Steinbach <[hidden email]> wrote:
Hi guys,

first of all, please accept my apologies for contacting you by mail. I was a bit lost, which mailing list to choose from as pointed to by
http://llvm.org/docs/CompileCudaWithLLVM.html
and the subsequent
http://llvm.org/docs/#mailing-lists
Feel free to deflect this request to the relevant mailing list or bug tracker.

In any case, I am very interested in using GPUCC in favor of NVCC for a multitude of reasons (C++1X, compilation speed, ...). I started to "port" my favorite samples from the nvidia SDK.
With clang 3.8, samples-7.0.28/7_CUDALibraries/simpleCUFFT as compiled with clang produces an error at runtime! Here is what I see with a K20c:

$ clang++ --cuda-path=/sw/apps/cuda/7.0.28   -I../../common/inc  -m64    --cuda-gpu-arch=sm_35 --cuda-gpu-arch=sm_35 -o simpleCUFFT.o -c simpleCUFFT.cu
$ clang++ --cuda-path=/sw/apps/cuda/7.0.28    -L/sw/apps/cuda/7.0.28/lib64 -lcudart -ldl -lrt -pthread  -m64      -o simpleCUFFT.llvm simpleCUFFT.o  -lcufft
$ ./simpleCUFFT.llvm
[simpleCUFFT] is starting...
GPU Device 0: "Tesla K20c" with compute capability 3.5

Transforming signal cufftExecC2C
Launching ComplexPointwiseMulAndScale<<< >>>
simpleCUFFT.cu(132) : getLastCudaError() CUDA error : Kernel execution failed [ ComplexPointwiseMulAndScale ] : (8) invalid device function.

The same source code works just fine with nvcc 7.0.
Any help would be appreciated.

Best,
Peter

PS. From random comments, I had the feeling that you are looking at the SHOC benchmarks with gpucc. If so, please comment on:
https://github.com/vetter/shoc/issues/48
I don't wanna do work that is either pointless (support for textures) or was already done. ;)
--
Peter Steinbach, Dr. rer. nat.
HPC Developer, Scientific Computing Facility

Max Planck Institute of Molecular Cell Biology and Genetics
Pfotenhauerstr. 108
01307 Dresden
Germany


phone <a href="tel:%2B49%20351%20210%202882" value="+493512102882" target="_blank">+49 351 210 2882
fax   <a href="tel:%2B49%20351%20210%201689" value="+493512101689" target="_blank">+49 351 210 1689
www.mpi-cbg.de




_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: gpucc breaks cuda 7.0.28/7_CUDALibraries/simpleCUFFT

Vassil Vassilev via cfe-dev
Hi Artem,

the described "bug" is gone with llvm/clang trunk. Are there any news on
texture memory support with cuda clang yet?

Thanks a bunch -
P

On 17.05.2016 00:28, Artem Belevich wrote:

> Peter,
>
> I can't reproduce the problem with recent clang. I've tried compiling
> the same smaple code with cuda-7.0 and 7.5.
> clang version 3.9.0 (trunk 268962) (llvm/trunk 268980)
>
> Could you tell me what was clang version you used? If you still see the
> problem, please file a clang  bug on llvm.org/bugs <http://llvm.org/bugs>.
>
> --Artem
>
>
> % clang++ -I../../common/inc --cuda-gpu-arch=sm_35 simpleCUFFT.cu
> -L/usr/local/cuda-7.5/lib64 -lcufft -lcudart -o simpleCUFFT-clang
> % LD_LIBRARY_PATH=/usr/local/cuda-7.5/lib64 ./simpleCUFFT-clang
> [simpleCUFFT] is starting...
> GPU Device 0: "Tesla K40c" with compute capability 3.5
>
> Transforming signal cufftExecC2C
> Launching ComplexPointwiseMulAndScale<<< >>>
> Transforming signal back cufftExecC2C
>
>
> On Tue, Apr 5, 2016 at 2:12 PM, Jingyue Wu <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Would you mind upload your simpleCUFFT.cu code? It looks related to
>     device code generation because building was successful.
>
>     On Tue, Apr 5, 2016 at 1:56 PM, Peter Steinbach <[hidden email]
>     <mailto:[hidden email]>> wrote:
>
>         Hi guys,
>
>         first of all, please accept my apologies for contacting you by
>         mail. I was a bit lost, which mailing list to choose from as
>         pointed to by
>         http://llvm.org/docs/CompileCudaWithLLVM.html
>         and the subsequent
>         http://llvm.org/docs/#mailing-lists
>         Feel free to deflect this request to the relevant mailing list
>         or bug tracker.
>
>         In any case, I am very interested in using GPUCC in favor of
>         NVCC for a multitude of reasons (C++1X, compilation speed, ...).
>         I started to "port" my favorite samples from the nvidia SDK.
>         With clang 3.8, samples-7.0.28/7_CUDALibraries/simpleCUFFT as
>         compiled with clang produces an error at runtime! Here is what I
>         see with a K20c:
>
>         $ clang++ --cuda-path=/sw/apps/cuda/7.0.28   -I../../common/inc
>         -m64    --cuda-gpu-arch=sm_35 --cuda-gpu-arch=sm_35 -o
>         simpleCUFFT.o -c simpleCUFFT.cu
>         $ clang++ --cuda-path=/sw/apps/cuda/7.0.28
>         -L/sw/apps/cuda/7.0.28/lib64 -lcudart -ldl -lrt -pthread  -m64
>              -o simpleCUFFT.llvm simpleCUFFT.o  -lcufft
>         $ ./simpleCUFFT.llvm
>         [simpleCUFFT] is starting...
>         GPU Device 0: "Tesla K20c" with compute capability 3.5
>
>         Transforming signal cufftExecC2C
>         Launching ComplexPointwiseMulAndScale<<< >>>
>         simpleCUFFT.cu(132) : getLastCudaError() CUDA error : Kernel
>         execution failed [ ComplexPointwiseMulAndScale ] : (8) invalid
>         device function.
>
>         The same source code works just fine with nvcc 7.0.
>         Any help would be appreciated.
>
>         Best,
>         Peter
>
>         PS. From random comments, I had the feeling that you are looking
>         at the SHOC benchmarks with gpucc. If so, please comment on:
>         https://github.com/vetter/shoc/issues/48
>         I don't wanna do work that is either pointless (support for
>         textures) or was already done. ;)
>         --
>         Peter Steinbach, Dr. rer. nat.
>         HPC Developer, Scientific Computing Facility
>
>         Max Planck Institute of Molecular Cell Biology and Genetics
>         Pfotenhauerstr. 108
>         01307 Dresden
>         Germany
>
>
>         phone +49 351 210 2882 <tel:%2B49%20351%20210%202882>
>         fax +49 351 210 1689 <tel:%2B49%20351%20210%201689>
>         www.mpi-cbg.de <http://www.mpi-cbg.de>
>
>
>
>

--
Peter Steinbach, Dr. rer. nat.
HPC Developer, Scientific Computing Facility

Max Planck Institute of Molecular Cell Biology and Genetics
Pfotenhauerstr. 108
01307 Dresden
Germany


phone +49 351 210 2882
fax   +49 351 210 1689
www.mpi-cbg.de
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: gpucc breaks cuda 7.0.28/7_CUDALibraries/simpleCUFFT

Vassil Vassilev via cfe-dev
Peter,

I'm glad to hear that it works now.

As for texture lookups, I don't have any good news -- they are still unsupported.

--Artem


On Tue, May 17, 2016 at 1:19 PM, Peter Steinbach <[hidden email]> wrote:
Hi Artem,

the described "bug" is gone with llvm/clang trunk. Are there any news on texture memory support with cuda clang yet?

Thanks a bunch -
P

On 17.05.2016 00:28, Artem Belevich wrote:
Peter,

I can't reproduce the problem with recent clang. I've tried compiling
the same smaple code with cuda-7.0 and 7.5.
clang version 3.9.0 (trunk 268962) (llvm/trunk 268980)

Could you tell me what was clang version you used? If you still see the
problem, please file a clang  bug on llvm.org/bugs <http://llvm.org/bugs>.

--Artem


% clang++ -I../../common/inc --cuda-gpu-arch=sm_35 simpleCUFFT.cu
-L/usr/local/cuda-7.5/lib64 -lcufft -lcudart -o simpleCUFFT-clang
% LD_LIBRARY_PATH=/usr/local/cuda-7.5/lib64 ./simpleCUFFT-clang
[simpleCUFFT] is starting...
GPU Device 0: "Tesla K40c" with compute capability 3.5

Transforming signal cufftExecC2C
Launching ComplexPointwiseMulAndScale<<< >>>
Transforming signal back cufftExecC2C


On Tue, Apr 5, 2016 at 2:12 PM, Jingyue Wu <[hidden email]
<mailto:[hidden email]>> wrote:

    Would you mind upload your simpleCUFFT.cu code? It looks related to
    device code generation because building was successful.

    On Tue, Apr 5, 2016 at 1:56 PM, Peter Steinbach <[hidden email]
    <mailto:[hidden email]>> wrote:

        Hi guys,

        first of all, please accept my apologies for contacting you by
        mail. I was a bit lost, which mailing list to choose from as
        pointed to by
        http://llvm.org/docs/CompileCudaWithLLVM.html
        and the subsequent
        http://llvm.org/docs/#mailing-lists
        Feel free to deflect this request to the relevant mailing list
        or bug tracker.

        In any case, I am very interested in using GPUCC in favor of
        NVCC for a multitude of reasons (C++1X, compilation speed, ...).
        I started to "port" my favorite samples from the nvidia SDK.
        With clang 3.8, samples-7.0.28/7_CUDALibraries/simpleCUFFT as
        compiled with clang produces an error at runtime! Here is what I
        see with a K20c:

        $ clang++ --cuda-path=/sw/apps/cuda/7.0.28   -I../../common/inc
        -m64    --cuda-gpu-arch=sm_35 --cuda-gpu-arch=sm_35 -o
        simpleCUFFT.o -c simpleCUFFT.cu
        $ clang++ --cuda-path=/sw/apps/cuda/7.0.28
        -L/sw/apps/cuda/7.0.28/lib64 -lcudart -ldl -lrt -pthread  -m64
             -o simpleCUFFT.llvm simpleCUFFT.o  -lcufft
        $ ./simpleCUFFT.llvm
        [simpleCUFFT] is starting...
        GPU Device 0: "Tesla K20c" with compute capability 3.5

        Transforming signal cufftExecC2C
        Launching ComplexPointwiseMulAndScale<<< >>>
        simpleCUFFT.cu(132) : getLastCudaError() CUDA error : Kernel
        execution failed [ ComplexPointwiseMulAndScale ] : (8) invalid
        device function.

        The same source code works just fine with nvcc 7.0.
        Any help would be appreciated.

        Best,
        Peter

        PS. From random comments, I had the feeling that you are looking
        at the SHOC benchmarks with gpucc. If so, please comment on:
        https://github.com/vetter/shoc/issues/48
        I don't wanna do work that is either pointless (support for
        textures) or was already done. ;)
        --
        Peter Steinbach, Dr. rer. nat.
        HPC Developer, Scientific Computing Facility

        Max Planck Institute of Molecular Cell Biology and Genetics
        Pfotenhauerstr. 108
        01307 Dresden
        Germany


        phone <a href="tel:%2B49%20351%20210%202882" value="+493512102882" target="_blank">+49 351 210 2882 <tel:%2B49%20351%20210%202882>
        fax <a href="tel:%2B49%20351%20210%201689" value="+493512101689" target="_blank">+49 351 210 1689 <tel:%2B49%20351%20210%201689>
        www.mpi-cbg.de <http://www.mpi-cbg.de>





--
Peter Steinbach, Dr. rer. nat.
HPC Developer, Scientific Computing Facility

Max Planck Institute of Molecular Cell Biology and Genetics
Pfotenhauerstr. 108
01307 Dresden
Germany


phone <a href="tel:%2B49%20351%20210%202882" value="+493512102882" target="_blank">+49 351 210 2882
fax   <a href="tel:%2B49%20351%20210%201689" value="+493512101689" target="_blank">+49 351 210 1689
www.mpi-cbg.de



--
--Artem Belevich

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev