OpenMP offloaded target region executed in both host and target-device

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

OpenMP offloaded target region executed in both host and target-device

Tom Stellard via cfe-dev
Hi,

I'm working on a project which requires OpenMP offloading to Nvidia GPUs using Clang. 

System specification

OS - Ubuntu 16.04 LTS
Clang -version 4.00
Processor - Intel(R) Core(TM) i7 -4700MQ CPU
Cuda -version - 9.0
Nvidia GPU - GeForce 740M (sm_capability - 35)

But the problem is I when I execute a sample program to test OpenMP offloading to Nvidia GPUs, part of the target region tends to run in GPU and then same target region starts executing in the host.

Please find the sample program  attached herewith, This a small C program written to multiply 2 matrices. 
The reason to claim that target region is being executed in both host and target-device is due to the abnormal output received from the print function residing in the target region. (My processor has 4 cores capable of handling 2 hardware level threads per core.).

Please find the image of the command line output attached herewith.

the program was compiled with -
 clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda


I can not figure out whether runtime believes that the GPU execution is not completing successfully?. So the target region is being executed in the host again. 

Thank you!
--
Piyumi Rameshka

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

2mm.c (1K) Download Attachment
command-line-output.jpg (318K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: OpenMP offloaded target region executed in both host and target-device

Tom Stellard via cfe-dev

Seems to me, your program crashes on GPU and then tries to execute the same code on cpu, though this behavior seems wrong to me.

The problem is in your code. When you try to map A, B and E array, you're ding it it the wrong way. Instead of mapping the arrays you just map pointers to these arrays and do not allocate the memory for them on the GPU.

-------------
Best regards,
Alexey Bataev
09.04.2018 8:21, Piyumi Lakshani via cfe-dev пишет:
Hi,

I'm working on a project which requires OpenMP offloading to Nvidia GPUs using Clang. 

System specification

OS - Ubuntu 16.04 LTS
Clang -version 4.00
Processor - Intel(R) Core(TM) i7 -4700MQ CPU
Cuda -version - 9.0
Nvidia GPU - GeForce 740M (sm_capability - 35)

But the problem is I when I execute a sample program to test OpenMP offloading to Nvidia GPUs, part of the target region tends to run in GPU and then same target region starts executing in the host.

Please find the sample program  attached herewith, This a small C program written to multiply 2 matrices. 
The reason to claim that target region is being executed in both host and target-device is due to the abnormal output received from the print function residing in the target region. (My processor has 4 cores capable of handling 2 hardware level threads per core.).

Please find the image of the command line output attached herewith.

the program was compiled with -
 clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda


I can not figure out whether runtime believes that the GPU execution is not completing successfully?. So the target region is being executed in the host again. 

Thank you!
--
Piyumi Rameshka


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: OpenMP offloaded target region executed in both host and target-device

Tom Stellard via cfe-dev
Hi,

Thank you Alexey for pointing that out. I was able to successfully offload the program to GPU after correcting that mistake.



On 9 April 2018 at 20:11, Alexey Bataev <[hidden email]> wrote:

Seems to me, your program crashes on GPU and then tries to execute the same code on cpu, though this behavior seems wrong to me.

The problem is in your code. When you try to map A, B and E array, you're ding it it the wrong way. Instead of mapping the arrays you just map pointers to these arrays and do not allocate the memory for them on the GPU.

-------------
Best regards,
Alexey Bataev
09.04.2018 8:21, Piyumi Lakshani via cfe-dev пишет:
Hi,

I'm working on a project which requires OpenMP offloading to Nvidia GPUs using Clang. 

System specification

OS - Ubuntu 16.04 LTS
Clang -version 4.00
Processor - Intel(R) Core(TM) i7 -4700MQ CPU
Cuda -version - 9.0
Nvidia GPU - GeForce 740M (sm_capability - 35)

But the problem is I when I execute a sample program to test OpenMP offloading to Nvidia GPUs, part of the target region tends to run in GPU and then same target region starts executing in the host.

Please find the sample program  attached herewith, This a small C program written to multiply 2 matrices. 
The reason to claim that target region is being executed in both host and target-device is due to the abnormal output received from the print function residing in the target region. (My processor has 4 cores capable of handling 2 hardware level threads per core.).

Please find the image of the command line output attached herewith.

the program was compiled with -
 clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda


I can not figure out whether runtime believes that the GPU execution is not completing successfully?. So the target region is being executed in the host again. 

Thank you!
--
Piyumi Rameshka



Regards,
--
Piyumi Rameshka

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev