[RFC] Delayed target-specific diagnostic when compiling for the devices.

classic Classic list List threaded Threaded
47 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[RFC] Delayed target-specific diagnostic when compiling for the devices.

David Blaikie via cfe-dev

Hi, many of the OpenMP users experience troubles when they try to compile real-world OpenMP applications, which use offloading constructs.

Problem Description
===============
For example, let’s look at the abstract code:
```
void target_unused() {
  int a;
  __asm__("constraints"
          : "=a"(a)
          :         // None
  );
}

void target_used() {
  int a;
  __asm__("constraints"
          : "=a"(a)
          :         // None
  );
}

void foo() {
  target_unused();
#pragma omp target
  target_used();
}
```
Assume, we going to compile this code on X86_64 host to run on the NVidia NVPTX64 device. When we compile this code for the host, everything is good. But when we compile the same code for the NVPTX64 target, we get the next error messages:
```
11:13: error: invalid output constraint '=a' in asm
20:13: error: invalid output constraint '=a' in asm
```
But, actually, we should see only one error message, the second one, for the function `target_used()`, which is actually used in the target region. The second function, `target_unused()` is used only on the host and we should no produce error message for this function when we compile the code for the device.

The main problem with those functions is that they are not marked explicitly as the device functions, just like it is required in CUDA. In OpenMP, it is not required to mark them explicitly as the device-only or both device-host function. They can be marked implicitly, depending of the fact that they are used in target-based constructs (probably, indirectly, through chain of calls) or not.

That’s why we need to postpone some of the target-related diagnostics till the CodeGen phase.

Possible solution.
==============
1. Move target-specific checks to the CodeGen.

The best solutions of all, does not require significant redesign of the existing code base, just requires some copy-paste of the diagnostics and the associated logic.

2. Add special delayed diagnostics class, associate it with the declarations somehow and check it for each declaration during the CodeGen.

Requires redesign of the diagnostic subsystem + redesign of the codegen subsystem, at least.

3. Introduce special expression, statement and declaration nodes.

This nodes serve only as the containers for the target-specific error messages, generated in Sema. During the codegen phase, if the codegen run into one of such constructs, it just emits the diagnostic, stored in these nodes.

Requires additional expression/statement/declarations nodes + looks like a hack.

-- 
-------------
Best regards,
Alexey Bataev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Delayed target-specific diagnostic when compiling for the devices.

David Blaikie via cfe-dev
Ping!

-------------
Best regards,
Alexey Bataev

09.01.2019 10:02, Alexey.Bataev пишет:

>
> Hi, many of the OpenMP users experience troubles when they try to
> compile real-world OpenMP applications, which use offloading constructs.
>
> Problem Description
> ===============
> For example, let’s look at the abstract code:
> ```
> void target_unused() {
>   int a;
>   __asm__("constraints"
>           : "=a"(a)
>           :         // None
>   );
> }
>
> void target_used() {
>   int a;
>   __asm__("constraints"
>           : "=a"(a)
>           :         // None
>   );
> }
>
> void foo() {
>   target_unused();
> #pragma omp target
>   target_used();
> }
> ```
> Assume, we going to compile this code on X86_64 host to run on the
> NVidia NVPTX64 device. When we compile this code for the host,
> everything is good. But when we compile the same code for the NVPTX64
> target, we get the next error messages:
> ```
> 11:13: error: invalid output constraint '=a' in asm
> 20:13: error: invalid output constraint '=a' in asm
> ```
> But, actually, we should see only one error message, the second one,
> for the function `target_used()`, which is actually used in the target
> region. The second function, `target_unused()` is used only on the
> host and we should no produce error message for this function when we
> compile the code for the device.
>
> The main problem with those functions is that they are not marked
> explicitly as the device functions, just like it is required in CUDA.
> In OpenMP, it is not required to mark them explicitly as the
> device-only or both device-host function. They can be marked
> implicitly, depending of the fact that they are used in target-based
> constructs (probably, indirectly, through chain of calls) or not.
>
> That’s why we need to postpone some of the target-related diagnostics
> till the CodeGen phase.
>
> Possible solution.
> ==============
> 1. Move target-specific checks to the CodeGen.
>
> The best solutions of all, does not require significant redesign of
> the existing code base, just requires some copy-paste of the
> diagnostics and the associated logic.
>
> 2. Add special delayed diagnostics class, associate it with the
> declarations somehow and check it for each declaration during the CodeGen.
>
> Requires redesign of the diagnostic subsystem + redesign of the
> codegen subsystem, at least.
>
> 3. Introduce special expression, statement and declaration nodes.
>
> This nodes serve only as the containers for the target-specific error
> messages, generated in Sema. During the codegen phase, if the codegen
> run into one of such constructs, it just emits the diagnostic, stored
> in these nodes.
>
> Requires additional expression/statement/declarations nodes + looks
> like a hack.
>
> --
> -------------
> Best regards,
> Alexey Bataev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Delayed target-specific diagnostic when compiling for the devices.

David Blaikie via cfe-dev
I have some questions about this... I'll write them up later today.

 -Hal

On 1/14/19 8:25 AM, Alexey Bataev wrote:

> Ping!
>
> -------------
> Best regards,
> Alexey Bataev
>
> 09.01.2019 10:02, Alexey.Bataev пишет:
>> Hi, many of the OpenMP users experience troubles when they try to
>> compile real-world OpenMP applications, which use offloading constructs.
>>
>> Problem Description
>> ===============
>> For example, let’s look at the abstract code:
>> ```
>> void target_unused() {
>>   int a;
>>   __asm__("constraints"
>>           : "=a"(a)
>>           :         // None
>>   );
>> }
>>
>> void target_used() {
>>   int a;
>>   __asm__("constraints"
>>           : "=a"(a)
>>           :         // None
>>   );
>> }
>>
>> void foo() {
>>   target_unused();
>> #pragma omp target
>>   target_used();
>> }
>> ```
>> Assume, we going to compile this code on X86_64 host to run on the
>> NVidia NVPTX64 device. When we compile this code for the host,
>> everything is good. But when we compile the same code for the NVPTX64
>> target, we get the next error messages:
>> ```
>> 11:13: error: invalid output constraint '=a' in asm
>> 20:13: error: invalid output constraint '=a' in asm
>> ```
>> But, actually, we should see only one error message, the second one,
>> for the function `target_used()`, which is actually used in the target
>> region. The second function, `target_unused()` is used only on the
>> host and we should no produce error message for this function when we
>> compile the code for the device.
>>
>> The main problem with those functions is that they are not marked
>> explicitly as the device functions, just like it is required in CUDA.
>> In OpenMP, it is not required to mark them explicitly as the
>> device-only or both device-host function. They can be marked
>> implicitly, depending of the fact that they are used in target-based
>> constructs (probably, indirectly, through chain of calls) or not.
>>
>> That’s why we need to postpone some of the target-related diagnostics
>> till the CodeGen phase.
>>
>> Possible solution.
>> ==============
>> 1. Move target-specific checks to the CodeGen.
>>
>> The best solutions of all, does not require significant redesign of
>> the existing code base, just requires some copy-paste of the
>> diagnostics and the associated logic.
>>
>> 2. Add special delayed diagnostics class, associate it with the
>> declarations somehow and check it for each declaration during the CodeGen.
>>
>> Requires redesign of the diagnostic subsystem + redesign of the
>> codegen subsystem, at least.
>>
>> 3. Introduce special expression, statement and declaration nodes.
>>
>> This nodes serve only as the containers for the target-specific error
>> messages, generated in Sema. During the codegen phase, if the codegen
>> run into one of such constructs, it just emits the diagnostic, stored
>> in these nodes.
>>
>> Requires additional expression/statement/declarations nodes + looks
>> like a hack.
>>
>> --
>> -------------
>> Best regards,
>> Alexey Bataev

--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Delayed target-specific diagnostic when compiling for the devices.

David Blaikie via cfe-dev
In reply to this post by David Blaikie via cfe-dev
On 1/9/19 9:02 AM, Alexey Bataev wrote:

>
> Hi, many of the OpenMP users experience troubles when they try to
> compile real-world OpenMP applications, which use offloading constructs.
>
> Problem Description
> ===============
> For example, let’s look at the abstract code:
> ```
> void target_unused() {
>   int a;
>   __asm__("constraints"
>           : "=a"(a)
>           :         // None
>   );
> }
>
> void target_used() {
>   int a;
>   __asm__("constraints"
>           : "=a"(a)
>           :         // None
>   );
> }
>
> void foo() {
>   target_unused();
> #pragma omp target
>   target_used();
> }
> ```
> Assume, we going to compile this code on X86_64 host to run on the
> NVidia NVPTX64 device. When we compile this code for the host,
> everything is good. But when we compile the same code for the NVPTX64
> target, we get the next error messages:
> ```
> 11:13: error: invalid output constraint '=a' in asm
> 20:13: error: invalid output constraint '=a' in asm
> ```
> But, actually, we should see only one error message, the second one,
> for the function `target_used()`, which is actually used in the target
> region. The second function, `target_unused()` is used only on the
> host and we should no produce error message for this function when we
> compile the code for the device.
>
> The main problem with those functions is that they are not marked
> explicitly as the device functions, just like it is required in CUDA.
> In OpenMP, it is not required to mark them explicitly as the
> device-only or both device-host function. They can be marked
> implicitly, depending of the fact that they are used in target-based
> constructs (probably, indirectly, through chain of calls) or not.
>

Do you mean that the implicit declare-target feature, which marks
functions to be generated for the target based on transitive usage, is
not implemented in the frontend? My understanding is that this feature
was designed to be implementable in the frontend. In that case, the
frontend can validate the inline assembly as it does now based on the
current code-generation target.

Thanks again,

Hal


>
> That’s why we need to postpone some of the target-related diagnostics
> till the CodeGen phase.
>
> Possible solution.
> ==============
> 1. Move target-specific checks to the CodeGen.
>
> The best solutions of all, does not require significant redesign of
> the existing code base, just requires some copy-paste of the
> diagnostics and the associated logic.
>
> 2. Add special delayed diagnostics class, associate it with the
> declarations somehow and check it for each declaration during the CodeGen.
>
> Requires redesign of the diagnostic subsystem + redesign of the
> codegen subsystem, at least.
>
> 3. Introduce special expression, statement and declaration nodes.
>
> This nodes serve only as the containers for the target-specific error
> messages, generated in Sema. During the codegen phase, if the codegen
> run into one of such constructs, it just emits the diagnostic, stored
> in these nodes.
>
> Requires additional expression/statement/declarations nodes + looks
> like a hack.
>
> --
> -------------
> Best regards,
> Alexey Bataev

--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Delayed target-specific diagnostic when compiling for the devices.

David Blaikie via cfe-dev
Yes, it is implemented in the frontend. No, it is not implemented in the Sema analysis, it is implemented in Codegen part of the frontend. And it is almost impossible to implement it in Sema. It requires recursive analysis of the functions used (not only called directly, but also indirectly). The better place to implement it is the CodeGen of the frontend.

Best regards,
Alexey Bataev

> 14 янв. 2019 г., в 19:16, Finkel, Hal J. <[hidden email]> написал(а):
>
>> On 1/9/19 9:02 AM, Alexey Bataev wrote:
>>
>> Hi, many of the OpenMP users experience troubles when they try to
>> compile real-world OpenMP applications, which use offloading constructs.
>>
>> Problem Description
>> ===============
>> For example, let’s look at the abstract code:
>> ```
>> void target_unused() {
>>   int a;
>>   __asm__("constraints"
>>           : "=a"(a)
>>           :         // None
>>   );
>> }
>>
>> void target_used() {
>>   int a;
>>   __asm__("constraints"
>>           : "=a"(a)
>>           :         // None
>>   );
>> }
>>
>> void foo() {
>>   target_unused();
>> #pragma omp target
>>   target_used();
>> }
>> ```
>> Assume, we going to compile this code on X86_64 host to run on the
>> NVidia NVPTX64 device. When we compile this code for the host,
>> everything is good. But when we compile the same code for the NVPTX64
>> target, we get the next error messages:
>> ```
>> 11:13: error: invalid output constraint '=a' in asm
>> 20:13: error: invalid output constraint '=a' in asm
>> ```
>> But, actually, we should see only one error message, the second one,
>> for the function `target_used()`, which is actually used in the target
>> region. The second function, `target_unused()` is used only on the
>> host and we should no produce error message for this function when we
>> compile the code for the device.
>>
>> The main problem with those functions is that they are not marked
>> explicitly as the device functions, just like it is required in CUDA.
>> In OpenMP, it is not required to mark them explicitly as the
>> device-only or both device-host function. They can be marked
>> implicitly, depending of the fact that they are used in target-based
>> constructs (probably, indirectly, through chain of calls) or not.
>>
>
> Do you mean that the implicit declare-target feature, which marks
> functions to be generated for the target based on transitive usage, is
> not implemented in the frontend? My understanding is that this feature
> was designed to be implementable in the frontend. In that case, the
> frontend can validate the inline assembly as it does now based on the
> current code-generation target.
>
> Thanks again,
>
> Hal
>
>
>>
>> That’s why we need to postpone some of the target-related diagnostics
>> till the CodeGen phase.
>>
>> Possible solution.
>> ==============
>> 1. Move target-specific checks to the CodeGen.
>>
>> The best solutions of all, does not require significant redesign of
>> the existing code base, just requires some copy-paste of the
>> diagnostics and the associated logic.
>>
>> 2. Add special delayed diagnostics class, associate it with the
>> declarations somehow and check it for each declaration during the CodeGen.
>>
>> Requires redesign of the diagnostic subsystem + redesign of the
>> codegen subsystem, at least.
>>
>> 3. Introduce special expression, statement and declaration nodes.
>>
>> This nodes serve only as the containers for the target-specific error
>> messages, generated in Sema. During the codegen phase, if the codegen
>> run into one of such constructs, it just emits the diagnostic, stored
>> in these nodes.
>>
>> Requires additional expression/statement/declarations nodes + looks
>> like a hack.
>>
>> --
>> -------------
>> Best regards,
>> Alexey Bataev
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Delayed target-specific diagnostic when compiling for the devices.

David Blaikie via cfe-dev
On 1/14/19 6:22 PM, Alexey Bataev wrote:
> Yes, it is implemented in the frontend. No, it is not implemented in the Sema analysis, it is implemented in Codegen part of the frontend. And it is almost impossible to implement it in Sema. It requires recursive analysis of the functions used (not only called directly, but also indirectly). The better place to implement it is the CodeGen of the frontend.

Your proposal makes sense to me. And, I agree, just moving the logic to
CodeGen seems like the most-straightforward solution. I don't see any
conceptual problem with inline assembly being checked a bit later in the
pipeline (it is, after all, quite target specific).

 -Hal

>
> Best regards,
> Alexey Bataev
>
>> 14 янв. 2019 г., в 19:16, Finkel, Hal J. <[hidden email]> написал(а):
>>
>>> On 1/9/19 9:02 AM, Alexey Bataev wrote:
>>>
>>> Hi, many of the OpenMP users experience troubles when they try to
>>> compile real-world OpenMP applications, which use offloading constructs.
>>>
>>> Problem Description
>>> ===============
>>> For example, let’s look at the abstract code:
>>> ```
>>> void target_unused() {
>>>   int a;
>>>   __asm__("constraints"
>>>           : "=a"(a)
>>>           :         // None
>>>   );
>>> }
>>>
>>> void target_used() {
>>>   int a;
>>>   __asm__("constraints"
>>>           : "=a"(a)
>>>           :         // None
>>>   );
>>> }
>>>
>>> void foo() {
>>>   target_unused();
>>> #pragma omp target
>>>   target_used();
>>> }
>>> ```
>>> Assume, we going to compile this code on X86_64 host to run on the
>>> NVidia NVPTX64 device. When we compile this code for the host,
>>> everything is good. But when we compile the same code for the NVPTX64
>>> target, we get the next error messages:
>>> ```
>>> 11:13: error: invalid output constraint '=a' in asm
>>> 20:13: error: invalid output constraint '=a' in asm
>>> ```
>>> But, actually, we should see only one error message, the second one,
>>> for the function `target_used()`, which is actually used in the target
>>> region. The second function, `target_unused()` is used only on the
>>> host and we should no produce error message for this function when we
>>> compile the code for the device.
>>>
>>> The main problem with those functions is that they are not marked
>>> explicitly as the device functions, just like it is required in CUDA.
>>> In OpenMP, it is not required to mark them explicitly as the
>>> device-only or both device-host function. They can be marked
>>> implicitly, depending of the fact that they are used in target-based
>>> constructs (probably, indirectly, through chain of calls) or not.
>>>
>> Do you mean that the implicit declare-target feature, which marks
>> functions to be generated for the target based on transitive usage, is
>> not implemented in the frontend? My understanding is that this feature
>> was designed to be implementable in the frontend. In that case, the
>> frontend can validate the inline assembly as it does now based on the
>> current code-generation target.
>>
>> Thanks again,
>>
>> Hal
>>
>>
>>> That’s why we need to postpone some of the target-related diagnostics
>>> till the CodeGen phase.
>>>
>>> Possible solution.
>>> ==============
>>> 1. Move target-specific checks to the CodeGen.
>>>
>>> The best solutions of all, does not require significant redesign of
>>> the existing code base, just requires some copy-paste of the
>>> diagnostics and the associated logic.
>>>
>>> 2. Add special delayed diagnostics class, associate it with the
>>> declarations somehow and check it for each declaration during the CodeGen.
>>>
>>> Requires redesign of the diagnostic subsystem + redesign of the
>>> codegen subsystem, at least.
>>>
>>> 3. Introduce special expression, statement and declaration nodes.
>>>
>>> This nodes serve only as the containers for the target-specific error
>>> messages, generated in Sema. During the codegen phase, if the codegen
>>> run into one of such constructs, it just emits the diagnostic, stored
>>> in these nodes.
>>>
>>> Requires additional expression/statement/declarations nodes + looks
>>> like a hack.
>>>
>>> --
>>> -------------
>>> Best regards,
>>> Alexey Bataev
>> --
>> Hal Finkel
>> Lead, Compiler Technology and Programming Languages
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Delayed target-specific diagnostic when compiling for the devices.

David Blaikie via cfe-dev


On 14 Jan 2019, at 23:23, Finkel, Hal J. via cfe-dev wrote:

> On 1/14/19 6:22 PM, Alexey Bataev wrote:
>> Yes, it is implemented in the frontend. No, it is not implemented in
>> the Sema analysis, it is implemented in Codegen part of the frontend.
>> And it is almost impossible to implement it in Sema. It requires
>> recursive analysis of the functions used (not only called directly,
>> but also indirectly). The better place to implement it is the CodeGen
>> of the frontend.
>
> Your proposal makes sense to me. And, I agree, just moving the logic
> to
> CodeGen seems like the most-straightforward solution. I don't see any
> conceptual problem with inline assembly being checked a bit later in
> the
> pipeline (it is, after all, quite target specific).

We generally prefer to diagnose things in Sema rather than IRGen, and
one
reason is that we want these diagnostics to show up in clients like
IDEs.

Obviously, ASM constraint validation is a pretty minor diagnostic, but I
don't see what naturally limits this to just that.  In fact, hasn't this
come up before?

If this is really just specific to ASM constraint validation, I think we
can find a way to delay the diagnosis of those to IRGen conditionally.
Otherwise, I think you need to find a way to suppress diagnostics from
functions that you don't care about.

John.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Delayed target-specific diagnostic when compiling for the devices.

David Blaikie via cfe-dev
Didn't we already go through this for CUDA? It has all the same issues. I see some comments about Sema::CUDADeferredDiags. I would look at that and try to generalize it.

On Tue, Jan 15, 2019 at 1:58 PM John McCall via cfe-dev <[hidden email]> wrote:


On 14 Jan 2019, at 23:23, Finkel, Hal J. via cfe-dev wrote:

> On 1/14/19 6:22 PM, Alexey Bataev wrote:
>> Yes, it is implemented in the frontend. No, it is not implemented in
>> the Sema analysis, it is implemented in Codegen part of the frontend.
>> And it is almost impossible to implement it in Sema. It requires
>> recursive analysis of the functions used (not only called directly,
>> but also indirectly). The better place to implement it is the CodeGen
>> of the frontend.
>
> Your proposal makes sense to me. And, I agree, just moving the logic
> to
> CodeGen seems like the most-straightforward solution. I don't see any
> conceptual problem with inline assembly being checked a bit later in
> the
> pipeline (it is, after all, quite target specific).

We generally prefer to diagnose things in Sema rather than IRGen, and
one
reason is that we want these diagnostics to show up in clients like
IDEs.

Obviously, ASM constraint validation is a pretty minor diagnostic, but I
don't see what naturally limits this to just that.  In fact, hasn't this
come up before?

If this is really just specific to ASM constraint validation, I think we
can find a way to delay the diagnosis of those to IRGen conditionally.
Otherwise, I think you need to find a way to suppress diagnostics from
functions that you don't care about.

John.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Delayed target-specific diagnostic when compiling for the devices.

David Blaikie via cfe-dev
This is not only for asm, we need to delay all target-specific diagnostics.
I'm not saying that we need to move the host diagnostic, only the diagnostic for the device compilation.
As for Cuda, it is a little but different. In Cuda the programmer must explicitly mark the device functions,  while in OpenMP it must be done implicitly. Thus, we cannot reuse the solution used for Cuda.

Best regards,
Alexey Bataev

15 янв. 2019 г., в 17:07, Reid Kleckner <[hidden email]> написал(а):

Didn't we already go through this for CUDA? It has all the same issues. I see some comments about Sema::CUDADeferredDiags. I would look at that and try to generalize it.

On Tue, Jan 15, 2019 at 1:58 PM John McCall via cfe-dev <[hidden email]> wrote:


On 14 Jan 2019, at 23:23, Finkel, Hal J. via cfe-dev wrote:

> On 1/14/19 6:22 PM, Alexey Bataev wrote:
>> Yes, it is implemented in the frontend. No, it is not implemented in
>> the Sema analysis, it is implemented in Codegen part of the frontend.
>> And it is almost impossible to implement it in Sema. It requires
>> recursive analysis of the functions used (not only called directly,
>> but also indirectly). The better place to implement it is the CodeGen
>> of the frontend.
>
> Your proposal makes sense to me. And, I agree, just moving the logic
> to
> CodeGen seems like the most-straightforward solution. I don't see any
> conceptual problem with inline assembly being checked a bit later in
> the
> pipeline (it is, after all, quite target specific).

We generally prefer to diagnose things in Sema rather than IRGen, and
one
reason is that we want these diagnostics to show up in clients like
IDEs.

Obviously, ASM constraint validation is a pretty minor diagnostic, but I
don't see what naturally limits this to just that.  In fact, hasn't this
come up before?

If this is really just specific to ASM constraint validation, I think we
can find a way to delay the diagnosis of those to IRGen conditionally.
Otherwise, I think you need to find a way to suppress diagnostics from
functions that you don't care about.

John.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Delayed target-specific diagnostic when compiling for the devices.

David Blaikie via cfe-dev
> As for Cuda, it is a little but different. In Cuda the programmer must explicitly mark the device functions,  while in OpenMP it must be done implicitly. Thus, we cannot reuse the solution used for Cuda.

Do you mean that because the OpenMP programmer does not explicitly mark device functions, the general approach taken to these sorts of errors in CUDA is inapplicable?

On Tue, Jan 15, 2019 at 2:20 PM Alexey Bataev <[hidden email]> wrote:
This is not only for asm, we need to delay all target-specific diagnostics.
I'm not saying that we need to move the host diagnostic, only the diagnostic for the device compilation.
As for Cuda, it is a little but different. In Cuda the programmer must explicitly mark the device functions,  while in OpenMP it must be done implicitly. Thus, we cannot reuse the solution used for Cuda.

Best regards,
Alexey Bataev

15 янв. 2019 г., в 17:07, Reid Kleckner <[hidden email]> написал(а):

Didn't we already go through this for CUDA? It has all the same issues. I see some comments about Sema::CUDADeferredDiags. I would look at that and try to generalize it.

On Tue, Jan 15, 2019 at 1:58 PM John McCall via cfe-dev <[hidden email]> wrote:


On 14 Jan 2019, at 23:23, Finkel, Hal J. via cfe-dev wrote:

> On 1/14/19 6:22 PM, Alexey Bataev wrote:
>> Yes, it is implemented in the frontend. No, it is not implemented in
>> the Sema analysis, it is implemented in Codegen part of the frontend.
>> And it is almost impossible to implement it in Sema. It requires
>> recursive analysis of the functions used (not only called directly,
>> but also indirectly). The better place to implement it is the CodeGen
>> of the frontend.
>
> Your proposal makes sense to me. And, I agree, just moving the logic
> to
> CodeGen seems like the most-straightforward solution. I don't see any
> conceptual problem with inline assembly being checked a bit later in
> the
> pipeline (it is, after all, quite target specific).

We generally prefer to diagnose things in Sema rather than IRGen, and
one
reason is that we want these diagnostics to show up in clients like
IDEs.

Obviously, ASM constraint validation is a pretty minor diagnostic, but I
don't see what naturally limits this to just that.  In fact, hasn't this
come up before?

If this is really just specific to ASM constraint validation, I think we
can find a way to delay the diagnosis of those to IRGen conditionally.
Otherwise, I think you need to find a way to suppress diagnostics from
functions that you don't care about.

John.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Delayed target-specific diagnostic when compiling for the devices.

David Blaikie via cfe-dev
Yes, right. OpenMP does not require explicit marking of the device function. It supports implicit target functions.

Best regards,
Alexey Bataev

15 янв. 2019 г., в 17:24, Justin Lebar <[hidden email]> написал(а):

> As for Cuda, it is a little but different. In Cuda the programmer must explicitly mark the device functions,  while in OpenMP it must be done implicitly. Thus, we cannot reuse the solution used for Cuda.

Do you mean that because the OpenMP programmer does not explicitly mark device functions, the general approach taken to these sorts of errors in CUDA is inapplicable?

On Tue, Jan 15, 2019 at 2:20 PM Alexey Bataev <[hidden email]> wrote:
This is not only for asm, we need to delay all target-specific diagnostics.
I'm not saying that we need to move the host diagnostic, only the diagnostic for the device compilation.
As for Cuda, it is a little but different. In Cuda the programmer must explicitly mark the device functions,  while in OpenMP it must be done implicitly. Thus, we cannot reuse the solution used for Cuda.

Best regards,
Alexey Bataev

15 янв. 2019 г., в 17:07, Reid Kleckner <[hidden email]> написал(а):

Didn't we already go through this for CUDA? It has all the same issues. I see some comments about Sema::CUDADeferredDiags. I would look at that and try to generalize it.

On Tue, Jan 15, 2019 at 1:58 PM John McCall via cfe-dev <[hidden email]> wrote:


On 14 Jan 2019, at 23:23, Finkel, Hal J. via cfe-dev wrote:

> On 1/14/19 6:22 PM, Alexey Bataev wrote:
>> Yes, it is implemented in the frontend. No, it is not implemented in
>> the Sema analysis, it is implemented in Codegen part of the frontend.
>> And it is almost impossible to implement it in Sema. It requires
>> recursive analysis of the functions used (not only called directly,
>> but also indirectly). The better place to implement it is the CodeGen
>> of the frontend.
>
> Your proposal makes sense to me. And, I agree, just moving the logic
> to
> CodeGen seems like the most-straightforward solution. I don't see any
> conceptual problem with inline assembly being checked a bit later in
> the
> pipeline (it is, after all, quite target specific).

We generally prefer to diagnose things in Sema rather than IRGen, and
one
reason is that we want these diagnostics to show up in clients like
IDEs.

Obviously, ASM constraint validation is a pretty minor diagnostic, but I
don't see what naturally limits this to just that.  In fact, hasn't this
come up before?

If this is really just specific to ASM constraint validation, I think we
can find a way to delay the diagnosis of those to IRGen conditionally.
Otherwise, I think you need to find a way to suppress diagnostics from
functions that you don't care about.

John.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Delayed target-specific diagnostic when compiling for the devices.

David Blaikie via cfe-dev
>> Do you mean that because the OpenMP programmer does not explicitly mark device functions, the general approach taken to these sorts of errors in CUDA is inapplicable?
> Yes, right. OpenMP does not require explicit marking of the device function. It supports implicit target functions.

Would you be willing to elaborate on how the fact that OpenMP has implicit device functions means that the general approach taken by CUDA to this problem is inapplicable to OpenMP?

(For example, I'd have naively thought that CUDA __host__ __device__ functions are very similar to an OpenMP implicit-device function, in basically all respects other than the fact that the CUDA functions have an explicit attribute.)

On Tue, Jan 15, 2019 at 2:26 PM Alexey Bataev <[hidden email]> wrote:
Yes, right. OpenMP does not require explicit marking of the device function. It supports implicit target functions.

Best regards,
Alexey Bataev

15 янв. 2019 г., в 17:24, Justin Lebar <[hidden email]> написал(а):

> As for Cuda, it is a little but different. In Cuda the programmer must explicitly mark the device functions,  while in OpenMP it must be done implicitly. Thus, we cannot reuse the solution used for Cuda.

Do you mean that because the OpenMP programmer does not explicitly mark device functions, the general approach taken to these sorts of errors in CUDA is inapplicable?

On Tue, Jan 15, 2019 at 2:20 PM Alexey Bataev <[hidden email]> wrote:
This is not only for asm, we need to delay all target-specific diagnostics.
I'm not saying that we need to move the host diagnostic, only the diagnostic for the device compilation.
As for Cuda, it is a little but different. In Cuda the programmer must explicitly mark the device functions,  while in OpenMP it must be done implicitly. Thus, we cannot reuse the solution used for Cuda.

Best regards,
Alexey Bataev

15 янв. 2019 г., в 17:07, Reid Kleckner <[hidden email]> написал(а):

Didn't we already go through this for CUDA? It has all the same issues. I see some comments about Sema::CUDADeferredDiags. I would look at that and try to generalize it.

On Tue, Jan 15, 2019 at 1:58 PM John McCall via cfe-dev <[hidden email]> wrote:


On 14 Jan 2019, at 23:23, Finkel, Hal J. via cfe-dev wrote:

> On 1/14/19 6:22 PM, Alexey Bataev wrote:
>> Yes, it is implemented in the frontend. No, it is not implemented in
>> the Sema analysis, it is implemented in Codegen part of the frontend.
>> And it is almost impossible to implement it in Sema. It requires
>> recursive analysis of the functions used (not only called directly,
>> but also indirectly). The better place to implement it is the CodeGen
>> of the frontend.
>
> Your proposal makes sense to me. And, I agree, just moving the logic
> to
> CodeGen seems like the most-straightforward solution. I don't see any
> conceptual problem with inline assembly being checked a bit later in
> the
> pipeline (it is, after all, quite target specific).

We generally prefer to diagnose things in Sema rather than IRGen, and
one
reason is that we want these diagnostics to show up in clients like
IDEs.

Obviously, ASM constraint validation is a pretty minor diagnostic, but I
don't see what naturally limits this to just that.  In fact, hasn't this
come up before?

If this is really just specific to ASM constraint validation, I think we
can find a way to delay the diagnosis of those to IRGen conditionally.
Otherwise, I think you need to find a way to suppress diagnostics from
functions that you don't care about.

John.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Delayed target-specific diagnostic when compiling for the devices.

David Blaikie via cfe-dev
In reply to this post by David Blaikie via cfe-dev
On 15 Jan 2019, at 17:20, Alexey Bataev wrote:
> This is not only for asm, we need to delay all target-specific
> diagnostics.
> I'm not saying that we need to move the host diagnostic, only the
> diagnostic for the device compilation.
> As for Cuda, it is a little but different. In Cuda the programmer must
> explicitly mark the device functions,  while in OpenMP it must be done
> implicitly. Thus, we cannot reuse the solution used for Cuda.

All it means is that you can't just use the solution used for CUDA "off
the shelf".  The basic idea of associating diagnostics with the current
function and then emitting those diagnostics later when you realize that
you have to emit that function is still completely applicable.

John.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Delayed target-specific diagnostic when compiling for the devices.

David Blaikie via cfe-dev
__host__ __device__ functions are still device functions and it means that they must be emitted when you compile for the device. You know, that the user marked those functions as the device functions. In OpenMP, you cannot say before the codegen phase whether the function is used on the device or not. We should not emit all the functions available, only those, which are used (implicitly or explicitly, directly or indirectly) in the target regions.

Best regards,
Alexey Bataev

>> 15 янв. 2019 г., в 17:34, John McCall <[hidden email]> написал(а):
>>
>> On 15 Jan 2019, at 17:20, Alexey Bataev wrote:
>> This is not only for asm, we need to delay all target-specific diagnostics.
>> I'm not saying that we need to move the host diagnostic, only the diagnostic for the device compilation.
>> As for Cuda, it is a little but different. In Cuda the programmer must explicitly mark the device functions,  while in OpenMP it must be done implicitly. Thus, we cannot reuse the solution used for Cuda.
>
> All it means is that you can't just use the solution used for CUDA "off the shelf".  The basic idea of associating diagnostics with the current function and then emitting those diagnostics later when you realize that you have to emit that function is still completely applicable.
>
> John.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Delayed target-specific diagnostic when compiling for the devices.

David Blaikie via cfe-dev
On 15 Jan 2019, at 17:58, Alexey Bataev wrote:
> __host__ __device__ functions are still device functions and it means
> that they must be emitted when you compile for the device. You know,
> that the user marked those functions as the device functions. In
> OpenMP, you cannot say before the codegen phase whether the function
> is used on the device or not. We should not emit all the functions
> available, only those, which are used (implicitly or explicitly,
> directly or indirectly) in the target regions.

I don't see why you couldn't do that analysis in Sema.

John.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Delayed target-specific diagnostic when compiling for the devices.

David Blaikie via cfe-dev
In reply to this post by David Blaikie via cfe-dev
> __host__ __device__ functions are still device functions and it means that they must be emitted when you compile for the device

That is not the case for templated or inline __host__ __device__ functions.  They explicitly are not emitted for host/device unless they are called from a host/device context.  CUDA code relies heavily on this fact.  As a result, you are allowed to do "host-only" things from a __host__ __device__ function so long as it's not codegen'ed for device.  Similarly, you can do "device-only" things from a __host__ __device__ function so long as it's not codegen'ed for host.

The notion of "deferred diagnostics" in clang's CUDA support is explicitly there to handle the case when we do not know whether or not a __host__ __device__ function must be emitted for host or device and so we don't know whether or not to raise an error when you do a "wrong-side" thing (i.e. you're compiling for device and you did a host-only thing, or you're compiling for host and you did a device-only thing).

On Tue, Jan 15, 2019 at 2:58 PM Alexey Bataev <[hidden email]> wrote:
__host__ __device__ functions are still device functions and it means that they must be emitted when you compile for the device. You know, that the user marked those functions as the device functions. In OpenMP, you cannot say before the codegen phase whether the function is used on the device or not. We should not emit all the functions available, only those, which are used (implicitly or explicitly, directly or indirectly) in the target regions.

Best regards,
Alexey Bataev

>> 15 янв. 2019 г., в 17:34, John McCall <[hidden email]> написал(а):
>>
>> On 15 Jan 2019, at 17:20, Alexey Bataev wrote:
>> This is not only for asm, we need to delay all target-specific diagnostics.
>> I'm not saying that we need to move the host diagnostic, only the diagnostic for the device compilation.
>> As for Cuda, it is a little but different. In Cuda the programmer must explicitly mark the device functions,  while in OpenMP it must be done implicitly. Thus, we cannot reuse the solution used for Cuda.
>
> All it means is that you can't just use the solution used for CUDA "off the shelf".  The basic idea of associating diagnostics with the current function and then emitting those diagnostics later when you realize that you have to emit that function is still completely applicable.
>
> John.

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Delayed target-specific diagnostic when compiling for the devices.

David Blaikie via cfe-dev
In reply to this post by David Blaikie via cfe-dev
Because currently this kind of the analysis is implemented in Codegen and Codegen decides, which function should be emitted and which is not.
To implement it in Sema, we'll need to reimpelement almost everything from the codegen, because we will need to analyse all the statements in all functions. It significantly increases the compilation time.

Best regards,
Alexey Bataev

> 15 янв. 2019 г., в 18:01, John McCall <[hidden email]> написал(а):
>
>> On 15 Jan 2019, at 17:58, Alexey Bataev wrote:
>> __host__ __device__ functions are still device functions and it means that they must be emitted when you compile for the device. You know, that the user marked those functions as the device functions. In OpenMP, you cannot say before the codegen phase whether the function is used on the device or not. We should not emit all the functions available, only those, which are used (implicitly or explicitly, directly or indirectly) in the target regions.
>
> I don't see why you couldn't do that analysis in Sema.
>
> John.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Delayed target-specific diagnostic when compiling for the devices.

David Blaikie via cfe-dev
Could you reference such functions in the initializer? Could you call them indirectly? Could you take their addresses? Or they just can be directly called from other functions?

Best regards,
Alexey Bataev

> 15 янв. 2019 г., в 18:16, Alexey Bataev <[hidden email]> написал(а):
>
> Because currently this kind of the analysis is implemented in Codegen and Codegen decides, which function should be emitted and which is not.
> To implement it in Sema, we'll need to reimpelement almost everything from the codegen, because we will need to analyse all the statements in all functions. It significantly increases the compilation time.
>
> Best regards,
> Alexey Bataev
>
>>> 15 янв. 2019 г., в 18:01, John McCall <[hidden email]> написал(а):
>>>
>>> On 15 Jan 2019, at 17:58, Alexey Bataev wrote:
>>> __host__ __device__ functions are still device functions and it means that they must be emitted when you compile for the device. You know, that the user marked those functions as the device functions. In OpenMP, you cannot say before the codegen phase whether the function is used on the device or not. We should not emit all the functions available, only those, which are used (implicitly or explicitly, directly or indirectly) in the target regions.
>>
>> I don't see why you couldn't do that analysis in Sema.
>>
>> John.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Delayed target-specific diagnostic when compiling for the devices.

David Blaikie via cfe-dev
In reply to this post by David Blaikie via cfe-dev

Yes, I thought about this. But we need to delay the diagnostic until the Codegen phase. What I need is the way to associate the diagnostic with the function so that this diagnostic is available in CodeGen.

Also, we need to postpone the diagnotics not only for functions, but,for example, for some types. For example, __float128 type is not supported by CUDA. We can get error messages when we ran into something like `typedef __float128 SomeOtherType` (say, in some system header files) and get the error diagnostic when we compile for the device. Though, actually, this type is not used in the device code, the diagnostic is still emitted and we need to delay too and emit it only iff the type is used in the device code.

-------------
Best regards,
Alexey Bataev
15.01.2019 17:33, John McCall пишет:
On 15 Jan 2019, at 17:20, Alexey Bataev wrote:
This is not only for asm, we need to delay all target-specific diagnostics.
I'm not saying that we need to move the host diagnostic, only the diagnostic for the device compilation.
As for Cuda, it is a little but different. In Cuda the programmer must explicitly mark the device functions,  while in OpenMP it must be done implicitly. Thus, we cannot reuse the solution used for Cuda.

All it means is that you can't just use the solution used for CUDA "off the shelf".  The basic idea of associating diagnostics with the current function and then emitting those diagnostics later when you realize that you have to emit that function is still completely applicable.

John.

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Delayed target-specific diagnostic when compiling for the devices.

David Blaikie via cfe-dev


On 16 Jan 2019, at 9:45, Alexey Bataev wrote:

> Yes, I thought about this. But we need to delay the diagnostic until the
> Codegen phase. What I need is the way to associate the diagnostic with
> the function so that this diagnostic is available in CodeGen.
>
> Also, we need to postpone the diagnotics not only for functions, but,for
> example, for some types. For example, __float128 type is not supported
> by CUDA. We can get error messages when we ran into something like
> `typedef __float128 SomeOtherType` (say, in some system header files)
> and get the error diagnostic when we compile for the device. Though,
> actually, this type is not used in the device code, the diagnostic is
> still emitted and we need to delay too and emit it only iff the type is
> used in the device code.

This is exactly what I mean.  We are not going to rewrite the compiler to
delay arbitrary diagnostics until IRGen.  You need to figure out a way to
do this that doesn't require that, and it probably starts with taking
advantage of the deferred-until-function-use infrastructure put in place
for CUDA.

John.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
123