Why is #pragma STDC FENV_ACCESS not supported?

classic Classic list List threaded Threaded
49 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: Why is #pragma STDC FENV_ACCESS not supported?

Matthieu Brucher via cfe-dev

This sounds promising.

 

I would not that when I added the strictfp attribute, I intended that the front end would attach this attribute to the callsite of all function calls within a scope that required strict floating point semantics.  It was supposed to be a way of preventing calls to libm functions from being optimized as LibFunc calls without the front end needing to know which functions could be processed that way.

 

I think it is a natural extension to use this attribute on functions that contain code requiring strict floating point handling, but the documentation will need to be updated.

 

From: Hal Finkel [mailto:[hidden email]]
Sent: Thursday, August 31, 2017 3:08 PM
To: Richard Smith <[hidden email]>
Cc: Kaylor, Andrew <[hidden email]>; Marcus Johnson <[hidden email]>; Clang Dev <[hidden email]>; [hidden email]
Subject: Re: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

 

 

On 08/31/2017 05:02 PM, Richard Smith wrote:

On 31 August 2017 at 14:40, Hal Finkel via cfe-dev <[hidden email]> wrote:

On 08/31/2017 04:31 PM, Richard Smith via cfe-dev wrote:

I think that's also not enough; you'd get the same problem after inlining, and across modules with LTO. You would need to also prevent any interprocedural code motion across a FENV_ACCESS / non-FENV_ACCESS boundary.


Or we prevent inlining.

 

Sure, I was considering that to be a form of interprocedural code motion :)

And even that doesn't seem to be enough. Suppose that some scalar optimization pass finds a clever way to converts some integer operation into a floating-point operation, such that it can prove that the FP values never overflow (I believe Chandler has an example of this that comes up in some real crypto code). Now suppose there's a case where the integer operands are undef, but that the code in question is bypassed in that case. If the FP operations get hoisted, and you happen to have FP exceptions enabled, you have a potential miscompile.


Good point. However, that's not a new problem, and we currently deal with this by respecting the noimplicitfloat attribute (and I think we'd definitely need to use that attribute if we allow fooling with the FP environment).

 

OK, so the idea would be that we'd lower a function containing FENV_ACCESS (or possibly an outlined block of such a function) with intrinsics for all FP operations, specially-annotated libm function calls, and noimplicitfloat and strictfp attributes to prevent generation of new FP operations and inlining into non-strictfp functions. Right? (And we could imagine a verifier check that ensures that you don't have pure FP operations inside a strictfp function.)


Yes, exactly.


 

Given the function annotations, do we need special intrinsics at all, or could we instead require that passes check whether the enclosing function is marked strictfp before optimizing, in the same way that some optimizations must be gated by a check for noimplicitfloat?


That's another possible design. We decided that the intrinsics were less intrusive. The problems is that it's not just FP-specific optimizations that would need to check the attribute, it is also other optimizations doing other kinds of code motion and value propagation. Having IR-level operations that are side-effect-free, except when some special function attribute is present, seems undesirable.

 -Hal


 

 -Hal




 

Fundamentally, it seems to me that feenableexcept is unsound in the current LLVM IR model of floating point, if we assume that fadd, fmul, fsub etc do not have side-effects.

 

On 31 August 2017 at 14:20, Kaylor, Andrew via cfe-dev <[hidden email]> wrote:

If that’s the case, we may need to use the constrained intrinsics for all FP operations when FENV_ACCESS is enabled anywhere in a function.

 

From: Richard Smith [mailto:[hidden email]]
Sent: Thursday, August 31, 2017 2:18 PM
To: Kaylor, Andrew <[hidden email]>
Cc: Clang Dev <[hidden email]>; Marcus Johnson <[hidden email]>; [hidden email]


Subject: Re: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

 

On 31 August 2017 at 14:14, Kaylor, Andrew via cfe-dev <[hidden email]> wrote:

I believe that we will rely on fedisableexcept() being marked as having unmodeled side-effects to prevent a hoist like that.

 

fadd can be hoisted past *anything*, can't it?

 

From: Richard Smith [mailto:[hidden email]]
Sent: Thursday, August 31, 2017 2:09 PM
To: Kaylor, Andrew <[hidden email]>
Cc: Marcus Johnson <[hidden email]>; Clang Dev <[hidden email]>; [hidden email]


Subject: Re: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

 

On 31 August 2017 at 11:09, Kaylor, Andrew via cfe-dev <[hidden email]> wrote:

There are still a few things missing from the optimizer to get it completely robust, but I think there is enough in place for front end work to begin.  As I think I’ve demonstrated in my recent attempt to contribute a clang patch I’m not skilled enough with the front end to be the person to pull this off without an excessive amount of oversight, but as Erich indicated we do have some good front end people here who have this on their TODO list.  It’s just not at the top of the TODO list yet.

 

If anyone is interested in the details of the LLVM side of things, there are constrained FP intrinisics (still marked as experimental at this point) documented in the language reference.  The initial patch can be seen here:

 

https://reviews.llvm.org/D27028

 

I’ve since added another group of intrinsics to handle the libm-equivalent intrinsics, and more recently Wei Ding contributed an fma intrinsic.

 

The idea is that the front end will emit the constrained intrinsics in place of equivalent general FP operations or intrinsics in scopes where FENV_ACCESS is enabled.  This will prevent the optimizer from making optimizations that assume default fenv settings (which is what we want the optimizer to do in all other cases).  Eventually, we’ll want to go back and teach specific optimizations to understand the intrinsics so that where possible optimizations can be performed in a manner consistent with dynamic rounding modes and strict exception handling.

 

How do you deal with the hoisting-into-fenv_access problem? Eg:

 

double f(double a, double b, double c) {

  {

#pragma STDC FENV_ACCESS ON

    feenableexcept(FE_OVERFLOW);

    double d = a * b;

    fedisableexcept(FE_OVERFLOW);

  }

  return c * d;

}

 

What stops llvm from hoisting the second fmul up to before the fedisableexcept?

 

-Andy

 

From: Hal Finkel [mailto:[hidden email]]
Sent: Thursday, August 31, 2017 10:45 AM
To: Richard Smith <[hidden email]>; Marcus Johnson <[hidden email]>
Cc: Clang Dev <[hidden email]>; Kaylor, Andrew <[hidden email]>
Subject: Re: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

 

 

On 08/31/2017 12:10 PM, Richard Smith via cfe-dev wrote:

Because no-one has implemented it. Patches would be welcome, but will need to start with a design and implementation of the requisite llvm extensions.


Yes. This is what Andrew Kaylor has been working on (cc'd).

 -Hal

 

On 31 Aug 2017 10:06, "Marcus Johnson via cfe-dev" <[hidden email]> wrote:

^^^^^^


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

 

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

 

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________ cfe-dev mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

 

_______________________________________________ cfe-dev mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

 

_______________________________________________ cfe-dev mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________ cfe-dev mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Why is #pragma STDC FENV_ACCESS not supported?

Matthieu Brucher via cfe-dev
If everyone's happy with that approach, I agree we're ready to start looking at the frontend side of things :-) I think there are a few open questions:

* Is it ever worth outlining an FENV_ACCESS block from a function to minimize the scope in which optimizations are restricted? (Presumably in the long term, all the LLVM optimizations that can operate on FP operations will do the same thing to intrinsic calls that are in "never traps, rounds to nearest" mode, so it may well not be worthwhile.)
* What impact does FENV_ACCESS have on constant expression evaluation, particularly in C++11 onwards where some FP operations are required to be evaluated during compilation?
* What happens if FENV_ACCESS is enabled at the end of a module?
* How does FENV_ACCESS interact with namespaces? Does it only last until the `}` (like at block scope) or not?

I've taken some of these to the C++ committee to see if they have opinions about how this feature should work in C++.

On 31 August 2017 at 15:28, Kaylor, Andrew via cfe-dev <[hidden email]> wrote:

This sounds promising.

 

I would not that when I added the strictfp attribute, I intended that the front end would attach this attribute to the callsite of all function calls within a scope that required strict floating point semantics.  It was supposed to be a way of preventing calls to libm functions from being optimized as LibFunc calls without the front end needing to know which functions could be processed that way.

 

I think it is a natural extension to use this attribute on functions that contain code requiring strict floating point handling, but the documentation will need to be updated.

 

From: Hal Finkel [mailto:[hidden email]]
Sent: Thursday, August 31, 2017 3:08 PM
To: Richard Smith <[hidden email]>
Cc: Kaylor, Andrew <[hidden email]>; Marcus Johnson <[hidden email]>; Clang Dev <[hidden email]>; [hidden email]


Subject: Re: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

 

 

On 08/31/2017 05:02 PM, Richard Smith wrote:

On 31 August 2017 at 14:40, Hal Finkel via cfe-dev <[hidden email]> wrote:

On 08/31/2017 04:31 PM, Richard Smith via cfe-dev wrote:

I think that's also not enough; you'd get the same problem after inlining, and across modules with LTO. You would need to also prevent any interprocedural code motion across a FENV_ACCESS / non-FENV_ACCESS boundary.


Or we prevent inlining.

 

Sure, I was considering that to be a form of interprocedural code motion :)

And even that doesn't seem to be enough. Suppose that some scalar optimization pass finds a clever way to converts some integer operation into a floating-point operation, such that it can prove that the FP values never overflow (I believe Chandler has an example of this that comes up in some real crypto code). Now suppose there's a case where the integer operands are undef, but that the code in question is bypassed in that case. If the FP operations get hoisted, and you happen to have FP exceptions enabled, you have a potential miscompile.


Good point. However, that's not a new problem, and we currently deal with this by respecting the noimplicitfloat attribute (and I think we'd definitely need to use that attribute if we allow fooling with the FP environment).

 

OK, so the idea would be that we'd lower a function containing FENV_ACCESS (or possibly an outlined block of such a function) with intrinsics for all FP operations, specially-annotated libm function calls, and noimplicitfloat and strictfp attributes to prevent generation of new FP operations and inlining into non-strictfp functions. Right? (And we could imagine a verifier check that ensures that you don't have pure FP operations inside a strictfp function.)


Yes, exactly.


 

Given the function annotations, do we need special intrinsics at all, or could we instead require that passes check whether the enclosing function is marked strictfp before optimizing, in the same way that some optimizations must be gated by a check for noimplicitfloat?


That's another possible design. We decided that the intrinsics were less intrusive. The problems is that it's not just FP-specific optimizations that would need to check the attribute, it is also other optimizations doing other kinds of code motion and value propagation. Having IR-level operations that are side-effect-free, except when some special function attribute is present, seems undesirable.

 -Hal


 

 -Hal




 

Fundamentally, it seems to me that feenableexcept is unsound in the current LLVM IR model of floating point, if we assume that fadd, fmul, fsub etc do not have side-effects.

 

On 31 August 2017 at 14:20, Kaylor, Andrew via cfe-dev <[hidden email]> wrote:

If that’s the case, we may need to use the constrained intrinsics for all FP operations when FENV_ACCESS is enabled anywhere in a function.

 

From: Richard Smith [mailto:[hidden email]]
Sent: Thursday, August 31, 2017 2:18 PM
To: Kaylor, Andrew <[hidden email]>
Cc: Clang Dev <[hidden email]>; Marcus Johnson <[hidden email]>; [hidden email]


Subject: Re: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

 

On 31 August 2017 at 14:14, Kaylor, Andrew via cfe-dev <[hidden email]> wrote:

I believe that we will rely on fedisableexcept() being marked as having unmodeled side-effects to prevent a hoist like that.

 

fadd can be hoisted past *anything*, can't it?

 

From: Richard Smith [mailto:[hidden email]]
Sent: Thursday, August 31, 2017 2:09 PM
To: Kaylor, Andrew <[hidden email]>
Cc: Marcus Johnson <[hidden email]>; Clang Dev <[hidden email]>; [hidden email]


Subject: Re: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

 

On 31 August 2017 at 11:09, Kaylor, Andrew via cfe-dev <[hidden email]> wrote:

There are still a few things missing from the optimizer to get it completely robust, but I think there is enough in place for front end work to begin.  As I think I’ve demonstrated in my recent attempt to contribute a clang patch I’m not skilled enough with the front end to be the person to pull this off without an excessive amount of oversight, but as Erich indicated we do have some good front end people here who have this on their TODO list.  It’s just not at the top of the TODO list yet.

 

If anyone is interested in the details of the LLVM side of things, there are constrained FP intrinisics (still marked as experimental at this point) documented in the language reference.  The initial patch can be seen here:

 

https://reviews.llvm.org/D27028

 

I’ve since added another group of intrinsics to handle the libm-equivalent intrinsics, and more recently Wei Ding contributed an fma intrinsic.

 

The idea is that the front end will emit the constrained intrinsics in place of equivalent general FP operations or intrinsics in scopes where FENV_ACCESS is enabled.  This will prevent the optimizer from making optimizations that assume default fenv settings (which is what we want the optimizer to do in all other cases).  Eventually, we’ll want to go back and teach specific optimizations to understand the intrinsics so that where possible optimizations can be performed in a manner consistent with dynamic rounding modes and strict exception handling.

 

How do you deal with the hoisting-into-fenv_access problem? Eg:

 

double f(double a, double b, double c) {

  {

#pragma STDC FENV_ACCESS ON

    feenableexcept(FE_OVERFLOW);

    double d = a * b;

    fedisableexcept(FE_OVERFLOW);

  }

  return c * d;

}

 

What stops llvm from hoisting the second fmul up to before the fedisableexcept?

 

-Andy

 

From: Hal Finkel [mailto:[hidden email]]
Sent: Thursday, August 31, 2017 10:45 AM
To: Richard Smith <[hidden email]>; Marcus Johnson <[hidden email]>
Cc: Clang Dev <[hidden email]>; Kaylor, Andrew <[hidden email]>
Subject: Re: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

 

 

On 08/31/2017 12:10 PM, Richard Smith via cfe-dev wrote:

Because no-one has implemented it. Patches would be welcome, but will need to start with a design and implementation of the requisite llvm extensions.


Yes. This is what Andrew Kaylor has been working on (cc'd).

 -Hal

 

On 31 Aug 2017 10:06, "Marcus Johnson via cfe-dev" <[hidden email]> wrote:

^^^^^^


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

 

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

 

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________ cfe-dev mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

 

_______________________________________________ cfe-dev mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

 

_______________________________________________ cfe-dev mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________ cfe-dev mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Why is #pragma STDC FENV_ACCESS not supported?

Matthieu Brucher via cfe-dev


On 08/31/2017 05:41 PM, Richard Smith wrote:
If everyone's happy with that approach, I agree we're ready to start looking at the frontend side of things :-) I think there are a few open questions:

* Is it ever worth outlining an FENV_ACCESS block from a function to minimize the scope in which optimizations are restricted? (Presumably in the long term, all the LLVM optimizations that can operate on FP operations will do the same thing to intrinsic calls that are in "never traps, rounds to nearest" mode, so it may well not be worthwhile.)

Outlining is probably a reasonable idea if there are floating-point computations outside of the FENV_ACCESS blocks in the functions. The tradeoffs here could be tricky, however, and I'd suggest waiting until we find some motivating cases. Even long term, I don't expect parity (just because if we're going to teach all of the transformations to treat them identically, we might as well extend the semantics of the IR operations themselves, and it's not clear to me that will even be worthwhile).

* What impact does FENV_ACCESS have on constant expression evaluation, particularly in C++11 onwards where some FP operations are required to be evaluated during compilation?
* What happens if FENV_ACCESS is enabled at the end of a module?

Would you prefer that it stay within the module? I think I'd prefer that.

Thanks again,
Hal

* How does FENV_ACCESS interact with namespaces? Does it only last until the `}` (like at block scope) or not?

I've taken some of these to the C++ committee to see if they have opinions about how this feature should work in C++.

On 31 August 2017 at 15:28, Kaylor, Andrew via cfe-dev <[hidden email]> wrote:

This sounds promising.

 

I would not that when I added the strictfp attribute, I intended that the front end would attach this attribute to the callsite of all function calls within a scope that required strict floating point semantics.  It was supposed to be a way of preventing calls to libm functions from being optimized as LibFunc calls without the front end needing to know which functions could be processed that way.

 

I think it is a natural extension to use this attribute on functions that contain code requiring strict floating point handling, but the documentation will need to be updated.

 

From: Hal Finkel [mailto:[hidden email]]
Sent: Thursday, August 31, 2017 3:08 PM
To: Richard Smith <[hidden email]>
Cc: Kaylor, Andrew <[hidden email]>; Marcus Johnson <[hidden email]>; Clang Dev <[hidden email]>; [hidden email]


Subject: Re: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

 

 

On 08/31/2017 05:02 PM, Richard Smith wrote:

On 31 August 2017 at 14:40, Hal Finkel via cfe-dev <[hidden email]> wrote:

On 08/31/2017 04:31 PM, Richard Smith via cfe-dev wrote:

I think that's also not enough; you'd get the same problem after inlining, and across modules with LTO. You would need to also prevent any interprocedural code motion across a FENV_ACCESS / non-FENV_ACCESS boundary.


Or we prevent inlining.

 

Sure, I was considering that to be a form of interprocedural code motion :)

And even that doesn't seem to be enough. Suppose that some scalar optimization pass finds a clever way to converts some integer operation into a floating-point operation, such that it can prove that the FP values never overflow (I believe Chandler has an example of this that comes up in some real crypto code). Now suppose there's a case where the integer operands are undef, but that the code in question is bypassed in that case. If the FP operations get hoisted, and you happen to have FP exceptions enabled, you have a potential miscompile.


Good point. However, that's not a new problem, and we currently deal with this by respecting the noimplicitfloat attribute (and I think we'd definitely need to use that attribute if we allow fooling with the FP environment).

 

OK, so the idea would be that we'd lower a function containing FENV_ACCESS (or possibly an outlined block of such a function) with intrinsics for all FP operations, specially-annotated libm function calls, and noimplicitfloat and strictfp attributes to prevent generation of new FP operations and inlining into non-strictfp functions. Right? (And we could imagine a verifier check that ensures that you don't have pure FP operations inside a strictfp function.)


Yes, exactly.


 

Given the function annotations, do we need special intrinsics at all, or could we instead require that passes check whether the enclosing function is marked strictfp before optimizing, in the same way that some optimizations must be gated by a check for noimplicitfloat?


That's another possible design. We decided that the intrinsics were less intrusive. The problems is that it's not just FP-specific optimizations that would need to check the attribute, it is also other optimizations doing other kinds of code motion and value propagation. Having IR-level operations that are side-effect-free, except when some special function attribute is present, seems undesirable.

 -Hal


 

 -Hal




 

Fundamentally, it seems to me that feenableexcept is unsound in the current LLVM IR model of floating point, if we assume that fadd, fmul, fsub etc do not have side-effects.

 

On 31 August 2017 at 14:20, Kaylor, Andrew via cfe-dev <[hidden email]> wrote:

If that’s the case, we may need to use the constrained intrinsics for all FP operations when FENV_ACCESS is enabled anywhere in a function.

 

From: Richard Smith [mailto:[hidden email]]
Sent: Thursday, August 31, 2017 2:18 PM
To: Kaylor, Andrew <[hidden email]>
Cc: Clang Dev <[hidden email]>; Marcus Johnson <[hidden email]>; [hidden email]


Subject: Re: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

 

On 31 August 2017 at 14:14, Kaylor, Andrew via cfe-dev <[hidden email]> wrote:

I believe that we will rely on fedisableexcept() being marked as having unmodeled side-effects to prevent a hoist like that.

 

fadd can be hoisted past *anything*, can't it?

 

From: Richard Smith [mailto:[hidden email]]
Sent: Thursday, August 31, 2017 2:09 PM
To: Kaylor, Andrew <[hidden email]>
Cc: Marcus Johnson <[hidden email]>; Clang Dev <[hidden email]>; [hidden email]


Subject: Re: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

 

On 31 August 2017 at 11:09, Kaylor, Andrew via cfe-dev <[hidden email]> wrote:

There are still a few things missing from the optimizer to get it completely robust, but I think there is enough in place for front end work to begin.  As I think I’ve demonstrated in my recent attempt to contribute a clang patch I’m not skilled enough with the front end to be the person to pull this off without an excessive amount of oversight, but as Erich indicated we do have some good front end people here who have this on their TODO list.  It’s just not at the top of the TODO list yet.

 

If anyone is interested in the details of the LLVM side of things, there are constrained FP intrinisics (still marked as experimental at this point) documented in the language reference.  The initial patch can be seen here:

 

https://reviews.llvm.org/D27028

 

I’ve since added another group of intrinsics to handle the libm-equivalent intrinsics, and more recently Wei Ding contributed an fma intrinsic.

 

The idea is that the front end will emit the constrained intrinsics in place of equivalent general FP operations or intrinsics in scopes where FENV_ACCESS is enabled.  This will prevent the optimizer from making optimizations that assume default fenv settings (which is what we want the optimizer to do in all other cases).  Eventually, we’ll want to go back and teach specific optimizations to understand the intrinsics so that where possible optimizations can be performed in a manner consistent with dynamic rounding modes and strict exception handling.

 

How do you deal with the hoisting-into-fenv_access problem? Eg:

 

double f(double a, double b, double c) {

  {

#pragma STDC FENV_ACCESS ON

    feenableexcept(FE_OVERFLOW);

    double d = a * b;

    fedisableexcept(FE_OVERFLOW);

  }

  return c * d;

}

 

What stops llvm from hoisting the second fmul up to before the fedisableexcept?

 

-Andy

 

From: Hal Finkel [mailto:[hidden email]]
Sent: Thursday, August 31, 2017 10:45 AM
To: Richard Smith <[hidden email]>; Marcus Johnson <[hidden email]>
Cc: Clang Dev <[hidden email]>; Kaylor, Andrew <[hidden email]>
Subject: Re: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

 

 

On 08/31/2017 12:10 PM, Richard Smith via cfe-dev wrote:

Because no-one has implemented it. Patches would be welcome, but will need to start with a design and implementation of the requisite llvm extensions.


Yes. This is what Andrew Kaylor has been working on (cc'd).

 -Hal

 

On 31 Aug 2017 10:06, "Marcus Johnson via cfe-dev" <[hidden email]> wrote:

^^^^^^


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

 

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

 

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________ cfe-dev mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

 

_______________________________________________ cfe-dev mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

 

_______________________________________________ cfe-dev mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________ cfe-dev mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
_______________________________________________ cfe-dev mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Why is #pragma STDC FENV_ACCESS not supported?

Matthieu Brucher via cfe-dev
On 31 August 2017 at 18:17, Hal Finkel via cfe-dev <[hidden email]> wrote:


On 08/31/2017 05:41 PM, Richard Smith wrote:
If everyone's happy with that approach, I agree we're ready to start looking at the frontend side of things :-) I think there are a few open questions:

* Is it ever worth outlining an FENV_ACCESS block from a function to minimize the scope in which optimizations are restricted? (Presumably in the long term, all the LLVM optimizations that can operate on FP operations will do the same thing to intrinsic calls that are in "never traps, rounds to nearest" mode, so it may well not be worthwhile.)

Outlining is probably a reasonable idea if there are floating-point computations outside of the FENV_ACCESS blocks in the functions. The tradeoffs here could be tricky, however, and I'd suggest waiting until we find some motivating cases. Even long term, I don't expect parity (just because if we're going to teach all of the transformations to treat them identically, we might as well extend the semantics of the IR operations themselves, and it's not clear to me that will even be worthwhile).

Hmm. I suppose there is still a semantic difference between, say, a "never trap, round to nearest mul" intrinsic and fmul -- the former cannot be reordered past a function call (because it might change the rounding mode), but the latter can. So perhaps there is some argument for outlining even in the hypothetical long term when LLVM can optimize the intrinsics well.
* What impact does FENV_ACCESS have on constant expression evaluation, particularly in C++11 onwards where some FP operations are required to be evaluated during compilation?
* What happens if FENV_ACCESS is enabled at the end of a module?

Would you prefer that it stay within the module? I think I'd prefer that.

Yes, with an enabled-by-default warning if a module ends with a non-default setting (on the basis that we want a modules build and a non-modules build using the same header to have the same semantics).

Thanks again,
Hal


* How does FENV_ACCESS interact with namespaces? Does it only last until the `}` (like at block scope) or not?

I've taken some of these to the C++ committee to see if they have opinions about how this feature should work in C++.

On 31 August 2017 at 15:28, Kaylor, Andrew via cfe-dev <[hidden email]> wrote:

This sounds promising.

 

I would not that when I added the strictfp attribute, I intended that the front end would attach this attribute to the callsite of all function calls within a scope that required strict floating point semantics.  It was supposed to be a way of preventing calls to libm functions from being optimized as LibFunc calls without the front end needing to know which functions could be processed that way.

 

I think it is a natural extension to use this attribute on functions that contain code requiring strict floating point handling, but the documentation will need to be updated.

 

From: Hal Finkel [mailto:[hidden email]]
Sent: Thursday, August 31, 2017 3:08 PM
To: Richard Smith <[hidden email]>
Cc: Kaylor, Andrew <[hidden email]>; Marcus Johnson <[hidden email]>; Clang Dev <[hidden email]>; [hidden email]


Subject: Re: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

 

 

On 08/31/2017 05:02 PM, Richard Smith wrote:

On 31 August 2017 at 14:40, Hal Finkel via cfe-dev <[hidden email]> wrote:

On 08/31/2017 04:31 PM, Richard Smith via cfe-dev wrote:

I think that's also not enough; you'd get the same problem after inlining, and across modules with LTO. You would need to also prevent any interprocedural code motion across a FENV_ACCESS / non-FENV_ACCESS boundary.


Or we prevent inlining.

 

Sure, I was considering that to be a form of interprocedural code motion :)

And even that doesn't seem to be enough. Suppose that some scalar optimization pass finds a clever way to converts some integer operation into a floating-point operation, such that it can prove that the FP values never overflow (I believe Chandler has an example of this that comes up in some real crypto code). Now suppose there's a case where the integer operands are undef, but that the code in question is bypassed in that case. If the FP operations get hoisted, and you happen to have FP exceptions enabled, you have a potential miscompile.


Good point. However, that's not a new problem, and we currently deal with this by respecting the noimplicitfloat attribute (and I think we'd definitely need to use that attribute if we allow fooling with the FP environment).

 

OK, so the idea would be that we'd lower a function containing FENV_ACCESS (or possibly an outlined block of such a function) with intrinsics for all FP operations, specially-annotated libm function calls, and noimplicitfloat and strictfp attributes to prevent generation of new FP operations and inlining into non-strictfp functions. Right? (And we could imagine a verifier check that ensures that you don't have pure FP operations inside a strictfp function.)


Yes, exactly.


 

Given the function annotations, do we need special intrinsics at all, or could we instead require that passes check whether the enclosing function is marked strictfp before optimizing, in the same way that some optimizations must be gated by a check for noimplicitfloat?


That's another possible design. We decided that the intrinsics were less intrusive. The problems is that it's not just FP-specific optimizations that would need to check the attribute, it is also other optimizations doing other kinds of code motion and value propagation. Having IR-level operations that are side-effect-free, except when some special function attribute is present, seems undesirable.

 -Hal


 

 -Hal




 

Fundamentally, it seems to me that feenableexcept is unsound in the current LLVM IR model of floating point, if we assume that fadd, fmul, fsub etc do not have side-effects.

 

On 31 August 2017 at 14:20, Kaylor, Andrew via cfe-dev <[hidden email]> wrote:

If that’s the case, we may need to use the constrained intrinsics for all FP operations when FENV_ACCESS is enabled anywhere in a function.

 

From: Richard Smith [mailto:[hidden email]]
Sent: Thursday, August 31, 2017 2:18 PM
To: Kaylor, Andrew <[hidden email]>
Cc: Clang Dev <[hidden email]>; Marcus Johnson <[hidden email]>; [hidden email]


Subject: Re: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

 

On 31 August 2017 at 14:14, Kaylor, Andrew via cfe-dev <[hidden email]> wrote:

I believe that we will rely on fedisableexcept() being marked as having unmodeled side-effects to prevent a hoist like that.

 

fadd can be hoisted past *anything*, can't it?

 

From: Richard Smith [mailto:[hidden email]]
Sent: Thursday, August 31, 2017 2:09 PM
To: Kaylor, Andrew <[hidden email]>
Cc: Marcus Johnson <[hidden email]>; Clang Dev <[hidden email]>; [hidden email]


Subject: Re: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

 

On 31 August 2017 at 11:09, Kaylor, Andrew via cfe-dev <[hidden email]> wrote:

There are still a few things missing from the optimizer to get it completely robust, but I think there is enough in place for front end work to begin.  As I think I’ve demonstrated in my recent attempt to contribute a clang patch I’m not skilled enough with the front end to be the person to pull this off without an excessive amount of oversight, but as Erich indicated we do have some good front end people here who have this on their TODO list.  It’s just not at the top of the TODO list yet.

 

If anyone is interested in the details of the LLVM side of things, there are constrained FP intrinisics (still marked as experimental at this point) documented in the language reference.  The initial patch can be seen here:

 

https://reviews.llvm.org/D27028

 

I’ve since added another group of intrinsics to handle the libm-equivalent intrinsics, and more recently Wei Ding contributed an fma intrinsic.

 

The idea is that the front end will emit the constrained intrinsics in place of equivalent general FP operations or intrinsics in scopes where FENV_ACCESS is enabled.  This will prevent the optimizer from making optimizations that assume default fenv settings (which is what we want the optimizer to do in all other cases).  Eventually, we’ll want to go back and teach specific optimizations to understand the intrinsics so that where possible optimizations can be performed in a manner consistent with dynamic rounding modes and strict exception handling.

 

How do you deal with the hoisting-into-fenv_access problem? Eg:

 

double f(double a, double b, double c) {

  {

#pragma STDC FENV_ACCESS ON

    feenableexcept(FE_OVERFLOW);

    double d = a * b;

    fedisableexcept(FE_OVERFLOW);

  }

  return c * d;

}

 

What stops llvm from hoisting the second fmul up to before the fedisableexcept?

 

-Andy

 

From: Hal Finkel [mailto:[hidden email]]
Sent: Thursday, August 31, 2017 10:45 AM
To: Richard Smith <[hidden email]>; Marcus Johnson <[hidden email]>
Cc: Clang Dev <[hidden email]>; Kaylor, Andrew <[hidden email]>
Subject: Re: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

 

 

On 08/31/2017 12:10 PM, Richard Smith via cfe-dev wrote:

Because no-one has implemented it. Patches would be welcome, but will need to start with a design and implementation of the requisite llvm extensions.


Yes. This is what Andrew Kaylor has been working on (cc'd).

 -Hal

 

On 31 Aug 2017 10:06, "Marcus Johnson via cfe-dev" <[hidden email]> wrote:

^^^^^^


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

 

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

 

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________ cfe-dev mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

 

_______________________________________________ cfe-dev mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

 

_______________________________________________ cfe-dev mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________ cfe-dev mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
_______________________________________________ cfe-dev mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Why is #pragma STDC FENV_ACCESS not supported?

Matthieu Brucher via cfe-dev
In reply to this post by Matthieu Brucher via cfe-dev
On Thu, Aug 31, 2017 at 05:03:17PM -0500, Hal Finkel via cfe-dev wrote:
>    To be clear, we've had several extensive discussions about this, on and
>    off list, and Andy has started adding the corresponding intrinsics into
>    the IR. There was a presumption about a lack of mixing, however, and we
>    do need to work out how to prevent mixing the native IR operations with
>    the intrinsics (although, perhaps we just did that).
>     -Hal

What's the current status of this work? My employeer very much needs this
work done sooner rather than later, and I've been tasked with helping make
it happen.

What, exactly, still needs to be done to complete this work? I've seen
some of the discussions about it, and I've seen the documentation on the
new llvm constrained floating point intrinsics. But clang I don't think
supports them yet, fptosi is not on the list anyway, and I'm not sure what
else is needed. So I'm asking, what all is needed and what can I work on
to move this forward?

Is there any work in progress code that anyone would be willing to share?
For example, any code using the new intrinsics? Andy?


The specific case we're running into today is that we have code being
reordered in ways that trigger traps when handling a NaN. This code:

#include <math.h>

int foo(double d) {
   int x = (!isnan(d) ? (int)d : 45);
   return x;
}

... becomes this:

define signext i32 @foo(double) local_unnamed_addr #0 !dbg !10 {
  tail call void @llvm.dbg.value(metadata double %0, i64 0, metadata !15, metadata !17), !dbg !18
  %2 = tail call signext i32 @__isnan(double %0) #3, !dbg !19
  %3 = icmp eq i32 %2, 0, !dbg !19
  %4 = fptosi double %0 to i32, !dbg !20
  %5 = select i1 %3, i32 %4, i32 45, !dbg !19
  tail call void @llvm.dbg.value(metadata i32 %5, i64 0, metadata !16, metadata !17), !dbg !21
  ret i32 %5, !dbg !22
}

So the fptosi gets moved _above_ the select and the trap happens. This
in code that was written to avoid a trap in exactly this case.

We're compiling with clang 5.0.0 "-g -O1" targeting SystemZ.
--
Kevin P. Neal                                http://www.pobox.com/~kpn/
      'Concerns about "rights" and "ownership" of domains are inappropriate.  
 It is appropriate to be concerned about "responsibilities" and "service"
 to the community.' -- RFC 1591, page 4: March 1994
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Why is #pragma STDC FENV_ACCESS not supported?

Matthieu Brucher via cfe-dev
Hi Kevin,

Thanks for reaching out about this, and thanks especially for offering to help. I've had some other priorities that have prevented me from making progress on this recently.

As far as I know, there is no support at all in clang for handling the FENV_ACCESS pragma. I have a sample patch somewhere that I created to demonstrate how the front end would create the constrained intrinsics instead of normal FP operations, but correctly implementing support for the pragma requires more front end and language expertise than I possess. I believe Clark Nelson, who does have such expertise, has this on his long term TODO list but I don't know anything about the actual timeframe when the work will begin.

On the LLVM side of things there are a few significant holes. As you've noticed, the FP to integer conversions operations still need intrinsics, as do fcmp, fptrunc, and fpext. There are probably others that I'm overlooking. The FP to SI conversion has an additional wrinkle that needs to be worked out in that the default lowering of this conversion to machine instructions is not exception safe.

In general, the current "strict FP" handling stops at instruction selection. At the MachineIR level we don't currently have a mechanism to prevent inappropriate optimizations based on floating point constraints, or indeed to convey such constraints to the backend. Implicit register use modeling may provide some restriction on some architectures, but this is definitely lacking for X86 targets. On the other hand, I'm not aware of any specific current problems, so in many cases we may "get lucky" and have the correct thing happen by chance. Obviously that's not a viable long term solution. I have a rough plan for adding improved register modeling to the X86 backend, which should take care of instruction scheduling issues, but we'd still need a mechanism to prevent constant folding optimizations and such.

As for what you could begin work on, it should be a fairly straight-forward task to implement the intrinsics for fptosi, fptoui, fcmp, fptrunc, and fpext. That would be a gentle introduction. Beyond that, it would be very helpful to have some pathfinding work done to solidify exactly what the remaining shortcomings are. I have a patch somewhere (stale by now, but I could refresh it pretty easily) that unconditionally converts all FP operations to the equivalent constrained intrinsics. You could use that to do testing and find out what's broken.

Thanks,
Andy


-----Original Message-----
From: Kevin P. Neal [mailto:[hidden email]]
Sent: Monday, January 08, 2018 6:41 AM
To: Hal Finkel via cfe-dev <[hidden email]>
Cc: Richard Smith <[hidden email]>; Kaylor, Andrew <[hidden email]>; Marcus Johnson <[hidden email]>; [hidden email]; Bob Huemmer <[hidden email]>
Subject: Re: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

On Thu, Aug 31, 2017 at 05:03:17PM -0500, Hal Finkel via cfe-dev wrote:
>    To be clear, we've had several extensive discussions about this, on and
>    off list, and Andy has started adding the corresponding intrinsics into
>    the IR. There was a presumption about a lack of mixing, however, and we
>    do need to work out how to prevent mixing the native IR operations with
>    the intrinsics (although, perhaps we just did that).
>     -Hal

What's the current status of this work? My employeer very much needs this work done sooner rather than later, and I've been tasked with helping make it happen.

What, exactly, still needs to be done to complete this work? I've seen some of the discussions about it, and I've seen the documentation on the new llvm constrained floating point intrinsics. But clang I don't think supports them yet, fptosi is not on the list anyway, and I'm not sure what else is needed. So I'm asking, what all is needed and what can I work on to move this forward?

Is there any work in progress code that anyone would be willing to share?
For example, any code using the new intrinsics? Andy?


The specific case we're running into today is that we have code being reordered in ways that trigger traps when handling a NaN. This code:

#include <math.h>

int foo(double d) {
   int x = (!isnan(d) ? (int)d : 45);
   return x;
}

... becomes this:

define signext i32 @foo(double) local_unnamed_addr #0 !dbg !10 {
  tail call void @llvm.dbg.value(metadata double %0, i64 0, metadata !15, metadata !17), !dbg !18
  %2 = tail call signext i32 @__isnan(double %0) #3, !dbg !19
  %3 = icmp eq i32 %2, 0, !dbg !19
  %4 = fptosi double %0 to i32, !dbg !20
  %5 = select i1 %3, i32 %4, i32 45, !dbg !19
  tail call void @llvm.dbg.value(metadata i32 %5, i64 0, metadata !16, metadata !17), !dbg !21
  ret i32 %5, !dbg !22
}

So the fptosi gets moved _above_ the select and the trap happens. This in code that was written to avoid a trap in exactly this case.

We're compiling with clang 5.0.0 "-g -O1" targeting SystemZ.
--
Kevin P. Neal                                http://www.pobox.com/~kpn/
      'Concerns about "rights" and "ownership" of domains are inappropriate.  
 It is appropriate to be concerned about "responsibilities" and "service"
 to the community.' -- RFC 1591, page 4: March 1994
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Why is #pragma STDC FENV_ACCESS not supported?

Matthieu Brucher via cfe-dev
On 8 January 2018 at 11:15, Kaylor, Andrew via llvm-dev <[hidden email]> wrote:
Hi Kevin,

Thanks for reaching out about this, and thanks especially for offering to help. I've had some other priorities that have prevented me from making progress on this recently.

As far as I know, there is no support at all in clang for handling the FENV_ACCESS pragma. I have a sample patch somewhere that I created to demonstrate how the front end would create the constrained intrinsics instead of normal FP operations, but correctly implementing support for the pragma requires more front end and language expertise than I possess. I believe Clark Nelson, who does have such expertise, has this on his long term TODO list but I don't know anything about the actual timeframe when the work will begin.

If you want to work on this side of things, the place to start would be teaching the lexer to recognize the attribute and produce a suitable annotation token, then teaching the parser to parse the token in the places where the pragma can appear and to track the current FENV_ACCESS state. Then you'll need to find a suitable AST representation for the pragma (I have some ideas on this, feel free to ask), both for the affected compound statements and for the affected floating-point operations, build those representations when necessary, and teach the various AST consumers (LLVM IR generation and constant expression evaluation immediately spring to mind) how to handle them.

On the LLVM side of things there are a few significant holes. As you've noticed, the FP to integer conversions operations still need intrinsics, as do fcmp, fptrunc, and fpext. There are probably others that I'm overlooking. The FP to SI conversion has an additional wrinkle that needs to be worked out in that the default lowering of this conversion to machine instructions is not exception safe.

In general, the current "strict FP" handling stops at instruction selection. At the MachineIR level we don't currently have a mechanism to prevent inappropriate optimizations based on floating point constraints, or indeed to convey such constraints to the backend. Implicit register use modeling may provide some restriction on some architectures, but this is definitely lacking for X86 targets. On the other hand, I'm not aware of any specific current problems, so in many cases we may "get lucky" and have the correct thing happen by chance. Obviously that's not a viable long term solution. I have a rough plan for adding improved register modeling to the X86 backend, which should take care of instruction scheduling issues, but we'd still need a mechanism to prevent constant folding optimizations and such.

As for what you could begin work on, it should be a fairly straight-forward task to implement the intrinsics for fptosi, fptoui, fcmp, fptrunc, and fpext. That would be a gentle introduction. Beyond that, it would be very helpful to have some pathfinding work done to solidify exactly what the remaining shortcomings are. I have a patch somewhere (stale by now, but I could refresh it pretty easily) that unconditionally converts all FP operations to the equivalent constrained intrinsics. You could use that to do testing and find out what's broken.

Thanks,
Andy


-----Original Message-----
From: Kevin P. Neal [mailto:[hidden email]]
Sent: Monday, January 08, 2018 6:41 AM
To: Hal Finkel via cfe-dev <[hidden email]>
Cc: Richard Smith <[hidden email]>; Kaylor, Andrew <[hidden email]>; Marcus Johnson <[hidden email]>; [hidden email]; Bob Huemmer <[hidden email]>
Subject: Re: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

On Thu, Aug 31, 2017 at 05:03:17PM -0500, Hal Finkel via cfe-dev wrote:
>    To be clear, we've had several extensive discussions about this, on and
>    off list, and Andy has started adding the corresponding intrinsics into
>    the IR. There was a presumption about a lack of mixing, however, and we
>    do need to work out how to prevent mixing the native IR operations with
>    the intrinsics (although, perhaps we just did that).
>     -Hal

What's the current status of this work? My employeer very much needs this work done sooner rather than later, and I've been tasked with helping make it happen.

What, exactly, still needs to be done to complete this work? I've seen some of the discussions about it, and I've seen the documentation on the new llvm constrained floating point intrinsics. But clang I don't think supports them yet, fptosi is not on the list anyway, and I'm not sure what else is needed. So I'm asking, what all is needed and what can I work on to move this forward?

Is there any work in progress code that anyone would be willing to share?
For example, any code using the new intrinsics? Andy?


The specific case we're running into today is that we have code being reordered in ways that trigger traps when handling a NaN. This code:

#include <math.h>

int foo(double d) {
   int x = (!isnan(d) ? (int)d : 45);
   return x;
}

... becomes this:

define signext i32 @foo(double) local_unnamed_addr #0 !dbg !10 {
  tail call void @llvm.dbg.value(metadata double %0, i64 0, metadata !15, metadata !17), !dbg !18
  %2 = tail call signext i32 @__isnan(double %0) #3, !dbg !19
  %3 = icmp eq i32 %2, 0, !dbg !19
  %4 = fptosi double %0 to i32, !dbg !20
  %5 = select i1 %3, i32 %4, i32 45, !dbg !19
  tail call void @llvm.dbg.value(metadata i32 %5, i64 0, metadata !16, metadata !17), !dbg !21
  ret i32 %5, !dbg !22
}

So the fptosi gets moved _above_ the select and the trap happens. This in code that was written to avoid a trap in exactly this case.

We're compiling with clang 5.0.0 "-g -O1" targeting SystemZ.
--
Kevin P. Neal                                http://www.pobox.com/~kpn/
      'Concerns about "rights" and "ownership" of domains are inappropriate.
 It is appropriate to be concerned about "responsibilities" and "service"
 to the community.' -- RFC 1591, page 4: March 1994
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Why is #pragma STDC FENV_ACCESS not supported?

Matthieu Brucher via cfe-dev


On 01/08/2018 07:06 PM, Richard Smith via llvm-dev wrote:
On 8 January 2018 at 11:15, Kaylor, Andrew via llvm-dev <[hidden email]> wrote:
Hi Kevin,

Thanks for reaching out about this, and thanks especially for offering to help. I've had some other priorities that have prevented me from making progress on this recently.

As far as I know, there is no support at all in clang for handling the FENV_ACCESS pragma. I have a sample patch somewhere that I created to demonstrate how the front end would create the constrained intrinsics instead of normal FP operations, but correctly implementing support for the pragma requires more front end and language expertise than I possess. I believe Clark Nelson, who does have such expertise, has this on his long term TODO list but I don't know anything about the actual timeframe when the work will begin.

If you want to work on this side of things, the place to start would be teaching the lexer to recognize the attribute and produce a suitable annotation token, then teaching the parser to parse the token in the places where the pragma can appear and to track the current FENV_ACCESS state. Then you'll need to find a suitable AST representation for the pragma (I have some ideas on this, feel free to ask), both for the affected compound statements and for the affected floating-point operations, build those representations when necessary, and teach the various AST consumers (LLVM IR generation and constant expression evaluation immediately spring to mind) how to handle them.

FWIW, I think it would be nice for the IRBuider to have a kind of "strict FP" state, kind of like how we have a "fast math" state for adding fast-math flags, that will cause CreateFAdd and friends to produce the associated intrinsics, instead of the IR instructions, when strictness is enabled.

 -Hal


On the LLVM side of things there are a few significant holes. As you've noticed, the FP to integer conversions operations still need intrinsics, as do fcmp, fptrunc, and fpext. There are probably others that I'm overlooking. The FP to SI conversion has an additional wrinkle that needs to be worked out in that the default lowering of this conversion to machine instructions is not exception safe.

In general, the current "strict FP" handling stops at instruction selection. At the MachineIR level we don't currently have a mechanism to prevent inappropriate optimizations based on floating point constraints, or indeed to convey such constraints to the backend. Implicit register use modeling may provide some restriction on some architectures, but this is definitely lacking for X86 targets. On the other hand, I'm not aware of any specific current problems, so in many cases we may "get lucky" and have the correct thing happen by chance. Obviously that's not a viable long term solution. I have a rough plan for adding improved register modeling to the X86 backend, which should take care of instruction scheduling issues, but we'd still need a mechanism to prevent constant folding optimizations and such.

As for what you could begin work on, it should be a fairly straight-forward task to implement the intrinsics for fptosi, fptoui, fcmp, fptrunc, and fpext. That would be a gentle introduction. Beyond that, it would be very helpful to have some pathfinding work done to solidify exactly what the remaining shortcomings are. I have a patch somewhere (stale by now, but I could refresh it pretty easily) that unconditionally converts all FP operations to the equivalent constrained intrinsics. You could use that to do testing and find out what's broken.

Thanks,
Andy


-----Original Message-----
From: Kevin P. Neal [mailto:[hidden email]]
Sent: Monday, January 08, 2018 6:41 AM
To: Hal Finkel via cfe-dev <[hidden email]>
Cc: Richard Smith <[hidden email]>; Kaylor, Andrew <[hidden email]>; Marcus Johnson <[hidden email]>; [hidden email]; Bob Huemmer <[hidden email]>
Subject: Re: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

On Thu, Aug 31, 2017 at 05:03:17PM -0500, Hal Finkel via cfe-dev wrote:
>    To be clear, we've had several extensive discussions about this, on and
>    off list, and Andy has started adding the corresponding intrinsics into
>    the IR. There was a presumption about a lack of mixing, however, and we
>    do need to work out how to prevent mixing the native IR operations with
>    the intrinsics (although, perhaps we just did that).
>     -Hal

What's the current status of this work? My employeer very much needs this work done sooner rather than later, and I've been tasked with helping make it happen.

What, exactly, still needs to be done to complete this work? I've seen some of the discussions about it, and I've seen the documentation on the new llvm constrained floating point intrinsics. But clang I don't think supports them yet, fptosi is not on the list anyway, and I'm not sure what else is needed. So I'm asking, what all is needed and what can I work on to move this forward?

Is there any work in progress code that anyone would be willing to share?
For example, any code using the new intrinsics? Andy?


The specific case we're running into today is that we have code being reordered in ways that trigger traps when handling a NaN. This code:

#include <math.h>

int foo(double d) {
   int x = (!isnan(d) ? (int)d : 45);
   return x;
}

... becomes this:

define signext i32 @foo(double) local_unnamed_addr #0 !dbg !10 {
  tail call void @llvm.dbg.value(metadata double %0, i64 0, metadata !15, metadata !17), !dbg !18
  %2 = tail call signext i32 @__isnan(double %0) #3, !dbg !19
  %3 = icmp eq i32 %2, 0, !dbg !19
  %4 = fptosi double %0 to i32, !dbg !20
  %5 = select i1 %3, i32 %4, i32 45, !dbg !19
  tail call void @llvm.dbg.value(metadata i32 %5, i64 0, metadata !16, metadata !17), !dbg !21
  ret i32 %5, !dbg !22
}

So the fptosi gets moved _above_ the select and the trap happens. This in code that was written to avoid a trap in exactly this case.

We're compiling with clang 5.0.0 "-g -O1" targeting SystemZ.
--
Kevin P. Neal                                http://www.pobox.com/~kpn/
      'Concerns about "rights" and "ownership" of domains are inappropriate.
 It is appropriate to be concerned about "responsibilities" and "service"
 to the community.' -- RFC 1591, page 4: March 1994
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Why is #pragma STDC FENV_ACCESS not supported?

Matthieu Brucher via cfe-dev
>    On 01/08/2018 07:06 PM, Richard Smith via llvm-dev wrote:
>
>    On 8 January 2018 at 11:15, Kaylor, Andrew via llvm-dev
>    <[1][hidden email]> wrote:
>
>      Hi Kevin,
>      Thanks for reaching out about this, and thanks especially for
>      offering to help. I've had some other priorities that have prevented
>      me from making progress on this recently.
>      As far as I know, there is no support at all in clang for handling
>      the FENV_ACCESS pragma. I have a sample patch somewhere that I
>      created to demonstrate how the front end would create the
>      constrained intrinsics instead of normal FP operations, but
>      correctly implementing support for the pragma requires more front
>      end and language expertise than I possess. I believe Clark Nelson,
>      who does have such expertise, has this on his long term TODO list
>      but I don't know anything about the actual timeframe when the work
>      will begin.
>
>    If you want to work on this side of things, the place to start would be
>    teaching the lexer to recognize the attribute and produce a suitable
>    annotation token, then teaching the parser to parse the token in the
>    places where the pragma can appear and to track the current FENV_ACCESS
>    state. Then you'll need to find a suitable AST representation for the
>    pragma (I have some ideas on this, feel free to ask), both for the
>    affected compound statements and for the affected floating-point
>    operations, build those representations when necessary, and teach the
>    various AST consumers (LLVM IR generation and constant expression
>    evaluation immediately spring to mind) how to handle them.

On Mon, Jan 08, 2018 at 09:49:47PM -0600, Hal Finkel via cfe-dev wrote:
>    FWIW, I think it would be nice for the IRBuider to have a kind of
>    "strict FP" state, kind of like how we have a "fast math" state for
>    adding fast-math flags, that will cause CreateFAdd and friends to
>    produce the associated intrinsics, instead of the IR instructions, when
>    strictness is enabled.
>     -Hal

I've been doing compiler backend work for 17 years now, but I'm new to
llvm. I also haven't done much front end work. So if a question seems
obvious then that's why...

Are Richard's and Hal's suggestions different parts of the same suggestion?
Is the "fast math" state part of the AST and therefore available to
AST consumers that way? I wouldn't guess that since the -ffast-math option
would be compilation wide.

Would having the pragma be part of the AST solve problems where the pragma
is in the middle of a function and shouldn't apply to source that comes
before the pragma?

--
Kevin P. Neal                                http://www.pobox.com/~kpn/
           On the community of supercomputer fans:
"But what we lack in size we make up for in eccentricity."
  from Steve Gombosi, comp.sys.super, 31 Jul 2000 11:22:43 -0600
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Why is #pragma STDC FENV_ACCESS not supported?

Matthieu Brucher via cfe-dev
In reply to this post by Matthieu Brucher via cfe-dev

Andrew Kaylor wrote:

>In general, the current "strict FP" handling stops at instruction
>selection. At the MachineIR level we don't currently have a mechanism
>to prevent inappropriate optimizations based on floating point
>constraints, or indeed to convey such constraints to the backend.
>Implicit register use modeling may provide some restriction on some
>architectures, but this is definitely lacking for X86 targets. On the
>other hand, I'm not aware of any specific current problems, so in many
>cases we may "get lucky" and have the correct thing happen by chance.
>Obviously that's not a viable long term solution. I have a rough plan
>for adding improved register modeling to the X86 backend, which should
>take care of instruction scheduling issues, but we'd still need a
>mechanism to prevent constant folding optimizations and such.

Given that Kevin intends to target SystemZ, I'll be happy to work on the SystemZ back-end support for this feature. I agree that we should be using implicit control register dependencies, which will at least prevent moving floating-point operations across instructions that e.g. change rounding modes. However, the main property we need to model is that floating-point operations may *trap*. I guess this can be done using UnmodeledSideEffects, but I'm not quite clear on how to make this dependent on whether or not a "strict" operation is requested (without duplicating all the instruction patterns ...).


Once we do use something like UnmodeledSideEffects, I think MachineIR passes should handle everything correctly; in the end, the requirements are not really different from those of other trapping instructions. B.t.w. I don't think anybody does constant folding on floating-point constants at the MachineIR level anyway ... have you seen this anywhere?


Mit freundlichen Gruessen / Best Regards

Ulrich Weigand

--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU/Linux compilers and toolchain
IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Why is #pragma STDC FENV_ACCESS not supported?

Matthieu Brucher via cfe-dev
In reply to this post by Matthieu Brucher via cfe-dev
I know next to nothing about the AST, so I'll leave commentary on that to someone who does except to say that I don't believe there is a strong connection between Richard's suggestion and Hal's. An IRBuilder can be created anywhere (and is frequently used in the optimizer). When the front end support is implemented it may use an IRBuilder that leverages the state Hal is suggesting, but representing the pragma in the AST is, I think, more about the mechanism that will indicate how to set that state.

I will say that I like Hal's suggestion very much.

-Andy

-----Original Message-----
From: Kevin P. Neal [mailto:[hidden email]]
Sent: Tuesday, January 09, 2018 8:05 AM
To: via cfe-dev <[hidden email]>
Cc: Richard Smith <[hidden email]>; Kaylor, Andrew <[hidden email]>; Nelson, Clark <[hidden email]>; Marcus Johnson <[hidden email]>; [hidden email]; llvm-dev <[hidden email]>; Hal Finkel <[hidden email]>; Bob Huemmer <[hidden email]>
Subject: Re: [cfe-dev] [llvm-dev] Why is #pragma STDC FENV_ACCESS not supported?

>    On 01/08/2018 07:06 PM, Richard Smith via llvm-dev wrote:
>
>    On 8 January 2018 at 11:15, Kaylor, Andrew via llvm-dev
>    <[1][hidden email]> wrote:
>
>      Hi Kevin,
>      Thanks for reaching out about this, and thanks especially for
>      offering to help. I've had some other priorities that have prevented
>      me from making progress on this recently.
>      As far as I know, there is no support at all in clang for handling
>      the FENV_ACCESS pragma. I have a sample patch somewhere that I
>      created to demonstrate how the front end would create the
>      constrained intrinsics instead of normal FP operations, but
>      correctly implementing support for the pragma requires more front
>      end and language expertise than I possess. I believe Clark Nelson,
>      who does have such expertise, has this on his long term TODO list
>      but I don't know anything about the actual timeframe when the work
>      will begin.
>
>    If you want to work on this side of things, the place to start would be
>    teaching the lexer to recognize the attribute and produce a suitable
>    annotation token, then teaching the parser to parse the token in the
>    places where the pragma can appear and to track the current FENV_ACCESS
>    state. Then you'll need to find a suitable AST representation for the
>    pragma (I have some ideas on this, feel free to ask), both for the
>    affected compound statements and for the affected floating-point
>    operations, build those representations when necessary, and teach the
>    various AST consumers (LLVM IR generation and constant expression
>    evaluation immediately spring to mind) how to handle them.

On Mon, Jan 08, 2018 at 09:49:47PM -0600, Hal Finkel via cfe-dev wrote:
>    FWIW, I think it would be nice for the IRBuider to have a kind of
>    "strict FP" state, kind of like how we have a "fast math" state for
>    adding fast-math flags, that will cause CreateFAdd and friends to
>    produce the associated intrinsics, instead of the IR instructions, when
>    strictness is enabled.
>     -Hal

I've been doing compiler backend work for 17 years now, but I'm new to llvm. I also haven't done much front end work. So if a question seems obvious then that's why...

Are Richard's and Hal's suggestions different parts of the same suggestion?
Is the "fast math" state part of the AST and therefore available to AST consumers that way? I wouldn't guess that since the -ffast-math option would be compilation wide.

Would having the pragma be part of the AST solve problems where the pragma is in the middle of a function and shouldn't apply to source that comes before the pragma?

--
Kevin P. Neal                                http://www.pobox.com/~kpn/
           On the community of supercomputer fans:
"But what we lack in size we make up for in eccentricity."
  from Steve Gombosi, comp.sys.super, 31 Jul 2000 11:22:43 -0600
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Why is #pragma STDC FENV_ACCESS not supported?

Matthieu Brucher via cfe-dev
In reply to this post by Matthieu Brucher via cfe-dev

I think we’re going to need to create a new mechanism to communicate strict FP modes to the backend. I think we need to avoid doing anything that will require re-inventing or duplicating all of the pattern matching that goes on in instruction selection (which is the reason we’re currently dropping that information). I’m out of my depth on this transition, but I think maybe we could handle it with some kind of attribute on the MBB.

 

In C/C++, at least, it’s my understanding that the pragmas always apply at the scope-level (as opposed to having the possibility of being instruction-specific), and we’ve previously agreed that our implementation will really need to apply the rules across entire functions in the sense that if any part of a function uses the constrained intrinsics all FP operations in the function will need to use them (though different metadata arguments may be used in different scopes). So I think that opens our options a bit.

 

Regarding constant folding, I think you are correct that it isn’t happening anywhere in the backends at the moment. There is some constant folding done during instruction selection, but the existing mechanism prevents that. My concern is that given LLVM’s development model, if there is nothing in place to prevent constant folding and no consensus that it shouldn’t be allowed then we should probably believe that someone will eventually do it.

 

-Andy

 

From: Ulrich Weigand [mailto:[hidden email]]
Sent: Tuesday, January 09, 2018 9:59 AM
To: Kaylor, Andrew <[hidden email]>; [hidden email]
Cc: Hal Finkel <[hidden email]>; Richard Smith <[hidden email]>; [hidden email]; [hidden email]; [hidden email]; [hidden email]; llvm-dev <[hidden email]>
Subject: Re: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

 

Andrew Kaylor wrote:

>In general, the current "strict FP" handling stops at instruction
>selection. At the MachineIR level we don't currently have a mechanism
>to prevent inappropriate optimizations based on floating point
>constraints, or indeed to convey such constraints to the backend.
>Implicit register use modeling may provide some restriction on some
>architectures, but this is definitely lacking for X86 targets. On the
>other hand, I'm not aware of any specific current problems, so in many
>cases we may "get lucky" and have the correct thing happen by chance.
>Obviously that's not a viable long term solution. I have a rough plan
>for adding improved register modeling to the X86 backend, which should
>take care of instruction scheduling issues, but we'd still need a
>mechanism to prevent constant folding optimizations and such.

Given that Kevin intends to target SystemZ, I'll be happy to work on the SystemZ back-end support for this feature. I agree that we should be using implicit control register dependencies, which will at least prevent moving floating-point operations across instructions that e.g. change rounding modes. However, the main property we need to model is that floating-point operations may *trap*. I guess this can be done using UnmodeledSideEffects, but I'm not quite clear on how to make this dependent on whether or not a "strict" operation is requested (without duplicating all the instruction patterns ...).


Once we do use something like UnmodeledSideEffects, I think MachineIR passes should handle everything correctly; in the end, the requirements are not really different from those of other trapping instructions. B.t.w. I don't think anybody does constant folding on floating-point constants at the MachineIR level anyway ... have you seen this anywhere?


Mit freundlichen Gruessen / Best Regards

Ulrich Weigand

--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU/Linux compilers and toolchain
IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Why is #pragma STDC FENV_ACCESS not supported?

Matthieu Brucher via cfe-dev

On Jan 9, 2018, at 1:53 PM, Kaylor, Andrew via cfe-dev <[hidden email]> wrote:

I think we’re going to need to create a new mechanism to communicate strict FP modes to the backend. I think we need to avoid doing anything that will require re-inventing or duplicating all of the pattern matching that goes on in instruction selection (which is the reason we’re currently dropping that information). I’m out of my depth on this transition, but I think maybe we could handle it with some kind of attribute on the MBB.
 
In C/C++, at least, it’s my understanding that the pragmas always apply at the scope-level (as opposed to having the possibility of being instruction-specific), and we’ve previously agreed that our implementation will really need to apply the rules across entire functions in the sense that if any part of a function uses the constrained intrinsics all FP operations in the function will need to use them (though different metadata arguments may be used in different scopes). So I think that opens our options a bit.
 
Regarding constant folding, I think you are correct that it isn’t happening anywhere in the backends at the moment. There is some constant folding done during instruction selection, but the existing mechanism prevents that. My concern is that given LLVM’s development model, if there is nothing in place to prevent constant folding and no consensus that it shouldn’t be allowed then we should probably believe that someone will eventually do it.

The standard argument against trying to introduce "scope-like" mechanisms to LLVM IR is inlining; unless you're going to prevent functions that use stricter/laxer FP rules from being inlined into each other (which sounds disastrous), you're going to need to communicate strictness on an instruction-by-instruction basis.  If the backend wants to handle that by using the strictest rule that it sees in use anywhere in the function because pattern-matching is otherwise too error-prone, ok, that's its right; but the IR really should be per-instruction.

John.

 
-Andy
From: Ulrich Weigand [[hidden email]] 
Sent: Tuesday, January 09, 2018 9:59 AM
To: Kaylor, Andrew <[hidden email]>; [hidden email]
Cc: Hal Finkel <[hidden email]>; Richard Smith <[hidden email]>; [hidden email]; [hidden email]; [hidden email]; [hidden email]; llvm-dev <[hidden email]>
Subject: Re: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?
 

Andrew Kaylor wrote:

>In general, the current "strict FP" handling stops at instruction
>selection. At the MachineIR level we don't currently have a mechanism
>to prevent inappropriate optimizations based on floating point
>constraints, or indeed to convey such constraints to the backend.
>Implicit register use modeling may provide some restriction on some
>architectures, but this is definitely lacking for X86 targets. On the
>other hand, I'm not aware of any specific current problems, so in many
>cases we may "get lucky" and have the correct thing happen by chance.
>Obviously that's not a viable long term solution. I have a rough plan
>for adding improved register modeling to the X86 backend, which should
>take care of instruction scheduling issues, but we'd still need a
>mechanism to prevent constant folding optimizations and such.

Given that Kevin intends to target SystemZ, I'll be happy to work on the SystemZ back-end support for this feature. I agree that we should be using implicit control register dependencies, which will at least prevent moving floating-point operations across instructions that e.g. change rounding modes. However, the main property we need to model is that floating-point operations may *trap*. I guess this can be done using UnmodeledSideEffects, but I'm not quite clear on how to make this dependent on whether or not a "strict" operation is requested (without duplicating all the instruction patterns ...).


Once we do use something like UnmodeledSideEffects, I think MachineIR passes should handle everything correctly; in the end, the requirements are not really different from those of other trapping instructions. B.t.w. I don't think anybody does constant folding on floating-point constants at the MachineIR level anyway ... have you seen this anywhere?


Mit freundlichen Gruessen / Best Regards

Ulrich Weigand

-- 
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU/Linux compilers and toolchain
IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Why is #pragma STDC FENV_ACCESS not supported?

Matthieu Brucher via cfe-dev
In reply to this post by Matthieu Brucher via cfe-dev
On Tue, Jan 09, 2018 at 06:53:51PM +0000, Kaylor, Andrew via cfe-dev wrote:

>    I think we're going to need to create a new mechanism to communicate
>    strict FP modes to the backend. I think we need to avoid doing anything
>    that will require re-inventing or duplicating all of the pattern
>    matching that goes on in instruction selection (which is the reason
>    we're currently dropping that information). I'm out of my depth on this
>    transition, but I think maybe we could handle it with some kind of
>    attribute on the MBB.
>
>
>    In C/C++, at least, it's my understanding that the pragmas always apply
>    at the scope-level (as opposed to having the possibility of being
>    instruction-specific), and we've previously agreed that our
>    implementation will really need to apply the rules across entire
>    functions in the sense that if any part of a function uses the
>    constrained intrinsics all FP operations in the function will need to
>    use them (though different metadata arguments may be used in different
>    scopes). So I think that opens our options a bit.
If the pragma applies to the entire function then would it be as simple
as a pass to convert intrinsic calls into whatstheterm SNodes (?) after
the optimization passes have run? Meaning, we can bypass any changes to
the AST?

That would still leave the backend changes to be done, of course.

BTW, I thought that optimization passes were allowed to drop metadata. So
what happens to the metadata on the constrained intrinsic calls? Or am
I mixing up two different metadatas?

>    Regarding constant folding, I think you are correct that it isn't
>    happening anywhere in the backends at the moment. There is some
>    constant folding done during instruction selection, but the existing
>    mechanism prevents that. My concern is that given LLVM's development
>    model, if there is nothing in place to prevent constant folding and no
>    consensus that it shouldn't be allowed then we should probably believe
>    that someone will eventually do it.

How would you prevent it?

>    -Andy
>
>
>    From: Ulrich Weigand [mailto:[hidden email]]
>    Sent: Tuesday, January 09, 2018 9:59 AM
>    To: Kaylor, Andrew <[hidden email]>; [hidden email]
>    Cc: Hal Finkel <[hidden email]>; Richard Smith
>    <[hidden email]>; [hidden email];
>    [hidden email]; [hidden email]; [hidden email];
>    llvm-dev <[hidden email]>
>    Subject: Re: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?
>
>
>    Andrew Kaylor wrote:
>    >In general, the current "strict FP" handling stops at instruction
>    >selection. At the MachineIR level we don't currently have a mechanism
>    >to prevent inappropriate optimizations based on floating point
>    >constraints, or indeed to convey such constraints to the backend.
>    >Implicit register use modeling may provide some restriction on some
>    >architectures, but this is definitely lacking for X86 targets. On the
>    >other hand, I'm not aware of any specific current problems, so in many
>    >cases we may "get lucky" and have the correct thing happen by chance.
>    >Obviously that's not a viable long term solution. I have a rough plan
>    >for adding improved register modeling to the X86 backend, which should
>    >take care of instruction scheduling issues, but we'd still need a
>    >mechanism to prevent constant folding optimizations and such.
>    Given that Kevin intends to target SystemZ, I'll be happy to work on
>    the SystemZ back-end support for this feature. I agree that we should
>    be using implicit control register dependencies, which will at least
>    prevent moving floating-point operations across instructions that e.g.
>    change rounding modes. However, the main property we need to model is
>    that floating-point operations may *trap*. I guess this can be done
>    using UnmodeledSideEffects, but I'm not quite clear on how to make this
>    dependent on whether or not a "strict" operation is requested (without
>    duplicating all the instruction patterns ...).
>    Once we do use something like UnmodeledSideEffects, I think MachineIR
>    passes should handle everything correctly; in the end, the requirements
>    are not really different from those of other trapping instructions.
>    B.t.w. I don't think anybody does constant folding on floating-point
>    constants at the MachineIR level anyway ... have you seen this
>    anywhere?
>    Mit freundlichen Gruessen / Best Regards
>    Ulrich Weigand
>    --
>    Dr. Ulrich Weigand | Phone: +49-7031/16-3727
>    STSM, GNU/Linux compilers and toolchain
>    IBM Deutschland Research & Development GmbH
>    Vorsitzende des Aufsichtsrats: Martina Koederitz | Gesch�ftsf�hrung:
>    Dirk Wittkopp
>    Sitz der Gesellschaft: B�blingen | Registergericht: Amtsgericht
>    Stuttgart, HRB 243294

> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

--
Kevin P. Neal                                http://www.pobox.com/~kpn/

"What is mathematics? The age-old answer is, of course, that mathematics
 is what mathematicians do." - Donald Knuth

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Why is #pragma STDC FENV_ACCESS not supported?

Matthieu Brucher via cfe-dev
In reply to this post by Matthieu Brucher via cfe-dev

>The standard argument against trying to introduce "scope-like" mechanisms to LLVM IR is inlining;

>unless you're going to prevent functions that use stricter/laxer FP rules from being inlined >into

>each other (which sounds disastrous), you're going to need to communicate strictness on an

>instruction-by-instruction basis.  If the backend wants to handle that by using the strictest

>rule that it sees in use anywhere in the function because pattern-matching is otherwise too

>error-prone, ok, that's its right; but the IR really should be per-instruction.

 

I added a function level attribute, strictfp, which is meant to help with this. I don’t believe the inlining handling of the attribute is implemented yet, but what I’m thinking is that we would never inline a function that had the strictfp attribute and if we inlined a non-strictfp function into a strictfp function, we would transform any FP operations into their constrained equivalents at that time. In the short term, we’d probably just block both types of inlining.

 

It may sound disastrous, but I think there’s an understanding that using strict FP semantics is going to significantly inhibit optimizations. In the short term, that’s actually the purpose of the constrained intrinsics -- to disable all optimizations until we can teach the optimizer to do things correctly. The plan is that once this is all implemented to produce correct results, we’ll go back and try to re-enable as many optimizations as possible, which may eventually include doing something more intelligent with inlining.

 

With regard to your “instruction level” comments, my intention is that the use of the intrinsics will impose the necessary restrictions at the instruction level. Optimizations (other than inlining) should never need to check the function level attribute. But if we mixed “raw” FP operations and constrained intrinsics within a single function there would be no way to prevent motion of the “raw” operations across the intrinsics.

 

The reason I brought up the scope level nature of the pragma was just to suggest that it might be a property that we could take advantage of to handle the transition from IR to MIR. I haven’t come up with a way to bake the strict FP information into the instructions across the ISel boundary, but I think it might be possible to temporarily add it to a block and then have an early machine code pass that used this information in some way once the MIR was all in place. I’m open to the possibility that that was a bad idea.

 

-Andy

 

 

From: [hidden email] [mailto:[hidden email]]
Sent: Tuesday, January 09, 2018 11:12 AM
To: Kaylor, Andrew <[hidden email]>
Cc: Ulrich Weigand <[hidden email]>; [hidden email]; [hidden email]; [hidden email]; llvm-dev <[hidden email]>; Richard Smith <[hidden email]>; [hidden email]
Subject: Re: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

 

 

On Jan 9, 2018, at 1:53 PM, Kaylor, Andrew via cfe-dev <[hidden email]> wrote:

 

I think we’re going to need to create a new mechanism to communicate strict FP modes to the backend. I think we need to avoid doing anything that will require re-inventing or duplicating all of the pattern matching that goes on in instruction selection (which is the reason we’re currently dropping that information). I’m out of my depth on this transition, but I think maybe we could handle it with some kind of attribute on the MBB.

 

In C/C++, at least, it’s my understanding that the pragmas always apply at the scope-level (as opposed to having the possibility of being instruction-specific), and we’ve previously agreed that our implementation will really need to apply the rules across entire functions in the sense that if any part of a function uses the constrained intrinsics all FP operations in the function will need to use them (though different metadata arguments may be used in different scopes). So I think that opens our options a bit.

 

Regarding constant folding, I think you are correct that it isn’t happening anywhere in the backends at the moment. There is some constant folding done during instruction selection, but the existing mechanism prevents that. My concern is that given LLVM’s development model, if there is nothing in place to prevent constant folding and no consensus that it shouldn’t be allowed then we should probably believe that someone will eventually do it.

 

The standard argument against trying to introduce "scope-like" mechanisms to LLVM IR is inlining; unless you're going to prevent functions that use stricter/laxer FP rules from being inlined into each other (which sounds disastrous), you're going to need to communicate strictness on an instruction-by-instruction basis.  If the backend wants to handle that by using the strictest rule that it sees in use anywhere in the function because pattern-matching is otherwise too error-prone, ok, that's its right; but the IR really should be per-instruction.

 

John.



 

-Andy

 

From: Ulrich Weigand [[hidden email]] 
Sent: Tuesday, January 09, 2018 9:59 AM
To: Kaylor, Andrew <[hidden email]>; [hidden email]
Cc: Hal Finkel <[hidden email]>; Richard Smith <[hidden email]>; [hidden email]; [hidden email]; [hidden email]; [hidden email]; llvm-dev <[hidden email]>
Subject: Re: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

 

Andrew Kaylor wrote:

>In general, the current "strict FP" handling stops at instruction
>selection. At the MachineIR level we don't currently have a mechanism
>to prevent inappropriate optimizations based on floating point
>constraints, or indeed to convey such constraints to the backend.
>Implicit register use modeling may provide some restriction on some
>architectures, but this is definitely lacking for X86 targets. On the
>other hand, I'm not aware of any specific current problems, so in many
>cases we may "get lucky" and have the correct thing happen by chance.
>Obviously that's not a viable long term solution. I have a rough plan
>for adding improved register modeling to the X86 backend, which should
>take care of instruction scheduling issues, but we'd still need a
>mechanism to prevent constant folding optimizations and such.

Given that Kevin intends to target SystemZ, I'll be happy to work on the SystemZ back-end support for this feature. I agree that we should be using implicit control register dependencies, which will at least prevent moving floating-point operations across instructions that e.g. change rounding modes. However, the main property we need to model is that floating-point operations may *trap*. I guess this can be done using UnmodeledSideEffects, but I'm not quite clear on how to make this dependent on whether or not a "strict" operation is requested (without duplicating all the instruction patterns ...).


Once we do use something like UnmodeledSideEffects, I think MachineIR passes should handle everything correctly; in the end, the requirements are not really different from those of other trapping instructions. B.t.w. I don't think anybody does constant folding on floating-point constants at the MachineIR level anyway ... have you seen this anywhere?


Mit freundlichen Gruessen / Best Regards

Ulrich Weigand

-- 
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU/Linux compilers and toolchain
IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

 


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Why is #pragma STDC FENV_ACCESS not supported?

Matthieu Brucher via cfe-dev
In reply to this post by Matthieu Brucher via cfe-dev
> If the pragma applies to the entire function then would it be as simple as a pass to
> convert intrinsic calls into whatstheterm SNodes (?) after the optimization passes
> have run? Meaning, we can bypass any changes to the AST?

The pragma doesn't necessarily apply to the entire function. As I said, the language standard says that the pragma applies at a scope level. The thing that applies to the entire function (and this is an implementation choice) is whether or not we're going to required constrained intrinsics to be used.

The intrinsics do get converted to selection DAG nodes after the optimization passes have been run. This is a separate issue from the AST. I think that's about controlling where the FP constraints need to be applied. As I said, that's beyond my skill set so I should stop talking now.

> BTW, I thought that optimization passes were allowed to drop metadata.
> So what happens to the metadata on the constrained intrinsic calls? Or am
> I mixing up two different metadatas?

The metadata in this case is a little special because it's being used as an argument to a function call. That won't ever be dropped.

> How would you prevent [constant folding in machine IR]?

To be honest, I don't know. I don't think there is an existing mechanism to prevent it. Given that no one has thought it was important to be able to do this kind of optimization in Machine IR, maybe it would be best just to state it as a rule that it isn't allowed. As far as I can tell, this is all that prevents fast-math kinds of operations, so maybe it is sufficient here too.

The trapping considerations are really more significant in that there are other optimizations we want to be able to perform that can violate our exception safety expectations. We need to be able to allow those kinds of optimizations by default but prevent them when strict FP semantics are needed. Constant folding, on the other hand, doesn't appear to be critical to performance at the MIR level.

-Andy

-----Original Message-----
From: Kevin P. Neal [mailto:[hidden email]]
Sent: Tuesday, January 09, 2018 11:37 AM
To: [hidden email]
Cc: Kaylor, Andrew <[hidden email]>; Ulrich Weigand <[hidden email]>; [hidden email]; [hidden email]; llvm-dev <[hidden email]>; Richard Smith <[hidden email]>
Subject: Re: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

On Tue, Jan 09, 2018 at 06:53:51PM +0000, Kaylor, Andrew via cfe-dev wrote:

>    I think we're going to need to create a new mechanism to communicate
>    strict FP modes to the backend. I think we need to avoid doing anything
>    that will require re-inventing or duplicating all of the pattern
>    matching that goes on in instruction selection (which is the reason
>    we're currently dropping that information). I'm out of my depth on this
>    transition, but I think maybe we could handle it with some kind of
>    attribute on the MBB.
>
>
>    In C/C++, at least, it's my understanding that the pragmas always apply
>    at the scope-level (as opposed to having the possibility of being
>    instruction-specific), and we've previously agreed that our
>    implementation will really need to apply the rules across entire
>    functions in the sense that if any part of a function uses the
>    constrained intrinsics all FP operations in the function will need to
>    use them (though different metadata arguments may be used in different
>    scopes). So I think that opens our options a bit.

If the pragma applies to the entire function then would it be as simple as a pass to convert intrinsic calls into whatstheterm SNodes (?) after the optimization passes have run? Meaning, we can bypass any changes to the AST?

That would still leave the backend changes to be done, of course.

BTW, I thought that optimization passes were allowed to drop metadata. So what happens to the metadata on the constrained intrinsic calls? Or am I mixing up two different metadatas?

>    Regarding constant folding, I think you are correct that it isn't
>    happening anywhere in the backends at the moment. There is some
>    constant folding done during instruction selection, but the existing
>    mechanism prevents that. My concern is that given LLVM's development
>    model, if there is nothing in place to prevent constant folding and no
>    consensus that it shouldn't be allowed then we should probably believe
>    that someone will eventually do it.

How would you prevent it?

>    -Andy
>
>
>    From: Ulrich Weigand [mailto:[hidden email]]
>    Sent: Tuesday, January 09, 2018 9:59 AM
>    To: Kaylor, Andrew <[hidden email]>; [hidden email]
>    Cc: Hal Finkel <[hidden email]>; Richard Smith
>    <[hidden email]>; [hidden email];
>    [hidden email]; [hidden email]; [hidden email];
>    llvm-dev <[hidden email]>
>    Subject: Re: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?
>
>
>    Andrew Kaylor wrote:
>    >In general, the current "strict FP" handling stops at instruction
>    >selection. At the MachineIR level we don't currently have a mechanism
>    >to prevent inappropriate optimizations based on floating point
>    >constraints, or indeed to convey such constraints to the backend.
>    >Implicit register use modeling may provide some restriction on some
>    >architectures, but this is definitely lacking for X86 targets. On the
>    >other hand, I'm not aware of any specific current problems, so in many
>    >cases we may "get lucky" and have the correct thing happen by chance.
>    >Obviously that's not a viable long term solution. I have a rough plan
>    >for adding improved register modeling to the X86 backend, which should
>    >take care of instruction scheduling issues, but we'd still need a
>    >mechanism to prevent constant folding optimizations and such.
>    Given that Kevin intends to target SystemZ, I'll be happy to work on
>    the SystemZ back-end support for this feature. I agree that we should
>    be using implicit control register dependencies, which will at least
>    prevent moving floating-point operations across instructions that e.g.
>    change rounding modes. However, the main property we need to model is
>    that floating-point operations may *trap*. I guess this can be done
>    using UnmodeledSideEffects, but I'm not quite clear on how to make this
>    dependent on whether or not a "strict" operation is requested (without
>    duplicating all the instruction patterns ...).
>    Once we do use something like UnmodeledSideEffects, I think MachineIR
>    passes should handle everything correctly; in the end, the requirements
>    are not really different from those of other trapping instructions.
>    B.t.w. I don't think anybody does constant folding on floating-point
>    constants at the MachineIR level anyway ... have you seen this
>    anywhere?
>    Mit freundlichen Gruessen / Best Regards
>    Ulrich Weigand
>    --
>    Dr. Ulrich Weigand | Phone: +49-7031/16-3727
>    STSM, GNU/Linux compilers and toolchain
>    IBM Deutschland Research & Development GmbH
>    Vorsitzende des Aufsichtsrats: Martina Koederitz | Gesch ftsf hrung:
>    Dirk Wittkopp
>    Sitz der Gesellschaft: B blingen | Registergericht: Amtsgericht
>    Stuttgart, HRB 243294

> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

--
Kevin P. Neal                                http://www.pobox.com/~kpn/

"What is mathematics? The age-old answer is, of course, that mathematics  is what mathematicians do." - Donald Knuth
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Why is #pragma STDC FENV_ACCESS not supported?

Matthieu Brucher via cfe-dev
In reply to this post by Matthieu Brucher via cfe-dev

> On Jan 9, 2018, at 11:04 AM, Kevin P. Neal via cfe-dev <[hidden email]> wrote:
>
>>   On 01/08/2018 07:06 PM, Richard Smith via llvm-dev wrote:
>>
>>   On 8 January 2018 at 11:15, Kaylor, Andrew via llvm-dev
>>   <[1][hidden email]> wrote:
>>
>>     Hi Kevin,
>>     Thanks for reaching out about this, and thanks especially for
>>     offering to help. I've had some other priorities that have prevented
>>     me from making progress on this recently.
>>     As far as I know, there is no support at all in clang for handling
>>     the FENV_ACCESS pragma. I have a sample patch somewhere that I
>>     created to demonstrate how the front end would create the
>>     constrained intrinsics instead of normal FP operations, but
>>     correctly implementing support for the pragma requires more front
>>     end and language expertise than I possess. I believe Clark Nelson,
>>     who does have such expertise, has this on his long term TODO list
>>     but I don't know anything about the actual timeframe when the work
>>     will begin.
>>
>>   If you want to work on this side of things, the place to start would be
>>   teaching the lexer to recognize the attribute and produce a suitable
>>   annotation token, then teaching the parser to parse the token in the
>>   places where the pragma can appear and to track the current FENV_ACCESS
>>   state. Then you'll need to find a suitable AST representation for the
>>   pragma (I have some ideas on this, feel free to ask), both for the
>>   affected compound statements and for the affected floating-point
>>   operations, build those representations when necessary, and teach the
>>   various AST consumers (LLVM IR generation and constant expression
>>   evaluation immediately spring to mind) how to handle them.
>
> On Mon, Jan 08, 2018 at 09:49:47PM -0600, Hal Finkel via cfe-dev wrote:
>>   FWIW, I think it would be nice for the IRBuider to have a kind of
>>   "strict FP" state, kind of like how we have a "fast math" state for
>>   adding fast-math flags, that will cause CreateFAdd and friends to
>>   produce the associated intrinsics, instead of the IR instructions, when
>>   strictness is enabled.
>>    -Hal
>
> I've been doing compiler backend work for 17 years now, but I'm new to
> llvm. I also haven't done much front end work. So if a question seems
> obvious then that's why...
>
> Are Richard's and Hal's suggestions different parts of the same suggestion?
> Is the "fast math" state part of the AST and therefore available to
> AST consumers that way? I wouldn't guess that since the -ffast-math option
> would be compilation wide.

fp_contract provides a very similar set of language features, with both a global flag
and a #pragma that can be used at either global or local scope to control semantics
thenceforth.  We handle that in the AST by storing FPOptions on individual arithmetic
expressions, and IRGen propagates that to the individual LLVM instructions it creates.
I think that's still the right representation for the AST here, and in fact the way fp_contract
implemented means that your job at the Sema will be very easy.  The only difference is
that IRGen can't be quite as literal because you're not tracking strictness per instruction
in the IR (I think; it's not clear to me how much of this isn't solved by using intrinsics).
Regardless, that's easy enough to handle; you can just remember the strictest semantics
you tried to use in the current function and then set that as the function-level attribute
after the function definition has been fully emitted.  Plus, if you do decide to track it
per-instruction in the future, you'll be in a good position to do so.

John.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Why is #pragma STDC FENV_ACCESS not supported?

Matthieu Brucher via cfe-dev
In reply to this post by Matthieu Brucher via cfe-dev
On 8 Jan 2018 19:50, "Hal Finkel via cfe-dev" <[hidden email]> wrote:
On 01/08/2018 07:06 PM, Richard Smith via llvm-dev wrote:
On 8 January 2018 at 11:15, Kaylor, Andrew via llvm-dev <[hidden email]> wrote:
Hi Kevin,

Thanks for reaching out about this, and thanks especially for offering to help. I've had some other priorities that have prevented me from making progress on this recently.

As far as I know, there is no support at all in clang for handling the FENV_ACCESS pragma. I have a sample patch somewhere that I created to demonstrate how the front end would create the constrained intrinsics instead of normal FP operations, but correctly implementing support for the pragma requires more front end and language expertise than I possess. I believe Clark Nelson, who does have such expertise, has this on his long term TODO list but I don't know anything about the actual timeframe when the work will begin.

If you want to work on this side of things, the place to start would be teaching the lexer to recognize the attribute and produce a suitable annotation token, then teaching the parser to parse the token in the places where the pragma can appear and to track the current FENV_ACCESS state. Then you'll need to find a suitable AST representation for the pragma (I have some ideas on this, feel free to ask), both for the affected compound statements and for the affected floating-point operations, build those representations when necessary, and teach the various AST consumers (LLVM IR generation and constant expression evaluation immediately spring to mind) how to handle them.

FWIW, I think it would be nice for the IRBuider to have a kind of "strict FP" state, kind of like how we have a "fast math" state for adding fast-math flags, that will cause CreateFAdd and friends to produce the associated intrinsics, instead of the IR instructions, when strictness is enabled.

I expect we'll need a "non-default FP environment" marker on each floating-point AST node (both on operations and on calls that could resolve to FP builtins) in order to properly handle constant expression evaluation for those things, so we will presumably have the relevant information to hand at the point of emitting an expression anyway. That said, this functionality may be useful if the IR representation needs to be different for FP operations appearing outside an FENV_ACCESS region but within a function that contains an FENV_ACCESS region (eg, using intrinsics rather than instructions).

 -Hal



On the LLVM side of things there are a few significant holes. As you've noticed, the FP to integer conversions operations still need intrinsics, as do fcmp, fptrunc, and fpext. There are probably others that I'm overlooking. The FP to SI conversion has an additional wrinkle that needs to be worked out in that the default lowering of this conversion to machine instructions is not exception safe.

In general, the current "strict FP" handling stops at instruction selection. At the MachineIR level we don't currently have a mechanism to prevent inappropriate optimizations based on floating point constraints, or indeed to convey such constraints to the backend. Implicit register use modeling may provide some restriction on some architectures, but this is definitely lacking for X86 targets. On the other hand, I'm not aware of any specific current problems, so in many cases we may "get lucky" and have the correct thing happen by chance. Obviously that's not a viable long term solution. I have a rough plan for adding improved register modeling to the X86 backend, which should take care of instruction scheduling issues, but we'd still need a mechanism to prevent constant folding optimizations and such.

As for what you could begin work on, it should be a fairly straight-forward task to implement the intrinsics for fptosi, fptoui, fcmp, fptrunc, and fpext. That would be a gentle introduction. Beyond that, it would be very helpful to have some pathfinding work done to solidify exactly what the remaining shortcomings are. I have a patch somewhere (stale by now, but I could refresh it pretty easily) that unconditionally converts all FP operations to the equivalent constrained intrinsics. You could use that to do testing and find out what's broken.

Thanks,
Andy


-----Original Message-----
From: Kevin P. Neal [mailto:[hidden email]]
Sent: Monday, January 08, 2018 6:41 AM
To: Hal Finkel via cfe-dev <[hidden email]>
Cc: Richard Smith <[hidden email]>; Kaylor, Andrew <[hidden email]>; Marcus Johnson <[hidden email]>; [hidden email]; Bob Huemmer <[hidden email]>
Subject: Re: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

On Thu, Aug 31, 2017 at 05:03:17PM -0500, Hal Finkel via cfe-dev wrote:
>    To be clear, we've had several extensive discussions about this, on and
>    off list, and Andy has started adding the corresponding intrinsics into
>    the IR. There was a presumption about a lack of mixing, however, and we
>    do need to work out how to prevent mixing the native IR operations with
>    the intrinsics (although, perhaps we just did that).
>     -Hal

What's the current status of this work? My employeer very much needs this work done sooner rather than later, and I've been tasked with helping make it happen.

What, exactly, still needs to be done to complete this work? I've seen some of the discussions about it, and I've seen the documentation on the new llvm constrained floating point intrinsics. But clang I don't think supports them yet, fptosi is not on the list anyway, and I'm not sure what else is needed. So I'm asking, what all is needed and what can I work on to move this forward?

Is there any work in progress code that anyone would be willing to share?
For example, any code using the new intrinsics? Andy?


The specific case we're running into today is that we have code being reordered in ways that trigger traps when handling a NaN. This code:

#include <math.h>

int foo(double d) {
   int x = (!isnan(d) ? (int)d : 45);
   return x;
}

... becomes this:

define signext i32 @foo(double) local_unnamed_addr #0 !dbg !10 {
  tail call void @llvm.dbg.value(metadata double %0, i64 0, metadata !15, metadata !17), !dbg !18
  %2 = tail call signext i32 @__isnan(double %0) #3, !dbg !19
  %3 = icmp eq i32 %2, 0, !dbg !19
  %4 = fptosi double %0 to i32, !dbg !20
  %5 = select i1 %3, i32 %4, i32 45, !dbg !19
  tail call void @llvm.dbg.value(metadata i32 %5, i64 0, metadata !16, metadata !17), !dbg !21
  ret i32 %5, !dbg !22
}

So the fptosi gets moved _above_ the select and the trap happens. This in code that was written to avoid a trap in exactly this case.

We're compiling with clang 5.0.0 "-g -O1" targeting SystemZ.
--
Kevin P. Neal                                http://www.pobox.com/~kpn/
      'Concerns about "rights" and "ownership" of domains are inappropriate.
 It is appropriate to be concerned about "responsibilities" and "service"
 to the community.' -- RFC 1591, page 4: March 1994
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Why is #pragma STDC FENV_ACCESS not supported?

Matthieu Brucher via cfe-dev


On 01/09/2018 04:30 PM, Richard Smith wrote:
On 8 Jan 2018 19:50, "Hal Finkel via cfe-dev" <[hidden email]> wrote:
On 01/08/2018 07:06 PM, Richard Smith via llvm-dev wrote:
On 8 January 2018 at 11:15, Kaylor, Andrew via llvm-dev <[hidden email]> wrote:
Hi Kevin,

Thanks for reaching out about this, and thanks especially for offering to help. I've had some other priorities that have prevented me from making progress on this recently.

As far as I know, there is no support at all in clang for handling the FENV_ACCESS pragma. I have a sample patch somewhere that I created to demonstrate how the front end would create the constrained intrinsics instead of normal FP operations, but correctly implementing support for the pragma requires more front end and language expertise than I possess. I believe Clark Nelson, who does have such expertise, has this on his long term TODO list but I don't know anything about the actual timeframe when the work will begin.

If you want to work on this side of things, the place to start would be teaching the lexer to recognize the attribute and produce a suitable annotation token, then teaching the parser to parse the token in the places where the pragma can appear and to track the current FENV_ACCESS state. Then you'll need to find a suitable AST representation for the pragma (I have some ideas on this, feel free to ask), both for the affected compound statements and for the affected floating-point operations, build those representations when necessary, and teach the various AST consumers (LLVM IR generation and constant expression evaluation immediately spring to mind) how to handle them.

FWIW, I think it would be nice for the IRBuider to have a kind of "strict FP" state, kind of like how we have a "fast math" state for adding fast-math flags, that will cause CreateFAdd and friends to produce the associated intrinsics, instead of the IR instructions, when strictness is enabled.

I expect we'll need a "non-default FP environment" marker on each floating-point AST node (both on operations and on calls that could resolve to FP builtins) in order to properly handle constant expression evaluation for those things, so we will presumably have the relevant information to hand at the point of emitting an expression anyway. That said, this functionality may be useful if the IR representation needs to be different for FP operations appearing outside an FENV_ACCESS region but within a function that contains an FENV_ACCESS region (eg, using intrinsics rather than instructions).

I also think that an abstraction in the IRBuilder will make this easier to use by other frontends (otherwise we'll have the (if (strict) CreateIntrinsic else CreateFOp) logic repeated in multiple projects/places.

 -Hal


 -Hal



On the LLVM side of things there are a few significant holes. As you've noticed, the FP to integer conversions operations still need intrinsics, as do fcmp, fptrunc, and fpext. There are probably others that I'm overlooking. The FP to SI conversion has an additional wrinkle that needs to be worked out in that the default lowering of this conversion to machine instructions is not exception safe.

In general, the current "strict FP" handling stops at instruction selection. At the MachineIR level we don't currently have a mechanism to prevent inappropriate optimizations based on floating point constraints, or indeed to convey such constraints to the backend. Implicit register use modeling may provide some restriction on some architectures, but this is definitely lacking for X86 targets. On the other hand, I'm not aware of any specific current problems, so in many cases we may "get lucky" and have the correct thing happen by chance. Obviously that's not a viable long term solution. I have a rough plan for adding improved register modeling to the X86 backend, which should take care of instruction scheduling issues, but we'd still need a mechanism to prevent constant folding optimizations and such.

As for what you could begin work on, it should be a fairly straight-forward task to implement the intrinsics for fptosi, fptoui, fcmp, fptrunc, and fpext. That would be a gentle introduction. Beyond that, it would be very helpful to have some pathfinding work done to solidify exactly what the remaining shortcomings are. I have a patch somewhere (stale by now, but I could refresh it pretty easily) that unconditionally converts all FP operations to the equivalent constrained intrinsics. You could use that to do testing and find out what's broken.

Thanks,
Andy


-----Original Message-----
From: Kevin P. Neal [mailto:[hidden email]]
Sent: Monday, January 08, 2018 6:41 AM
To: Hal Finkel via cfe-dev <[hidden email]>
Cc: Richard Smith <[hidden email]>; Kaylor, Andrew <[hidden email]>; Marcus Johnson <[hidden email]>; [hidden email]; Bob Huemmer <[hidden email]>
Subject: Re: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

On Thu, Aug 31, 2017 at 05:03:17PM -0500, Hal Finkel via cfe-dev wrote:
>    To be clear, we've had several extensive discussions about this, on and
>    off list, and Andy has started adding the corresponding intrinsics into
>    the IR. There was a presumption about a lack of mixing, however, and we
>    do need to work out how to prevent mixing the native IR operations with
>    the intrinsics (although, perhaps we just did that).
>     -Hal

What's the current status of this work? My employeer very much needs this work done sooner rather than later, and I've been tasked with helping make it happen.

What, exactly, still needs to be done to complete this work? I've seen some of the discussions about it, and I've seen the documentation on the new llvm constrained floating point intrinsics. But clang I don't think supports them yet, fptosi is not on the list anyway, and I'm not sure what else is needed. So I'm asking, what all is needed and what can I work on to move this forward?

Is there any work in progress code that anyone would be willing to share?
For example, any code using the new intrinsics? Andy?


The specific case we're running into today is that we have code being reordered in ways that trigger traps when handling a NaN. This code:

#include <math.h>

int foo(double d) {
   int x = (!isnan(d) ? (int)d : 45);
   return x;
}

... becomes this:

define signext i32 @foo(double) local_unnamed_addr #0 !dbg !10 {
  tail call void @llvm.dbg.value(metadata double %0, i64 0, metadata !15, metadata !17), !dbg !18
  %2 = tail call signext i32 @__isnan(double %0) #3, !dbg !19
  %3 = icmp eq i32 %2, 0, !dbg !19
  %4 = fptosi double %0 to i32, !dbg !20
  %5 = select i1 %3, i32 %4, i32 45, !dbg !19
  tail call void @llvm.dbg.value(metadata i32 %5, i64 0, metadata !16, metadata !17), !dbg !21
  ret i32 %5, !dbg !22
}

So the fptosi gets moved _above_ the select and the trap happens. This in code that was written to avoid a trap in exactly this case.

We're compiling with clang 5.0.0 "-g -O1" targeting SystemZ.
--
Kevin P. Neal                                http://www.pobox.com/~kpn/
      'Concerns about "rights" and "ownership" of domains are inappropriate.
 It is appropriate to be concerned about "responsibilities" and "service"
 to the community.' -- RFC 1591, page 4: March 1994
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
_______________________________________________ cfe-dev mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Why is #pragma STDC FENV_ACCESS not supported?

Matthieu Brucher via cfe-dev
In reply to this post by Matthieu Brucher via cfe-dev

On Jan 9, 2018, at 3:50 PM, Kaylor, Andrew <[hidden email]> wrote:

>The standard argument against trying to introduce "scope-like" mechanisms to LLVM IR is inlining;
>unless you're going to prevent functions that use stricter/laxer FP rules from being inlined >into
>each other (which sounds disastrous), you're going to need to communicate strictness on an
>instruction-by-instruction basis.  If the backend wants to handle that by using the strictest
>rule that it sees in use anywhere in the function because pattern-matching is otherwise too
>error-prone, ok, that's its right; but the IR really should be per-instruction.
 
I added a function level attribute, strictfp, which is meant to help with this. I don’t believe the inlining handling of the attribute is implemented yet, but what I’m thinking is that we would never inline a function that had the strictfp attribute and if we inlined a non-strictfp function into a strictfp function, we would transform any FP operations into their constrained equivalents at that time. In the short term, we’d probably just block both types of inlining.
 
It may sound disastrous, but I think there’s an understanding that using strict FP semantics is going to significantly inhibit optimizations. In the short term, that’s actually the purpose of the constrained intrinsics -- to disable all optimizations until we can teach the optimizer to do things correctly. The plan is that once this is all implemented to produce correct results, we’ll go back and try to re-enable as many optimizations as possible, which may eventually include doing something more intelligent with inlining.
 
With regard to your “instruction level” comments, my intention is that the use of the intrinsics will impose the necessary restrictions at the instruction level. Optimizations (other than inlining) should never need to check the function level attribute. But if we mixed “raw” FP operations and constrained intrinsics within a single function there would be no way to prevent motion of the “raw” operations across the intrinsics.

Is that a problem?  Semantics are guaranteed only for strictfp operations, i.e. ones that use the intrinsics.  Raw operations can get reordered across intrinsics and change semantics, but that seems allowable, right?

John.


 
The reason I brought up the scope level nature of the pragma was just to suggest that it might be a property that we could take advantage of to handle the transition from IR to MIR. I haven’t come up with a way to bake the strict FP information into the instructions across the ISel boundary, but I think it might be possible to temporarily add it to a block and then have an early machine code pass that used this information in some way once the MIR was all in place. I’m open to the possibility that that was a bad idea.
 
-Andy
 
From: [hidden email] [[hidden email]] 
Sent: Tuesday, January 09, 2018 11:12 AM
To: Kaylor, Andrew <[hidden email]>
Cc: Ulrich Weigand <[hidden email]>; [hidden email]; [hidden email]; [hidden email]; llvm-dev <[hidden email]>; Richard Smith <[hidden email]>; [hidden email]
Subject: Re: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?
 
 
On Jan 9, 2018, at 1:53 PM, Kaylor, Andrew via cfe-dev <[hidden email]> wrote:
 
I think we’re going to need to create a new mechanism to communicate strict FP modes to the backend. I think we need to avoid doing anything that will require re-inventing or duplicating all of the pattern matching that goes on in instruction selection (which is the reason we’re currently dropping that information). I’m out of my depth on this transition, but I think maybe we could handle it with some kind of attribute on the MBB.
 
In C/C++, at least, it’s my understanding that the pragmas always apply at the scope-level (as opposed to having the possibility of being instruction-specific), and we’ve previously agreed that our implementation will really need to apply the rules across entire functions in the sense that if any part of a function uses the constrained intrinsics all FP operations in the function will need to use them (though different metadata arguments may be used in different scopes). So I think that opens our options a bit.
 
Regarding constant folding, I think you are correct that it isn’t happening anywhere in the backends at the moment. There is some constant folding done during instruction selection, but the existing mechanism prevents that. My concern is that given LLVM’s development model, if there is nothing in place to prevent constant folding and no consensus that it shouldn’t be allowed then we should probably believe that someone will eventually do it.
 
The standard argument against trying to introduce "scope-like" mechanisms to LLVM IR is inlining; unless you're going to prevent functions that use stricter/laxer FP rules from being inlined into each other (which sounds disastrous), you're going to need to communicate strictness on an instruction-by-instruction basis.  If the backend wants to handle that by using the strictest rule that it sees in use anywhere in the function because pattern-matching is otherwise too error-prone, ok, that's its right; but the IR really should be per-instruction.
 
John.


 
-Andy
 
From: Ulrich Weigand [[hidden email]] 
Sent: Tuesday, January 09, 2018 9:59 AM
To: Kaylor, Andrew <[hidden email]>; [hidden email]
Cc: Hal Finkel <[hidden email]>; Richard Smith <[hidden email]>; [hidden email]; [hidden email]; [hidden email]; [hidden email]; llvm-dev <[hidden email]>
Subject: Re: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?
 
Andrew Kaylor wrote:

>In general, the current "strict FP" handling stops at instruction
>selection. At the MachineIR level we don't currently have a mechanism
>to prevent inappropriate optimizations based on floating point
>constraints, or indeed to convey such constraints to the backend.
>Implicit register use modeling may provide some restriction on some
>architectures, but this is definitely lacking for X86 targets. On the
>other hand, I'm not aware of any specific current problems, so in many
>cases we may "get lucky" and have the correct thing happen by chance.
>Obviously that's not a viable long term solution. I have a rough plan
>for adding improved register modeling to the X86 backend, which should
>take care of instruction scheduling issues, but we'd still need a
>mechanism to prevent constant folding optimizations and such.

Given that Kevin intends to target SystemZ, I'll be happy to work on the SystemZ back-end support for this feature. I agree that we should be using implicit control register dependencies, which will at least prevent moving floating-point operations across instructions that e.g. change rounding modes. However, the main property we need to model is that floating-point operations may *trap*. I guess this can be done using UnmodeledSideEffects, but I'm not quite clear on how to make this dependent on whether or not a "strict" operation is requested (without duplicating all the instruction patterns ...).


Once we do use something like UnmodeledSideEffects, I think MachineIR passes should handle everything correctly; in the end, the requirements are not really different from those of other trapping instructions. B.t.w. I don't think anybody does constant folding on floating-point constants at the MachineIR level anyway ... have you seen this anywhere?


Mit freundlichen Gruessen / Best Regards

Ulrich Weigand

-- 
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU/Linux compilers and toolchain
IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
123