fp-contract at -O0

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

fp-contract at -O0

Hans Wennborg via cfe-dev

Hi everyone,

 

Melanie Blower recently submitted a change that was intended to make the default set of floating point options in clang be consistent with the options that would be set by the -ffp-model=precise umbrella option. The only change needed was to make the default for fp-contract “on” instead of “off”. While not a trivial change, we thought this was reasonable, since fp-contract=on only allows contraction that is allowed by the language standard. Unfortunately, this change unleashed a surprising number of problems.

 

The most surprising problem, to me at least, was that this change caused FMA instructions to be generated at -O0.

 

There are a couple of things that need to be sorted out here, but I’d like to start with the -O0 behavior. Consider the following scenario, which was possible even before the recent change:

 

--------

test.c

--------

double f(double a, double b, double c) {

  return a * b + c;

}

--------

clang -c -O0 -ffp-contract=on test.c

--------

 

Since clang 5.0 this has produced a call to llvm.fmuladd, which for targets that support FMA will generally result in an FMA instruction. Arguably this is what the user asked for, since they explicitly enabled fp-contract. On the other hand, it is also an optimization, which they said they did not want. As a point of comparison, specifying -ffast-math will cause the front end to attach the “fast” flag to math operations (which also allows contraction), but will not lead to FMA formation.

 

What should we do with this? I see two possible solutions:

 

1. The driver should not pass the -ffp-contract=on flag by default at -O0 (still allows fmuladd formation if the user specifies -ffp-contract=on)

2. The front end should not form the llvm.fmuladd intrinsic at -O0

 

The second option seems preferable to me, but I don’t know how unnatural it might be for the front end to respond to optimization level.

 

Apart from the -O0 dilemma, the change in default fp-contract behavior seems to have led to other problems. It introduced some performance regressions in LNT on x86 and some accuracy-related test failures on PowerPC. There are likely other issues that I just haven’t heard about. So, I guess we should talk about whether we really want to enable this by default when optimizations are enabled. I don’t know anything about the PowerPC issues. I looked at the top x86 performance regression and it seems that the introduction of the fmuladd intrinsic changed the decision of the loop unroller (the key loop is unrolled by 8 instead of 4). I’m inclined to regard that as a fluke of the test case or possibly a problem in the loop unroller, but I wouldn’t see it as a reason to prefer to disable FP contraction.

 

FWIW, the fp-model option was intended to provide the same functionality as the /fp option in the MSVC compiler. The MSVC /fp:precise option enables FP contraction.

 

Input here would be appreciated.

 

Thanks,

Andy

 


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: fp-contract at -O0

Hans Wennborg via cfe-dev
On Fri, Feb 14, 2020 at 12:30:06AM +0000, Kaylor, Andrew via cfe-dev wrote:
> The most surprising problem, to me at least, was that this change caused FMA instructions to be generated at -O0.

Why is that a problem? As long as it doesn't create build performance
regressions, it seems to be semantically valid to do?

Joerg
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: fp-contract at -O0

Hans Wennborg via cfe-dev
In reply to this post by Hans Wennborg via cfe-dev
-O0 does not mean “do not optimize”. It means "Reduce compilation time and make debugging produce the expected results” (quoting the GCC manual, but it applies equally to clang).

From my perspective, this is absolutely the expected behavior.

– Steve

On Feb 13, 2020, at 7:30 PM, Kaylor, Andrew via cfe-dev <[hidden email]> wrote:

Hi everyone,
 
Melanie Blower recently submitted a change that was intended to make the default set of floating point options in clang be consistent with the options that would be set by the -ffp-model=precise umbrella option. The only change needed was to make the default for fp-contract “on” instead of “off”. While not a trivial change, we thought this was reasonable, since fp-contract=on only allows contraction that is allowed by the language standard. Unfortunately, this change unleashed a surprising number of problems.
 
The most surprising problem, to me at least, was that this change caused FMA instructions to be generated at -O0.
 
There are a couple of things that need to be sorted out here, but I’d like to start with the -O0 behavior. Consider the following scenario, which was possible even before the recent change:
 
--------
test.c
--------
double f(double a, double b, double c) {
  return a * b + c;
}
--------
clang -c -O0 -ffp-contract=on test.c
--------
 
Since clang 5.0 this has produced a call to llvm.fmuladd, which for targets that support FMA will generally result in an FMA instruction. Arguably this is what the user asked for, since they explicitly enabled fp-contract. On the other hand, it is also an optimization, which they said they did not want. As a point of comparison, specifying -ffast-math will cause the front end to attach the “fast” flag to math operations (which also allows contraction), but will not lead to FMA formation.
 
What should we do with this? I see two possible solutions:
 
1. The driver should not pass the -ffp-contract=on flag by default at -O0 (still allows fmuladd formation if the user specifies -ffp-contract=on)
2. The front end should not form the llvm.fmuladd intrinsic at -O0
 
The second option seems preferable to me, but I don’t know how unnatural it might be for the front end to respond to optimization level.
 
Apart from the -O0 dilemma, the change in default fp-contract behavior seems to have led to other problems. It introduced some performance regressions in LNT on x86 and some accuracy-related test failures on PowerPC. There are likely other issues that I just haven’t heard about. So, I guess we should talk about whether we really want to enable this by default when optimizations are enabled. I don’t know anything about the PowerPC issues. I looked at the top x86 performance regression and it seems that the introduction of the fmuladd intrinsic changed the decision of the loop unroller (the key loop is unrolled by 8 instead of 4). I’m inclined to regard that as a fluke of the test case or possibly a problem in the loop unroller, but I wouldn’t see it as a reason to prefer to disable FP contraction.
 
FWIW, the fp-model option was intended to provide the same functionality as the /fp option in the MSVC compiler. The MSVC /fp:precise option enables FP contraction.
 
Input here would be appreciated.
 
Thanks,
Andy
 
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: fp-contract at -O0

Hans Wennborg via cfe-dev

That’s certainly a reasonable position, but it isn’t without problems. For instance: https://godbolt.org/z/9JtoPt

 

In this case, “-O0 -ffp-contract=on -march=haswell” results in an FMA instruction but “-O0 -ffp-contract=fast -march=haswell” does not.

 

I’m not opposed to allowing the explicit use of -ffp-contract=on to lead clang to generate a call to llvm.fmuladd, but I don’t think that should happen by default at -O0.

 

-Andy

 

From: [hidden email] <[hidden email]>
Sent: Thursday, February 13, 2020 5:17 PM
To: Kaylor, Andrew <[hidden email]>
Cc: [hidden email]
Subject: Re: [cfe-dev] fp-contract at -O0

 

-O0 does not mean “do not optimize”. It means "Reduce compilation time and make debugging produce the expected results” (quoting the GCC manual, but it applies equally to clang).

 

From my perspective, this is absolutely the expected behavior.

 

– Steve



On Feb 13, 2020, at 7:30 PM, Kaylor, Andrew via cfe-dev <[hidden email]> wrote:

 

Hi everyone,

 

Melanie Blower recently submitted a change that was intended to make the default set of floating point options in clang be consistent with the options that would be set by the -ffp-model=precise umbrella option. The only change needed was to make the default for fp-contract “on” instead of “off”. While not a trivial change, we thought this was reasonable, since fp-contract=on only allows contraction that is allowed by the language standard. Unfortunately, this change unleashed a surprising number of problems.

 

The most surprising problem, to me at least, was that this change caused FMA instructions to be generated at -O0.

 

There are a couple of things that need to be sorted out here, but I’d like to start with the -O0 behavior. Consider the following scenario, which was possible even before the recent change:

 

--------

test.c

--------

double f(double a, double b, double c) {

  return a * b + c;

}

--------

clang -c -O0 -ffp-contract=on test.c

--------

 

Since clang 5.0 this has produced a call to llvm.fmuladd, which for targets that support FMA will generally result in an FMA instruction. Arguably this is what the user asked for, since they explicitly enabled fp-contract. On the other hand, it is also an optimization, which they said they did not want. As a point of comparison, specifying -ffast-math will cause the front end to attach the “fast” flag to math operations (which also allows contraction), but will not lead to FMA formation.

 

What should we do with this? I see two possible solutions:

 

1. The driver should not pass the -ffp-contract=on flag by default at -O0 (still allows fmuladd formation if the user specifies -ffp-contract=on)

2. The front end should not form the llvm.fmuladd intrinsic at -O0

 

The second option seems preferable to me, but I don’t know how unnatural it might be for the front end to respond to optimization level.

 

Apart from the -O0 dilemma, the change in default fp-contract behavior seems to have led to other problems. It introduced some performance regressions in LNT on x86 and some accuracy-related test failures on PowerPC. There are likely other issues that I just haven’t heard about. So, I guess we should talk about whether we really want to enable this by default when optimizations are enabled. I don’t know anything about the PowerPC issues. I looked at the top x86 performance regression and it seems that the introduction of the fmuladd intrinsic changed the decision of the loop unroller (the key loop is unrolled by 8 instead of 4). I’m inclined to regard that as a fluke of the test case or possibly a problem in the loop unroller, but I wouldn’t see it as a reason to prefer to disable FP contraction.

 

FWIW, the fp-model option was intended to provide the same functionality as the /fp option in the MSVC compiler. The MSVC /fp:precise option enables FP contraction.

 

Input here would be appreciated.

 

Thanks,

Andy

 

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

 


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: fp-contract at -O0

Hans Wennborg via cfe-dev
Why not? What situation are you trying to avoid?

I don’t see a problem with the godbolt link; is your concern simply that you think -ffp-contract=fast should fuse a super-set of what is done by =on, or is there something else?

If anything, preserving FMA formation at O0 _helps_ debuggability, because it means that numerical behavior is more likely to match what a user observed at Os, allowing them to debug the problem.

On Feb 13, 2020, at 8:32 PM, Kaylor, Andrew <[hidden email]> wrote:

That’s certainly a reasonable position, but it isn’t without problems. For instance: https://godbolt.org/z/9JtoPt
 
In this case, “-O0 -ffp-contract=on -march=haswell” results in an FMA instruction but “-O0 -ffp-contract=fast -march=haswell” does not.
 
I’m not opposed to allowing the explicit use of -ffp-contract=on to lead clang to generate a call to llvm.fmuladd, but I don’t think that should happen by default at -O0.
 
-Andy
 
From: [hidden email] <[hidden email]> 
Sent: Thursday, February 13, 2020 5:17 PM
To: Kaylor, Andrew <[hidden email]>
Cc: [hidden email]
Subject: Re: [cfe-dev] fp-contract at -O0
 
-O0 does not mean “do not optimize”. It means "Reduce compilation time and make debugging produce the expected results” (quoting the GCC manual, but it applies equally to clang).
 
From my perspective, this is absolutely the expected behavior.
 
– Steve


On Feb 13, 2020, at 7:30 PM, Kaylor, Andrew via cfe-dev <[hidden email]> wrote:
 
Hi everyone,
 
Melanie Blower recently submitted a change that was intended to make the default set of floating point options in clang be consistent with the options that would be set by the -ffp-model=precise umbrella option. The only change needed was to make the default for fp-contract “on” instead of “off”. While not a trivial change, we thought this was reasonable, since fp-contract=on only allows contraction that is allowed by the language standard. Unfortunately, this change unleashed a surprising number of problems.
 
The most surprising problem, to me at least, was that this change caused FMA instructions to be generated at -O0.
 
There are a couple of things that need to be sorted out here, but I’d like to start with the -O0 behavior. Consider the following scenario, which was possible even before the recent change:
 
--------
test.c
--------
double f(double a, double b, double c) {
  return a * b + c;
}
--------
clang -c -O0 -ffp-contract=on test.c
--------
 
Since clang 5.0 this has produced a call to llvm.fmuladd, which for targets that support FMA will generally result in an FMA instruction. Arguably this is what the user asked for, since they explicitly enabled fp-contract. On the other hand, it is also an optimization, which they said they did not want. As a point of comparison, specifying -ffast-math will cause the front end to attach the “fast” flag to math operations (which also allows contraction), but will not lead to FMA formation.
 
What should we do with this? I see two possible solutions:
 
1. The driver should not pass the -ffp-contract=on flag by default at -O0 (still allows fmuladd formation if the user specifies -ffp-contract=on)
2. The front end should not form the llvm.fmuladd intrinsic at -O0
 
The second option seems preferable to me, but I don’t know how unnatural it might be for the front end to respond to optimization level.
 
Apart from the -O0 dilemma, the change in default fp-contract behavior seems to have led to other problems. It introduced some performance regressions in LNT on x86 and some accuracy-related test failures on PowerPC. There are likely other issues that I just haven’t heard about. So, I guess we should talk about whether we really want to enable this by default when optimizations are enabled. I don’t know anything about the PowerPC issues. I looked at the top x86 performance regression and it seems that the introduction of the fmuladd intrinsic changed the decision of the loop unroller (the key loop is unrolled by 8 instead of 4). I’m inclined to regard that as a fluke of the test case or possibly a problem in the loop unroller, but I wouldn’t see it as a reason to prefer to disable FP contraction.
 
FWIW, the fp-model option was intended to provide the same functionality as the /fp option in the MSVC compiler. The MSVC /fp:precise option enables FP contraction.
 
Input here would be appreciated.
 
Thanks,
Andy
 
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: fp-contract at -O0

Hans Wennborg via cfe-dev

> Why not? What situation are you trying to avoid?

 

It just seems unexpected. I write code with no explicit FMA’s. I compile with no command line options, and I get FMA. It’s not what I’d expect, and someone else specifically complained about this behavior after Melanie’s patch landed.

 

> I don’t see a problem with the godbolt link; is your concern simply that you think -ffp-contract=fast should fuse a super-set of what is done by =on, or is there something else?

 

Yes, that is my concern. I think =fast should always produce at least as many FMA’s as =on.

 

> If anything, preserving FMA formation at O0 _helps_ debuggability, because it means that numerical behavior is more likely to match what a user observed at Os, allowing them to debug the problem.

 

That’s an excellent point. I could definitely be persuaded by that argument.

 

-Andy

 

 

From: [hidden email] <[hidden email]>
Sent: Thursday, February 13, 2020 5:37 PM
To: Kaylor, Andrew <[hidden email]>
Cc: [hidden email]
Subject: Re: [cfe-dev] fp-contract at -O0

 

Why not? What situation are you trying to avoid?

 

I don’t see a problem with the godbolt link; is your concern simply that you think -ffp-contract=fast should fuse a super-set of what is done by =on, or is there something else?

 

If anything, preserving FMA formation at O0 _helps_ debuggability, because it means that numerical behavior is more likely to match what a user observed at Os, allowing them to debug the problem.



On Feb 13, 2020, at 8:32 PM, Kaylor, Andrew <[hidden email]> wrote:

 

That’s certainly a reasonable position, but it isn’t without problems. For instance: https://godbolt.org/z/9JtoPt

 

In this case, “-O0 -ffp-contract=on -march=haswell” results in an FMA instruction but “-O0 -ffp-contract=fast -march=haswell” does not.

 

I’m not opposed to allowing the explicit use of -ffp-contract=on to lead clang to generate a call to llvm.fmuladd, but I don’t think that should happen by default at -O0.

 

-Andy

 

From: [hidden email] <[hidden email]> 
Sent: Thursday, February 13, 2020 5:17 PM
To: Kaylor, Andrew <[hidden email]>
Cc: [hidden email]
Subject: Re: [cfe-dev] fp-contract at -O0

 

-O0 does not mean “do not optimize”. It means "Reduce compilation time and make debugging produce the expected results” (quoting the GCC manual, but it applies equally to clang).

 

From my perspective, this is absolutely the expected behavior.

 

– Steve




On Feb 13, 2020, at 7:30 PM, Kaylor, Andrew via cfe-dev <[hidden email]> wrote:

 

Hi everyone,

 

Melanie Blower recently submitted a change that was intended to make the default set of floating point options in clang be consistent with the options that would be set by the -ffp-model=precise umbrella option. The only change needed was to make the default for fp-contract “on” instead of “off”. While not a trivial change, we thought this was reasonable, since fp-contract=on only allows contraction that is allowed by the language standard. Unfortunately, this change unleashed a surprising number of problems.

 

The most surprising problem, to me at least, was that this change caused FMA instructions to be generated at -O0.

 

There are a couple of things that need to be sorted out here, but I’d like to start with the -O0 behavior. Consider the following scenario, which was possible even before the recent change:

 

--------

test.c

--------

double f(double a, double b, double c) {

  return a * b + c;

}

--------

clang -c -O0 -ffp-contract=on test.c

--------

 

Since clang 5.0 this has produced a call to llvm.fmuladd, which for targets that support FMA will generally result in an FMA instruction. Arguably this is what the user asked for, since they explicitly enabled fp-contract. On the other hand, it is also an optimization, which they said they did not want. As a point of comparison, specifying -ffast-math will cause the front end to attach the “fast” flag to math operations (which also allows contraction), but will not lead to FMA formation.

 

What should we do with this? I see two possible solutions:

 

1. The driver should not pass the -ffp-contract=on flag by default at -O0 (still allows fmuladd formation if the user specifies -ffp-contract=on)

2. The front end should not form the llvm.fmuladd intrinsic at -O0

 

The second option seems preferable to me, but I don’t know how unnatural it might be for the front end to respond to optimization level.

 

Apart from the -O0 dilemma, the change in default fp-contract behavior seems to have led to other problems. It introduced some performance regressions in LNT on x86 and some accuracy-related test failures on PowerPC. There are likely other issues that I just haven’t heard about. So, I guess we should talk about whether we really want to enable this by default when optimizations are enabled. I don’t know anything about the PowerPC issues. I looked at the top x86 performance regression and it seems that the introduction of the fmuladd intrinsic changed the decision of the loop unroller (the key loop is unrolled by 8 instead of 4). I’m inclined to regard that as a fluke of the test case or possibly a problem in the loop unroller, but I wouldn’t see it as a reason to prefer to disable FP contraction.

 

FWIW, the fp-model option was intended to provide the same functionality as the /fp option in the MSVC compiler. The MSVC /fp:precise option enables FP contraction.

 

Input here would be appreciated.

 

Thanks,

Andy

 

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

 


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: fp-contract at -O0

Hans Wennborg via cfe-dev
On Feb 13, 2020, at 9:17 PM, Kaylor, Andrew <[hidden email]> wrote:

> I don’t see a problem with the godbolt link; is your concern simply that you think -ffp-contract=fast should fuse a super-set of what is done by =on, or is there something else?
 
Yes, that is my concern. I think =fast should always produce at least as many FMA’s as =on.

I can imagine a few ways to handle this, if we really want to do something about it:

1 A diagnostic when combining -ffp-contract=fast with -O0 that you aren’t going to get FMA formation.
2 Make -ffp-contract=fast decay to =on under -O0.
3 Make -ffp-contract=fast always imply =on as well (so the frontend would form fmuladd nodes in both modes, but =fast would additionally license forming fma out of mul+add pairs).

Option 1 is easy but silly. Option 2 is only slightly more invasive and definitely fixes the “problem”, but is maybe a little too clever. Option 3 may be the best, but I haven’t thought through all the details, and it would require some experimentation.

– Steve

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: fp-contract at -O0

Hans Wennborg via cfe-dev

> 3 Make -ffp-contract=fast always imply =on as well (so the frontend would form fmuladd nodes in both modes, but =fast would additionally license forming fma out of mul+add pairs).

 

This option could potentially impede optimizations that are currently performed. Having the contract flag set on FP operations instead of using the fmuladd intrinsic gives the backend freedom to mix and match operations from different source expressions. I’ve come across a case recently where this is beneficial.

 

Perhaps the problem is with my expectation. The option isn’t very well documented in clang (or gcc).

 

    “Form fused FP ops (e.g. FMAs): fast (everywhere) | on (according to FP_CONTRACT pragma) | off (never fuse). Default is ‘fast’ for CUDA/HIP and ‘on’ otherwise.”

 

Obviously, we don’t form fused FP ops “everywhere.” What this probably should say is that we form fused ops potentially anywhere, at the discretion of the compiler. A more verbose explanation would be good. With the right wording this would reasonably explain why such ops aren’t fused at -O0.

 

Having given it more thought, I’d be OK with option 0 -- leave things as they are (or recently have been/soon will be) with =on as the default and the front end forming fmuladd or setting the contract flag without regard to the optimization level.

 

BTW, I also noticed some time ago that the front end will form fmuladd with =fast if the code in question is subject to a pragma STDC FP_CONTRACT ON. That seemed wrong to me at the time but now seems reasonable and consistent.

 

-Andy

 

From: [hidden email] <[hidden email]>
Sent: Friday, February 14, 2020 6:00 AM
To: Kaylor, Andrew <[hidden email]>
Cc: [hidden email]
Subject: Re: [cfe-dev] fp-contract at -O0

 

On Feb 13, 2020, at 9:17 PM, Kaylor, Andrew <[hidden email]> wrote:

 

> I don’t see a problem with the godbolt link; is your concern simply that you think -ffp-contract=fast should fuse a super-set of what is done by =on, or is there something else?

 

Yes, that is my concern. I think =fast should always produce at least as many FMA’s as =on.

 

I can imagine a few ways to handle this, if we really want to do something about it:

 

1 A diagnostic when combining -ffp-contract=fast with -O0 that you aren’t going to get FMA formation.

2 Make -ffp-contract=fast decay to =on under -O0.

3 Make -ffp-contract=fast always imply =on as well (so the frontend would form fmuladd nodes in both modes, but =fast would additionally license forming fma out of mul+add pairs).

 

Option 1 is easy but silly. Option 2 is only slightly more invasive and definitely fixes the “problem”, but is maybe a little too clever. Option 3 may be the best, but I haven’t thought through all the details, and it would require some experimentation.

 

– Steve


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: fp-contract at -O0

Hans Wennborg via cfe-dev
In reply to this post by Hans Wennborg via cfe-dev
"Kaylor, Andrew via cfe-dev" <[hidden email]> writes:

> --------
> test.c
> --------
> double f(double a, double b, double c) {
>   return a * b + c;
> }
> --------
> clang -c -O0 -ffp-contract=on test.c
> --------
>
> Since clang 5.0 this has produced a call to llvm.fmuladd, which for
> targets that support FMA will generally result in an FMA
> instruction. Arguably this is what the user asked for, since they
> explicitly enabled fp-contract. On the other hand, it is also an
> optimization, which they said they did not want. As a point of
> comparison, specifying -ffast-math will cause the front end to attach
> the "fast" flag to math operations (which also allows contraction),
> but will not lead to FMA formation.
>
> What should we do with this? I see two possible solutions:
>
> 1. The driver should not pass the -ffp-contract=on flag by default at
> -O0 (still allows fmuladd formation if the user specifies
> -ffp-contract=on)
>
> 2. The front end should not form the llvm.fmuladd intrinsic at -O0

I prefer option #1.  If the user explicitly adds -ffp-contract=on then
we should absolutely generate FMAs even at -O0.  To me this is the
principle of least surprise.  A "more specific" option
(-ffp-contract=on) overrides a "more general" option (-O0).

                       -David
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: fp-contract at -O0

Hans Wennborg via cfe-dev
In reply to this post by Hans Wennborg via cfe-dev
"Kaylor, Andrew via cfe-dev" <[hidden email]> writes:

> I’m not opposed to allowing the explicit use of -ffp-contract=on to
> lead clang to generate a call to llvm.fmuladd, but I don’t think that
> should happen by default at -O0.

+1.  This seems like the most reasonable behavior to me.

                    -David
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: fp-contract at -O0

Hans Wennborg via cfe-dev
In reply to this post by Hans Wennborg via cfe-dev
Stephen Canon via cfe-dev <[hidden email]> writes:

> If anything, preserving FMA formation at O0 _helps_ debuggability,
> because it means that numerical behavior is more likely to match what
> a user observed at Os, allowing them to debug the problem.

The user can always pass -ffp-contract=on to do that.

There are many cases where FMA is not desired and most users don't
expect fused operations at -O0 unless they specifically ask for it.

                     -David
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: fp-contract at -O0

Hans Wennborg via cfe-dev


> On Feb 18, 2020, at 2:01 PM, David Greene <[hidden email]> wrote:
>
> Stephen Canon via cfe-dev <[hidden email]> writes:
>
>> If anything, preserving FMA formation at O0 _helps_ debuggability,
>> because it means that numerical behavior is more likely to match what
>> a user observed at Os, allowing them to debug the problem.
>
> The user can always pass -ffp-contract=on to do that.
>
> There are many cases where FMA is not desired and most users don't
> expect fused operations at -O0 unless they specifically ask for it.

If FMA is not desired, -ffp-contract=off or the pragma should be used to disable it. -O0 is the wrong tool for that job.

– Steve
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: fp-contract at -O0

Hans Wennborg via cfe-dev
Stephen Canon <[hidden email]> writes:

>> On Feb 18, 2020, at 2:01 PM, David Greene <[hidden email]> wrote:
>>
>> There are many cases where FMA is not desired and most users don't
>> expect fused operations at -O0 unless they specifically ask for it.
>
> If FMA is not desired, -ffp-contract=off or the pragma should be used
> to disable it. -O0 is the wrong tool for that job.

Why?  Many of our customers would be surprised to see FMAs at -O0.

                   -David
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: fp-contract at -O0

Hans Wennborg via cfe-dev
On Wed, Feb 19, 2020 at 12:22 PM David Greene via cfe-dev <[hidden email]> wrote:
Stephen Canon <[hidden email]> writes:
>> On Feb 18, 2020, at 2:01 PM, David Greene <[hidden email]> wrote:
>>
>> There are many cases where FMA is not desired and most users don't
>> expect fused operations at -O0 unless they specifically ask for it.
>
> If FMA is not desired, -ffp-contract=off or the pragma should be used
> to disable it. -O0 is the wrong tool for that job.

Why?  Many of our customers would be surprised to see FMAs at -O0.

Jumping in with my opinion, as this thread doesn't seem to be dying of its own accord:

The -O0 level is supposed to be the compiler's "default" optimization level — that is, the "simplest possible" optimization level, the fastest one, the one that just flows through the compiler without taking any unnecessary detours or side quests. -O0 is the level where you get the thing that just works, without applying any additional post-processing to it.

In fact, film "post-processing" is a good way to think about optimization. -O0 codegen is like the dailies straight from the camera. Optimization options, -ffp-contract=whatever, and so on, are all inputs (from the human "director-producer") to the guy who does the post-processing, saying "take this raw footage, as it came from the camera, and— look for some extra FMAs, or lower the ones that basic codegen already put there, or whatever."

The innards of the compiler always look basically like this:

    do_some_codegen();
    if (some_option) {
        postprocess_the_codegen_to_satisfy_a_whim_of_the_director();
    }

The "-O0" path is by definition the path that does not take that `if` branch.  I don't care if the whim is "I want more FMAs" or "I want fewer FMAs" or "I want more spills to stack" or "I want fewer spills to stack" or whatever. The "-O0" path is by definition the path that does not cater to any whim except "I want to see the dailies as soon as possible."

Which is to say, Stephen Canon and Joerg Sonnenberger are correct.

–Arthur

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev