incorrect floating point accrued exception flags with -O

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

incorrect floating point accrued exception flags with -O

Hal Finkel via cfe-dev
Hi,

First, apologies if this is not the right place to post.

I am seeing unexpected values in the floating point accrued exception flags with clang generated programs. My original issue is seeing FE_INEXACT after an exact float to unsigned int conversion within a ternary expression. This issue does not occur with gcc. In trying to isolate the problem I wrote a simple test program, which results in completely opposite behaviour. FE_INEXACT is not getting set for an inexact conversion when optimisation is enabled.

Given I’m not yet seeing predictable results for accrued exception flags, I gave up trying to reproduce my original issue (FE_INEXACT for exact conversion) until I am certain which floating point optimisations are being enabled, and under what conditions floating point accrued exceptions are optimised away, otherwise I can’t be sure to isolate my first problem.

I have two versions of a simple test program below, one which even returns incorrect results in gcc. The tests below run on Linux using Debian vendor build of clang 3.8.1 and on macos with the Xcode 8.3.1 vendor build of clang. I don’t have -fast-math enabled so I would expect standards compliant behaviour. I would like to know what optimisations are preventing floating point accrued exceptions from being set and how to disable these optimisation so that I am get deterministic results, then I can try to reproduce my first issue in isolation.

- fcvt1.c triggers the same issue with gcc (FE_INEXACT not set for inexact conversion)
- fcvt2.c triggers the issue only with clang (FE_INEXACT not set for inexact conversion)
- no reproducer yet… (FE_INEXACT set after exact conversion)

Happy Holidays,

Michael.

$ gcc --version
gcc (Debian 6.3.0-6) 6.3.0 20170205
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ gcc -O0 -lm fcvt1.c
$ ./a.out
1 exact
1 inexact
$ gcc -O3 -lm fcvt1.c
$ ./a.out
1 exact
1 exact
$ gcc -O0 -lm fcvt2.c
$ ./a.out
1 exact
1 inexact
$ gcc -O3 -lm fcvt2.c
$ ./a.out
1 exact
1 inexact

$ clang --version
clang version 3.8.1-16 (tags/RELEASE_381/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

$ clang -O0 -lm fcvt1.c
$ ./a.out
1 exact
1 inexact
$ clang -O3 -lm fcvt1.c
$ ./a.out
1 exact
1 exact
$ clang -O0 -lm fcvt2.c
$ ./a.out
1 exact
1 inexact
$ clang -O3 -lm fcvt2.c
$ ./a.out
1 exact
1 exact

$ clang --version
Apple LLVM version 8.1.0 (clang-802.0.41)
Target: x86_64-apple-darwin16.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

$ cc -O0 fcvt1.c
$ ./a.out
1 exact
1 inexact
$ cc -O3 fcvt1.c
$ ./a.out
1 exact
1 exact
$ cc -O0 fcvt2.c
$ ./a.out
1 exact
1 inexact
$ cc -O3 fcvt2.c
$ ./a.out
1 exact
1 exact


$ cat fcvt1.c
#include <stdio.h>
#include <fenv.h>

unsigned fcvt(float a)
{
        return (unsigned)a;
}

int main()
{
        fesetround(FE_TONEAREST);

        feclearexcept(FE_ALL_EXCEPT);
        printf("%d ", fcvt(1.0f));
        printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");

        feclearexcept(FE_ALL_EXCEPT);
        printf("%d ", fcvt(1.1f));
        printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");
}


$ cat fcvt2.c
#include <stdio.h>
#include <fenv.h>

unsigned fcvt(float a)
{
        return (unsigned)a;
}

void test_fcvt(float a)
{
        feclearexcept(FE_ALL_EXCEPT);
        printf("%d ", fcvt(a));
        printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");
}

int main()
{
        fesetround(FE_TONEAREST);

        test_fcvt(1.0f);
        test_fcvt(1.1f);
}

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: incorrect floating point accrued exception flags with -O

Hal Finkel via cfe-dev
Hi Michael —

You’re dancing around a real issue in clang (and most other compilers), but it’s camouflaged by a few issues in your code. I’ll address those first:

1. If you want to read or set the floating-point environment, your code must contain:

        #pragma STDC FENV_ACCESS ON

If you do not have this pragma, all bets are off. The compiler is free to re-arrange your calls to fe* functions, treat the floating-point environment as constant, or eliminate them all together. See §7.6.1 of the C standard for more details, in particular, the following sentence:

> If part of a program tests floating-point status flags, sets floating-point control modes, or runs under non-default mode settings, but was translated with the state for the FENV_ACCESS pragma ‘‘off’’, the behavior is undefined.


If you add this pragma to your code example, you’ll get a helpful warning from clang that FENV_ACCESS is not [yet] supported.
 
2. Also in §7.6, you will note the following sentence (third bullet in paragraph 3):

> a function call is assumed to have the potential for raising floating-point exceptions, unless its documentation promises otherwise.

In particular, your code calls `printf` between `feclearexcept` and `fetestexcept`. To the best of my recollection, `printf` is not documented as not modifying the floating-point environment, so once you call it, all bets are off w.r.t. the floating-point state, even if you set FENV_ACCESS ON.

OK, now the real issue in clang: it doesn’t [yet] support FENV_ACCESS. Neither does GCC. There’s been some motion recently toward adding support for FENV_ACCESS, but it’s a largish project, and it hasn’t happened yet. Both compilers, when optimization is enabled, simply replace your call to fcvt(1.1) with 1 (because they don’t support FENV_ACCESS). GCC happens to “work” in your second example because it inlines `fcvt` into `test_fcvt`, but doesn’t inline `test_fcvt` into `main`, clang inlines both, does constant propagation, and no flags are raised.

godbolt.org is a good resource to see what’s going on here, though it won’t tell you *why*:
https://godbolt.org/g/Zb8Eoc

Best,
– Steve

> On Apr 15, 2017, at 5:51 PM, Michael Clark via cfe-dev <[hidden email]> wrote:
>
> Hi,
>
> First, apologies if this is not the right place to post.
>
> I am seeing unexpected values in the floating point accrued exception flags with clang generated programs. My original issue is seeing FE_INEXACT after an exact float to unsigned int conversion within a ternary expression. This issue does not occur with gcc. In trying to isolate the problem I wrote a simple test program, which results in completely opposite behaviour. FE_INEXACT is not getting set for an inexact conversion when optimisation is enabled.
>
> Given I’m not yet seeing predictable results for accrued exception flags, I gave up trying to reproduce my original issue (FE_INEXACT for exact conversion) until I am certain which floating point optimisations are being enabled, and under what conditions floating point accrued exceptions are optimised away, otherwise I can’t be sure to isolate my first problem.
>
> I have two versions of a simple test program below, one which even returns incorrect results in gcc. The tests below run on Linux using Debian vendor build of clang 3.8.1 and on macos with the Xcode 8.3.1 vendor build of clang. I don’t have -fast-math enabled so I would expect standards compliant behaviour. I would like to know what optimisations are preventing floating point accrued exceptions from being set and how to disable these optimisation so that I am get deterministic results, then I can try to reproduce my first issue in isolation.
>
> - fcvt1.c triggers the same issue with gcc (FE_INEXACT not set for inexact conversion)
> - fcvt2.c triggers the issue only with clang (FE_INEXACT not set for inexact conversion)
> - no reproducer yet… (FE_INEXACT set after exact conversion)
>
> Happy Holidays,
>
> Michael.
>
> $ gcc --version
> gcc (Debian 6.3.0-6) 6.3.0 20170205
> Copyright (C) 2016 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>
> $ gcc -O0 -lm fcvt1.c
> $ ./a.out
> 1 exact
> 1 inexact
> $ gcc -O3 -lm fcvt1.c
> $ ./a.out
> 1 exact
> 1 exact
> $ gcc -O0 -lm fcvt2.c
> $ ./a.out
> 1 exact
> 1 inexact
> $ gcc -O3 -lm fcvt2.c
> $ ./a.out
> 1 exact
> 1 inexact
>
> $ clang --version
> clang version 3.8.1-16 (tags/RELEASE_381/final)
> Target: x86_64-pc-linux-gnu
> Thread model: posix
> InstalledDir: /usr/bin
>
> $ clang -O0 -lm fcvt1.c
> $ ./a.out
> 1 exact
> 1 inexact
> $ clang -O3 -lm fcvt1.c
> $ ./a.out
> 1 exact
> 1 exact
> $ clang -O0 -lm fcvt2.c
> $ ./a.out
> 1 exact
> 1 inexact
> $ clang -O3 -lm fcvt2.c
> $ ./a.out
> 1 exact
> 1 exact
>
> $ clang --version
> Apple LLVM version 8.1.0 (clang-802.0.41)
> Target: x86_64-apple-darwin16.5.0
> Thread model: posix
> InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
>
> $ cc -O0 fcvt1.c
> $ ./a.out
> 1 exact
> 1 inexact
> $ cc -O3 fcvt1.c
> $ ./a.out
> 1 exact
> 1 exact
> $ cc -O0 fcvt2.c
> $ ./a.out
> 1 exact
> 1 inexact
> $ cc -O3 fcvt2.c
> $ ./a.out
> 1 exact
> 1 exact
>
>
> $ cat fcvt1.c
> #include <stdio.h>
> #include <fenv.h>
>
> unsigned fcvt(float a)
> {
>        return (unsigned)a;
> }
>
> int main()
> {
>        fesetround(FE_TONEAREST);
>
>        feclearexcept(FE_ALL_EXCEPT);
>        printf("%d ", fcvt(1.0f));
>        printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");
>
>        feclearexcept(FE_ALL_EXCEPT);
>        printf("%d ", fcvt(1.1f));
>        printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");
> }
>
>
> $ cat fcvt2.c
> #include <stdio.h>
> #include <fenv.h>
>
> unsigned fcvt(float a)
> {
>        return (unsigned)a;
> }
>
> void test_fcvt(float a)
> {
>        feclearexcept(FE_ALL_EXCEPT);
>        printf("%d ", fcvt(a));
>        printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");
> }
>
> int main()
> {
>        fesetround(FE_TONEAREST);
>
>        test_fcvt(1.0f);
>        test_fcvt(1.1f);
> }
>
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: incorrect floating point accrued exception flags with -O

Hal Finkel via cfe-dev

On 18 Apr 2017, at 1:08 AM, Stephen Canon <[hidden email]> wrote:

Hi Michael —

You’re dancing around a real issue in clang (and most other compilers), but it’s camouflaged by a few issues in your code. I’ll address those first:

1. If you want to read or set the floating-point environment, your code must contain:

#pragma STDC FENV_ACCESS ON

Yes, I tried that first and got the warning.

If you do not have this pragma, all bets are off. The compiler is free to re-arrange your calls to fe* functions, treat the floating-point environment as constant, or eliminate them all together. See §7.6.1 of the C standard for more details, in particular, the following sentence:

If part of a program tests floating-point status flags, sets floating-point control modes, or runs under non-default mode settings, but was translated with the state for the FENV_ACCESS pragma ‘‘off’’, the behavior is undefined.


If you add this pragma to your code example, you’ll get a helpful warning from clang that FENV_ACCESS is not [yet] supported.

Interesting. I’m sure the scientific computing folk will be interested in having this working. Many IEEE-754 compliant ISAs support floating point accrued exceptions. In fact I am working on a RISC-V simulator and binary translator so ultimately the C code will be translated to x86_64 asm and I’ll read MXCSR directly however I’m currently reversing the compiler asm output for the (working) conversions. I wanted the C cast based conversions to work reliably on gcc and clang for a reference interpreter that I am using to test a binary translating JIT engine.

2. Also in §7.6, you will note the following sentence (third bullet in paragraph 3):

a function call is assumed to have the potential for raising floating-point exceptions, unless its documentation promises otherwise.

In particular, your code calls `printf` between `feclearexcept` and `fetestexcept`. To the best of my recollection, `printf` is not documented as not modifying the floating-point environment, so once you call it, all bets are off w.r.t. the floating-point state, even if you set FENV_ACCESS ON.

I can modify the test to fetch the exception before the printf but I don’t believe it will make any difference as I am only printing an integer not a double. In the code where the problem exists, I explicitly save and restore the floating point accrued exception state in logging routines as I’ve already encountered the issue where printf with a double stomps on the floating point accrued exception state. I’ve in fact ported gdtoa and friends to C++ from FreeBSD’s libc. However, in this case I am only printing integers so it should have no effect on the floating point accrued exception state.

Indeed. I have a variadic template formatter replacement for snprintf that does not use varargs. It is derived from FreeBSD’s snprintf and David M Gay’s gdtoa. It has been updated to type box arguments using a variadic template wrapper. It emits a fixed size stack frame and it buffers in std::string  <https://github.com/michaeljclark/c-fmt/>. It relies on the wrapper being inlined. Note: the code is missing extern inline and I’ve since moved part of the implementation from headers into compiled modules but have not yet updated c+fmt.

As an aside, a C++2n string formatter that does not depend on iostream/stringstream would be a nice addition to the standard. A familiar snprintf style interface using format strings, but without all of the buffer woes. It also needs to support formatting QP (Quad Precision) so I intend to update gdtoa to a template that is parameterised for variable exponent and significand using type information structs:


OK, now the real issue in clang: it doesn’t [yet] support FENV_ACCESS. Neither does GCC. There’s been some motion recently toward adding support for FENV_ACCESS, but it’s a largish project, and it hasn’t happened yet. Both compilers, when optimization is enabled, simply replace your call to fcvt(1.1) with 1 (because they don’t support FENV_ACCESS). GCC happens to “work” in your second example because it inlines `fcvt` into `test_fcvt`, but doesn’t inline `test_fcvt` into `main`, clang inlines both, does constant propagation, and no flags are raised.

I knew it was inlining which is why I moved the code to an (default visibility extern) function which gcc seems to handle and I have been dumping asm output from both of the compilers. It would be interesting if there was a mode where default visibility extern functions where not inlined unless they were declared extern inline. I can understand static functions or template instantiation being inlined, but default visibility extern is a different issue. gcc seems to be more conservative with “non static" functions.

godbolt.org is a good resource to see what’s going on here, though it won’t tell you *why*:
https://godbolt.org/g/Zb8Eoc

Yes Matt Godbolt’s tools is very useful. I use objdump (and otool -tV on macos) a lot too, but I thought there might be a compiler flag for conservative handling of floating point to retain floating point accrued exceptions. I was unaware of the level of support for floating point accrued exceptions. I’ve added __attribute__ ((noinline)) to the second version and it now works with -O3. There should be a flag e.g. -fenv-ieee745 that somehow carries exception state even when inlining or disables inlining for functions that perform conversions or use any operations that require rounding of floating point values.


I’ll work on reproducing my original issue (FE_INEXACT for exact conversion) in isolation using __attribute__ ((noinline)) …

Thanks,
Michael.

Best,
– Steve

On Apr 15, 2017, at 5:51 PM, Michael Clark via cfe-dev <[hidden email]> wrote:

Hi,

First, apologies if this is not the right place to post.

I am seeing unexpected values in the floating point accrued exception flags with clang generated programs. My original issue is seeing FE_INEXACT after an exact float to unsigned int conversion within a ternary expression. This issue does not occur with gcc. In trying to isolate the problem I wrote a simple test program, which results in completely opposite behaviour. FE_INEXACT is not getting set for an inexact conversion when optimisation is enabled.

Given I’m not yet seeing predictable results for accrued exception flags, I gave up trying to reproduce my original issue (FE_INEXACT for exact conversion) until I am certain which floating point optimisations are being enabled, and under what conditions floating point accrued exceptions are optimised away, otherwise I can’t be sure to isolate my first problem.

I have two versions of a simple test program below, one which even returns incorrect results in gcc. The tests below run on Linux using Debian vendor build of clang 3.8.1 and on macos with the Xcode 8.3.1 vendor build of clang. I don’t have -fast-math enabled so I would expect standards compliant behaviour. I would like to know what optimisations are preventing floating point accrued exceptions from being set and how to disable these optimisation so that I am get deterministic results, then I can try to reproduce my first issue in isolation.

- fcvt1.c triggers the same issue with gcc (FE_INEXACT not set for inexact conversion)
- fcvt2.c triggers the issue only with clang (FE_INEXACT not set for inexact conversion)
- no reproducer yet… (FE_INEXACT set after exact conversion)

Happy Holidays,

Michael.

$ gcc --version
gcc (Debian 6.3.0-6) 6.3.0 20170205
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ gcc -O0 -lm fcvt1.c
$ ./a.out
1 exact
1 inexact
$ gcc -O3 -lm fcvt1.c
$ ./a.out
1 exact
1 exact
$ gcc -O0 -lm fcvt2.c
$ ./a.out
1 exact
1 inexact
$ gcc -O3 -lm fcvt2.c
$ ./a.out
1 exact
1 inexact

$ clang --version
clang version 3.8.1-16 (tags/RELEASE_381/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

$ clang -O0 -lm fcvt1.c
$ ./a.out
1 exact
1 inexact
$ clang -O3 -lm fcvt1.c
$ ./a.out
1 exact
1 exact
$ clang -O0 -lm fcvt2.c
$ ./a.out
1 exact
1 inexact
$ clang -O3 -lm fcvt2.c
$ ./a.out
1 exact
1 exact

$ clang --version
Apple LLVM version 8.1.0 (clang-802.0.41)
Target: x86_64-apple-darwin16.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

$ cc -O0 fcvt1.c
$ ./a.out
1 exact
1 inexact
$ cc -O3 fcvt1.c
$ ./a.out
1 exact
1 exact
$ cc -O0 fcvt2.c
$ ./a.out
1 exact
1 inexact
$ cc -O3 fcvt2.c
$ ./a.out
1 exact
1 exact


$ cat fcvt1.c
#include <stdio.h>
#include <fenv.h>

unsigned fcvt(float a)
{
      return (unsigned)a;
}

int main()
{
      fesetround(FE_TONEAREST);

      feclearexcept(FE_ALL_EXCEPT);
      printf("%d ", fcvt(1.0f));
      printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");

      feclearexcept(FE_ALL_EXCEPT);
      printf("%d ", fcvt(1.1f));
      printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");
}


$ cat fcvt2.c
#include <stdio.h>
#include <fenv.h>

unsigned fcvt(float a)
{
      return (unsigned)a;
}

void test_fcvt(float a)
{
      feclearexcept(FE_ALL_EXCEPT);
      printf("%d ", fcvt(a));
      printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");
}

int main()
{
      fesetround(FE_TONEAREST);

      test_fcvt(1.0f);
      test_fcvt(1.1f);
}

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

FE_INEXACT being set for an exact conversion from float to unsigned long long

Hal Finkel via cfe-dev
Hi,

I’ve reproduced my original issue. This issue is FE_INEXACT set for an exact conversion from float to unsigned long long.

The prior issue was eager inlining and constant folding causing missing updates to the floating point accrued exception flags when optimisation was enabled.

This second issue appears not to be an eager optimisation or constant folding issue.

- float to unsigned int conversion appears to be okay. 
- float to unsigned long long conversion appears to incorrectly update the accrued exception flags. 

Note the code explicitly casts from float to unsigned and then to signed. The first cast is to select float conversion to unsigned, and the outer cast is a sign extension indicator as all RISC-V integers are canonically sign extended to the width of the widest type (unlike x86). Returning a signed type of a smaller width will automatically sign extend when assigned to a larger signed type (the code came from a template) which is why we have extra casts. While the sign extension is redundant on 64-bit it isn’t for u128 and s128 which we intend to support.


Any insight would be greatly appreciated.

Michael.


$ g++ -O3 -lm fcvt.cc 
$ ./a.out 
1 exact
1 inexact
1 exact
1 inexact


$ clang++ -O3 -lm fcvt.cc 
$ ./a.out 
1 exact
1 inexact
1 inexact
1 inexact


$ cat fcvt.cc
#include <cstdio>
#include <cmath>
#include <cfenv>
#include <limits>

typedef signed int         s32;
typedef unsigned int       u32;
typedef signed long long   s64;
typedef unsigned long long u64;

__attribute__ ((noinline)) s32 fcvt_wu(float f)
{
return (std::isnan(f) | ((f >= 0) & std::isinf(f)))
? std::numeric_limits<u32>::max()
: s32(u32(f));
}

__attribute__ ((noinline)) s64 fcvt_lu(float f)
{
return (std::isnan(f) | ((f >= 0) & std::isinf(f)))
? std::numeric_limits<u64>::max()
: s64(u64(f));
}

void test_fcvt_wu(float a)
{
feclearexcept(FE_ALL_EXCEPT);
printf("%d ", fcvt_wu(a));
printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");
}

void test_fcvt_lu(float a)
{       
feclearexcept(FE_ALL_EXCEPT);
printf("%lld ", fcvt_lu(a));
printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");
}

int main()
{
fesetround(FE_TONEAREST);

test_fcvt_wu(1.0f);
test_fcvt_wu(1.1f);
test_fcvt_lu(1.0f);
test_fcvt_lu(1.1f);
}


On 18 Apr 2017, at 10:51 AM, Michael Clark <[hidden email]> wrote:


On 18 Apr 2017, at 1:08 AM, Stephen Canon <[hidden email]> wrote:

Hi Michael —

You’re dancing around a real issue in clang (and most other compilers), but it’s camouflaged by a few issues in your code. I’ll address those first:

1. If you want to read or set the floating-point environment, your code must contain:

#pragma STDC FENV_ACCESS ON

Yes, I tried that first and got the warning.

If you do not have this pragma, all bets are off. The compiler is free to re-arrange your calls to fe* functions, treat the floating-point environment as constant, or eliminate them all together. See §7.6.1 of the C standard for more details, in particular, the following sentence:

If part of a program tests floating-point status flags, sets floating-point control modes, or runs under non-default mode settings, but was translated with the state for the FENV_ACCESS pragma ‘‘off’’, the behavior is undefined.


If you add this pragma to your code example, you’ll get a helpful warning from clang that FENV_ACCESS is not [yet] supported.

Interesting. I’m sure the scientific computing folk will be interested in having this working. Many IEEE-754 compliant ISAs support floating point accrued exceptions. In fact I am working on a RISC-V simulator and binary translator so ultimately the C code will be translated to x86_64 asm and I’ll read MXCSR directly however I’m currently reversing the compiler asm output for the (working) conversions. I wanted the C cast based conversions to work reliably on gcc and clang for a reference interpreter that I am using to test a binary translating JIT engine.

2. Also in §7.6, you will note the following sentence (third bullet in paragraph 3):

a function call is assumed to have the potential for raising floating-point exceptions, unless its documentation promises otherwise.

In particular, your code calls `printf` between `feclearexcept` and `fetestexcept`. To the best of my recollection, `printf` is not documented as not modifying the floating-point environment, so once you call it, all bets are off w.r.t. the floating-point state, even if you set FENV_ACCESS ON.

I can modify the test to fetch the exception before the printf but I don’t believe it will make any difference as I am only printing an integer not a double. In the code where the problem exists, I explicitly save and restore the floating point accrued exception state in logging routines as I’ve already encountered the issue where printf with a double stomps on the floating point accrued exception state. I’ve in fact ported gdtoa and friends to C++ from FreeBSD’s libc. However, in this case I am only printing integers so it should have no effect on the floating point accrued exception state.

Indeed. I have a variadic template formatter replacement for snprintf that does not use varargs. It is derived from FreeBSD’s snprintf and David M Gay’s gdtoa. It has been updated to type box arguments using a variadic template wrapper. It emits a fixed size stack frame and it buffers in std::string  <https://github.com/michaeljclark/c-fmt/>. It relies on the wrapper being inlined. Note: the code is missing extern inline and I’ve since moved part of the implementation from headers into compiled modules but have not yet updated c+fmt.

As an aside, a C++2n string formatter that does not depend on iostream/stringstream would be a nice addition to the standard. A familiar snprintf style interface using format strings, but without all of the buffer woes. It also needs to support formatting QP (Quad Precision) so I intend to update gdtoa to a template that is parameterised for variable exponent and significand using type information structs:


OK, now the real issue in clang: it doesn’t [yet] support FENV_ACCESS. Neither does GCC. There’s been some motion recently toward adding support for FENV_ACCESS, but it’s a largish project, and it hasn’t happened yet. Both compilers, when optimization is enabled, simply replace your call to fcvt(1.1) with 1 (because they don’t support FENV_ACCESS). GCC happens to “work” in your second example because it inlines `fcvt` into `test_fcvt`, but doesn’t inline `test_fcvt` into `main`, clang inlines both, does constant propagation, and no flags are raised.

I knew it was inlining which is why I moved the code to an (default visibility extern) function which gcc seems to handle and I have been dumping asm output from both of the compilers. It would be interesting if there was a mode where default visibility extern functions where not inlined unless they were declared extern inline. I can understand static functions or template instantiation being inlined, but default visibility extern is a different issue. gcc seems to be more conservative with “non static" functions.

godbolt.org is a good resource to see what’s going on here, though it won’t tell you *why*:
https://godbolt.org/g/Zb8Eoc

Yes Matt Godbolt’s tools is very useful. I use objdump (and otool -tV on macos) a lot too, but I thought there might be a compiler flag for conservative handling of floating point to retain floating point accrued exceptions. I was unaware of the level of support for floating point accrued exceptions. I’ve added __attribute__ ((noinline)) to the second version and it now works with -O3. There should be a flag e.g. -fenv-ieee745 that somehow carries exception state even when inlining or disables inlining for functions that perform conversions or use any operations that require rounding of floating point values.


I’ll work on reproducing my original issue (FE_INEXACT for exact conversion) in isolation using __attribute__ ((noinline)) …

Thanks,
Michael.

Best,
– Steve

On Apr 15, 2017, at 5:51 PM, Michael Clark via cfe-dev <[hidden email]> wrote:

Hi,

First, apologies if this is not the right place to post.

I am seeing unexpected values in the floating point accrued exception flags with clang generated programs. My original issue is seeing FE_INEXACT after an exact float to unsigned int conversion within a ternary expression. This issue does not occur with gcc. In trying to isolate the problem I wrote a simple test program, which results in completely opposite behaviour. FE_INEXACT is not getting set for an inexact conversion when optimisation is enabled.

Given I’m not yet seeing predictable results for accrued exception flags, I gave up trying to reproduce my original issue (FE_INEXACT for exact conversion) until I am certain which floating point optimisations are being enabled, and under what conditions floating point accrued exceptions are optimised away, otherwise I can’t be sure to isolate my first problem.

I have two versions of a simple test program below, one which even returns incorrect results in gcc. The tests below run on Linux using Debian vendor build of clang 3.8.1 and on macos with the Xcode 8.3.1 vendor build of clang. I don’t have -fast-math enabled so I would expect standards compliant behaviour. I would like to know what optimisations are preventing floating point accrued exceptions from being set and how to disable these optimisation so that I am get deterministic results, then I can try to reproduce my first issue in isolation.

- fcvt1.c triggers the same issue with gcc (FE_INEXACT not set for inexact conversion)
- fcvt2.c triggers the issue only with clang (FE_INEXACT not set for inexact conversion)
- no reproducer yet… (FE_INEXACT set after exact conversion)

Happy Holidays,

Michael.

$ gcc --version
gcc (Debian 6.3.0-6) 6.3.0 20170205
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ gcc -O0 -lm fcvt1.c
$ ./a.out
1 exact
1 inexact
$ gcc -O3 -lm fcvt1.c
$ ./a.out
1 exact
1 exact
$ gcc -O0 -lm fcvt2.c
$ ./a.out
1 exact
1 inexact
$ gcc -O3 -lm fcvt2.c
$ ./a.out
1 exact
1 inexact

$ clang --version
clang version 3.8.1-16 (tags/RELEASE_381/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

$ clang -O0 -lm fcvt1.c
$ ./a.out
1 exact
1 inexact
$ clang -O3 -lm fcvt1.c
$ ./a.out
1 exact
1 exact
$ clang -O0 -lm fcvt2.c
$ ./a.out
1 exact
1 inexact
$ clang -O3 -lm fcvt2.c
$ ./a.out
1 exact
1 exact

$ clang --version
Apple LLVM version 8.1.0 (clang-802.0.41)
Target: x86_64-apple-darwin16.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

$ cc -O0 fcvt1.c
$ ./a.out
1 exact
1 inexact
$ cc -O3 fcvt1.c
$ ./a.out
1 exact
1 exact
$ cc -O0 fcvt2.c
$ ./a.out
1 exact
1 inexact
$ cc -O3 fcvt2.c
$ ./a.out
1 exact
1 exact


$ cat fcvt1.c
#include <stdio.h>
#include <fenv.h>

unsigned fcvt(float a)
{
      return (unsigned)a;
}

int main()
{
      fesetround(FE_TONEAREST);

      feclearexcept(FE_ALL_EXCEPT);
      printf("%d ", fcvt(1.0f));
      printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");

      feclearexcept(FE_ALL_EXCEPT);
      printf("%d ", fcvt(1.1f));
      printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");
}


$ cat fcvt2.c
#include <stdio.h>
#include <fenv.h>

unsigned fcvt(float a)
{
      return (unsigned)a;
}

void test_fcvt(float a)
{
      feclearexcept(FE_ALL_EXCEPT);
      printf("%d ", fcvt(a));
      printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");
}

int main()
{
      fesetround(FE_TONEAREST);

      test_fcvt(1.0f);
      test_fcvt(1.1f);
}

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev




_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: FE_INEXACT being set for an exact conversion from float to unsigned long long

Hal Finkel via cfe-dev
You’re hitting https://bugs.llvm.org/show_bug.cgi?id=17686.

Which is actually precisely the same “compiler does not model FENV_ACCESS” bug, just in the compiler’s built-in lowering for fp-to-unsigned conversion instead of in your code (because x86—pre AVX-512F—does not have a native float-to-unsigned conversion).

The real fix for all of these issues is to implement FENV_ACCESS.

FWIW the "std::isnan(f) | ((f >= 0) & std::isinf(f))) ? std::numeric_limits<u64>::max()” dance in the rest of your conversion gives me pause; what are you trying to do? It’s pretty odd to clamp nan and inf to u32::max but leave the result for all values between UINT64_MAX + 1 and infinity undefined.

– Steve

> On Apr 18, 2017, at 4:56 PM, Michael Clark <[hidden email]> wrote:
>
> Hi,
>
> I’ve reproduced my original issue. This issue is FE_INEXACT set for an exact conversion from float to unsigned long long.
>
> The prior issue was eager inlining and constant folding causing missing updates to the floating point accrued exception flags when optimisation was enabled.
>
> This second issue appears not to be an eager optimisation or constant folding issue.
>
> - float to unsigned int conversion appears to be okay.
> - float to unsigned long long conversion appears to incorrectly update the accrued exception flags.
>
> Note the code explicitly casts from float to unsigned and then to signed. The first cast is to select float conversion to unsigned, and the outer cast is a sign extension indicator as all RISC-V integers are canonically sign extended to the width of the widest type (unlike x86). Returning a signed type of a smaller width will automatically sign extend when assigned to a larger signed type (the code came from a template) which is why we have extra casts. While the sign extension is redundant on 64-bit it isn’t for u128 and s128 which we intend to support.
>
> - https://godbolt.org/g/kvSm5J
>
> Any insight would be greatly appreciated.
>
> Michael.
>
>
> $ g++ -O3 -lm fcvt.cc
> $ ./a.out
> 1 exact
> 1 inexact
> 1 exact
> 1 inexact
>
>
> $ clang++ -O3 -lm fcvt.cc
> $ ./a.out
> 1 exact
> 1 inexact
> 1 inexact
> 1 inexact
>
>
> $ cat fcvt.cc
> #include <cstdio>
> #include <cmath>
> #include <cfenv>
> #include <limits>
>
> typedef signed int         s32;
> typedef unsigned int       u32;
> typedef signed long long   s64;
> typedef unsigned long long u64;
>
> __attribute__ ((noinline)) s32 fcvt_wu(float f)
> {
> return (std::isnan(f) | ((f >= 0) & std::isinf(f)))
> ? std::numeric_limits<u32>::max()
> : s32(u32(f));
> }
>
> __attribute__ ((noinline)) s64 fcvt_lu(float f)
> {
> return (std::isnan(f) | ((f >= 0) & std::isinf(f)))
> ? std::numeric_limits<u64>::max()
> : s64(u64(f));
> }
>
> void test_fcvt_wu(float a)
> {
> feclearexcept(FE_ALL_EXCEPT);
> printf("%d ", fcvt_wu(a));
> printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");
> }
>
> void test_fcvt_lu(float a)
> {      
> feclearexcept(FE_ALL_EXCEPT);
> printf("%lld ", fcvt_lu(a));
> printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");
> }
>
> int main()
> {
> fesetround(FE_TONEAREST);
>
> test_fcvt_wu(1.0f);
> test_fcvt_wu(1.1f);
> test_fcvt_lu(1.0f);
> test_fcvt_lu(1.1f);
> }
>
>
>> On 18 Apr 2017, at 10:51 AM, Michael Clark <[hidden email]> wrote:
>>
>>
>>> On 18 Apr 2017, at 1:08 AM, Stephen Canon <[hidden email]> wrote:
>>>
>>> Hi Michael —
>>>
>>> You’re dancing around a real issue in clang (and most other compilers), but it’s camouflaged by a few issues in your code. I’ll address those first:
>>>
>>> 1. If you want to read or set the floating-point environment, your code must contain:
>>>
>>> #pragma STDC FENV_ACCESS ON
>>
>> Yes, I tried that first and got the warning.
>>
>>> If you do not have this pragma, all bets are off. The compiler is free to re-arrange your calls to fe* functions, treat the floating-point environment as constant, or eliminate them all together. See §7.6.1 of the C standard for more details, in particular, the following sentence:
>>>
>>>> If part of a program tests floating-point status flags, sets floating-point control modes, or runs under non-default mode settings, but was translated with the state for the FENV_ACCESS pragma ‘‘off’’, the behavior is undefined.
>>>
>>>
>>> If you add this pragma to your code example, you’ll get a helpful warning from clang that FENV_ACCESS is not [yet] supported.
>>
>> Interesting. I’m sure the scientific computing folk will be interested in having this working. Many IEEE-754 compliant ISAs support floating point accrued exceptions. In fact I am working on a RISC-V simulator and binary translator so ultimately the C code will be translated to x86_64 asm and I’ll read MXCSR directly however I’m currently reversing the compiler asm output for the (working) conversions. I wanted the C cast based conversions to work reliably on gcc and clang for a reference interpreter that I am using to test a binary translating JIT engine.
>>
>>> 2. Also in §7.6, you will note the following sentence (third bullet in paragraph 3):
>>>
>>>> a function call is assumed to have the potential for raising floating-point exceptions, unless its documentation promises otherwise.
>>>
>>> In particular, your code calls `printf` between `feclearexcept` and `fetestexcept`. To the best of my recollection, `printf` is not documented as not modifying the floating-point environment, so once you call it, all bets are off w.r.t. the floating-point state, even if you set FENV_ACCESS ON.
>>
>> I can modify the test to fetch the exception before the printf but I don’t believe it will make any difference as I am only printing an integer not a double. In the code where the problem exists, I explicitly save and restore the floating point accrued exception state in logging routines as I’ve already encountered the issue where printf with a double stomps on the floating point accrued exception state. I’ve in fact ported gdtoa and friends to C++ from FreeBSD’s libc. However, in this case I am only printing integers so it should have no effect on the floating point accrued exception state.
>>
>> Indeed. I have a variadic template formatter replacement for snprintf that does not use varargs. It is derived from FreeBSD’s snprintf and David M Gay’s gdtoa. It has been updated to type box arguments using a variadic template wrapper. It emits a fixed size stack frame and it buffers in std::string  <https://github.com/michaeljclark/c-fmt/>. It relies on the wrapper being inlined. Note: the code is missing extern inline and I’ve since moved part of the implementation from headers into compiled modules but have not yet updated c+fmt.
>>
>> As an aside, a C++2n string formatter that does not depend on iostream/stringstream would be a nice addition to the standard. A familiar snprintf style interface using format strings, but without all of the buffer woes. It also needs to support formatting QP (Quad Precision) so I intend to update gdtoa to a template that is parameterised for variable exponent and significand using type information structs:
>>
>> https://github.com/michaeljclark/riscv-meta/blob/07d3af92b235b0e366c5af76ff65805c49812392/src/asm/fpu.h#L46-L110
>>
>>> OK, now the real issue in clang: it doesn’t [yet] support FENV_ACCESS. Neither does GCC. There’s been some motion recently toward adding support for FENV_ACCESS, but it’s a largish project, and it hasn’t happened yet. Both compilers, when optimization is enabled, simply replace your call to fcvt(1.1) with 1 (because they don’t support FENV_ACCESS). GCC happens to “work” in your second example because it inlines `fcvt` into `test_fcvt`, but doesn’t inline `test_fcvt` into `main`, clang inlines both, does constant propagation, and no flags are raised.
>>
>> I knew it was inlining which is why I moved the code to an (default visibility extern) function which gcc seems to handle and I have been dumping asm output from both of the compilers. It would be interesting if there was a mode where default visibility extern functions where not inlined unless they were declared extern inline. I can understand static functions or template instantiation being inlined, but default visibility extern is a different issue. gcc seems to be more conservative with “non static" functions.
>>
>>> godbolt.org is a good resource to see what’s going on here, though it won’t tell you *why*:
>>> https://godbolt.org/g/Zb8Eoc
>>
>> Yes Matt Godbolt’s tools is very useful. I use objdump (and otool -tV on macos) a lot too, but I thought there might be a compiler flag for conservative handling of floating point to retain floating point accrued exceptions. I was unaware of the level of support for floating point accrued exceptions. I’ve added __attribute__ ((noinline)) to the second version and it now works with -O3. There should be a flag e.g. -fenv-ieee745 that somehow carries exception state even when inlining or disables inlining for functions that perform conversions or use any operations that require rounding of floating point values.
>>
>> - https://godbolt.org/g/PH60E3
>>
>> I’ll work on reproducing my original issue (FE_INEXACT for exact conversion) in isolation using __attribute__ ((noinline)) …
>>
>> Thanks,
>> Michael.
>>
>>> Best,
>>> – Steve
>>>
>>>> On Apr 15, 2017, at 5:51 PM, Michael Clark via cfe-dev <[hidden email]> wrote:
>>>>
>>>> Hi,
>>>>
>>>> First, apologies if this is not the right place to post.
>>>>
>>>> I am seeing unexpected values in the floating point accrued exception flags with clang generated programs. My original issue is seeing FE_INEXACT after an exact float to unsigned int conversion within a ternary expression. This issue does not occur with gcc. In trying to isolate the problem I wrote a simple test program, which results in completely opposite behaviour. FE_INEXACT is not getting set for an inexact conversion when optimisation is enabled.
>>>>
>>>> Given I’m not yet seeing predictable results for accrued exception flags, I gave up trying to reproduce my original issue (FE_INEXACT for exact conversion) until I am certain which floating point optimisations are being enabled, and under what conditions floating point accrued exceptions are optimised away, otherwise I can’t be sure to isolate my first problem.
>>>>
>>>> I have two versions of a simple test program below, one which even returns incorrect results in gcc. The tests below run on Linux using Debian vendor build of clang 3.8.1 and on macos with the Xcode 8.3.1 vendor build of clang. I don’t have -fast-math enabled so I would expect standards compliant behaviour. I would like to know what optimisations are preventing floating point accrued exceptions from being set and how to disable these optimisation so that I am get deterministic results, then I can try to reproduce my first issue in isolation.
>>>>
>>>> - fcvt1.c triggers the same issue with gcc (FE_INEXACT not set for inexact conversion)
>>>> - fcvt2.c triggers the issue only with clang (FE_INEXACT not set for inexact conversion)
>>>> - no reproducer yet… (FE_INEXACT set after exact conversion)
>>>>
>>>> Happy Holidays,
>>>>
>>>> Michael.
>>>>
>>>> $ gcc --version
>>>> gcc (Debian 6.3.0-6) 6.3.0 20170205
>>>> Copyright (C) 2016 Free Software Foundation, Inc.
>>>> This is free software; see the source for copying conditions.  There is NO
>>>> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>>>>
>>>> $ gcc -O0 -lm fcvt1.c
>>>> $ ./a.out
>>>> 1 exact
>>>> 1 inexact
>>>> $ gcc -O3 -lm fcvt1.c
>>>> $ ./a.out
>>>> 1 exact
>>>> 1 exact
>>>> $ gcc -O0 -lm fcvt2.c
>>>> $ ./a.out
>>>> 1 exact
>>>> 1 inexact
>>>> $ gcc -O3 -lm fcvt2.c
>>>> $ ./a.out
>>>> 1 exact
>>>> 1 inexact
>>>>
>>>> $ clang --version
>>>> clang version 3.8.1-16 (tags/RELEASE_381/final)
>>>> Target: x86_64-pc-linux-gnu
>>>> Thread model: posix
>>>> InstalledDir: /usr/bin
>>>>
>>>> $ clang -O0 -lm fcvt1.c
>>>> $ ./a.out
>>>> 1 exact
>>>> 1 inexact
>>>> $ clang -O3 -lm fcvt1.c
>>>> $ ./a.out
>>>> 1 exact
>>>> 1 exact
>>>> $ clang -O0 -lm fcvt2.c
>>>> $ ./a.out
>>>> 1 exact
>>>> 1 inexact
>>>> $ clang -O3 -lm fcvt2.c
>>>> $ ./a.out
>>>> 1 exact
>>>> 1 exact
>>>>
>>>> $ clang --version
>>>> Apple LLVM version 8.1.0 (clang-802.0.41)
>>>> Target: x86_64-apple-darwin16.5.0
>>>> Thread model: posix
>>>> InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
>>>>
>>>> $ cc -O0 fcvt1.c
>>>> $ ./a.out
>>>> 1 exact
>>>> 1 inexact
>>>> $ cc -O3 fcvt1.c
>>>> $ ./a.out
>>>> 1 exact
>>>> 1 exact
>>>> $ cc -O0 fcvt2.c
>>>> $ ./a.out
>>>> 1 exact
>>>> 1 inexact
>>>> $ cc -O3 fcvt2.c
>>>> $ ./a.out
>>>> 1 exact
>>>> 1 exact
>>>>
>>>>
>>>> $ cat fcvt1.c
>>>> #include <stdio.h>
>>>> #include <fenv.h>
>>>>
>>>> unsigned fcvt(float a)
>>>> {
>>>>       return (unsigned)a;
>>>> }
>>>>
>>>> int main()
>>>> {
>>>>       fesetround(FE_TONEAREST);
>>>>
>>>>       feclearexcept(FE_ALL_EXCEPT);
>>>>       printf("%d ", fcvt(1.0f));
>>>>       printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");
>>>>
>>>>       feclearexcept(FE_ALL_EXCEPT);
>>>>       printf("%d ", fcvt(1.1f));
>>>>       printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");
>>>> }
>>>>
>>>>
>>>> $ cat fcvt2.c
>>>> #include <stdio.h>
>>>> #include <fenv.h>
>>>>
>>>> unsigned fcvt(float a)
>>>> {
>>>>       return (unsigned)a;
>>>> }
>>>>
>>>> void test_fcvt(float a)
>>>> {
>>>>       feclearexcept(FE_ALL_EXCEPT);
>>>>       printf("%d ", fcvt(a));
>>>>       printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");
>>>> }
>>>>
>>>> int main()
>>>> {
>>>>       fesetround(FE_TONEAREST);
>>>>
>>>>       test_fcvt(1.0f);
>>>>       test_fcvt(1.1f);
>>>> }
>>>>
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> [hidden email]
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>>
>

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: FE_INEXACT being set for an exact conversion from float to unsigned long long

Hal Finkel via cfe-dev
Forgot to reply all.

On 19 Apr 2017, at 10:03 AM, Michael Clark <[hidden email]> wrote:


On 19 Apr 2017, at 9:52 AM, Stephen Canon <[hidden email]> wrote:

You’re hitting https://bugs.llvm.org/show_bug.cgi?id=17686.

Which is actually precisely the same “compiler does not model FENV_ACCESS” bug, just in the compiler’s built-in lowering for fp-to-unsigned conversion instead of in your code (because x86—pre AVX-512F—does not have a native float-to-unsigned conversion).

I’m aware.


The real fix for all of these issues is to implement FENV_ACCESS.

I think I’ll need inline asm then.

FWIW the "std::isnan(f) | ((f >= 0) & std::isinf(f))) ? std::numeric_limits<u64>::max()” dance in the rest of your conversion gives me pause; what are you trying to do? It’s pretty odd to clamp nan and inf to u32::max but leave the result for all values between UINT64_MAX + 1 and infinity undefined.

The defined behaviour for RISC-V is to convert NaN and positive infinity to UINT_MAX, while the remainder is already handled within the behaviour of the intrinsic conversion i.e. the signed positive wraps around to produce the values between INT_MAX and UINT_MAX. It could be tightened up a little. Values below -1 are clamped to 0 (-1 can round up to 0). It’s actually a subset of the whole expression which I trimmed for the test case. It’s so that the conversion passes the RISC-V compliance test suite which has different defined behaviour, whereas the behaviour in C may be undefined.

It’s going to be asm anyway… as we are writing a JIT.

– Steve

On Apr 18, 2017, at 4:56 PM, Michael Clark <[hidden email]> wrote:

Hi,

I’ve reproduced my original issue. This issue is FE_INEXACT set for an exact conversion from float to unsigned long long.

The prior issue was eager inlining and constant folding causing missing updates to the floating point accrued exception flags when optimisation was enabled.

This second issue appears not to be an eager optimisation or constant folding issue.

- float to unsigned int conversion appears to be okay.
- float to unsigned long long conversion appears to incorrectly update the accrued exception flags.

Note the code explicitly casts from float to unsigned and then to signed. The first cast is to select float conversion to unsigned, and the outer cast is a sign extension indicator as all RISC-V integers are canonically sign extended to the width of the widest type (unlike x86). Returning a signed type of a smaller width will automatically sign extend when assigned to a larger signed type (the code came from a template) which is why we have extra casts. While the sign extension is redundant on 64-bit it isn’t for u128 and s128 which we intend to support.

- https://godbolt.org/g/kvSm5J

Any insight would be greatly appreciated.

Michael.


$ g++ -O3 -lm fcvt.cc
$ ./a.out
1 exact
1 inexact
1 exact
1 inexact


$ clang++ -O3 -lm fcvt.cc
$ ./a.out
1 exact
1 inexact
1 inexact
1 inexact


$ cat fcvt.cc
#include <cstdio>
#include <cmath>
#include <cfenv>
#include <limits>

typedef signed int         s32;
typedef unsigned int       u32;
typedef signed long long   s64;
typedef unsigned long long u64;

__attribute__ ((noinline)) s32 fcvt_wu(float f)
{
return (std::isnan(f) | ((f >= 0) & std::isinf(f)))
? std::numeric_limits<u32>::max()
: s32(u32(f));
}

__attribute__ ((noinline)) s64 fcvt_lu(float f)
{
return (std::isnan(f) | ((f >= 0) & std::isinf(f)))
? std::numeric_limits<u64>::max()
: s64(u64(f));
}

void test_fcvt_wu(float a)
{
feclearexcept(FE_ALL_EXCEPT);
printf("%d ", fcvt_wu(a));
printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");
}

void test_fcvt_lu(float a)
{       
feclearexcept(FE_ALL_EXCEPT);
printf("%lld ", fcvt_lu(a));
printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");
}

int main()
{
fesetround(FE_TONEAREST);

test_fcvt_wu(1.0f);
test_fcvt_wu(1.1f);
test_fcvt_lu(1.0f);
test_fcvt_lu(1.1f);
}


On 18 Apr 2017, at 10:51 AM, Michael Clark <[hidden email]> wrote:


On 18 Apr 2017, at 1:08 AM, Stephen Canon <[hidden email]> wrote:

Hi Michael —

You’re dancing around a real issue in clang (and most other compilers), but it’s camouflaged by a few issues in your code. I’ll address those first:

1. If you want to read or set the floating-point environment, your code must contain:

#pragma STDC FENV_ACCESS ON

Yes, I tried that first and got the warning.

If you do not have this pragma, all bets are off. The compiler is free to re-arrange your calls to fe* functions, treat the floating-point environment as constant, or eliminate them all together. See §7.6.1 of the C standard for more details, in particular, the following sentence:

If part of a program tests floating-point status flags, sets floating-point control modes, or runs under non-default mode settings, but was translated with the state for the FENV_ACCESS pragma ‘‘off’’, the behavior is undefined.


If you add this pragma to your code example, you’ll get a helpful warning from clang that FENV_ACCESS is not [yet] supported.

Interesting. I’m sure the scientific computing folk will be interested in having this working. Many IEEE-754 compliant ISAs support floating point accrued exceptions. In fact I am working on a RISC-V simulator and binary translator so ultimately the C code will be translated to x86_64 asm and I’ll read MXCSR directly however I’m currently reversing the compiler asm output for the (working) conversions. I wanted the C cast based conversions to work reliably on gcc and clang for a reference interpreter that I am using to test a binary translating JIT engine.

2. Also in §7.6, you will note the following sentence (third bullet in paragraph 3):

a function call is assumed to have the potential for raising floating-point exceptions, unless its documentation promises otherwise.

In particular, your code calls `printf` between `feclearexcept` and `fetestexcept`. To the best of my recollection, `printf` is not documented as not modifying the floating-point environment, so once you call it, all bets are off w.r.t. the floating-point state, even if you set FENV_ACCESS ON.

I can modify the test to fetch the exception before the printf but I don’t believe it will make any difference as I am only printing an integer not a double. In the code where the problem exists, I explicitly save and restore the floating point accrued exception state in logging routines as I’ve already encountered the issue where printf with a double stomps on the floating point accrued exception state. I’ve in fact ported gdtoa and friends to C++ from FreeBSD’s libc. However, in this case I am only printing integers so it should have no effect on the floating point accrued exception state.

Indeed. I have a variadic template formatter replacement for snprintf that does not use varargs. It is derived from FreeBSD’s snprintf and David M Gay’s gdtoa. It has been updated to type box arguments using a variadic template wrapper. It emits a fixed size stack frame and it buffers in std::string  <https://github.com/michaeljclark/c-fmt/>. It relies on the wrapper being inlined. Note: the code is missing extern inline and I’ve since moved part of the implementation from headers into compiled modules but have not yet updated c+fmt.

As an aside, a C++2n string formatter that does not depend on iostream/stringstream would be a nice addition to the standard. A familiar snprintf style interface using format strings, but without all of the buffer woes. It also needs to support formatting QP (Quad Precision) so I intend to update gdtoa to a template that is parameterised for variable exponent and significand using type information structs:

https://github.com/michaeljclark/riscv-meta/blob/07d3af92b235b0e366c5af76ff65805c49812392/src/asm/fpu.h#L46-L110

OK, now the real issue in clang: it doesn’t [yet] support FENV_ACCESS. Neither does GCC. There’s been some motion recently toward adding support for FENV_ACCESS, but it’s a largish project, and it hasn’t happened yet. Both compilers, when optimization is enabled, simply replace your call to fcvt(1.1) with 1 (because they don’t support FENV_ACCESS). GCC happens to “work” in your second example because it inlines `fcvt` into `test_fcvt`, but doesn’t inline `test_fcvt` into `main`, clang inlines both, does constant propagation, and no flags are raised.

I knew it was inlining which is why I moved the code to an (default visibility extern) function which gcc seems to handle and I have been dumping asm output from both of the compilers. It would be interesting if there was a mode where default visibility extern functions where not inlined unless they were declared extern inline. I can understand static functions or template instantiation being inlined, but default visibility extern is a different issue. gcc seems to be more conservative with “non static" functions.

godbolt.org is a good resource to see what’s going on here, though it won’t tell you *why*:
https://godbolt.org/g/Zb8Eoc

Yes Matt Godbolt’s tools is very useful. I use objdump (and otool -tV on macos) a lot too, but I thought there might be a compiler flag for conservative handling of floating point to retain floating point accrued exceptions. I was unaware of the level of support for floating point accrued exceptions. I’ve added __attribute__ ((noinline)) to the second version and it now works with -O3. There should be a flag e.g. -fenv-ieee745 that somehow carries exception state even when inlining or disables inlining for functions that perform conversions or use any operations that require rounding of floating point values.

- https://godbolt.org/g/PH60E3

I’ll work on reproducing my original issue (FE_INEXACT for exact conversion) in isolation using __attribute__ ((noinline)) …

Thanks,
Michael.

Best,
– Steve

On Apr 15, 2017, at 5:51 PM, Michael Clark via cfe-dev <[hidden email]> wrote:

Hi,

First, apologies if this is not the right place to post.

I am seeing unexpected values in the floating point accrued exception flags with clang generated programs. My original issue is seeing FE_INEXACT after an exact float to unsigned int conversion within a ternary expression. This issue does not occur with gcc. In trying to isolate the problem I wrote a simple test program, which results in completely opposite behaviour. FE_INEXACT is not getting set for an inexact conversion when optimisation is enabled.

Given I’m not yet seeing predictable results for accrued exception flags, I gave up trying to reproduce my original issue (FE_INEXACT for exact conversion) until I am certain which floating point optimisations are being enabled, and under what conditions floating point accrued exceptions are optimised away, otherwise I can’t be sure to isolate my first problem.

I have two versions of a simple test program below, one which even returns incorrect results in gcc. The tests below run on Linux using Debian vendor build of clang 3.8.1 and on macos with the Xcode 8.3.1 vendor build of clang. I don’t have -fast-math enabled so I would expect standards compliant behaviour. I would like to know what optimisations are preventing floating point accrued exceptions from being set and how to disable these optimisation so that I am get deterministic results, then I can try to reproduce my first issue in isolation.

- fcvt1.c triggers the same issue with gcc (FE_INEXACT not set for inexact conversion)
- fcvt2.c triggers the issue only with clang (FE_INEXACT not set for inexact conversion)
- no reproducer yet… (FE_INEXACT set after exact conversion)

Happy Holidays,

Michael.

$ gcc --version
gcc (Debian 6.3.0-6) 6.3.0 20170205
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ gcc -O0 -lm fcvt1.c
$ ./a.out
1 exact
1 inexact
$ gcc -O3 -lm fcvt1.c
$ ./a.out
1 exact
1 exact
$ gcc -O0 -lm fcvt2.c
$ ./a.out
1 exact
1 inexact
$ gcc -O3 -lm fcvt2.c
$ ./a.out
1 exact
1 inexact

$ clang --version
clang version 3.8.1-16 (tags/RELEASE_381/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

$ clang -O0 -lm fcvt1.c
$ ./a.out
1 exact
1 inexact
$ clang -O3 -lm fcvt1.c
$ ./a.out
1 exact
1 exact
$ clang -O0 -lm fcvt2.c
$ ./a.out
1 exact
1 inexact
$ clang -O3 -lm fcvt2.c
$ ./a.out
1 exact
1 exact

$ clang --version
Apple LLVM version 8.1.0 (clang-802.0.41)
Target: x86_64-apple-darwin16.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

$ cc -O0 fcvt1.c
$ ./a.out
1 exact
1 inexact
$ cc -O3 fcvt1.c
$ ./a.out
1 exact
1 exact
$ cc -O0 fcvt2.c
$ ./a.out
1 exact
1 inexact
$ cc -O3 fcvt2.c
$ ./a.out
1 exact
1 exact


$ cat fcvt1.c
#include <stdio.h>
#include <fenv.h>

unsigned fcvt(float a)
{
    return (unsigned)a;
}

int main()
{
    fesetround(FE_TONEAREST);

    feclearexcept(FE_ALL_EXCEPT);
    printf("%d ", fcvt(1.0f));
    printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");

    feclearexcept(FE_ALL_EXCEPT);
    printf("%d ", fcvt(1.1f));
    printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");
}


$ cat fcvt2.c
#include <stdio.h>
#include <fenv.h>

unsigned fcvt(float a)
{
    return (unsigned)a;
}

void test_fcvt(float a)
{
    feclearexcept(FE_ALL_EXCEPT);
    printf("%d ", fcvt(a));
    printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");
}

int main()
{
    fesetround(FE_TONEAREST);

    test_fcvt(1.0f);
    test_fcvt(1.1f);
}

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev







_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: FE_INEXACT being set for an exact conversion from float to unsigned long long

Hal Finkel via cfe-dev
When I get time I’ll try to figure out what’s happening in the x86_64 float to uint64_t conversion codegen. Clang AST though to LLVM lowering is new to me so it might take me some time.

While I understand FENV_ACCESS is a large milestone, clang already generates code on x86_64 that correctly sets the accrued exception state for a large number of the floating point conversions I have been testing. The only way towards completing a milestone is via fixing a number of small issues along the way…

Michael.

On 19 Apr 2017, at 10:06 AM, Michael Clark <[hidden email]> wrote:

Forgot to reply all.

On 19 Apr 2017, at 10:03 AM, Michael Clark <[hidden email]> wrote:


On 19 Apr 2017, at 9:52 AM, Stephen Canon <[hidden email]> wrote:

You’re hitting https://bugs.llvm.org/show_bug.cgi?id=17686.

Which is actually precisely the same “compiler does not model FENV_ACCESS” bug, just in the compiler’s built-in lowering for fp-to-unsigned conversion instead of in your code (because x86—pre AVX-512F—does not have a native float-to-unsigned conversion).

I’m aware.


The real fix for all of these issues is to implement FENV_ACCESS.

I think I’ll need inline asm then.

FWIW the "std::isnan(f) | ((f >= 0) & std::isinf(f))) ? std::numeric_limits<u64>::max()” dance in the rest of your conversion gives me pause; what are you trying to do? It’s pretty odd to clamp nan and inf to u32::max but leave the result for all values between UINT64_MAX + 1 and infinity undefined.

The defined behaviour for RISC-V is to convert NaN and positive infinity to UINT_MAX, while the remainder is already handled within the behaviour of the intrinsic conversion i.e. the signed positive wraps around to produce the values between INT_MAX and UINT_MAX. It could be tightened up a little. Values below -1 are clamped to 0 (-1 can round up to 0). It’s actually a subset of the whole expression which I trimmed for the test case. It’s so that the conversion passes the RISC-V compliance test suite which has different defined behaviour, whereas the behaviour in C may be undefined.

It’s going to be asm anyway… as we are writing a JIT.

– Steve

On Apr 18, 2017, at 4:56 PM, Michael Clark <[hidden email]> wrote:

Hi,

I’ve reproduced my original issue. This issue is FE_INEXACT set for an exact conversion from float to unsigned long long.

The prior issue was eager inlining and constant folding causing missing updates to the floating point accrued exception flags when optimisation was enabled.

This second issue appears not to be an eager optimisation or constant folding issue.

- float to unsigned int conversion appears to be okay.
- float to unsigned long long conversion appears to incorrectly update the accrued exception flags.

Note the code explicitly casts from float to unsigned and then to signed. The first cast is to select float conversion to unsigned, and the outer cast is a sign extension indicator as all RISC-V integers are canonically sign extended to the width of the widest type (unlike x86). Returning a signed type of a smaller width will automatically sign extend when assigned to a larger signed type (the code came from a template) which is why we have extra casts. While the sign extension is redundant on 64-bit it isn’t for u128 and s128 which we intend to support.

- https://godbolt.org/g/kvSm5J

Any insight would be greatly appreciated.

Michael.


$ g++ -O3 -lm fcvt.cc
$ ./a.out
1 exact
1 inexact
1 exact
1 inexact


$ clang++ -O3 -lm fcvt.cc
$ ./a.out
1 exact
1 inexact
1 inexact
1 inexact


$ cat fcvt.cc
#include <cstdio>
#include <cmath>
#include <cfenv>
#include <limits>

typedef signed int         s32;
typedef unsigned int       u32;
typedef signed long long   s64;
typedef unsigned long long u64;

__attribute__ ((noinline)) s32 fcvt_wu(float f)
{
return (std::isnan(f) | ((f >= 0) & std::isinf(f)))
? std::numeric_limits<u32>::max()
: s32(u32(f));
}

__attribute__ ((noinline)) s64 fcvt_lu(float f)
{
return (std::isnan(f) | ((f >= 0) & std::isinf(f)))
? std::numeric_limits<u64>::max()
: s64(u64(f));
}

void test_fcvt_wu(float a)
{
feclearexcept(FE_ALL_EXCEPT);
printf("%d ", fcvt_wu(a));
printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");
}

void test_fcvt_lu(float a)
{       
feclearexcept(FE_ALL_EXCEPT);
printf("%lld ", fcvt_lu(a));
printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");
}

int main()
{
fesetround(FE_TONEAREST);

test_fcvt_wu(1.0f);
test_fcvt_wu(1.1f);
test_fcvt_lu(1.0f);
test_fcvt_lu(1.1f);
}


On 18 Apr 2017, at 10:51 AM, Michael Clark <[hidden email]> wrote:


On 18 Apr 2017, at 1:08 AM, Stephen Canon <[hidden email]> wrote:

Hi Michael —

You’re dancing around a real issue in clang (and most other compilers), but it’s camouflaged by a few issues in your code. I’ll address those first:

1. If you want to read or set the floating-point environment, your code must contain:

#pragma STDC FENV_ACCESS ON

Yes, I tried that first and got the warning.

If you do not have this pragma, all bets are off. The compiler is free to re-arrange your calls to fe* functions, treat the floating-point environment as constant, or eliminate them all together. See §7.6.1 of the C standard for more details, in particular, the following sentence:

If part of a program tests floating-point status flags, sets floating-point control modes, or runs under non-default mode settings, but was translated with the state for the FENV_ACCESS pragma ‘‘off’’, the behavior is undefined.


If you add this pragma to your code example, you’ll get a helpful warning from clang that FENV_ACCESS is not [yet] supported.

Interesting. I’m sure the scientific computing folk will be interested in having this working. Many IEEE-754 compliant ISAs support floating point accrued exceptions. In fact I am working on a RISC-V simulator and binary translator so ultimately the C code will be translated to x86_64 asm and I’ll read MXCSR directly however I’m currently reversing the compiler asm output for the (working) conversions. I wanted the C cast based conversions to work reliably on gcc and clang for a reference interpreter that I am using to test a binary translating JIT engine.

2. Also in §7.6, you will note the following sentence (third bullet in paragraph 3):

a function call is assumed to have the potential for raising floating-point exceptions, unless its documentation promises otherwise.

In particular, your code calls `printf` between `feclearexcept` and `fetestexcept`. To the best of my recollection, `printf` is not documented as not modifying the floating-point environment, so once you call it, all bets are off w.r.t. the floating-point state, even if you set FENV_ACCESS ON.

I can modify the test to fetch the exception before the printf but I don’t believe it will make any difference as I am only printing an integer not a double. In the code where the problem exists, I explicitly save and restore the floating point accrued exception state in logging routines as I’ve already encountered the issue where printf with a double stomps on the floating point accrued exception state. I’ve in fact ported gdtoa and friends to C++ from FreeBSD’s libc. However, in this case I am only printing integers so it should have no effect on the floating point accrued exception state.

Indeed. I have a variadic template formatter replacement for snprintf that does not use varargs. It is derived from FreeBSD’s snprintf and David M Gay’s gdtoa. It has been updated to type box arguments using a variadic template wrapper. It emits a fixed size stack frame and it buffers in std::string  <https://github.com/michaeljclark/c-fmt/>. It relies on the wrapper being inlined. Note: the code is missing extern inline and I’ve since moved part of the implementation from headers into compiled modules but have not yet updated c+fmt.

As an aside, a C++2n string formatter that does not depend on iostream/stringstream would be a nice addition to the standard. A familiar snprintf style interface using format strings, but without all of the buffer woes. It also needs to support formatting QP (Quad Precision) so I intend to update gdtoa to a template that is parameterised for variable exponent and significand using type information structs:

https://github.com/michaeljclark/riscv-meta/blob/07d3af92b235b0e366c5af76ff65805c49812392/src/asm/fpu.h#L46-L110

OK, now the real issue in clang: it doesn’t [yet] support FENV_ACCESS. Neither does GCC. There’s been some motion recently toward adding support for FENV_ACCESS, but it’s a largish project, and it hasn’t happened yet. Both compilers, when optimization is enabled, simply replace your call to fcvt(1.1) with 1 (because they don’t support FENV_ACCESS). GCC happens to “work” in your second example because it inlines `fcvt` into `test_fcvt`, but doesn’t inline `test_fcvt` into `main`, clang inlines both, does constant propagation, and no flags are raised.

I knew it was inlining which is why I moved the code to an (default visibility extern) function which gcc seems to handle and I have been dumping asm output from both of the compilers. It would be interesting if there was a mode where default visibility extern functions where not inlined unless they were declared extern inline. I can understand static functions or template instantiation being inlined, but default visibility extern is a different issue. gcc seems to be more conservative with “non static" functions.

godbolt.org is a good resource to see what’s going on here, though it won’t tell you *why*:
https://godbolt.org/g/Zb8Eoc

Yes Matt Godbolt’s tools is very useful. I use objdump (and otool -tV on macos) a lot too, but I thought there might be a compiler flag for conservative handling of floating point to retain floating point accrued exceptions. I was unaware of the level of support for floating point accrued exceptions. I’ve added __attribute__ ((noinline)) to the second version and it now works with -O3. There should be a flag e.g. -fenv-ieee745 that somehow carries exception state even when inlining or disables inlining for functions that perform conversions or use any operations that require rounding of floating point values.

- https://godbolt.org/g/PH60E3

I’ll work on reproducing my original issue (FE_INEXACT for exact conversion) in isolation using __attribute__ ((noinline)) …

Thanks,
Michael.

Best,
– Steve

On Apr 15, 2017, at 5:51 PM, Michael Clark via cfe-dev <[hidden email]> wrote:

Hi,

First, apologies if this is not the right place to post.

I am seeing unexpected values in the floating point accrued exception flags with clang generated programs. My original issue is seeing FE_INEXACT after an exact float to unsigned int conversion within a ternary expression. This issue does not occur with gcc. In trying to isolate the problem I wrote a simple test program, which results in completely opposite behaviour. FE_INEXACT is not getting set for an inexact conversion when optimisation is enabled.

Given I’m not yet seeing predictable results for accrued exception flags, I gave up trying to reproduce my original issue (FE_INEXACT for exact conversion) until I am certain which floating point optimisations are being enabled, and under what conditions floating point accrued exceptions are optimised away, otherwise I can’t be sure to isolate my first problem.

I have two versions of a simple test program below, one which even returns incorrect results in gcc. The tests below run on Linux using Debian vendor build of clang 3.8.1 and on macos with the Xcode 8.3.1 vendor build of clang. I don’t have -fast-math enabled so I would expect standards compliant behaviour. I would like to know what optimisations are preventing floating point accrued exceptions from being set and how to disable these optimisation so that I am get deterministic results, then I can try to reproduce my first issue in isolation.

- fcvt1.c triggers the same issue with gcc (FE_INEXACT not set for inexact conversion)
- fcvt2.c triggers the issue only with clang (FE_INEXACT not set for inexact conversion)
- no reproducer yet… (FE_INEXACT set after exact conversion)

Happy Holidays,

Michael.

$ gcc --version
gcc (Debian 6.3.0-6) 6.3.0 20170205
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ gcc -O0 -lm fcvt1.c
$ ./a.out
1 exact
1 inexact
$ gcc -O3 -lm fcvt1.c
$ ./a.out
1 exact
1 exact
$ gcc -O0 -lm fcvt2.c
$ ./a.out
1 exact
1 inexact
$ gcc -O3 -lm fcvt2.c
$ ./a.out
1 exact
1 inexact

$ clang --version
clang version 3.8.1-16 (tags/RELEASE_381/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

$ clang -O0 -lm fcvt1.c
$ ./a.out
1 exact
1 inexact
$ clang -O3 -lm fcvt1.c
$ ./a.out
1 exact
1 exact
$ clang -O0 -lm fcvt2.c
$ ./a.out
1 exact
1 inexact
$ clang -O3 -lm fcvt2.c
$ ./a.out
1 exact
1 exact

$ clang --version
Apple LLVM version 8.1.0 (clang-802.0.41)
Target: x86_64-apple-darwin16.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

$ cc -O0 fcvt1.c
$ ./a.out
1 exact
1 inexact
$ cc -O3 fcvt1.c
$ ./a.out
1 exact
1 exact
$ cc -O0 fcvt2.c
$ ./a.out
1 exact
1 inexact
$ cc -O3 fcvt2.c
$ ./a.out
1 exact
1 exact


$ cat fcvt1.c
#include <stdio.h>
#include <fenv.h>

unsigned fcvt(float a)
{
    return (unsigned)a;
}

int main()
{
    fesetround(FE_TONEAREST);

    feclearexcept(FE_ALL_EXCEPT);
    printf("%d ", fcvt(1.0f));
    printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");

    feclearexcept(FE_ALL_EXCEPT);
    printf("%d ", fcvt(1.1f));
    printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");
}


$ cat fcvt2.c
#include <stdio.h>
#include <fenv.h>

unsigned fcvt(float a)
{
    return (unsigned)a;
}

void test_fcvt(float a)
{
    feclearexcept(FE_ALL_EXCEPT);
    printf("%d ", fcvt(a));
    printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" : "exact");
}

int main()
{
    fesetround(FE_TONEAREST);

    test_fcvt(1.0f);
    test_fcvt(1.1f);
}

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev








_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: FE_INEXACT being set for an exact conversion from float to unsigned long long

Hal Finkel via cfe-dev
On 18 April 2017 at 15:54, Michael Clark via cfe-dev
<[hidden email]> wrote:
> The only way towards completing a milestone is via fixing a number of small issues along
> the way…

I believe there's more to it than that. None of LLVM's optimizations
are aware of this extra side-channel of information (with possible
exceptions like avoiding speculating fdiv because of unavoidable
exceptions).

From what I remember, the real proposal is to replace all
floating-point IR with intrinsics when FENV_ACCESS is on, which the
optimizers by default won't have a clue about and will treat
conservatively (essentially like they're modifying external memory).

So be careful with drawing conclusions from small snippets; you're
probably not seeing the full range of LLVM's behaviour.

Cheers.

Tim.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: FE_INEXACT being set for an exact conversion from float to unsigned long long

Hal Finkel via cfe-dev

On 19 Apr 2017, at 1:14 PM, Tim Northover <[hidden email]> wrote:

On 18 April 2017 at 15:54, Michael Clark via cfe-dev
<[hidden email]> wrote:
The only way towards completing a milestone is via fixing a number of small issues along
the way…

I believe there's more to it than that. None of LLVM's optimizations
are aware of this extra side-channel of information (with possible
exceptions like avoiding speculating fdiv because of unavoidable
exceptions).

From what I remember, the real proposal is to replace all
floating-point IR with intrinsics when FENV_ACCESS is on, which the
optimizers by default won't have a clue about and will treat
conservatively (essentially like they're modifying external memory).

So be careful with drawing conclusions from small snippets; you're
probably not seeing the full range of LLVM's behaviour.


Yes. I’m sure.

It reproduces with just the cast on its own: https://godbolt.org/g/myUoL2

It appears to be in the LLVM lowering of the fptoui intrinsic so it must MC layer optimisations.

; Function Attrs: noinline nounwind uwtable
define i64 @_Z7fcvt_luf(float %f) #0 {
  %1 = alloca float, align 4
  store float %f, float* %1, align 4
  %2 = load float, float* %1, align 4
  %3 = fptoui float %2 to i64
  ret i64 %3
}

GCC performs a comparison with ucomiss and branches whereas Clang computes both forms and predicates the result using a conditional move. One of the conversions obviously is setting the INEXACT MXCSR flag.

Clang lowering (inexact set when result is exact):

fcvt_lu(float):
        movss   xmm1, dword ptr [rip + .LCPI1_0] # xmm1 = mem[0],zero,zero,zero
        movaps  xmm2, xmm0
        subss   xmm2, xmm1
        cvttss2si       rax, xmm2
        movabs  rcx, -9223372036854775808
        xor     rcx, rax
        cvttss2si       rax, xmm0
        ucomiss xmm0, xmm1
        cmovae  rax, rcx
        ret

GCC lowering (sets flags correctly):

fcvt_lu(float):
        ucomiss xmm0, DWORD PTR .LC0[rip]
        jnb     .L4
        cvttss2si       rax, xmm0
        ret
.L4:
        subss   xmm0, DWORD PTR .LC0[rip]
        movabs  rdx, -9223372036854775808
        cvttss2si       rax, xmm0
        xor     rax, rdx
        ret

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: FE_INEXACT being set for an exact conversion from float to unsigned long long

Hal Finkel via cfe-dev
I’m getting close. I think it may be an issue with an individual intrinsic. I’m looking for the X86 lowering of Instruction::FPToUI.

I found a comment around the rationale for using a conditional move versus a branch. I believe the predicate logic using a conditional move is causing INEXACT to be set from the other side of the predicate as the lowered x86_64 code executes both conversions whereas GCC uses a branch. That seems to be the difference.

I can’t find FPToUI in llvm/lib/Target/X86 so I’m trying to figure out what the cast gets renamed to in the target layer so I can find where the sequence is emitted.


$ more llvm/lib/Target/X86//README-X86-64.txt
Are we better off using branches instead of cmove to implement FP to
unsigned i64?

_conv:
        ucomiss LC0(%rip), %xmm0
        cvttss2siq      %xmm0, %rdx
        jb      L3
        subss   LC0(%rip), %xmm0
        movabsq $-9223372036854775808, %rax
        cvttss2siq      %xmm0, %rdx
        xorq    %rax, %rdx
L3:
        movq    %rdx, %rax
        ret

instead of

_conv:
        movss LCPI1_0(%rip), %xmm1
        cvttss2siq %xmm0, %rcx
        movaps %xmm0, %xmm2
        subss %xmm1, %xmm2
        cvttss2siq %xmm2, %rax
        movabsq $-9223372036854775808, %rdx
        xorq %rdx, %rax
        ucomiss %xmm1, %xmm0
        cmovb %rcx, %rax
        ret


On 19 Apr 2017, at 2:10 PM, Michael Clark <[hidden email]> wrote:


On 19 Apr 2017, at 1:14 PM, Tim Northover <[hidden email]> wrote:

On 18 April 2017 at 15:54, Michael Clark via cfe-dev
<[hidden email]> wrote:
The only way towards completing a milestone is via fixing a number of small issues along
the way…

I believe there's more to it than that. None of LLVM's optimizations
are aware of this extra side-channel of information (with possible
exceptions like avoiding speculating fdiv because of unavoidable
exceptions).

From what I remember, the real proposal is to replace all
floating-point IR with intrinsics when FENV_ACCESS is on, which the
optimizers by default won't have a clue about and will treat
conservatively (essentially like they're modifying external memory).

So be careful with drawing conclusions from small snippets; you're
probably not seeing the full range of LLVM's behaviour.


Yes. I’m sure.

It reproduces with just the cast on its own: https://godbolt.org/g/myUoL2

It appears to be in the LLVM lowering of the fptoui intrinsic so it must MC layer optimisations.

; Function Attrs: noinline nounwind uwtable
define i64 @_Z7fcvt_luf(float %f) #0 {
  %1 = alloca float, align 4
  store float %f, float* %1, align 4
  %2 = load float, float* %1, align 4
  %3 = fptoui float %2 to i64
  ret i64 %3
}

GCC performs a comparison with ucomiss and branches whereas Clang computes both forms and predicates the result using a conditional move. One of the conversions obviously is setting the INEXACT MXCSR flag.

Clang lowering (inexact set when result is exact):

fcvt_lu(float):
        movss   xmm1, dword ptr [rip + .LCPI1_0] # xmm1 = mem[0],zero,zero,zero
        movaps  xmm2, xmm0
        subss   xmm2, xmm1
        cvttss2si       rax, xmm2
        movabs  rcx, -9223372036854775808
        xor     rcx, rax
        cvttss2si       rax, xmm0
        ucomiss xmm0, xmm1
        cmovae  rax, rcx
        ret

GCC lowering (sets flags correctly):

fcvt_lu(float):
        ucomiss xmm0, DWORD PTR .LC0[rip]
        jnb     .L4
        cvttss2si       rax, xmm0
        ret
.L4:
        subss   xmm0, DWORD PTR .LC0[rip]
        movabs  rdx, -9223372036854775808
        cvttss2si       rax, xmm0
        xor     rax, rdx
        ret


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev