Converting float to int with FJCVTZS

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Converting float to int with FJCVTZS

Dimitry Andric via cfe-dev
Hi!

I'm working on a code base where a simulation needs to produce the exact same result on Aarch64 and x86_64 architectures.

This is indeed the case for the whole codebase, with one exception: Rounding floats to integers. Specifically, when we're in undefined behavior territory. In that case, you notice the difference between the emitted fcvtzu instruction on aarch64 (saturating cast) and cvttss2si on x86 (wrap-around).

Now, I know undefined behavior is not the main business of LLVM, but I wonder if it would be possible to ask it to emit FJCVTZS instead, which behaves like x86 outside of the integer range. Of course, this would be an opt-in flag.

What do you think? If it's not something that would be valuable for clang, do you have any pointers on how to patch it myself?

Of course, I can just use the compiler intrinsic __builtin_arm_jcvt to trigger this behavior, but then I need to be sure to catch all the places, and be sure that everyone on the team remembers to do the same in the future.

Thanks,
Johannes

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Converting float to int with FJCVTZS

Dimitry Andric via cfe-dev
Hi Johannes,

I don't think cvttss2si wraps around. Instead it returns 0x80000000 for large values. "If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised. If this exception is masked, the indefinite integer value (80000000H or 80000000_00000000H if operand size is 64 bits) is returned."

Also isn't fcvtzu an unsigned conversion while cvttss2si and FJCVTZS are signed conversions? Am I missing something?

~Craig


On Wed, Mar 10, 2021 at 9:03 AM Johannes Hoff via cfe-dev <[hidden email]> wrote:
Hi!

I'm working on a code base where a simulation needs to produce the exact same result on Aarch64 and x86_64 architectures.

This is indeed the case for the whole codebase, with one exception: Rounding floats to integers. Specifically, when we're in undefined behavior territory. In that case, you notice the difference between the emitted fcvtzu instruction on aarch64 (saturating cast) and cvttss2si on x86 (wrap-around).

Now, I know undefined behavior is not the main business of LLVM, but I wonder if it would be possible to ask it to emit FJCVTZS instead, which behaves like x86 outside of the integer range. Of course, this would be an opt-in flag.

What do you think? If it's not something that would be valuable for clang, do you have any pointers on how to patch it myself?

Of course, I can just use the compiler intrinsic __builtin_arm_jcvt to trigger this behavior, but then I need to be sure to catch all the places, and be sure that everyone on the team remembers to do the same in the future.

Thanks,
Johannes

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Converting float to int with FJCVTZS

Dimitry Andric via cfe-dev
Oh I think I understand now. For unsigned int to float, x86-64 uses a 64-bit cvttss2si instruction and drops the upper 32 bits because there's no 32-bit unsigned conversion instruction without avx512.

So are you asking for AArch64 to also do a 64-bit conversion and truncate the result? Replacing a 32-bit fcvtzu with a 32-bit fjcvtzs wouldn't work would it?

~Craig


On Wed, Mar 10, 2021 at 9:26 AM Craig Topper <[hidden email]> wrote:
Hi Johannes,

I don't think cvttss2si wraps around. Instead it returns 0x80000000 for large values. "If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised. If this exception is masked, the indefinite integer value (80000000H or 80000000_00000000H if operand size is 64 bits) is returned."

Also isn't fcvtzu an unsigned conversion while cvttss2si and FJCVTZS are signed conversions? Am I missing something?

~Craig


On Wed, Mar 10, 2021 at 9:03 AM Johannes Hoff via cfe-dev <[hidden email]> wrote:
Hi!

I'm working on a code base where a simulation needs to produce the exact same result on Aarch64 and x86_64 architectures.

This is indeed the case for the whole codebase, with one exception: Rounding floats to integers. Specifically, when we're in undefined behavior territory. In that case, you notice the difference between the emitted fcvtzu instruction on aarch64 (saturating cast) and cvttss2si on x86 (wrap-around).

Now, I know undefined behavior is not the main business of LLVM, but I wonder if it would be possible to ask it to emit FJCVTZS instead, which behaves like x86 outside of the integer range. Of course, this would be an opt-in flag.

What do you think? If it's not something that would be valuable for clang, do you have any pointers on how to patch it myself?

Of course, I can just use the compiler intrinsic __builtin_arm_jcvt to trigger this behavior, but then I need to be sure to catch all the places, and be sure that everyone on the team remembers to do the same in the future.

Thanks,
Johannes

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Converting float to int with FJCVTZS

Dimitry Andric via cfe-dev
LLVM already has support for UB-free float2int conversions:
https://llvm.org/docs/LangRef.html#saturating-floating-point-to-integer-conversions
Rather than trying to herd each backend to conditionally do the same thing,
I think a much more straight-forward solution would be
to expose those intrinsics as clang builtins.


Roman.

On Wed, Mar 10, 2021 at 8:37 PM Craig Topper via cfe-dev
<[hidden email]> wrote:

>
> Oh I think I understand now. For unsigned int to float, x86-64 uses a 64-bit cvttss2si instruction and drops the upper 32 bits because there's no 32-bit unsigned conversion instruction without avx512.
>
> So are you asking for AArch64 to also do a 64-bit conversion and truncate the result? Replacing a 32-bit fcvtzu with a 32-bit fjcvtzs wouldn't work would it?
>
> ~Craig
>
>
> On Wed, Mar 10, 2021 at 9:26 AM Craig Topper <[hidden email]> wrote:
>>
>> Hi Johannes,
>>
>> I don't think cvttss2si wraps around. Instead it returns 0x80000000 for large values. "If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised. If this exception is masked, the indefinite integer value (80000000H or 80000000_00000000H if operand size is 64 bits) is returned."
>>
>> Also isn't fcvtzu an unsigned conversion while cvttss2si and FJCVTZS are signed conversions? Am I missing something?
>>
>> ~Craig
>>
>>
>> On Wed, Mar 10, 2021 at 9:03 AM Johannes Hoff via cfe-dev <[hidden email]> wrote:
>>>
>>> Hi!
>>>
>>> I'm working on a code base where a simulation needs to produce the exact same result on Aarch64 and x86_64 architectures.
>>>
>>> This is indeed the case for the whole codebase, with one exception: Rounding floats to integers. Specifically, when we're in undefined behavior territory. In that case, you notice the difference between the emitted fcvtzu instruction on aarch64 (saturating cast) and cvttss2si on x86 (wrap-around).
>>>
>>> Now, I know undefined behavior is not the main business of LLVM, but I wonder if it would be possible to ask it to emit FJCVTZS instead, which behaves like x86 outside of the integer range. Of course, this would be an opt-in flag.
>>>
>>> What do you think? If it's not something that would be valuable for clang, do you have any pointers on how to patch it myself?
>>>
>>> Of course, I can just use the compiler intrinsic __builtin_arm_jcvt to trigger this behavior, but then I need to be sure to catch all the places, and be sure that everyone on the team remembers to do the same in the future.
>>>
>>> Thanks,
>>> Johannes
>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> [hidden email]
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Converting float to int with FJCVTZS

Dimitry Andric via cfe-dev
In reply to this post by Dimitry Andric via cfe-dev
Hi, Craig!

Thanks for your reply. It seems I have indeed conflated some behavior here, specifically that it converts with cvttss2si and then truncates the 64 bit result.

Replacing fcvtzu with fjcvtzs does indeed produce the same result as x86_64, however, it does not generalize to other conversions from floating point to integer.

So my proposed solution to always use fjcvtzs was not great. Might there be some other way to get similar behavior? I will try to use intrinsics to get the same behavior not matter the conversion; but the question remains if it's possible to do with compiler flags instead of changing the code.

For a motivating example, see https://godbolt.org/z/sjeE6M

#include <stdio.h>
#include <cstdint>

void cast(float value) {
  printf("uint32_t(%.2f) = %u\n", value, uint32_t(value));
}

int main() {
  cast(4294967808.);
}

// output on x86_64:  uint32_t(4294967808.00) = 512
// output on aarch64: uint32_t(4294967808.00) = 4294967295

Replacing uint32_t(value) with __builtin_arm_jcvt(value) on aarch64 makes it behave like x86_64:

// output on aarch64: __builtin_arm_jcvt(4294967808.00) = 512

> On 10 Mar 2021, at 18:36, Craig Topper <[hidden email]> wrote:
>
> Oh I think I understand now. For unsigned int to float, x86-64 uses a 64-bit cvttss2si instruction and drops the upper 32 bits because there's no 32-bit unsigned conversion instruction without avx512.
>
> So are you asking for AArch64 to also do a 64-bit conversion and truncate the result? Replacing a 32-bit fcvtzu with a 32-bit fjcvtzs wouldn't work would it?
>
> ~Craig
>
>
> On Wed, Mar 10, 2021 at 9:26 AM Craig Topper <[hidden email]> wrote:
> Hi Johannes,
>
> I don't think cvttss2si wraps around. Instead it returns 0x80000000 for large values. "If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised. If this exception is masked, the indefinite integer value (80000000H or 80000000_00000000H if operand size is 64 bits) is returned."
>
> Also isn't fcvtzu an unsigned conversion while cvttss2si and FJCVTZS are signed conversions? Am I missing something?
>
> ~Craig
>
>
> On Wed, Mar 10, 2021 at 9:03 AM Johannes Hoff via cfe-dev <[hidden email]> wrote:
> Hi!
>
> I'm working on a code base where a simulation needs to produce the exact same result on Aarch64 and x86_64 architectures.
>
> This is indeed the case for the whole codebase, with one exception: Rounding floats to integers. Specifically, when we're in undefined behavior territory. In that case, you notice the difference between the emitted fcvtzu instruction on aarch64 (saturating cast) and cvttss2si on x86 (wrap-around).
>
> Now, I know undefined behavior is not the main business of LLVM, but I wonder if it would be possible to ask it to emit FJCVTZS instead, which behaves like x86 outside of the integer range. Of course, this would be an opt-in flag.
>
> What do you think? If it's not something that would be valuable for clang, do you have any pointers on how to patch it myself?
>
> Of course, I can just use the compiler intrinsic __builtin_arm_jcvt to trigger this behavior, but then I need to be sure to catch all the places, and be sure that everyone on the team remembers to do the same in the future.
>
> Thanks,
> Johannes
>
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Converting float to int with FJCVTZS

Dimitry Andric via cfe-dev
In reply to this post by Dimitry Andric via cfe-dev
Hi, Roman!

Thanks for you relpy. I think exposing those would be great indeed.

Also, having the possibility to pick a UB-free implementation with compiler flags would be great. Be it saturating or anything else.

To be sure, neither of these are possible today?

Johannes

> On 10 Mar 2021, at 18:40, Roman Lebedev <[hidden email]> wrote:
>
> LLVM already has support for UB-free float2int conversions:
> https://llvm.org/docs/LangRef.html#saturating-floating-point-to-integer-conversions
> Rather than trying to herd each backend to conditionally do the same thing,
> I think a much more straight-forward solution would be
> to expose those intrinsics as clang builtins.
>
>
> Roman.
>
> On Wed, Mar 10, 2021 at 8:37 PM Craig Topper via cfe-dev
> <[hidden email]> wrote:
>>
>> Oh I think I understand now. For unsigned int to float, x86-64 uses a 64-bit cvttss2si instruction and drops the upper 32 bits because there's no 32-bit unsigned conversion instruction without avx512.
>>
>> So are you asking for AArch64 to also do a 64-bit conversion and truncate the result? Replacing a 32-bit fcvtzu with a 32-bit fjcvtzs wouldn't work would it?
>>
>> ~Craig
>>
>>
>> On Wed, Mar 10, 2021 at 9:26 AM Craig Topper <[hidden email]> wrote:
>>>
>>> Hi Johannes,
>>>
>>> I don't think cvttss2si wraps around. Instead it returns 0x80000000 for large values. "If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised. If this exception is masked, the indefinite integer value (80000000H or 80000000_00000000H if operand size is 64 bits) is returned."
>>>
>>> Also isn't fcvtzu an unsigned conversion while cvttss2si and FJCVTZS are signed conversions? Am I missing something?
>>>
>>> ~Craig
>>>
>>>
>>> On Wed, Mar 10, 2021 at 9:03 AM Johannes Hoff via cfe-dev <[hidden email]> wrote:
>>>>
>>>> Hi!
>>>>
>>>> I'm working on a code base where a simulation needs to produce the exact same result on Aarch64 and x86_64 architectures.
>>>>
>>>> This is indeed the case for the whole codebase, with one exception: Rounding floats to integers. Specifically, when we're in undefined behavior territory. In that case, you notice the difference between the emitted fcvtzu instruction on aarch64 (saturating cast) and cvttss2si on x86 (wrap-around).
>>>>
>>>> Now, I know undefined behavior is not the main business of LLVM, but I wonder if it would be possible to ask it to emit FJCVTZS instead, which behaves like x86 outside of the integer range. Of course, this would be an opt-in flag.
>>>>
>>>> What do you think? If it's not something that would be valuable for clang, do you have any pointers on how to patch it myself?
>>>>
>>>> Of course, I can just use the compiler intrinsic __builtin_arm_jcvt to trigger this behavior, but then I need to be sure to catch all the places, and be sure that everyone on the team remembers to do the same in the future.
>>>>
>>>> Thanks,
>>>> Johannes
>>>>
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> [hidden email]
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>> _______________________________________________
>> cfe-dev mailing list
>> [hidden email]
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Converting float to int with FJCVTZS

Dimitry Andric via cfe-dev
On Thu, Mar 11, 2021 at 6:10 PM Johannes Hoff <[hidden email]> wrote:
>
> Hi, Roman!
>
> Thanks for you relpy. I think exposing those would be great indeed.
>
> Also, having the possibility to pick a UB-free implementation with compiler flags would be great. Be it saturating or anything else.
>
> To be sure, neither of these are possible today?
I don't recall seeing any such patches on clang side, no.

> Johannes
Roman

> > On 10 Mar 2021, at 18:40, Roman Lebedev <[hidden email]> wrote:
> >
> > LLVM already has support for UB-free float2int conversions:
> > https://llvm.org/docs/LangRef.html#saturating-floating-point-to-integer-conversions
> > Rather than trying to herd each backend to conditionally do the same thing,
> > I think a much more straight-forward solution would be
> > to expose those intrinsics as clang builtins.
> >
> >
> > Roman.
> >
> > On Wed, Mar 10, 2021 at 8:37 PM Craig Topper via cfe-dev
> > <[hidden email]> wrote:
> >>
> >> Oh I think I understand now. For unsigned int to float, x86-64 uses a 64-bit cvttss2si instruction and drops the upper 32 bits because there's no 32-bit unsigned conversion instruction without avx512.
> >>
> >> So are you asking for AArch64 to also do a 64-bit conversion and truncate the result? Replacing a 32-bit fcvtzu with a 32-bit fjcvtzs wouldn't work would it?
> >>
> >> ~Craig
> >>
> >>
> >> On Wed, Mar 10, 2021 at 9:26 AM Craig Topper <[hidden email]> wrote:
> >>>
> >>> Hi Johannes,
> >>>
> >>> I don't think cvttss2si wraps around. Instead it returns 0x80000000 for large values. "If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised. If this exception is masked, the indefinite integer value (80000000H or 80000000_00000000H if operand size is 64 bits) is returned."
> >>>
> >>> Also isn't fcvtzu an unsigned conversion while cvttss2si and FJCVTZS are signed conversions? Am I missing something?
> >>>
> >>> ~Craig
> >>>
> >>>
> >>> On Wed, Mar 10, 2021 at 9:03 AM Johannes Hoff via cfe-dev <[hidden email]> wrote:
> >>>>
> >>>> Hi!
> >>>>
> >>>> I'm working on a code base where a simulation needs to produce the exact same result on Aarch64 and x86_64 architectures.
> >>>>
> >>>> This is indeed the case for the whole codebase, with one exception: Rounding floats to integers. Specifically, when we're in undefined behavior territory. In that case, you notice the difference between the emitted fcvtzu instruction on aarch64 (saturating cast) and cvttss2si on x86 (wrap-around).
> >>>>
> >>>> Now, I know undefined behavior is not the main business of LLVM, but I wonder if it would be possible to ask it to emit FJCVTZS instead, which behaves like x86 outside of the integer range. Of course, this would be an opt-in flag.
> >>>>
> >>>> What do you think? If it's not something that would be valuable for clang, do you have any pointers on how to patch it myself?
> >>>>
> >>>> Of course, I can just use the compiler intrinsic __builtin_arm_jcvt to trigger this behavior, but then I need to be sure to catch all the places, and be sure that everyone on the team remembers to do the same in the future.
> >>>>
> >>>> Thanks,
> >>>> Johannes
> >>>>
> >>>> _______________________________________________
> >>>> cfe-dev mailing list
> >>>> [hidden email]
> >>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >>
> >> _______________________________________________
> >> cfe-dev mailing list
> >> [hidden email]
> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Converting float to int with FJCVTZS

Dimitry Andric via cfe-dev
In reply to this post by Dimitry Andric via cfe-dev
Using "-fsanitize=float-cast-overflow -fsanitize-trap=float-cast-overflow" will ensure a UB-free float-to-int on all targets.  (See https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html )

-Eli

-----Original Message-----
From: cfe-dev <[hidden email]> On Behalf Of Johannes Hoff via cfe-dev
Sent: Thursday, March 11, 2021 7:10 AM
To: Roman Lebedev <[hidden email]>
Cc: clang developer list <[hidden email]>
Subject: [EXT] Re: [cfe-dev] Converting float to int with FJCVTZS

Hi, Roman!

Thanks for you relpy. I think exposing those would be great indeed.

Also, having the possibility to pick a UB-free implementation with compiler flags would be great. Be it saturating or anything else.

To be sure, neither of these are possible today?

Johannes

> On 10 Mar 2021, at 18:40, Roman Lebedev <[hidden email]> wrote:
>
> LLVM already has support for UB-free float2int conversions:
> https://llvm.org/docs/LangRef.html#saturating-floating-point-to-integer-conversions
> Rather than trying to herd each backend to conditionally do the same thing,
> I think a much more straight-forward solution would be
> to expose those intrinsics as clang builtins.
>
>
> Roman.
>
> On Wed, Mar 10, 2021 at 8:37 PM Craig Topper via cfe-dev
> <[hidden email]> wrote:
>>
>> Oh I think I understand now. For unsigned int to float, x86-64 uses a 64-bit cvttss2si instruction and drops the upper 32 bits because there's no 32-bit unsigned conversion instruction without avx512.
>>
>> So are you asking for AArch64 to also do a 64-bit conversion and truncate the result? Replacing a 32-bit fcvtzu with a 32-bit fjcvtzs wouldn't work would it?
>>
>> ~Craig
>>
>>
>> On Wed, Mar 10, 2021 at 9:26 AM Craig Topper <[hidden email]> wrote:
>>>
>>> Hi Johannes,
>>>
>>> I don't think cvttss2si wraps around. Instead it returns 0x80000000 for large values. "If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised. If this exception is masked, the indefinite integer value (80000000H or 80000000_00000000H if operand size is 64 bits) is returned."
>>>
>>> Also isn't fcvtzu an unsigned conversion while cvttss2si and FJCVTZS are signed conversions? Am I missing something?
>>>
>>> ~Craig
>>>
>>>
>>> On Wed, Mar 10, 2021 at 9:03 AM Johannes Hoff via cfe-dev <[hidden email]> wrote:
>>>>
>>>> Hi!
>>>>
>>>> I'm working on a code base where a simulation needs to produce the exact same result on Aarch64 and x86_64 architectures.
>>>>
>>>> This is indeed the case for the whole codebase, with one exception: Rounding floats to integers. Specifically, when we're in undefined behavior territory. In that case, you notice the difference between the emitted fcvtzu instruction on aarch64 (saturating cast) and cvttss2si on x86 (wrap-around).
>>>>
>>>> Now, I know undefined behavior is not the main business of LLVM, but I wonder if it would be possible to ask it to emit FJCVTZS instead, which behaves like x86 outside of the integer range. Of course, this would be an opt-in flag.
>>>>
>>>> What do you think? If it's not something that would be valuable for clang, do you have any pointers on how to patch it myself?
>>>>
>>>> Of course, I can just use the compiler intrinsic __builtin_arm_jcvt to trigger this behavior, but then I need to be sure to catch all the places, and be sure that everyone on the team remembers to do the same in the future.
>>>>
>>>> Thanks,
>>>> Johannes
>>>>
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> [hidden email]
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>> _______________________________________________
>> cfe-dev mailing list
>> [hidden email]
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev