Quantcast

[RFC] implementation of _Float16

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[RFC] implementation of _Float16

Martin J. O'Riordan via cfe-dev

Hi,

 

ARMv8.2-A introduces as an optional extension half-precision data-processing instructions for Advanced SIMD and floating-point in both AArch64 and AArch32 states [1], and we are looking into implementing C/C++-language support for these new ARMv8.2-A half-precision instructions.

 

We would like to introduce a new Clang type. The reason is that we e.g. cannot use type __fp16 (defined in the ARM C Language Extensions [2]) because it is a storage type only. This means when using standard C operators values of __fp16 type promote to float when used in arithmetic operations, which we would like to avoid for the ARMv8.2-A half-precision instructions. Please note that the LLVM IR already has a half precision type, onto which for example __fp16 is mapped, so there are no changes or additions required for the LLVM IR.

 

As a new Clang type we would like to propose _Float16 as defined in a C11 extension, see [3]. Arithmetic is well defined, it is not only a storage type as __fp16. Our question is whether a partial implementation, just implementing this type and not claiming (full) C11 conformance is acceptable?

 

[1] ARM ARM: https://static.docs.arm.com/ddi0487/b/DDI0487B_a_armv8_arm.pdf

[2] ARM C Language Extensions 2.1: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053d/IHI0053D_acle_2_1.pdf

[3] ISO/IEC TS 18661-3  interchange and extended types:  http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1945.pdf

 

Cheers,

Sjoerd.

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] implementation of _Float16

Martin J. O'Riordan via cfe-dev


On 05/10/2017 05:18 AM, Sjoerd Meijer via cfe-dev wrote:

Hi,

 

ARMv8.2-A introduces as an optional extension half-precision data-processing instructions for Advanced SIMD and floating-point in both AArch64 and AArch32 states [1], and we are looking into implementing C/C++-language support for these new ARMv8.2-A half-precision instructions.

 

We would like to introduce a new Clang type. The reason is that we e.g. cannot use type __fp16 (defined in the ARM C Language Extensions [2]) because it is a storage type only. This means when using standard C operators values of __fp16 type promote to float when used in arithmetic operations, which we would like to avoid for the ARMv8.2-A half-precision instructions. Please note that the LLVM IR already has a half precision type, onto which for example __fp16 is mapped, so there are no changes or additions required for the LLVM IR.

 

As a new Clang type we would like to propose _Float16 as defined in a C11 extension, see [3]. Arithmetic is well defined, it is not only a storage type as __fp16. Our question is whether a partial implementation, just implementing this type and not claiming (full) C11 conformance is acceptable?


I would very much like to see fp16 as a first-class floating-point type in Clang and LLVM (i.e. handling that is not just a storage type). Doing this in Clang in a way that is specified by C11 seems like the right approach. I don't see why implementing this would be predicated on implementing other parts of C11.

 -Hal

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] implementation of _Float16

Martin J. O'Riordan via cfe-dev

Our Out-of-Tree target implements fully native FP16 operations based on ‘__fp16’ (scalar and SIMD vector), so is the issue for ARM that ‘__fp16’ is already used to implement a storage-only type and that another type is needed to differentiate between a native and a storage-only type?  Once the ‘f16’ type appears in the IR (and the vector variants) the code-generation is straightforward enough.

 

Certainly we have had to make many changes to CLang and to LLVM to fully implement this including suppression of implicit conversion to ‘double’, but nothing scary or obscure.  Many of these changes are simply to enable something that is already normal for OpenCL, but to do so for C and C++.

 

More controversially we also added a “synonym” for this using ‘short float’ rather than ‘_Float16’ (or OpenCL’s ‘half’), and created a parallel set of the ISO C library functions using ‘s’ to suffix the usual names (e.g. ‘tan’, ‘tanf’, ‘tanl’ plus ‘tans’).  The ‘s’ suffix was unambiguous (though we actually use the double-underscore prefix, e.g. ‘__tans’ to avoid conflict with the user’s names) and the type ‘short float’ was available too without breaking anything.  Enabling the ‘h’ suffix for FP constants (again from OpenCL) makes the whole fit smoothly with the normal FP types.

 

However, for variadic functions (such as ‘printf’) we do promote to ‘double’ because there are no formatting specifiers available for ‘half’ any more than there is support for ‘float’ - it is also consistent with ‘va_arg’ usage for ‘char’ and ‘short’ as ‘int’.  My feeling is that using implementation defined types ‘float’, ‘double’ and ‘long double’ can be extended to include ‘short float’ without dictating that they have any particular bit-sizes (e.g. FP16 for ‘half’).

 

This solution has worked very well over the past few years and is symmetric with the other floating-point data types.

 

There are some issues with C++ and overloading because ‘__fp16’ to other FP types (and INT types) is not ranked in exactly the same way as for example ‘float’ is to other FP types; but this is really only because it is not a 1st class citizen of the type-system and the rules would need to be specified to make this valid.  I have not tried to fix this as it works reasonably well as it is, and it would really be an issue for the C++ committee to decide if they ever choose to adopt another FP data type.  I did add it to the traits in the C++ library though so that it is considered legal for floating-point types.

 

I’d love to see this adopted as a formal type in a future version of ISO C and ISO C++.

 

            MartinO

 

From: cfe-dev [mailto:[hidden email]] On Behalf Of Hal Finkel via cfe-dev
Sent: 10 May 2017 11:39
To: Sjoerd Meijer <[hidden email]>; [hidden email]
Subject: Re: [cfe-dev] [RFC] implementation of _Float16

 

On 05/10/2017 05:18 AM, Sjoerd Meijer via cfe-dev wrote:

Hi,

 

ARMv8.2-A introduces as an optional extension half-precision data-processing instructions for Advanced SIMD and floating-point in both AArch64 and AArch32 states [1], and we are looking into implementing C/C++-language support for these new ARMv8.2-A half-precision instructions.

 

We would like to introduce a new Clang type. The reason is that we e.g. cannot use type __fp16 (defined in the ARM C Language Extensions [2]) because it is a storage type only. This means when using standard C operators values of __fp16 type promote to float when used in arithmetic operations, which we would like to avoid for the ARMv8.2-A half-precision instructions. Please note that the LLVM IR already has a half precision type, onto which for example __fp16 is mapped, so there are no changes or additions required for the LLVM IR.

 

As a new Clang type we would like to propose _Float16 as defined in a C11 extension, see [3]. Arithmetic is well defined, it is not only a storage type as __fp16. Our question is whether a partial implementation, just implementing this type and not claiming (full) C11 conformance is acceptable?


I would very much like to see fp16 as a first-class floating-point type in Clang and LLVM (i.e. handling that is not just a storage type). Doing this in Clang in a way that is specified by C11 seems like the right approach. I don't see why implementing this would be predicated on implementing other parts of C11.

 -Hal


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] implementation of _Float16

Martin J. O'Riordan via cfe-dev

Hi Hal, Martin,

 

Thanks for the feedback.

Yes, the issue indeed is that ‘__fp16’ is already used to implement a storage-only type.  And earlier I wrote that I don’t expect LLVM IR changes, but now I am not so sure anymore if both types map onto the same half LLVM IR type. With  two half precision types, __fp16 and _Float16, where one is a storage only type and the other a native type, somehow the distinction between these two must be made I think.

 

Cheers,

Sjoerd.

 

From: Martin J. O'Riordan [mailto:[hidden email]]
Sent: 10 May 2017 14:13
To: 'Hal Finkel'; Sjoerd Meijer
Cc: 'clang developer list'
Subject: RE: [cfe-dev] [RFC] implementation of _Float16

 

Our Out-of-Tree target implements fully native FP16 operations based on ‘__fp16’ (scalar and SIMD vector), so is the issue for ARM that ‘__fp16’ is already used to implement a storage-only type and that another type is needed to differentiate between a native and a storage-only type?  Once the ‘f16’ type appears in the IR (and the vector variants) the code-generation is straightforward enough.

 

Certainly we have had to make many changes to CLang and to LLVM to fully implement this including suppression of implicit conversion to ‘double’, but nothing scary or obscure.  Many of these changes are simply to enable something that is already normal for OpenCL, but to do so for C and C++.

 

More controversially we also added a “synonym” for this using ‘short float’ rather than ‘_Float16’ (or OpenCL’s ‘half’), and created a parallel set of the ISO C library functions using ‘s’ to suffix the usual names (e.g. ‘tan’, ‘tanf’, ‘tanl’ plus ‘tans’).  The ‘s’ suffix was unambiguous (though we actually use the double-underscore prefix, e.g. ‘__tans’ to avoid conflict with the user’s names) and the type ‘short float’ was available too without breaking anything.  Enabling the ‘h’ suffix for FP constants (again from OpenCL) makes the whole fit smoothly with the normal FP types.

 

However, for variadic functions (such as ‘printf’) we do promote to ‘double’ because there are no formatting specifiers available for ‘half’ any more than there is support for ‘float’ - it is also consistent with ‘va_arg’ usage for ‘char’ and ‘short’ as ‘int’.  My feeling is that using implementation defined types ‘float’, ‘double’ and ‘long double’ can be extended to include ‘short float’ without dictating that they have any particular bit-sizes (e.g. FP16 for ‘half’).

 

This solution has worked very well over the past few years and is symmetric with the other floating-point data types.

 

There are some issues with C++ and overloading because ‘__fp16’ to other FP types (and INT types) is not ranked in exactly the same way as for example ‘float’ is to other FP types; but this is really only because it is not a 1st class citizen of the type-system and the rules would need to be specified to make this valid.  I have not tried to fix this as it works reasonably well as it is, and it would really be an issue for the C++ committee to decide if they ever choose to adopt another FP data type.  I did add it to the traits in the C++ library though so that it is considered legal for floating-point types.

 

I’d love to see this adopted as a formal type in a future version of ISO C and ISO C++.

 

            MartinO

 

From: cfe-dev [[hidden email]] On Behalf Of Hal Finkel via cfe-dev
Sent: 10 May 2017 11:39
To: Sjoerd Meijer <[hidden email]>; [hidden email]
Subject: Re: [cfe-dev] [RFC] implementation of _Float16

 

On 05/10/2017 05:18 AM, Sjoerd Meijer via cfe-dev wrote:

Hi,

 

ARMv8.2-A introduces as an optional extension half-precision data-processing instructions for Advanced SIMD and floating-point in both AArch64 and AArch32 states [1], and we are looking into implementing C/C++-language support for these new ARMv8.2-A half-precision instructions.

 

We would like to introduce a new Clang type. The reason is that we e.g. cannot use type __fp16 (defined in the ARM C Language Extensions [2]) because it is a storage type only. This means when using standard C operators values of __fp16 type promote to float when used in arithmetic operations, which we would like to avoid for the ARMv8.2-A half-precision instructions. Please note that the LLVM IR already has a half precision type, onto which for example __fp16 is mapped, so there are no changes or additions required for the LLVM IR.

 

As a new Clang type we would like to propose _Float16 as defined in a C11 extension, see [3]. Arithmetic is well defined, it is not only a storage type as __fp16. Our question is whether a partial implementation, just implementing this type and not claiming (full) C11 conformance is acceptable?


I would very much like to see fp16 as a first-class floating-point type in Clang and LLVM (i.e. handling that is not just a storage type). Doing this in Clang in a way that is specified by C11 seems like the right approach. I don't see why implementing this would be predicated on implementing other parts of C11.

 -Hal

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] implementation of _Float16

Martin J. O'Riordan via cfe-dev

Yes, I see how this would be an issue if it is necessary to keep the storage-only versus native types separate.

 

At the moment I have ‘short float’ internally associated with OpenCL’s ‘half’ but I do not enable ‘half’ as a keyword.  Independently I have made ‘__fp16’ when used with our target also be a synonym for ‘short float/half’ (simply to avoid adding a new keyword).  This in turn is bound to the IEEE FP16 using ‘HalfFormat = &llvm::APFloat::IEEEhalf();’.

 

In our case it is always a native type and never a storage only type, so coupling ‘__fp16’ to ‘half’ made sense.  Certainly if the native versus storage-only variants were distinct, then this association I have made would have to be decoupled (not a big-deal).

 

Another approach might be to always work with FP16 as-if native, but to provide only Load/Store instructions in the TableGen descriptions for FP16, and to adapt lowering to always perform the arithmetic using FP32 if the selected target does not support native FP16 - would that be feasible in your case?  In this way it is not really any different to how targets that have no FPU can use an alternative integer based implementation (with the help of ‘compiler-rt’).

 

I can certainly see how something like ‘ADD’ of ‘f16’ could be changed to use ‘Expand’ in lowering rather than ‘Legal’ as a function of the selected target (or some other target specific option) - we just marked it ‘Legal’ and provided the corresponding instructions in TableGen with very little custom lowering necessary.  I have a mild concern that LLVM would have to have an ‘f16’ which is native and another kind-of ‘f16’ restricted to being only storage.

 

Thanks,

 

            MartinO

 

From: Sjoerd Meijer [mailto:[hidden email]]
Sent: 10 May 2017 14:19
To: Martin J. O'Riordan <[hidden email]>; 'Hal Finkel' <[hidden email]>
Cc: 'clang developer list' <[hidden email]>
Subject: RE: [cfe-dev] [RFC] implementation of _Float16

 

Hi Hal, Martin,

 

Thanks for the feedback.

Yes, the issue indeed is that ‘__fp16’ is already used to implement a storage-only type.  And earlier I wrote that I don’t expect LLVM IR changes, but now I am not so sure anymore if both types map onto the same half LLVM IR type. With  two half precision types, __fp16 and _Float16, where one is a storage only type and the other a native type, somehow the distinction between these two must be made I think.

 

Cheers,

Sjoerd.

 

From: Martin J. O'Riordan [[hidden email]]
Sent: 10 May 2017 14:13
To: 'Hal Finkel'; Sjoerd Meijer
Cc: 'clang developer list'
Subject: RE: [cfe-dev] [RFC] implementation of _Float16

 

Our Out-of-Tree target implements fully native FP16 operations based on ‘__fp16’ (scalar and SIMD vector), so is the issue for ARM that ‘__fp16’ is already used to implement a storage-only type and that another type is needed to differentiate between a native and a storage-only type?  Once the ‘f16’ type appears in the IR (and the vector variants) the code-generation is straightforward enough.

 

Certainly we have had to make many changes to CLang and to LLVM to fully implement this including suppression of implicit conversion to ‘double’, but nothing scary or obscure.  Many of these changes are simply to enable something that is already normal for OpenCL, but to do so for C and C++.

 

More controversially we also added a “synonym” for this using ‘short float’ rather than ‘_Float16’ (or OpenCL’s ‘half’), and created a parallel set of the ISO C library functions using ‘s’ to suffix the usual names (e.g. ‘tan’, ‘tanf’, ‘tanl’ plus ‘tans’).  The ‘s’ suffix was unambiguous (though we actually use the double-underscore prefix, e.g. ‘__tans’ to avoid conflict with the user’s names) and the type ‘short float’ was available too without breaking anything.  Enabling the ‘h’ suffix for FP constants (again from OpenCL) makes the whole fit smoothly with the normal FP types.

 

However, for variadic functions (such as ‘printf’) we do promote to ‘double’ because there are no formatting specifiers available for ‘half’ any more than there is support for ‘float’ - it is also consistent with ‘va_arg’ usage for ‘char’ and ‘short’ as ‘int’.  My feeling is that using implementation defined types ‘float’, ‘double’ and ‘long double’ can be extended to include ‘short float’ without dictating that they have any particular bit-sizes (e.g. FP16 for ‘half’).

 

This solution has worked very well over the past few years and is symmetric with the other floating-point data types.

 

There are some issues with C++ and overloading because ‘__fp16’ to other FP types (and INT types) is not ranked in exactly the same way as for example ‘float’ is to other FP types; but this is really only because it is not a 1st class citizen of the type-system and the rules would need to be specified to make this valid.  I have not tried to fix this as it works reasonably well as it is, and it would really be an issue for the C++ committee to decide if they ever choose to adopt another FP data type.  I did add it to the traits in the C++ library though so that it is considered legal for floating-point types.

 

I’d love to see this adopted as a formal type in a future version of ISO C and ISO C++.

 

            MartinO

 

From: cfe-dev [[hidden email]] On Behalf Of Hal Finkel via cfe-dev
Sent: 10 May 2017 11:39
To: Sjoerd Meijer <[hidden email]>; [hidden email]
Subject: Re: [cfe-dev] [RFC] implementation of _Float16

 

On 05/10/2017 05:18 AM, Sjoerd Meijer via cfe-dev wrote:

Hi,

 

ARMv8.2-A introduces as an optional extension half-precision data-processing instructions for Advanced SIMD and floating-point in both AArch64 and AArch32 states [1], and we are looking into implementing C/C++-language support for these new ARMv8.2-A half-precision instructions.

 

We would like to introduce a new Clang type. The reason is that we e.g. cannot use type __fp16 (defined in the ARM C Language Extensions [2]) because it is a storage type only. This means when using standard C operators values of __fp16 type promote to float when used in arithmetic operations, which we would like to avoid for the ARMv8.2-A half-precision instructions. Please note that the LLVM IR already has a half precision type, onto which for example __fp16 is mapped, so there are no changes or additions required for the LLVM IR.

 

As a new Clang type we would like to propose _Float16 as defined in a C11 extension, see [3]. Arithmetic is well defined, it is not only a storage type as __fp16. Our question is whether a partial implementation, just implementing this type and not claiming (full) C11 conformance is acceptable?


I would very much like to see fp16 as a first-class floating-point type in Clang and LLVM (i.e. handling that is not just a storage type). Doing this in Clang in a way that is specified by C11 seems like the right approach. I don't see why implementing this would be predicated on implementing other parts of C11.

 -Hal

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] implementation of _Float16

Martin J. O'Riordan via cfe-dev


On 05/10/2017 09:01 AM, Martin J. O'Riordan wrote:

Yes, I see how this would be an issue if it is necessary to keep the storage-only versus native types separate.

 

At the moment I have ‘short float’ internally associated with OpenCL’s ‘half’ but I do not enable ‘half’ as a keyword.  Independently I have made ‘__fp16’ when used with our target also be a synonym for ‘short float/half’ (simply to avoid adding a new keyword).  This in turn is bound to the IEEE FP16 using ‘HalfFormat = &llvm::APFloat::IEEEhalf();’.

 

In our case it is always a native type and never a storage only type, so coupling ‘__fp16’ to ‘half’ made sense.  Certainly if the native versus storage-only variants were distinct, then this association I have made would have to be decoupled (not a big-deal).

 

Another approach might be to always work with FP16 as-if native, but to provide only Load/Store instructions in the TableGen descriptions for FP16, and to adapt lowering to always perform the arithmetic using FP32 if the selected target does not support native FP16 - would that be feasible in your case?  In this way it is not really any different to how targets that have no FPU can use an alternative integer based implementation (with the help of ‘compiler-rt’).

 

I can certainly see how something like ‘ADD’ of ‘f16’ could be changed to use ‘Expand’ in lowering rather than ‘Legal’ as a function of the selected target (or some other target specific option) - we just marked it ‘Legal’ and provided the corresponding instructions in TableGen with very little custom lowering necessary.  I have a mild concern that LLVM would have to have an ‘f16’ which is native and another kind-of ‘f16’ restricted to being only storage.


Why? That should only be true if they have different semantics.

 -Hal

 

Thanks,

 

            MartinO

 

From: Sjoerd Meijer [[hidden email]]
Sent: 10 May 2017 14:19
To: Martin J. O'Riordan [hidden email]; 'Hal Finkel' [hidden email]
Cc: 'clang developer list' [hidden email]
Subject: RE: [cfe-dev] [RFC] implementation of _Float16

 

Hi Hal, Martin,

 

Thanks for the feedback.

Yes, the issue indeed is that ‘__fp16’ is already used to implement a storage-only type.  And earlier I wrote that I don’t expect LLVM IR changes, but now I am not so sure anymore if both types map onto the same half LLVM IR type. With  two half precision types, __fp16 and _Float16, where one is a storage only type and the other a native type, somehow the distinction between these two must be made I think.

 

Cheers,

Sjoerd.

 

From: Martin J. O'Riordan [[hidden email]]
Sent: 10 May 2017 14:13
To: 'Hal Finkel'; Sjoerd Meijer
Cc: 'clang developer list'
Subject: RE: [cfe-dev] [RFC] implementation of _Float16

 

Our Out-of-Tree target implements fully native FP16 operations based on ‘__fp16’ (scalar and SIMD vector), so is the issue for ARM that ‘__fp16’ is already used to implement a storage-only type and that another type is needed to differentiate between a native and a storage-only type?  Once the ‘f16’ type appears in the IR (and the vector variants) the code-generation is straightforward enough.

 

Certainly we have had to make many changes to CLang and to LLVM to fully implement this including suppression of implicit conversion to ‘double’, but nothing scary or obscure.  Many of these changes are simply to enable something that is already normal for OpenCL, but to do so for C and C++.

 

More controversially we also added a “synonym” for this using ‘short float’ rather than ‘_Float16’ (or OpenCL’s ‘half’), and created a parallel set of the ISO C library functions using ‘s’ to suffix the usual names (e.g. ‘tan’, ‘tanf’, ‘tanl’ plus ‘tans’).  The ‘s’ suffix was unambiguous (though we actually use the double-underscore prefix, e.g. ‘__tans’ to avoid conflict with the user’s names) and the type ‘short float’ was available too without breaking anything.  Enabling the ‘h’ suffix for FP constants (again from OpenCL) makes the whole fit smoothly with the normal FP types.

 

However, for variadic functions (such as ‘printf’) we do promote to ‘double’ because there are no formatting specifiers available for ‘half’ any more than there is support for ‘float’ - it is also consistent with ‘va_arg’ usage for ‘char’ and ‘short’ as ‘int’.  My feeling is that using implementation defined types ‘float’, ‘double’ and ‘long double’ can be extended to include ‘short float’ without dictating that they have any particular bit-sizes (e.g. FP16 for ‘half’).

 

This solution has worked very well over the past few years and is symmetric with the other floating-point data types.

 

There are some issues with C++ and overloading because ‘__fp16’ to other FP types (and INT types) is not ranked in exactly the same way as for example ‘float’ is to other FP types; but this is really only because it is not a 1st class citizen of the type-system and the rules would need to be specified to make this valid.  I have not tried to fix this as it works reasonably well as it is, and it would really be an issue for the C++ committee to decide if they ever choose to adopt another FP data type.  I did add it to the traits in the C++ library though so that it is considered legal for floating-point types.

 

I’d love to see this adopted as a formal type in a future version of ISO C and ISO C++.

 

            MartinO

 

From: cfe-dev [[hidden email]] On Behalf Of Hal Finkel via cfe-dev
Sent: 10 May 2017 11:39
To: Sjoerd Meijer <[hidden email]>; [hidden email]
Subject: Re: [cfe-dev] [RFC] implementation of _Float16

 

On 05/10/2017 05:18 AM, Sjoerd Meijer via cfe-dev wrote:

Hi,

 

ARMv8.2-A introduces as an optional extension half-precision data-processing instructions for Advanced SIMD and floating-point in both AArch64 and AArch32 states [1], and we are looking into implementing C/C++-language support for these new ARMv8.2-A half-precision instructions.

 

We would like to introduce a new Clang type. The reason is that we e.g. cannot use type __fp16 (defined in the ARM C Language Extensions [2]) because it is a storage type only. This means when using standard C operators values of __fp16 type promote to float when used in arithmetic operations, which we would like to avoid for the ARMv8.2-A half-precision instructions. Please note that the LLVM IR already has a half precision type, onto which for example __fp16 is mapped, so there are no changes or additions required for the LLVM IR.

 

As a new Clang type we would like to propose _Float16 as defined in a C11 extension, see [3]. Arithmetic is well defined, it is not only a storage type as __fp16. Our question is whether a partial implementation, just implementing this type and not claiming (full) C11 conformance is acceptable?


I would very much like to see fp16 as a first-class floating-point type in Clang and LLVM (i.e. handling that is not just a storage type). Doing this in Clang in a way that is specified by C11 seems like the right approach. I don't see why implementing this would be predicated on implementing other parts of C11.

 -Hal

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.


-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] implementation of _Float16

Martin J. O'Riordan via cfe-dev

The thing that confused me again is that for simple expressions/examples like this:

 

__fp16 MyAdd(__fp16 a, __fp16 b) {

  return a + b;

}

 

The IR does not include promotions/truncations which you would expect (because operations are done on floats):

 

define half @MyAdd(half %a, half %b) local_unnamed_addr #0 {

entry:

  %0 = fadd half %a, %b

  ret half %0

}

 

But that is only because there is this optimisation that does not include them if it can prove that the result with/without these converts is the same, so in other cases the promotes/truncates are there as expected.

 

This means that Clang produces the necessary promotions when needed, and that a new _Float16 type can also be mapped onto the LLRM IR half type I think (no changes needed). Yes, then the approach could indeed be to treat it as a native type, and only promote operands to floats when required.

 

Cheers,

Sjoerd.

 

From: Hal Finkel [mailto:[hidden email]]
Sent: 10 May 2017 16:00
To: Martin J. O'Riordan; Sjoerd Meijer
Cc: 'clang developer list'
Subject: Re: [cfe-dev] [RFC] implementation of _Float16

 

 

On 05/10/2017 09:01 AM, Martin J. O'Riordan wrote:

Yes, I see how this would be an issue if it is necessary to keep the storage-only versus native types separate.

 

At the moment I have ‘short float’ internally associated with OpenCL’s ‘half’ but I do not enable ‘half’ as a keyword.  Independently I have made ‘__fp16’ when used with our target also be a synonym for ‘short float/half’ (simply to avoid adding a new keyword).  This in turn is bound to the IEEE FP16 using ‘HalfFormat = &llvm::APFloat::IEEEhalf();’.

 

In our case it is always a native type and never a storage only type, so coupling ‘__fp16’ to ‘half’ made sense.  Certainly if the native versus storage-only variants were distinct, then this association I have made would have to be decoupled (not a big-deal).

 

Another approach might be to always work with FP16 as-if native, but to provide only Load/Store instructions in the TableGen descriptions for FP16, and to adapt lowering to always perform the arithmetic using FP32 if the selected target does not support native FP16 - would that be feasible in your case?  In this way it is not really any different to how targets that have no FPU can use an alternative integer based implementation (with the help of ‘compiler-rt’).

 

I can certainly see how something like ‘ADD’ of ‘f16’ could be changed to use ‘Expand’ in lowering rather than ‘Legal’ as a function of the selected target (or some other target specific option) - we just marked it ‘Legal’ and provided the corresponding instructions in TableGen with very little custom lowering necessary.  I have a mild concern that LLVM would have to have an ‘f16’ which is native and another kind-of ‘f16’ restricted to being only storage.


Why? That should only be true if they have different semantics.

 -Hal


 

Thanks,

 

            MartinO

 

From: Sjoerd Meijer [[hidden email]]
Sent: 10 May 2017 14:19
To: Martin J. O'Riordan [hidden email]; 'Hal Finkel' [hidden email]
Cc: 'clang developer list' [hidden email]
Subject: RE: [cfe-dev] [RFC] implementation of _Float16

 

Hi Hal, Martin,

 

Thanks for the feedback.

Yes, the issue indeed is that ‘__fp16’ is already used to implement a storage-only type.  And earlier I wrote that I don’t expect LLVM IR changes, but now I am not so sure anymore if both types map onto the same half LLVM IR type. With  two half precision types, __fp16 and _Float16, where one is a storage only type and the other a native type, somehow the distinction between these two must be made I think.

 

Cheers,

Sjoerd.

 

From: Martin J. O'Riordan [[hidden email]]
Sent: 10 May 2017 14:13
To: 'Hal Finkel'; Sjoerd Meijer
Cc: 'clang developer list'
Subject: RE: [cfe-dev] [RFC] implementation of _Float16

 

Our Out-of-Tree target implements fully native FP16 operations based on ‘__fp16’ (scalar and SIMD vector), so is the issue for ARM that ‘__fp16’ is already used to implement a storage-only type and that another type is needed to differentiate between a native and a storage-only type?  Once the ‘f16’ type appears in the IR (and the vector variants) the code-generation is straightforward enough.

 

Certainly we have had to make many changes to CLang and to LLVM to fully implement this including suppression of implicit conversion to ‘double’, but nothing scary or obscure.  Many of these changes are simply to enable something that is already normal for OpenCL, but to do so for C and C++.

 

More controversially we also added a “synonym” for this using ‘short float’ rather than ‘_Float16’ (or OpenCL’s ‘half’), and created a parallel set of the ISO C library functions using ‘s’ to suffix the usual names (e.g. ‘tan’, ‘tanf’, ‘tanl’ plus ‘tans’).  The ‘s’ suffix was unambiguous (though we actually use the double-underscore prefix, e.g. ‘__tans’ to avoid conflict with the user’s names) and the type ‘short float’ was available too without breaking anything.  Enabling the ‘h’ suffix for FP constants (again from OpenCL) makes the whole fit smoothly with the normal FP types.

 

However, for variadic functions (such as ‘printf’) we do promote to ‘double’ because there are no formatting specifiers available for ‘half’ any more than there is support for ‘float’ - it is also consistent with ‘va_arg’ usage for ‘char’ and ‘short’ as ‘int’.  My feeling is that using implementation defined types ‘float’, ‘double’ and ‘long double’ can be extended to include ‘short float’ without dictating that they have any particular bit-sizes (e.g. FP16 for ‘half’).

 

This solution has worked very well over the past few years and is symmetric with the other floating-point data types.

 

There are some issues with C++ and overloading because ‘__fp16’ to other FP types (and INT types) is not ranked in exactly the same way as for example ‘float’ is to other FP types; but this is really only because it is not a 1st class citizen of the type-system and the rules would need to be specified to make this valid.  I have not tried to fix this as it works reasonably well as it is, and it would really be an issue for the C++ committee to decide if they ever choose to adopt another FP data type.  I did add it to the traits in the C++ library though so that it is considered legal for floating-point types.

 

I’d love to see this adopted as a formal type in a future version of ISO C and ISO C++.

 

            MartinO

 

From: cfe-dev [[hidden email]] On Behalf Of Hal Finkel via cfe-dev
Sent: 10 May 2017 11:39
To: Sjoerd Meijer <[hidden email]>; [hidden email]
Subject: Re: [cfe-dev] [RFC] implementation of _Float16

 

On 05/10/2017 05:18 AM, Sjoerd Meijer via cfe-dev wrote:

Hi,

 

ARMv8.2-A introduces as an optional extension half-precision data-processing instructions for Advanced SIMD and floating-point in both AArch64 and AArch32 states [1], and we are looking into implementing C/C++-language support for these new ARMv8.2-A half-precision instructions.

 

We would like to introduce a new Clang type. The reason is that we e.g. cannot use type __fp16 (defined in the ARM C Language Extensions [2]) because it is a storage type only. This means when using standard C operators values of __fp16 type promote to float when used in arithmetic operations, which we would like to avoid for the ARMv8.2-A half-precision instructions. Please note that the LLVM IR already has a half precision type, onto which for example __fp16 is mapped, so there are no changes or additions required for the LLVM IR.

 

As a new Clang type we would like to propose _Float16 as defined in a C11 extension, see [3]. Arithmetic is well defined, it is not only a storage type as __fp16. Our question is whether a partial implementation, just implementing this type and not claiming (full) C11 conformance is acceptable?


I would very much like to see fp16 as a first-class floating-point type in Clang and LLVM (i.e. handling that is not just a storage type). Doing this in Clang in a way that is specified by C11 seems like the right approach. I don't see why implementing this would be predicated on implementing other parts of C11.

 -Hal

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.



-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] implementation of _Float16

Martin J. O'Riordan via cfe-dev


On 05/10/2017 11:15 AM, Sjoerd Meijer wrote:

The thing that confused me again is that for simple expressions/examples like this:

 

__fp16 MyAdd(__fp16 a, __fp16 b) {

  return a + b;

}

 

The IR does not include promotions/truncations which you would expect (because operations are done on floats):

 

define half @MyAdd(half %a, half %b) local_unnamed_addr #0 {

entry:

  %0 = fadd half %a, %b

  ret half %0

}

 

But that is only because there is this optimisation that does not include them if it can prove that the result with/without these converts is the same, so in other cases the promotes/truncates are there as expected.

 

This means that Clang produces the necessary promotions when needed, and that a new _Float16 type can also be mapped onto the LLRM IR half type I think (no changes needed). Yes, then the approach could indeed be to treat it as a native type, and only promote operands to floats when required.


By "when required", do you mean when the result would be the same as if the operation had been performed in single precision? If so, then no, we need different semantics. That having been said, I'd be in favor of changing the current semantics to require explicit promotions/truncations, change the existing optimization to elide them when they're provably redundant (as we do with other such things), and then only have a single, true, half-precision type. I suspect that we'd need to figure out how to auto-upgrade, but that seems doable.

 -Hal

 

Cheers,

Sjoerd.

 

From: Hal Finkel [[hidden email]]
Sent: 10 May 2017 16:00
To: Martin J. O'Riordan; Sjoerd Meijer
Cc: 'clang developer list'
Subject: Re: [cfe-dev] [RFC] implementation of _Float16

 

 

On 05/10/2017 09:01 AM, Martin J. O'Riordan wrote:

Yes, I see how this would be an issue if it is necessary to keep the storage-only versus native types separate.

 

At the moment I have ‘short float’ internally associated with OpenCL’s ‘half’ but I do not enable ‘half’ as a keyword.  Independently I have made ‘__fp16’ when used with our target also be a synonym for ‘short float/half’ (simply to avoid adding a new keyword).  This in turn is bound to the IEEE FP16 using ‘HalfFormat = &llvm::APFloat::IEEEhalf();’.

 

In our case it is always a native type and never a storage only type, so coupling ‘__fp16’ to ‘half’ made sense.  Certainly if the native versus storage-only variants were distinct, then this association I have made would have to be decoupled (not a big-deal).

 

Another approach might be to always work with FP16 as-if native, but to provide only Load/Store instructions in the TableGen descriptions for FP16, and to adapt lowering to always perform the arithmetic using FP32 if the selected target does not support native FP16 - would that be feasible in your case?  In this way it is not really any different to how targets that have no FPU can use an alternative integer based implementation (with the help of ‘compiler-rt’).

 

I can certainly see how something like ‘ADD’ of ‘f16’ could be changed to use ‘Expand’ in lowering rather than ‘Legal’ as a function of the selected target (or some other target specific option) - we just marked it ‘Legal’ and provided the corresponding instructions in TableGen with very little custom lowering necessary.  I have a mild concern that LLVM would have to have an ‘f16’ which is native and another kind-of ‘f16’ restricted to being only storage.


Why? That should only be true if they have different semantics.

 -Hal


 

Thanks,

 

            MartinO

 

From: Sjoerd Meijer [[hidden email]]
Sent: 10 May 2017 14:19
To: Martin J. O'Riordan [hidden email]; 'Hal Finkel' [hidden email]
Cc: 'clang developer list' [hidden email]
Subject: RE: [cfe-dev] [RFC] implementation of _Float16

 

Hi Hal, Martin,

 

Thanks for the feedback.

Yes, the issue indeed is that ‘__fp16’ is already used to implement a storage-only type.  And earlier I wrote that I don’t expect LLVM IR changes, but now I am not so sure anymore if both types map onto the same half LLVM IR type. With  two half precision types, __fp16 and _Float16, where one is a storage only type and the other a native type, somehow the distinction between these two must be made I think.

 

Cheers,

Sjoerd.

 

From: Martin J. O'Riordan [[hidden email]]
Sent: 10 May 2017 14:13
To: 'Hal Finkel'; Sjoerd Meijer
Cc: 'clang developer list'
Subject: RE: [cfe-dev] [RFC] implementation of _Float16

 

Our Out-of-Tree target implements fully native FP16 operations based on ‘__fp16’ (scalar and SIMD vector), so is the issue for ARM that ‘__fp16’ is already used to implement a storage-only type and that another type is needed to differentiate between a native and a storage-only type?  Once the ‘f16’ type appears in the IR (and the vector variants) the code-generation is straightforward enough.

 

Certainly we have had to make many changes to CLang and to LLVM to fully implement this including suppression of implicit conversion to ‘double’, but nothing scary or obscure.  Many of these changes are simply to enable something that is already normal for OpenCL, but to do so for C and C++.

 

More controversially we also added a “synonym” for this using ‘short float’ rather than ‘_Float16’ (or OpenCL’s ‘half’), and created a parallel set of the ISO C library functions using ‘s’ to suffix the usual names (e.g. ‘tan’, ‘tanf’, ‘tanl’ plus ‘tans’).  The ‘s’ suffix was unambiguous (though we actually use the double-underscore prefix, e.g. ‘__tans’ to avoid conflict with the user’s names) and the type ‘short float’ was available too without breaking anything.  Enabling the ‘h’ suffix for FP constants (again from OpenCL) makes the whole fit smoothly with the normal FP types.

 

However, for variadic functions (such as ‘printf’) we do promote to ‘double’ because there are no formatting specifiers available for ‘half’ any more than there is support for ‘float’ - it is also consistent with ‘va_arg’ usage for ‘char’ and ‘short’ as ‘int’.  My feeling is that using implementation defined types ‘float’, ‘double’ and ‘long double’ can be extended to include ‘short float’ without dictating that they have any particular bit-sizes (e.g. FP16 for ‘half’).

 

This solution has worked very well over the past few years and is symmetric with the other floating-point data types.

 

There are some issues with C++ and overloading because ‘__fp16’ to other FP types (and INT types) is not ranked in exactly the same way as for example ‘float’ is to other FP types; but this is really only because it is not a 1st class citizen of the type-system and the rules would need to be specified to make this valid.  I have not tried to fix this as it works reasonably well as it is, and it would really be an issue for the C++ committee to decide if they ever choose to adopt another FP data type.  I did add it to the traits in the C++ library though so that it is considered legal for floating-point types.

 

I’d love to see this adopted as a formal type in a future version of ISO C and ISO C++.

 

            MartinO

 

From: cfe-dev [[hidden email]] On Behalf Of Hal Finkel via cfe-dev
Sent: 10 May 2017 11:39
To: Sjoerd Meijer <[hidden email]>; [hidden email]
Subject: Re: [cfe-dev] [RFC] implementation of _Float16

 

On 05/10/2017 05:18 AM, Sjoerd Meijer via cfe-dev wrote:

Hi,

 

ARMv8.2-A introduces as an optional extension half-precision data-processing instructions for Advanced SIMD and floating-point in both AArch64 and AArch32 states [1], and we are looking into implementing C/C++-language support for these new ARMv8.2-A half-precision instructions.

 

We would like to introduce a new Clang type. The reason is that we e.g. cannot use type __fp16 (defined in the ARM C Language Extensions [2]) because it is a storage type only. This means when using standard C operators values of __fp16 type promote to float when used in arithmetic operations, which we would like to avoid for the ARMv8.2-A half-precision instructions. Please note that the LLVM IR already has a half precision type, onto which for example __fp16 is mapped, so there are no changes or additions required for the LLVM IR.

 

As a new Clang type we would like to propose _Float16 as defined in a C11 extension, see [3]. Arithmetic is well defined, it is not only a storage type as __fp16. Our question is whether a partial implementation, just implementing this type and not claiming (full) C11 conformance is acceptable?


I would very much like to see fp16 as a first-class floating-point type in Clang and LLVM (i.e. handling that is not just a storage type). Doing this in Clang in a way that is specified by C11 seems like the right approach. I don't see why implementing this would be predicated on implementing other parts of C11.

 -Hal

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.



-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] implementation of _Float16

Martin J. O'Riordan via cfe-dev
In reply to this post by Martin J. O'Riordan via cfe-dev
On Wed, May 10, 2017 at 9:13 AM, Martin J. O'Riordan via cfe-dev <[hidden email]> wrote:
More controversially we also added a “synonym” for this using ‘short float’ rather than ‘_Float16’ (or OpenCL’s ‘half’), and created a parallel set of the ISO C library functions using ‘s’ to suffix the usual names (e.g. ‘tan’, ‘tanf’, ‘tanl’ plus ‘tans’).
Perhaps a bit of a tangent here with regards to language semantics. I am going to guess that, in general, short float is not necessarily the same format as _Float16. So it follows that if both are present, then _Float16 and short float are types which are not compatible (e.g., for _Generic) with each other. This would be consistent with _Float32 and "plain" float.

-- HT

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] implementation of _Float16

Martin J. O'Riordan via cfe-dev
Note that ISO/IEC TS 18661-3 (linked in the first email on this thread) already defines a standard suffix for math functions operating on _Float16, in section 12.3:

        _Float16 acosf16(_Float16 x);
        _Float16 asinf16(_Float16 x);
        …

There’s no need to invent something new for this.

– Steve

> On May 10, 2017, at 5:22 PM, Hubert Tong via cfe-dev <[hidden email]> wrote:
>
>> On Wed, May 10, 2017 at 9:13 AM, Martin J. O'Riordan via cfe-dev <[hidden email]> wrote:
>> More controversially we also added a “synonym” for this using ‘short float’ rather than ‘_Float16’ (or OpenCL’s ‘half’), and created a parallel set of the ISO C library functions using ‘s’ to suffix the usual names (e.g. ‘tan’, ‘tanf’, ‘tanl’ plus ‘tans’).
>
> Perhaps a bit of a tangent here with regards to language semantics. I am going to guess that, in general, short float is not necessarily the same format as _Float16. So it follows that if both are present, then _Float16 and short float are types which are not compatible (e.g., for _Generic) with each other. This would be consistent with _Float32 and "plain" float.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] implementation of _Float16

Martin J. O'Riordan via cfe-dev
In reply to this post by Martin J. O'Riordan via cfe-dev

Hi Hal,

 

You mentioned “I'd be in favor of changing the current semantics”. Just checking: do you mean the semantics of __fp16?

Because that is exactly what we are trying to avoid with introducing a new true half type; changing semantics of fp16 would break backward compatibility.

 

> By "when required", do you mean when the result would

> be the same as if the operation had been performed in single

> precision? If so, then no, we need different semantics

 

I think that is indeed the case, but I am double checking that.

 

Cheers,

Sjoerd.

 

From: Hal Finkel [mailto:[hidden email]]
Sent: 10 May 2017 17:40
To: Sjoerd Meijer; Martin J. O'Riordan
Cc: 'clang developer list'; nd
Subject: Re: [cfe-dev] [RFC] implementation of _Float16

 

 

On 05/10/2017 11:15 AM, Sjoerd Meijer wrote:

The thing that confused me again is that for simple expressions/examples like this:

 

__fp16 MyAdd(__fp16 a, __fp16 b) {

  return a + b;

}

 

The IR does not include promotions/truncations which you would expect (because operations are done on floats):

 

define half @MyAdd(half %a, half %b) local_unnamed_addr #0 {

entry:

  %0 = fadd half %a, %b

  ret half %0

}

 

But that is only because there is this optimisation that does not include them if it can prove that the result with/without these converts is the same, so in other cases the promotes/truncates are there as expected.

 

This means that Clang produces the necessary promotions when needed, and that a new _Float16 type can also be mapped onto the LLRM IR half type I think (no changes needed). Yes, then the approach could indeed be to treat it as a native type, and only promote operands to floats when required.


By "when required", do you mean when the result would be the same as if the operation had been performed in single precision? If so, then no, we need different semantics. That having been said, I'd be in favor of changing the current semantics to require explicit promotions/truncations, change the existing optimization to elide them when they're provably redundant (as we do with other such things), and then only have a single, true, half-precision type. I suspect that we'd need to figure out how to auto-upgrade, but that seems doable.

 -Hal


 

Cheers,

Sjoerd.

 

From: Hal Finkel [[hidden email]]
Sent: 10 May 2017 16:00
To: Martin J. O'Riordan; Sjoerd Meijer
Cc: 'clang developer list'
Subject: Re: [cfe-dev] [RFC] implementation of _Float16

 

 

On 05/10/2017 09:01 AM, Martin J. O'Riordan wrote:

Yes, I see how this would be an issue if it is necessary to keep the storage-only versus native types separate.

 

At the moment I have ‘short float’ internally associated with OpenCL’s ‘half’ but I do not enable ‘half’ as a keyword.  Independently I have made ‘__fp16’ when used with our target also be a synonym for ‘short float/half’ (simply to avoid adding a new keyword).  This in turn is bound to the IEEE FP16 using ‘HalfFormat = &llvm::APFloat::IEEEhalf();’.

 

In our case it is always a native type and never a storage only type, so coupling ‘__fp16’ to ‘half’ made sense.  Certainly if the native versus storage-only variants were distinct, then this association I have made would have to be decoupled (not a big-deal).

 

Another approach might be to always work with FP16 as-if native, but to provide only Load/Store instructions in the TableGen descriptions for FP16, and to adapt lowering to always perform the arithmetic using FP32 if the selected target does not support native FP16 - would that be feasible in your case?  In this way it is not really any different to how targets that have no FPU can use an alternative integer based implementation (with the help of ‘compiler-rt’).

 

I can certainly see how something like ‘ADD’ of ‘f16’ could be changed to use ‘Expand’ in lowering rather than ‘Legal’ as a function of the selected target (or some other target specific option) - we just marked it ‘Legal’ and provided the corresponding instructions in TableGen with very little custom lowering necessary.  I have a mild concern that LLVM would have to have an ‘f16’ which is native and another kind-of ‘f16’ restricted to being only storage.


Why? That should only be true if they have different semantics.

 -Hal



 

Thanks,

 

            MartinO

 

From: Sjoerd Meijer [[hidden email]]
Sent: 10 May 2017 14:19
To: Martin J. O'Riordan [hidden email]; 'Hal Finkel' [hidden email]
Cc: 'clang developer list' [hidden email]
Subject: RE: [cfe-dev] [RFC] implementation of _Float16

 

Hi Hal, Martin,

 

Thanks for the feedback.

Yes, the issue indeed is that ‘__fp16’ is already used to implement a storage-only type.  And earlier I wrote that I don’t expect LLVM IR changes, but now I am not so sure anymore if both types map onto the same half LLVM IR type. With  two half precision types, __fp16 and _Float16, where one is a storage only type and the other a native type, somehow the distinction between these two must be made I think.

 

Cheers,

Sjoerd.

 

From: Martin J. O'Riordan [[hidden email]]
Sent: 10 May 2017 14:13
To: 'Hal Finkel'; Sjoerd Meijer
Cc: 'clang developer list'
Subject: RE: [cfe-dev] [RFC] implementation of _Float16

 

Our Out-of-Tree target implements fully native FP16 operations based on ‘__fp16’ (scalar and SIMD vector), so is the issue for ARM that ‘__fp16’ is already used to implement a storage-only type and that another type is needed to differentiate between a native and a storage-only type?  Once the ‘f16’ type appears in the IR (and the vector variants) the code-generation is straightforward enough.

 

Certainly we have had to make many changes to CLang and to LLVM to fully implement this including suppression of implicit conversion to ‘double’, but nothing scary or obscure.  Many of these changes are simply to enable something that is already normal for OpenCL, but to do so for C and C++.

 

More controversially we also added a “synonym” for this using ‘short float’ rather than ‘_Float16’ (or OpenCL’s ‘half’), and created a parallel set of the ISO C library functions using ‘s’ to suffix the usual names (e.g. ‘tan’, ‘tanf’, ‘tanl’ plus ‘tans’).  The ‘s’ suffix was unambiguous (though we actually use the double-underscore prefix, e.g. ‘__tans’ to avoid conflict with the user’s names) and the type ‘short float’ was available too without breaking anything.  Enabling the ‘h’ suffix for FP constants (again from OpenCL) makes the whole fit smoothly with the normal FP types.

 

However, for variadic functions (such as ‘printf’) we do promote to ‘double’ because there are no formatting specifiers available for ‘half’ any more than there is support for ‘float’ - it is also consistent with ‘va_arg’ usage for ‘char’ and ‘short’ as ‘int’.  My feeling is that using implementation defined types ‘float’, ‘double’ and ‘long double’ can be extended to include ‘short float’ without dictating that they have any particular bit-sizes (e.g. FP16 for ‘half’).

 

This solution has worked very well over the past few years and is symmetric with the other floating-point data types.

 

There are some issues with C++ and overloading because ‘__fp16’ to other FP types (and INT types) is not ranked in exactly the same way as for example ‘float’ is to other FP types; but this is really only because it is not a 1st class citizen of the type-system and the rules would need to be specified to make this valid.  I have not tried to fix this as it works reasonably well as it is, and it would really be an issue for the C++ committee to decide if they ever choose to adopt another FP data type.  I did add it to the traits in the C++ library though so that it is considered legal for floating-point types.

 

I’d love to see this adopted as a formal type in a future version of ISO C and ISO C++.

 

            MartinO

 

From: cfe-dev [[hidden email]] On Behalf Of Hal Finkel via cfe-dev
Sent: 10 May 2017 11:39
To: Sjoerd Meijer <[hidden email]>; [hidden email]
Subject: Re: [cfe-dev] [RFC] implementation of _Float16

 

On 05/10/2017 05:18 AM, Sjoerd Meijer via cfe-dev wrote:

Hi,

 

ARMv8.2-A introduces as an optional extension half-precision data-processing instructions for Advanced SIMD and floating-point in both AArch64 and AArch32 states [1], and we are looking into implementing C/C++-language support for these new ARMv8.2-A half-precision instructions.

 

We would like to introduce a new Clang type. The reason is that we e.g. cannot use type __fp16 (defined in the ARM C Language Extensions [2]) because it is a storage type only. This means when using standard C operators values of __fp16 type promote to float when used in arithmetic operations, which we would like to avoid for the ARMv8.2-A half-precision instructions. Please note that the LLVM IR already has a half precision type, onto which for example __fp16 is mapped, so there are no changes or additions required for the LLVM IR.

 

As a new Clang type we would like to propose _Float16 as defined in a C11 extension, see [3]. Arithmetic is well defined, it is not only a storage type as __fp16. Our question is whether a partial implementation, just implementing this type and not claiming (full) C11 conformance is acceptable?


I would very much like to see fp16 as a first-class floating-point type in Clang and LLVM (i.e. handling that is not just a storage type). Doing this in Clang in a way that is specified by C11 seems like the right approach. I don't see why implementing this would be predicated on implementing other parts of C11.

 -Hal

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.




-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory



-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] implementation of _Float16

Martin J. O'Riordan via cfe-dev


On 05/11/2017 06:22 AM, Sjoerd Meijer wrote:

Hi Hal,

 

You mentioned “I'd be in favor of changing the current semantics”. Just checking: do you mean the semantics of __fp16?

Because that is exactly what we are trying to avoid with introducing a new true half type; changing semantics of fp16 would break backward compatibility.


I don't mean changing the semantics of __fp16, the source-language type. I mean changing the semantics of the IR-level half type. I suspect we can do this along with an auto-upgrade feature that does not break backwards compatibility (by inserting extend/truncate around operations in old IR as I described to adjust for the existing semantics as you described them).

 -Hal

 

> By "when required", do you mean when the result would

> be the same as if the operation had been performed in single

> precision? If so, then no, we need different semantics

 

I think that is indeed the case, but I am double checking that.

 

Cheers,

Sjoerd.

 

From: Hal Finkel [[hidden email]]
Sent: 10 May 2017 17:40
To: Sjoerd Meijer; Martin J. O'Riordan
Cc: 'clang developer list'; nd
Subject: Re: [cfe-dev] [RFC] implementation of _Float16

 

 

On 05/10/2017 11:15 AM, Sjoerd Meijer wrote:

The thing that confused me again is that for simple expressions/examples like this:

 

__fp16 MyAdd(__fp16 a, __fp16 b) {

  return a + b;

}

 

The IR does not include promotions/truncations which you would expect (because operations are done on floats):

 

define half @MyAdd(half %a, half %b) local_unnamed_addr #0 {

entry:

  %0 = fadd half %a, %b

  ret half %0

}

 

But that is only because there is this optimisation that does not include them if it can prove that the result with/without these converts is the same, so in other cases the promotes/truncates are there as expected.

 

This means that Clang produces the necessary promotions when needed, and that a new _Float16 type can also be mapped onto the LLRM IR half type I think (no changes needed). Yes, then the approach could indeed be to treat it as a native type, and only promote operands to floats when required.


By "when required", do you mean when the result would be the same as if the operation had been performed in single precision? If so, then no, we need different semantics. That having been said, I'd be in favor of changing the current semantics to require explicit promotions/truncations, change the existing optimization to elide them when they're provably redundant (as we do with other such things), and then only have a single, true, half-precision type. I suspect that we'd need to figure out how to auto-upgrade, but that seems doable.

 -Hal


 

Cheers,

Sjoerd.

 

From: Hal Finkel [[hidden email]]
Sent: 10 May 2017 16:00
To: Martin J. O'Riordan; Sjoerd Meijer
Cc: 'clang developer list'
Subject: Re: [cfe-dev] [RFC] implementation of _Float16

 

 

On 05/10/2017 09:01 AM, Martin J. O'Riordan wrote:

Yes, I see how this would be an issue if it is necessary to keep the storage-only versus native types separate.

 

At the moment I have ‘short float’ internally associated with OpenCL’s ‘half’ but I do not enable ‘half’ as a keyword.  Independently I have made ‘__fp16’ when used with our target also be a synonym for ‘short float/half’ (simply to avoid adding a new keyword).  This in turn is bound to the IEEE FP16 using ‘HalfFormat = &llvm::APFloat::IEEEhalf();’.

 

In our case it is always a native type and never a storage only type, so coupling ‘__fp16’ to ‘half’ made sense.  Certainly if the native versus storage-only variants were distinct, then this association I have made would have to be decoupled (not a big-deal).

 

Another approach might be to always work with FP16 as-if native, but to provide only Load/Store instructions in the TableGen descriptions for FP16, and to adapt lowering to always perform the arithmetic using FP32 if the selected target does not support native FP16 - would that be feasible in your case?  In this way it is not really any different to how targets that have no FPU can use an alternative integer based implementation (with the help of ‘compiler-rt’).

 

I can certainly see how something like ‘ADD’ of ‘f16’ could be changed to use ‘Expand’ in lowering rather than ‘Legal’ as a function of the selected target (or some other target specific option) - we just marked it ‘Legal’ and provided the corresponding instructions in TableGen with very little custom lowering necessary.  I have a mild concern that LLVM would have to have an ‘f16’ which is native and another kind-of ‘f16’ restricted to being only storage.


Why? That should only be true if they have different semantics.

 -Hal



 

Thanks,

 

            MartinO

 

From: Sjoerd Meijer [[hidden email]]
Sent: 10 May 2017 14:19
To: Martin J. O'Riordan [hidden email]; 'Hal Finkel' [hidden email]
Cc: 'clang developer list' [hidden email]
Subject: RE: [cfe-dev] [RFC] implementation of _Float16

 

Hi Hal, Martin,

 

Thanks for the feedback.

Yes, the issue indeed is that ‘__fp16’ is already used to implement a storage-only type.  And earlier I wrote that I don’t expect LLVM IR changes, but now I am not so sure anymore if both types map onto the same half LLVM IR type. With  two half precision types, __fp16 and _Float16, where one is a storage only type and the other a native type, somehow the distinction between these two must be made I think.

 

Cheers,

Sjoerd.

 

From: Martin J. O'Riordan [[hidden email]]
Sent: 10 May 2017 14:13
To: 'Hal Finkel'; Sjoerd Meijer
Cc: 'clang developer list'
Subject: RE: [cfe-dev] [RFC] implementation of _Float16

 

Our Out-of-Tree target implements fully native FP16 operations based on ‘__fp16’ (scalar and SIMD vector), so is the issue for ARM that ‘__fp16’ is already used to implement a storage-only type and that another type is needed to differentiate between a native and a storage-only type?  Once the ‘f16’ type appears in the IR (and the vector variants) the code-generation is straightforward enough.

 

Certainly we have had to make many changes to CLang and to LLVM to fully implement this including suppression of implicit conversion to ‘double’, but nothing scary or obscure.  Many of these changes are simply to enable something that is already normal for OpenCL, but to do so for C and C++.

 

More controversially we also added a “synonym” for this using ‘short float’ rather than ‘_Float16’ (or OpenCL’s ‘half’), and created a parallel set of the ISO C library functions using ‘s’ to suffix the usual names (e.g. ‘tan’, ‘tanf’, ‘tanl’ plus ‘tans’).  The ‘s’ suffix was unambiguous (though we actually use the double-underscore prefix, e.g. ‘__tans’ to avoid conflict with the user’s names) and the type ‘short float’ was available too without breaking anything.  Enabling the ‘h’ suffix for FP constants (again from OpenCL) makes the whole fit smoothly with the normal FP types.

 

However, for variadic functions (such as ‘printf’) we do promote to ‘double’ because there are no formatting specifiers available for ‘half’ any more than there is support for ‘float’ - it is also consistent with ‘va_arg’ usage for ‘char’ and ‘short’ as ‘int’.  My feeling is that using implementation defined types ‘float’, ‘double’ and ‘long double’ can be extended to include ‘short float’ without dictating that they have any particular bit-sizes (e.g. FP16 for ‘half’).

 

This solution has worked very well over the past few years and is symmetric with the other floating-point data types.

 

There are some issues with C++ and overloading because ‘__fp16’ to other FP types (and INT types) is not ranked in exactly the same way as for example ‘float’ is to other FP types; but this is really only because it is not a 1st class citizen of the type-system and the rules would need to be specified to make this valid.  I have not tried to fix this as it works reasonably well as it is, and it would really be an issue for the C++ committee to decide if they ever choose to adopt another FP data type.  I did add it to the traits in the C++ library though so that it is considered legal for floating-point types.

 

I’d love to see this adopted as a formal type in a future version of ISO C and ISO C++.

 

            MartinO

 

From: cfe-dev [[hidden email]] On Behalf Of Hal Finkel via cfe-dev
Sent: 10 May 2017 11:39
To: Sjoerd Meijer <[hidden email]>; [hidden email]
Subject: Re: [cfe-dev] [RFC] implementation of _Float16

 

On 05/10/2017 05:18 AM, Sjoerd Meijer via cfe-dev wrote:

Hi,

 

ARMv8.2-A introduces as an optional extension half-precision data-processing instructions for Advanced SIMD and floating-point in both AArch64 and AArch32 states [1], and we are looking into implementing C/C++-language support for these new ARMv8.2-A half-precision instructions.

 

We would like to introduce a new Clang type. The reason is that we e.g. cannot use type __fp16 (defined in the ARM C Language Extensions [2]) because it is a storage type only. This means when using standard C operators values of __fp16 type promote to float when used in arithmetic operations, which we would like to avoid for the ARMv8.2-A half-precision instructions. Please note that the LLVM IR already has a half precision type, onto which for example __fp16 is mapped, so there are no changes or additions required for the LLVM IR.

 

As a new Clang type we would like to propose _Float16 as defined in a C11 extension, see [3]. Arithmetic is well defined, it is not only a storage type as __fp16. Our question is whether a partial implementation, just implementing this type and not claiming (full) C11 conformance is acceptable?


I would very much like to see fp16 as a first-class floating-point type in Clang and LLVM (i.e. handling that is not just a storage type). Doing this in Clang in a way that is specified by C11 seems like the right approach. I don't see why implementing this would be predicated on implementing other parts of C11.

 -Hal

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.




-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory



-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] implementation of _Float16

Martin J. O'Riordan via cfe-dev
In reply to this post by Martin J. O'Riordan via cfe-dev
On 10 May 2017 at 09:39, Hal Finkel via cfe-dev <[hidden email]> wrote:
> By "when required", do you mean when the result would be the same as if the
> operation had been performed in single precision? If so, then no, we need
> different semantics.

I don't follow here. I've discussed this before (with Steve Canon, so
if he says anything that contradicts me, ignore me), and I was
convinced that all of LLVM's primitive operations have the same
semantics as half as when promoted to float and then truncated again
(including sqrt, but not fma I believe).

So as far as I know the situation right now is that we miscompile
@llvm.fma.f16 with an extra fpext/fptrunc. That could be a problem if
Clang emits that for __fp16, but I haven't managed to make it do so
yet ("fp contract" seems to give a fmuladd with correct promotions
etc).

Other than that, transcendentals are pattern-matched so shouldn't
cause compatibility issues.

Do you have any more specific worries?

Cheers.

Tim.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] implementation of _Float16

Martin J. O'Riordan via cfe-dev

On 05/11/2017 05:54 PM, Tim Northover wrote:
> On 10 May 2017 at 09:39, Hal Finkel via cfe-dev <[hidden email]> wrote:
>> By "when required", do you mean when the result would be the same as if the
>> operation had been performed in single precision? If so, then no, we need
>> different semantics.
> I don't follow here. I've discussed this before (with Steve Canon, so
> if he says anything that contradicts me, ignore me), and I was
> convinced that all of LLVM's primitive operations have the same
> semantics as half as when promoted to float and then truncated again
> (including sqrt, but not fma I believe).

That's what's been asserted here as well. The question is: If we're
going to want a type that represents half precision without the implied
extend/truncate operations, do we a) Introduce a new type that is
"really" a half or b) change half not to imply the extend/truncate and
then autoupgrade?

  -Hal

>
> So as far as I know the situation right now is that we miscompile
> @llvm.fma.f16 with an extra fpext/fptrunc. That could be a problem if
> Clang emits that for __fp16, but I haven't managed to make it do so
> yet ("fp contract" seems to give a fmuladd with correct promotions
> etc).
>
> Other than that, transcendentals are pattern-matched so shouldn't
> cause compatibility issues.
>
> Do you have any more specific worries?
>
> Cheers.
>
> Tim.

--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] implementation of _Float16

Martin J. O'Riordan via cfe-dev

On May 11, 2017, at 7:11 PM, Hal Finkel via cfe-dev <[hidden email]> wrote:

That's what's been asserted here as well. The question is: If we're going to want a type that represents half precision without the implied extend/truncate operations, do we a) Introduce a new type that is "really" a half or b) change half not to imply the extend/truncate and then autoupgrade?

Just to try to try to be precise, I want to broaden this slightly and try to sketch out all the questions around this. Apologies if the answers to these are obvious or you feel like they’re already settled. I’d like to make sure we define the scope of the decisions pretty clearly before bike shedding it to death =)

(a) For targets that do not have fp16 hardware support, what is FLT_EVAL_METHOD (I’m using the C-language bindings here so that there are semi-formal definitions that people can look up, but this is at least partially a non-language specific policy decision)?

- We could choose FLT_EVAL_METHOD = 0, which requires us to “simulate” _Float16 operations by upconverting to a legal type (float), doing the operation in float, and converting back to _Float16 for every operation (this works for all the arithmetic instructions, except fma, which would require a libcall or other special handling, but we would want fma formation from mul + add to still be licensed when allowed by program semantics).

- We could choose FLT_EVAL_METHOD = 32, which allows us to maintain extra precision by eliding the conversions to/from _Float16 around each operation (leaving intermediate results in `float`).

The second option obviously yields better performance on many targets, but slightly reduces portability; targets without _Float16 support now get different answers than targets that have _Float16 support for basic arithmetic. The second option matches (I think?) the intended behavior of the arm __fp16 extension.

(b) For targets that have fp16 hardware support, we still get to choose FLT_EVAL_METHOD.

- Use the fp16 hardware. FLT_EVAL_METHOD = 0.

- The other choice is FLT_EVAL_METHOD = 32 (matching the existing behavior of __fp16, but making it much harder for people to take advantage of the shiny new instructions—they would have to use intrinsics—and severely hampering the autovectorizer’s options).

It sounds like everyone is settled on the first choice (and I agree with that), but let’s be clear that this *is* a decision that we’re making.

(c) Assuming FLT_EVAL_METHOD = 0 for targets with fp16 hardware, do we need to support a type with the __fp16 extension semantics of “implicitly promote everything to float” for the purposes of source compatibility?

Sounds like “yes”, at least for some toolchains.

(d) If yes, does that actually require a separate type at LLVM IR layer?

I don’t immediately see that it would, but I am not an expert.

Anything I missed?
– Steve

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] implementation of _Float16

Martin J. O'Riordan via cfe-dev

Hi Steve,

 

Thanks for the explanations and pointers!

 

> Anything I missed?

 

Maybe. Reading your reply and also N1945 again, we got confused (my colleague Simon Tatham helped me here). You seem to be interpreting FLT_EVAL_METHOD=0 as saying that fp16 operations are done in fp16 precision, and FLT_EVAL_METHOD=32 as saying that fp16 operations are done in fp32 precision.

 

But by our reading of N1945, FLT_EVAL_METHOD=0 and FLT_EVAL_METHOD=32 mean the same thing, at least on a platform where float is 32 bits wide. Each one says that for some type T, expressions whose semantic type is T or smaller are evaluated in type T; and for FLT_EVAL_METHOD=0, T = float, whereas for FLT_EVAL_METHOD=32, T = _Float32, i.e. the same physical type in both cases.

 

So surely, to indicate that fp16 operations are done in fp16 precision, we would have to set FLT_EVAL_METHOD to 16, not to 0?

 

Cheers,

Sjoerd.

 

 

From: cfe-dev [mailto:[hidden email]] On Behalf Of Stephen Canon via cfe-dev
Sent: 12 May 2017 00:57
To: Hal Finkel
Cc: nd; clang developer list
Subject: Re: [cfe-dev] [RFC] implementation of _Float16

 

 

On May 11, 2017, at 7:11 PM, Hal Finkel via cfe-dev <[hidden email]> wrote:

 

That's what's been asserted here as well. The question is: If we're going to want a type that represents half precision without the implied extend/truncate operations, do we a) Introduce a new type that is "really" a half or b) change half not to imply the extend/truncate and then autoupgrade?

 

Just to try to try to be precise, I want to broaden this slightly and try to sketch out all the questions around this. Apologies if the answers to these are obvious or you feel like they’re already settled. I’d like to make sure we define the scope of the decisions pretty clearly before bike shedding it to death =)

 

(a) For targets that do not have fp16 hardware support, what is FLT_EVAL_METHOD (I’m using the C-language bindings here so that there are semi-formal definitions that people can look up, but this is at least partially a non-language specific policy decision)?

 

            - We could choose FLT_EVAL_METHOD = 0, which requires us to “simulate” _Float16 operations by upconverting to a legal type (float), doing the operation in float, and converting back to _Float16 for every operation (this works for all the arithmetic instructions, except fma, which would require a libcall or other special handling, but we would want fma formation from mul + add to still be licensed when allowed by program semantics).

 

            - We could choose FLT_EVAL_METHOD = 32, which allows us to maintain extra precision by eliding the conversions to/from _Float16 around each operation (leaving intermediate results in `float`).

 

The second option obviously yields better performance on many targets, but slightly reduces portability; targets without _Float16 support now get different answers than targets that have _Float16 support for basic arithmetic. The second option matches (I think?) the intended behavior of the arm __fp16 extension.

 

(b) For targets that have fp16 hardware support, we still get to choose FLT_EVAL_METHOD.

 

            - Use the fp16 hardware. FLT_EVAL_METHOD = 0.

 

            - The other choice is FLT_EVAL_METHOD = 32 (matching the existing behavior of __fp16, but making it much harder for people to take advantage of the shiny new instructions—they would have to use intrinsics—and severely hampering the autovectorizer’s options).

 

It sounds like everyone is settled on the first choice (and I agree with that), but let’s be clear that this *is* a decision that we’re making.

 

(c) Assuming FLT_EVAL_METHOD = 0 for targets with fp16 hardware, do we need to support a type with the __fp16 extension semantics of “implicitly promote everything to float” for the purposes of source compatibility?

 

Sounds like “yes”, at least for some toolchains.

 

(d) If yes, does that actually require a separate type at LLVM IR layer?

 

I don’t immediately see that it would, but I am not an expert.

 

Anything I missed?

– Steve


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] implementation of _Float16

Martin J. O'Riordan via cfe-dev
Yes, you’re right. That’s what I get for writing it up from memory instead of just re-reading the spec.

The choice is really between FLT_EVAL_METHOD=16 (evaluate _Float16 in _Float16) and FLT_EVAL_METHOD=0 (evaluate _Float16 in float).

– Steve

On May 15, 2017, at 9:50 AM, Sjoerd Meijer <[hidden email]> wrote:

Hi Steve,
 
Thanks for the explanations and pointers!
 
> Anything I missed?
 
Maybe. Reading your reply and also N1945 again, we got confused (my colleague Simon Tatham helped me here). You seem to be interpreting FLT_EVAL_METHOD=0 as saying that fp16 operations are done in fp16 precision, and FLT_EVAL_METHOD=32 as saying that fp16 operations are done in fp32 precision.
 
But by our reading of N1945, FLT_EVAL_METHOD=0 and FLT_EVAL_METHOD=32 mean the same thing, at least on a platform where float is 32 bits wide. Each one says that for some type T, expressions whose semantic type is T or smaller are evaluated in type T; and for FLT_EVAL_METHOD=0, T = float, whereas for FLT_EVAL_METHOD=32, T = _Float32, i.e. the same physical type in both cases.
 
So surely, to indicate that fp16 operations are done in fp16 precision, we would have to set FLT_EVAL_METHOD to 16, not to 0?
 
Cheers,
Sjoerd.
 
 
From: cfe-dev [[hidden email]] On Behalf Of Stephen Canon via cfe-dev
Sent: 12 May 2017 00:57
To: Hal Finkel
Cc: nd; clang developer list
Subject: Re: [cfe-dev] [RFC] implementation of _Float16
 
 
On May 11, 2017, at 7:11 PM, Hal Finkel via cfe-dev <[hidden email]> wrote:
 
That's what's been asserted here as well. The question is: If we're going to want a type that represents half precision without the implied extend/truncate operations, do we a) Introduce a new type that is "really" a half or b) change half not to imply the extend/truncate and then autoupgrade?
 
Just to try to try to be precise, I want to broaden this slightly and try to sketch out all the questions around this. Apologies if the answers to these are obvious or you feel like they’re already settled. I’d like to make sure we define the scope of the decisions pretty clearly before bike shedding it to death =)
 
(a) For targets that do not have fp16 hardware support, what is FLT_EVAL_METHOD (I’m using the C-language bindings here so that there are semi-formal definitions that people can look up, but this is at least partially a non-language specific policy decision)?
 
            - We could choose FLT_EVAL_METHOD = 0, which requires us to “simulate” _Float16 operations by upconverting to a legal type (float), doing the operation in float, and converting back to _Float16 for every operation (this works for all the arithmetic instructions, except fma, which would require a libcall or other special handling, but we would want fma formation from mul + add to still be licensed when allowed by program semantics).
 
            - We could choose FLT_EVAL_METHOD = 32, which allows us to maintain extra precision by eliding the conversions to/from _Float16 around each operation (leaving intermediate results in `float`).
 
The second option obviously yields better performance on many targets, but slightly reduces portability; targets without _Float16 support now get different answers than targets that have _Float16 support for basic arithmetic. The second option matches (I think?) the intended behavior of the arm __fp16 extension.
 
(b) For targets that have fp16 hardware support, we still get to choose FLT_EVAL_METHOD.
 
            - Use the fp16 hardware. FLT_EVAL_METHOD = 0.
 
            - The other choice is FLT_EVAL_METHOD = 32 (matching the existing behavior of __fp16, but making it much harder for people to take advantage of the shiny new instructions—they would have to use intrinsics—and severely hampering the autovectorizer’s options).
 
It sounds like everyone is settled on the first choice (and I agree with that), but let’s be clear that this *is* a decision that we’re making.
 
(c) Assuming FLT_EVAL_METHOD = 0 for targets with fp16 hardware, do we need to support a type with the __fp16 extension semantics of “implicitly promote everything to float” for the purposes of source compatibility?
 
Sounds like “yes”, at least for some toolchains.
 
(d) If yes, does that actually require a separate type at LLVM IR layer?
 
I don’t immediately see that it would, but I am not an expert.
 
Anything I missed?
– Steve


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Loading...