Clang builtins for C++20 STL features

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Clang builtins for C++20 STL features

Oleg Smolsky via cfe-dev
Hi Clang devs,

WG21 has voted in a bunch of C++20 STL features that need compiler support via builtins/intrinsics. As usual, MSVC's STL would like to use identically-named builtins for Clang, C1XX, and EDG, so I wanted to ask if you've chosen any names (and interfaces) yet. Also as usual, I have utterly no opinion about naming - any name that gets the compiler to do my work for me is amazing (as long as all compilers are consistent). :-)

* P0595R2 std::is_constant_evaluated()
Should this be __is_constant_evaluated() or __builtin_is_constant_evaluated() or something else?

* P0482R6 char8_t
Given std::is_constant_evaluated(), we might not need anything new here. Otherwise, should there be analogues of __builtin_memcmp(), __builtin_strlen(), and __builtin_char_memchr() for constexpr char_traits<char8_t>?

* P0476R2 std::bit_cast()
This came up a month ago, where Richard Smith suggested __builtin_bit_cast(To, value) or __bit_cast<To>(value), preferring the former (for C friendliness). Was a final name chosen?

* P0528R3 Atomic Compare-And-Exchange With Padding Bits
We need compiler magic here, in some form. Billy O'Neal wrote to the C1XX team: "To implement the new atomic_ref as well as the change to compare the value representation of atomics only, the library needs a way to zero out the padding in arbitrary T, which we can't discover with library tech alone. We would like an intrinsic that accepts a trivially-copyable T and produces a copy with the padding zeroed, or takes a T* and zeros the padding inside that T, or similar."

* P0758R1 std::is_nothrow_convertible
This can be implemented without an intrinsic (std::is_nothrow_invocable_r already demands it; std::is_convertible plus noexcept plus library cleverness works), but an intrinsic is higher throughput (and simpler for third-party libraries that want to imitate the STL without using the STL for whatever reason). MSVC's spelling for the plain trait is __is_convertible_to(From, To); should the new trait be __is_nothrow_convertible_to(From, To) or __is_nothrow_convertible(From, To)?

* P1007R3 std::assume_aligned()
MSVC supports a general __assume() although I'm unsure if it's applicable/desirable here. Should there be a dedicated builtin?

* I think this list is complete but I might be missing some features.

Thanks,
STL
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Clang builtins for C++20 STL features

Oleg Smolsky via cfe-dev


On 2018-11-29 5:55 PM, Stephan T. Lavavej via cfe-dev wrote:

> Hi Clang devs,
>
> WG21 has voted in a bunch of C++20 STL features that need compiler support via builtins/intrinsics. As usual, MSVC's STL would like to use identically-named builtins for Clang, C1XX, and EDG, so I wanted to ask if you've chosen any names (and interfaces) yet. Also as usual, I have utterly no opinion about naming - any name that gets the compiler to do my work for me is amazing (as long as all compilers are consistent). :-)
>
> * P0595R2 std::is_constant_evaluated()
> Should this be __is_constant_evaluated() or __builtin_is_constant_evaluated() or something else?
>
> * P0482R6 char8_t
> Given std::is_constant_evaluated(), we might not need anything new here. Otherwise, should there be analogues of __builtin_memcmp(), __builtin_strlen(), and __builtin_char_memchr() for constexpr char_traits<char8_t>?
>
> * P0476R2 std::bit_cast()
> This came up a month ago, where Richard Smith suggested __builtin_bit_cast(To, value) or __bit_cast<To>(value), preferring the former (for C friendliness). Was a final name chosen?

Not formally, but I'm working on a patch that uses the
__builtin_bit_cast(To, value) spelling, so lets just go with that.

>
> * P0528R3 Atomic Compare-And-Exchange With Padding Bits
> We need compiler magic here, in some form. Billy O'Neal wrote to the C1XX team: "To implement the new atomic_ref as well as the change to compare the value representation of atomics only, the library needs a way to zero out the padding in arbitrary T, which we can't discover with library tech alone. We would like an intrinsic that accepts a trivially-copyable T and produces a copy with the padding zeroed, or takes a T* and zeros the padding inside that T, or similar."
>
> * P0758R1 std::is_nothrow_convertible
> This can be implemented without an intrinsic (std::is_nothrow_invocable_r already demands it; std::is_convertible plus noexcept plus library cleverness works), but an intrinsic is higher throughput (and simpler for third-party libraries that want to imitate the STL without using the STL for whatever reason). MSVC's spelling for the plain trait is __is_convertible_to(From, To); should the new trait be __is_nothrow_convertible_to(From, To) or __is_nothrow_convertible(From, To)?
>
> * P1007R3 std::assume_aligned()
> MSVC supports a general __assume() although I'm unsure if it's applicable/desirable here. Should there be a dedicated builtin?
>
> * I think this list is complete but I might be missing some features.
>
> Thanks,
> STL
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Clang builtins for C++20 STL features

Oleg Smolsky via cfe-dev
In reply to this post by Oleg Smolsky via cfe-dev
On Thu, 29 Nov 2018 at 17:55, Stephan T. Lavavej via cfe-dev <[hidden email]> wrote:
Hi Clang devs,

WG21 has voted in a bunch of C++20 STL features that need compiler support via builtins/intrinsics. As usual, MSVC's STL would like to use identically-named builtins for Clang, C1XX, and EDG, so I wanted to ask if you've chosen any names (and interfaces) yet. Also as usual, I have utterly no opinion about naming - any name that gets the compiler to do my work for me is amazing (as long as all compilers are consistent). :-)

* P0595R2 std::is_constant_evaluated()
Should this be __is_constant_evaluated() or __builtin_is_constant_evaluated() or something else?

For us (and I'd guess for GCC), __builtin_is_constant_evaluated() would be the most natural choice.

* P0482R6 char8_t
Given std::is_constant_evaluated(), we might not need anything new here. Otherwise, should there be analogues of __builtin_memcmp(), __builtin_strlen(), and __builtin_char_memchr() for constexpr char_traits<char8_t>?

These seem low-priority given std::is_constant_evaluated(), but I think it might still be nice to have the builtins even if you don't formally need them. Our __builtin_mem* and __builtin_str* are a lot faster to evaluate than the equivalent hand-rolled C++ code would be. (Clang has a __has_builtin builtin macro to allow these to be detected and used if available. Does MSVC have anything similar?)

Approximately following the naming convention of wcscmp etc, maybe __builtin_u8scmp, __builtin_u8slen, __builtin_u8scpy, ...? (Should we also add __builtin_u16s* and __builtin_u32s* while we're here?)

* P0476R2 std::bit_cast()
This came up a month ago, where Richard Smith suggested __builtin_bit_cast(To, value) or __bit_cast<To>(value), preferring the former (for C friendliness). Was a final name chosen?

(I need to keep reminding myself: this can't be __builtin_bit_cast(&dest, &src) because the To type might not be default-constructible.)

Unless someone wants to provide a counterargument, let's go with __builtin_bit_cast(To, value).

constexpr T __builtin_bit_cast(typename T, const U &src)
Effects: Bit-cast the value of src to type T. Ill-formed if T and U are of different sizes. Only guaranteed to be usable in constant expressions in the conditions specified for std::bit_cast.
 
* P0528R3 Atomic Compare-And-Exchange With Padding Bits
We need compiler magic here, in some form. Billy O'Neal wrote to the C1XX team: "To implement the new atomic_ref as well as the change to compare the value representation of atomics only, the library needs a way to zero out the padding in arbitrary T, which we can't discover with library tech alone. We would like an intrinsic that accepts a trivially-copyable T and produces a copy with the padding zeroed, or takes a T* and zeros the padding inside that T, or similar."

I think this should be done in-place in memory; producing a copy has the problem that you're passing around a value of type T, and that might permit the padding bits to become undefined again.

void __builtin_clear_padding(T *ptr)
Effects: Set to zero all bits that are padding bits in the representation of every value of type T.

Do we need to allow this to be called in constant expressions?
 
* P0758R1 std::is_nothrow_convertible
This can be implemented without an intrinsic (std::is_nothrow_invocable_r already demands it; std::is_convertible plus noexcept plus library cleverness works), but an intrinsic is higher throughput (and simpler for third-party libraries that want to imitate the STL without using the STL for whatever reason). MSVC's spelling for the plain trait is __is_convertible_to(From, To); should the new trait be __is_nothrow_convertible_to(From, To) or __is_nothrow_convertible(From, To)?

I would generally prefer that we expose traits that exactly match the library requirements with the same name as the library trait with a leading dunder, with an argument list matching the library trait. So:
Add __is_convertible(From, To) and __is_nothrow_convertible(From, To)
Make __is_convertible_to a (deprecated) synonym for __is_convertible.

* P1007R3 std::assume_aligned()
MSVC supports a general __assume() although I'm unsure if it's applicable/desirable here. Should there be a dedicated builtin?

Well, Clang already supports __assume() for MSVC compatibility (only in MS mode) and __builtin_assume() (our preferred spelling, available in general). But a general assume intrinsic is probably not the best choice here.

Clang and GCC also already have:

void *__builtin_assume_aligned(const void *p, size_t align, size_t offset_from_aligned = 0)

That's not ideal because it's not const-correct.

Even for library wording, the spec is weak on specifying when a call to std::assume_aligned is allowed in constant expressions. For Clang at least, we treat a call to __builtin_assume_aligned as non-constant if we cannot prove the object is suitably aligned (that is, if the complete object's alignment and the offset within it don't result in a suitable alignment) even if we happen to know that (for instance) a global variable will typically be aligned to 16 bytes in practice.
 
* I think this list is complete but I might be missing some features.

How about:
* invocation type traits from Library Fundamentals V1 
* source_location from Library Fundamentals V2

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Clang builtins for C++20 STL features

Oleg Smolsky via cfe-dev

[Erik Pilkington]

> Not formally, but I'm working on a patch that uses the __builtin_bit_cast(To, value) spelling, so lets just go with that.

 

[Richard Smith]

> (I need to keep reminding myself: this can't be __builtin_bit_cast(&dest, &src) because the To type might not be default-constructible.)

> Unless someone wants to provide a counterargument, let's go with __builtin_bit_cast(To, value).

> constexpr T __builtin_bit_cast(typename T, const U &src)

> Effects: Bit-cast the value of src to type T. Ill-formed if T and U are of different sizes. Only guaranteed to be usable in constant expressions in the conditions specified for std::bit_cast.

 

Great, thanks!

 

> For us (and I'd guess for GCC), __builtin_is_constant_evaluated() would be the most natural choice.

 

Got it.

 

> These seem low-priority given std::is_constant_evaluated(), but I think it might still be nice to have the builtins even if you don't formally need them. Our __builtin_mem* and __builtin_str* are a lot faster to evaluate than the equivalent hand-rolled C++ code would be.

 

Ah, throughput is an excellent reason. (I can already imagine generated code stressing constexpr char_traits<char8_t> with enormous strings.)

 

> (Clang has a __has_builtin builtin macro to allow these to be detected and used if available. Does MSVC have anything similar?)

 

Not at the moment.

 

> Approximately following the naming convention of wcscmp etc, maybe __builtin_u8scmp, __builtin_u8slen, __builtin_u8scpy, ...?

 

Nice and systematic.

 

> (Should we also add __builtin_u16s* and __builtin_u32s* while we're here?)

 

That's https://bugs.llvm.org/show_bug.cgi?id=35165 "Consider providing string builtins for char16_t" which I filed last year.

 

> I think this should be done in-place in memory; producing a copy has the problem that you're passing around a value of type T, and that might permit the padding bits to become undefined again.

> void __builtin_clear_padding(T *ptr)

> Effects: Set to zero all bits that are padding bits in the representation of every value of type T.

 

Great, I agree with the rationale.

 

> Do we need to allow this to be called in constant expressions?

 

No, because atomic isn't constexpr.

 

> I would generally prefer that we expose traits that exactly match the library requirements with the same name as the library trait with a leading dunder, with an argument list matching the library trait. So:

> Add __is_convertible(From, To) and __is_nothrow_convertible(From, To)

> Make __is_convertible_to a (deprecated) synonym for __is_convertible.

 

Sounds good (although switching intrinsics is a headache due to getting all compilers updated, we can do it).

 

> Well, Clang already supports __assume() for MSVC compatibility (only in MS mode) and __builtin_assume() (our preferred spelling, available in general). But a general assume intrinsic is probably not the best choice here.

> Clang and GCC also already have:

> void *__builtin_assume_aligned(const void *p, size_t align, size_t offset_from_aligned = 0)

> See https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

> That's not ideal because it's not const-correct.

> Even for library wording, the spec is weak on specifying when a call to std::assume_aligned is allowed in constant expressions.

> For Clang at least, we treat a call to __builtin_assume_aligned as non-constant if we cannot prove the object is suitably aligned

> (that is, if the complete object's alignment and the offset within it don't result in a suitable alignment) even if we happen to know that (for instance) a global variable will typically be aligned to 16 bytes in practice.

 

I'll mark this as "more design needed", then.

 

> How about:

> * invocation type traits from Library Fundamentals V1

> * source_location from Library Fundamentals V2

 

Ah, good catches. We aren't implementing LibFun at the moment so I don't need intrinsics for these (C++23 at the earliest). If Clang chooses anything, we'll follow that precedent.

 

I'll send these names/interfaces over to the C1XX team (and I'll ask them to comment here if anything is truly unacceptable, although it all sounds good to me).

 

Thanks!

STL

 

From: Richard Smith <[hidden email]>
Sent: Thursday, November 29, 2018 6:53 PM
To: Stephan T. Lavavej <[hidden email]>
Cc: Clang Dev <[hidden email]>
Subject: Re: [cfe-dev] Clang builtins for C++20 STL features

 

On Thu, 29 Nov 2018 at 17:55, Stephan T. Lavavej via cfe-dev <[hidden email]> wrote:

Hi Clang devs,

WG21 has voted in a bunch of C++20 STL features that need compiler support via builtins/intrinsics. As usual, MSVC's STL would like to use identically-named builtins for Clang, C1XX, and EDG, so I wanted to ask if you've chosen any names (and interfaces) yet. Also as usual, I have utterly no opinion about naming - any name that gets the compiler to do my work for me is amazing (as long as all compilers are consistent). :-)

* P0595R2 std::is_constant_evaluated()
Should this be __is_constant_evaluated() or __builtin_is_constant_evaluated() or something else?

 

For us (and I'd guess for GCC), __builtin_is_constant_evaluated() would be the most natural choice.

 

* P0482R6 char8_t
Given std::is_constant_evaluated(), we might not need anything new here. Otherwise, should there be analogues of __builtin_memcmp(), __builtin_strlen(), and __builtin_char_memchr() for constexpr char_traits<char8_t>?

 

These seem low-priority given std::is_constant_evaluated(), but I think it might still be nice to have the builtins even if you don't formally need them. Our __builtin_mem* and __builtin_str* are a lot faster to evaluate than the equivalent hand-rolled C++ code would be. (Clang has a __has_builtin builtin macro to allow these to be detected and used if available. Does MSVC have anything similar?)

 

Approximately following the naming convention of wcscmp etc, maybe __builtin_u8scmp, __builtin_u8slen, __builtin_u8scpy, ...? (Should we also add __builtin_u16s* and __builtin_u32s* while we're here?)

 

* P0476R2 std::bit_cast()
This came up a month ago, where Richard Smith suggested __builtin_bit_cast(To, value) or __bit_cast<To>(value), preferring the former (for C friendliness). Was a final name chosen?

 

(I need to keep reminding myself: this can't be __builtin_bit_cast(&dest, &src) because the To type might not be default-constructible.)

 

Unless someone wants to provide a counterargument, let's go with __builtin_bit_cast(To, value).

 

constexpr T __builtin_bit_cast(typename T, const U &src)

Effects: Bit-cast the value of src to type T. Ill-formed if T and U are of different sizes. Only guaranteed to be usable in constant expressions in the conditions specified for std::bit_cast.

 

* P0528R3 Atomic Compare-And-Exchange With Padding Bits
We need compiler magic here, in some form. Billy O'Neal wrote to the C1XX team: "To implement the new atomic_ref as well as the change to compare the value representation of atomics only, the library needs a way to zero out the padding in arbitrary T, which we can't discover with library tech alone. We would like an intrinsic that accepts a trivially-copyable T and produces a copy with the padding zeroed, or takes a T* and zeros the padding inside that T, or similar."

 

I think this should be done in-place in memory; producing a copy has the problem that you're passing around a value of type T, and that might permit the padding bits to become undefined again.

 

void __builtin_clear_padding(T *ptr)

Effects: Set to zero all bits that are padding bits in the representation of every value of type T.

 

Do we need to allow this to be called in constant expressions?

 

* P0758R1 std::is_nothrow_convertible
This can be implemented without an intrinsic (std::is_nothrow_invocable_r already demands it; std::is_convertible plus noexcept plus library cleverness works), but an intrinsic is higher throughput (and simpler for third-party libraries that want to imitate the STL without using the STL for whatever reason). MSVC's spelling for the plain trait is __is_convertible_to(From, To); should the new trait be __is_nothrow_convertible_to(From, To) or __is_nothrow_convertible(From, To)?

 

I would generally prefer that we expose traits that exactly match the library requirements with the same name as the library trait with a leading dunder, with an argument list matching the library trait. So:

Add __is_convertible(From, To) and __is_nothrow_convertible(From, To)

Make __is_convertible_to a (deprecated) synonym for __is_convertible.

 

* P1007R3 std::assume_aligned()
MSVC supports a general __assume() although I'm unsure if it's applicable/desirable here. Should there be a dedicated builtin?

 

Well, Clang already supports __assume() for MSVC compatibility (only in MS mode) and __builtin_assume() (our preferred spelling, available in general). But a general assume intrinsic is probably not the best choice here.

 

Clang and GCC also already have:

 

void *__builtin_assume_aligned(const void *p, size_t align, size_t offset_from_aligned = 0)

 

That's not ideal because it's not const-correct.

 

Even for library wording, the spec is weak on specifying when a call to std::assume_aligned is allowed in constant expressions. For Clang at least, we treat a call to __builtin_assume_aligned as non-constant if we cannot prove the object is suitably aligned (that is, if the complete object's alignment and the offset within it don't result in a suitable alignment) even if we happen to know that (for instance) a global variable will typically be aligned to 16 bytes in practice.

 

* I think this list is complete but I might be missing some features.

 

How about:

* invocation type traits from Library Fundamentals V1 

* source_location from Library Fundamentals V2


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Clang builtins for C++20 STL features

Oleg Smolsky via cfe-dev

> Approximately following the naming convention of wcscmp etc, maybe __builtin_u8scmp, __builtin_u8slen, __builtin_u8scpy, ...?

 

Actually, memcmp and memchr are non-null-terminated while strlen is null-terminated, so the names should reflect that. Existing names:

 

__builtin_memcmp()

__builtin_strlen()

__builtin_char_memchr()

 

__builtin_wmemcmp()

__builtin_wcslen()

__builtin_wmemchr()

 

How about the following for char8_t, plus char16_t/char32_t if those are implemented? (This differs from my suggestion in LLVM#35165.)

 

__builtin_u8memcmp()

__builtin_u8strlen()

__builtin_u8memchr()

 

__builtin_u16memcmp()

__builtin_u16strlen()

__builtin_u16memchr()

 

__builtin_u32memcmp()

__builtin_u32strlen()

__builtin_u32memchr()

 

Thanks,

STL

 

From: Stephan T. Lavavej
Sent: Thursday, November 29, 2018 7:22 PM
To: 'Richard Smith' <[hidden email]>; 'Erik Pilkington' <[hidden email]>
Cc: Clang Dev <[hidden email]>
Subject: RE: [cfe-dev] Clang builtins for C++20 STL features

 

[Erik Pilkington]

> Not formally, but I'm working on a patch that uses the __builtin_bit_cast(To, value) spelling, so lets just go with that.

 

[Richard Smith]

> (I need to keep reminding myself: this can't be __builtin_bit_cast(&dest, &src) because the To type might not be default-constructible.)

> Unless someone wants to provide a counterargument, let's go with __builtin_bit_cast(To, value).

> constexpr T __builtin_bit_cast(typename T, const U &src)

> Effects: Bit-cast the value of src to type T. Ill-formed if T and U are of different sizes. Only guaranteed to be usable in constant expressions in the conditions specified for std::bit_cast.

 

Great, thanks!

 

> For us (and I'd guess for GCC), __builtin_is_constant_evaluated() would be the most natural choice.

 

Got it.

 

> These seem low-priority given std::is_constant_evaluated(), but I think it might still be nice to have the builtins even if you don't formally need them. Our __builtin_mem* and __builtin_str* are a lot faster to evaluate than the equivalent hand-rolled C++ code would be.

 

Ah, throughput is an excellent reason. (I can already imagine generated code stressing constexpr char_traits<char8_t> with enormous strings.)

 

> (Clang has a __has_builtin builtin macro to allow these to be detected and used if available. Does MSVC have anything similar?)

 

Not at the moment.

 

> Approximately following the naming convention of wcscmp etc, maybe __builtin_u8scmp, __builtin_u8slen, __builtin_u8scpy, ...?

 

Nice and systematic.

 

> (Should we also add __builtin_u16s* and __builtin_u32s* while we're here?)

 

That's https://bugs.llvm.org/show_bug.cgi?id=35165 "Consider providing string builtins for char16_t" which I filed last year.

 

> I think this should be done in-place in memory; producing a copy has the problem that you're passing around a value of type T, and that might permit the padding bits to become undefined again.

> void __builtin_clear_padding(T *ptr)

> Effects: Set to zero all bits that are padding bits in the representation of every value of type T.

 

Great, I agree with the rationale.

 

> Do we need to allow this to be called in constant expressions?

 

No, because atomic isn't constexpr.

 

> I would generally prefer that we expose traits that exactly match the library requirements with the same name as the library trait with a leading dunder, with an argument list matching the library trait. So:

> Add __is_convertible(From, To) and __is_nothrow_convertible(From, To)

> Make __is_convertible_to a (deprecated) synonym for __is_convertible.

 

Sounds good (although switching intrinsics is a headache due to getting all compilers updated, we can do it).

 

> Well, Clang already supports __assume() for MSVC compatibility (only in MS mode) and __builtin_assume() (our preferred spelling, available in general). But a general assume intrinsic is probably not the best choice here.

> Clang and GCC also already have:

> void *__builtin_assume_aligned(const void *p, size_t align, size_t offset_from_aligned = 0)

> See https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

> That's not ideal because it's not const-correct.

> Even for library wording, the spec is weak on specifying when a call to std::assume_aligned is allowed in constant expressions.

> For Clang at least, we treat a call to __builtin_assume_aligned as non-constant if we cannot prove the object is suitably aligned

> (that is, if the complete object's alignment and the offset within it don't result in a suitable alignment) even if we happen to know that (for instance) a global variable will typically be aligned to 16 bytes in practice.

 

I'll mark this as "more design needed", then.

 

> How about:

> * invocation type traits from Library Fundamentals V1

> * source_location from Library Fundamentals V2

 

Ah, good catches. We aren't implementing LibFun at the moment so I don't need intrinsics for these (C++23 at the earliest). If Clang chooses anything, we'll follow that precedent.

 

I'll send these names/interfaces over to the C1XX team (and I'll ask them to comment here if anything is truly unacceptable, although it all sounds good to me).

 

Thanks!

STL

 

From: Richard Smith <[hidden email]>
Sent: Thursday, November 29, 2018 6:53 PM
To: Stephan T. Lavavej <[hidden email]>
Cc: Clang Dev <[hidden email]>
Subject: Re: [cfe-dev] Clang builtins for C++20 STL features

 

On Thu, 29 Nov 2018 at 17:55, Stephan T. Lavavej via cfe-dev <[hidden email]> wrote:

Hi Clang devs,

WG21 has voted in a bunch of C++20 STL features that need compiler support via builtins/intrinsics. As usual, MSVC's STL would like to use identically-named builtins for Clang, C1XX, and EDG, so I wanted to ask if you've chosen any names (and interfaces) yet. Also as usual, I have utterly no opinion about naming - any name that gets the compiler to do my work for me is amazing (as long as all compilers are consistent). :-)

* P0595R2 std::is_constant_evaluated()
Should this be __is_constant_evaluated() or __builtin_is_constant_evaluated() or something else?

 

For us (and I'd guess for GCC), __builtin_is_constant_evaluated() would be the most natural choice.

 

* P0482R6 char8_t
Given std::is_constant_evaluated(), we might not need anything new here. Otherwise, should there be analogues of __builtin_memcmp(), __builtin_strlen(), and __builtin_char_memchr() for constexpr char_traits<char8_t>?

 

These seem low-priority given std::is_constant_evaluated(), but I think it might still be nice to have the builtins even if you don't formally need them. Our __builtin_mem* and __builtin_str* are a lot faster to evaluate than the equivalent hand-rolled C++ code would be. (Clang has a __has_builtin builtin macro to allow these to be detected and used if available. Does MSVC have anything similar?)

 

Approximately following the naming convention of wcscmp etc, maybe __builtin_u8scmp, __builtin_u8slen, __builtin_u8scpy, ...? (Should we also add __builtin_u16s* and __builtin_u32s* while we're here?)

 

* P0476R2 std::bit_cast()
This came up a month ago, where Richard Smith suggested __builtin_bit_cast(To, value) or __bit_cast<To>(value), preferring the former (for C friendliness). Was a final name chosen?

 

(I need to keep reminding myself: this can't be __builtin_bit_cast(&dest, &src) because the To type might not be default-constructible.)

 

Unless someone wants to provide a counterargument, let's go with __builtin_bit_cast(To, value).

 

constexpr T __builtin_bit_cast(typename T, const U &src)

Effects: Bit-cast the value of src to type T. Ill-formed if T and U are of different sizes. Only guaranteed to be usable in constant expressions in the conditions specified for std::bit_cast.

 

* P0528R3 Atomic Compare-And-Exchange With Padding Bits
We need compiler magic here, in some form. Billy O'Neal wrote to the C1XX team: "To implement the new atomic_ref as well as the change to compare the value representation of atomics only, the library needs a way to zero out the padding in arbitrary T, which we can't discover with library tech alone. We would like an intrinsic that accepts a trivially-copyable T and produces a copy with the padding zeroed, or takes a T* and zeros the padding inside that T, or similar."

 

I think this should be done in-place in memory; producing a copy has the problem that you're passing around a value of type T, and that might permit the padding bits to become undefined again.

 

void __builtin_clear_padding(T *ptr)

Effects: Set to zero all bits that are padding bits in the representation of every value of type T.

 

Do we need to allow this to be called in constant expressions?

 

* P0758R1 std::is_nothrow_convertible
This can be implemented without an intrinsic (std::is_nothrow_invocable_r already demands it; std::is_convertible plus noexcept plus library cleverness works), but an intrinsic is higher throughput (and simpler for third-party libraries that want to imitate the STL without using the STL for whatever reason). MSVC's spelling for the plain trait is __is_convertible_to(From, To); should the new trait be __is_nothrow_convertible_to(From, To) or __is_nothrow_convertible(From, To)?

 

I would generally prefer that we expose traits that exactly match the library requirements with the same name as the library trait with a leading dunder, with an argument list matching the library trait. So:

Add __is_convertible(From, To) and __is_nothrow_convertible(From, To)

Make __is_convertible_to a (deprecated) synonym for __is_convertible.

 

* P1007R3 std::assume_aligned()
MSVC supports a general __assume() although I'm unsure if it's applicable/desirable here. Should there be a dedicated builtin?

 

Well, Clang already supports __assume() for MSVC compatibility (only in MS mode) and __builtin_assume() (our preferred spelling, available in general). But a general assume intrinsic is probably not the best choice here.

 

Clang and GCC also already have:

 

void *__builtin_assume_aligned(const void *p, size_t align, size_t offset_from_aligned = 0)

 

That's not ideal because it's not const-correct.

 

Even for library wording, the spec is weak on specifying when a call to std::assume_aligned is allowed in constant expressions. For Clang at least, we treat a call to __builtin_assume_aligned as non-constant if we cannot prove the object is suitably aligned (that is, if the complete object's alignment and the offset within it don't result in a suitable alignment) even if we happen to know that (for instance) a global variable will typically be aligned to 16 bytes in practice.

 

* I think this list is complete but I might be missing some features.

 

How about:

* invocation type traits from Library Fundamentals V1 

* source_location from Library Fundamentals V2


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Clang builtins for C++20 STL features

Oleg Smolsky via cfe-dev
In reply to this post by Oleg Smolsky via cfe-dev
On Fri, Nov 30, 2018 at 5:53 AM Richard Smith via cfe-dev
<[hidden email]> wrote:
>
> On Thu, 29 Nov 2018 at 17:55, Stephan T. Lavavej via cfe-dev <[hidden email]> wrote:
>> * P1007R3 std::assume_aligned()
>> MSVC supports a general __assume() although I'm unsure if it's applicable/desirable here. Should there be a dedicated builtin?
>
>
> Well, Clang already supports __assume() for MSVC compatibility (only in MS mode) and __builtin_assume() (our preferred spelling, available in general). But a general assume intrinsic is probably not the best choice here.
Whatever it ends up being, *please* *don't* just use generic
__builtin_assume(), but a special standalone builtin.
It's better for sanitization reasons.

> Clang and GCC also already have:
>
> void *__builtin_assume_aligned(const void *p, size_t align, size_t offset_from_aligned = 0)
> See https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
>
> That's not ideal because it's not const-correct.
>
> Even for library wording, the spec is weak on specifying when a call to std::assume_aligned is allowed in constant expressions. For Clang at least, we treat a call to __builtin_assume_aligned as non-constant if we cannot prove the object is suitably aligned (that is, if the complete object's alignment and the offset within it don't result in a suitable alignment) even if we happen to know that (for instance) a global variable will typically be aligned to 16 bytes in practice.


Roman.

> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Clang builtins for C++20 STL features

Oleg Smolsky via cfe-dev
In reply to this post by Oleg Smolsky via cfe-dev


On Thu, Nov 29, 2018 at 9:53 PM Richard Smith via cfe-dev <[hidden email]> wrote:
On Thu, 29 Nov 2018 at 17:55, Stephan T. Lavavej via cfe-dev <[hidden email]> wrote:
Hi Clang devs,

WG21 has voted in a bunch of C++20 STL features that need compiler support via builtins/intrinsics. As usual, MSVC's STL would like to use identically-named builtins for Clang, C1XX, and EDG, so I wanted to ask if you've chosen any names (and interfaces) yet. Also as usual, I have utterly no opinion about naming - any name that gets the compiler to do my work for me is amazing (as long as all compilers are consistent). :-)

* P0595R2 std::is_constant_evaluated()
Should this be __is_constant_evaluated() or __builtin_is_constant_evaluated() or something else?

For us (and I'd guess for GCC), __builtin_is_constant_evaluated() would be the most natural choice.

* P0482R6 char8_t
Given std::is_constant_evaluated(), we might not need anything new here. Otherwise, should there be analogues of __builtin_memcmp(), __builtin_strlen(), and __builtin_char_memchr() for constexpr char_traits<char8_t>?

These seem low-priority given std::is_constant_evaluated(), but I think it might still be nice to have the builtins even if you don't formally need them. Our __builtin_mem* and __builtin_str* are a lot faster to evaluate than the equivalent hand-rolled C++ code would be. (Clang has a __has_builtin builtin macro to allow these to be detected and used if available. Does MSVC have anything similar?)

Approximately following the naming convention of wcscmp etc, maybe __builtin_u8scmp, __builtin_u8slen, __builtin_u8scpy, ...? (Should we also add __builtin_u16s* and __builtin_u32s* while we're here?)

* P0476R2 std::bit_cast()
This came up a month ago, where Richard Smith suggested __builtin_bit_cast(To, value) or __bit_cast<To>(value), preferring the former (for C friendliness). Was a final name chosen?

(I need to keep reminding myself: this can't be __builtin_bit_cast(&dest, &src) because the To type might not be default-constructible.)

Unless someone wants to provide a counterargument, let's go with __builtin_bit_cast(To, value).

constexpr T __builtin_bit_cast(typename T, const U &src)
Effects: Bit-cast the value of src to type T. Ill-formed if T and U are of different sizes. Only guaranteed to be usable in constant expressions in the conditions specified for std::bit_cast.
 
* P0528R3 Atomic Compare-And-Exchange With Padding Bits
We need compiler magic here, in some form. Billy O'Neal wrote to the C1XX team: "To implement the new atomic_ref as well as the change to compare the value representation of atomics only, the library needs a way to zero out the padding in arbitrary T, which we can't discover with library tech alone. We would like an intrinsic that accepts a trivially-copyable T and produces a copy with the padding zeroed, or takes a T* and zeros the padding inside that T, or similar."

I think this should be done in-place in memory; producing a copy has the problem that you're passing around a value of type T, and that might permit the padding bits to become undefined again.

void __builtin_clear_padding(T *ptr)
Effects: Set to zero all bits that are padding bits in the representation of every value of type T.

Do we need to allow this to be called in constant expressions?
 
* P0758R1 std::is_nothrow_convertible
This can be implemented without an intrinsic (std::is_nothrow_invocable_r already demands it; std::is_convertible plus noexcept plus library cleverness works), but an intrinsic is higher throughput (and simpler for third-party libraries that want to imitate the STL without using the STL for whatever reason). MSVC's spelling for the plain trait is __is_convertible_to(From, To); should the new trait be __is_nothrow_convertible_to(From, To) or __is_nothrow_convertible(From, To)?

I would generally prefer that we expose traits that exactly match the library requirements with the same name as the library trait with a leading dunder, with an argument list matching the library trait. So:
Add __is_convertible(From, To) and __is_nothrow_convertible(From, To)
Make __is_convertible_to a (deprecated) synonym for __is_convertible.

* P1007R3 std::assume_aligned()
MSVC supports a general __assume() although I'm unsure if it's applicable/desirable here. Should there be a dedicated builtin?

Well, Clang already supports __assume() for MSVC compatibility (only in MS mode) and __builtin_assume() (our preferred spelling, available in general). But a general assume intrinsic is probably not the best choice here.

Clang and GCC also already have:

void *__builtin_assume_aligned(const void *p, size_t align, size_t offset_from_aligned = 0)

That's not ideal because it's not const-correct.

Even for library wording, the spec is weak on specifying when a call to std::assume_aligned is allowed in constant expressions. For Clang at least, we treat a call to __builtin_assume_aligned as non-constant if we cannot prove the object is suitably aligned (that is, if the complete object's alignment and the offset within it don't result in a suitable alignment) even if we happen to know that (for instance) a global variable will typically be aligned to 16 bytes in practice.

Could `std::assume_aligned` be correctly implemented on top of `__attribute__((assume_aligned(N)))` instead? This solves the constexpr and const-correctness problems, and it doesn't depend on `std::assume_aligned` getting inlined to work.
 
 
* I think this list is complete but I might be missing some features.

How about:
* invocation type traits from Library Fundamentals V1 
* source_location from Library Fundamentals V2
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Clang builtins for C++20 STL features

Oleg Smolsky via cfe-dev
In reply to this post by Oleg Smolsky via cfe-dev
On Thu, Nov 29, 2018, 9:53 PM Richard Smith via cfe-dev <[hidden email] wrote:
* P0528R3 Atomic Compare-And-Exchange With Padding Bits
We need compiler magic here, in some form. Billy O'Neal wrote to the C1XX team: "To implement the new atomic_ref as well as the change to compare the value representation of atomics only, the library needs a way to zero out the padding in arbitrary T, which we can't discover with library tech alone. We would like an intrinsic that accepts a trivially-copyable T and produces a copy with the padding zeroed, or takes a T* and zeros the padding inside that T, or similar."

I think this should be done in-place in memory; producing a copy has the problem that you're passing around a value of type T, and that might permit the padding bits to become undefined again.

void __builtin_clear_padding(T *ptr)
Effects: Set to zero all bits that are padding bits in the representation of every value of type T.

Is the intent here that every value stored into a std::atomic (e.g. via store, exchange, or compare_exchange) would be passed through __builtin_clear_padding first, before being stored into the atomic object? And presumably the same would need to occur implicitly for C-style `_Atomic T`?


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Clang builtins for C++20 STL features

Oleg Smolsky via cfe-dev
On Fri, Nov 30, 2018 at 12:35 PM James Y Knight via cfe-dev <[hidden email]> wrote:
On Thu, Nov 29, 2018, 9:53 PM Richard Smith via cfe-dev <[hidden email] wrote:
* P0528R3 Atomic Compare-And-Exchange With Padding Bits
We need compiler magic here, in some form. Billy O'Neal wrote to the C1XX team: "To implement the new atomic_ref as well as the change to compare the value representation of atomics only, the library needs a way to zero out the padding in arbitrary T, which we can't discover with library tech alone. We would like an intrinsic that accepts a trivially-copyable T and produces a copy with the padding zeroed, or takes a T* and zeros the padding inside that T, or similar."

I think this should be done in-place in memory; producing a copy has the problem that you're passing around a value of type T, and that might permit the padding bits to become undefined again.

void __builtin_clear_padding(T *ptr)
Effects: Set to zero all bits that are padding bits in the representation of every value of type T.

Is the intent here that every value stored into a std::atomic (e.g. via store, exchange, or compare_exchange) would be passed through __builtin_clear_padding first, before being stored into the atomic object? And presumably the same would need to occur implicitly for C-style `_Atomic T`?

Yes; see P0582R3 "The Curious Case of Padding Bits" (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0528r3.html) for details. The intent is to make compare-exchange work for types with padding bits by zeroing those padding bits before storing a value in an atomic or comparing with the value stored in an atomic.

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Clang builtins for C++20 STL features

Oleg Smolsky via cfe-dev


On Nov 30, 2018, at 12:39 PM, Casey Carter via cfe-dev <[hidden email]> wrote:

On Fri, Nov 30, 2018 at 12:35 PM James Y Knight via cfe-dev <[hidden email]> wrote:
On Thu, Nov 29, 2018, 9:53 PM Richard Smith via cfe-dev <[hidden email] wrote:
* P0528R3 Atomic Compare-And-Exchange With Padding Bits
We need compiler magic here, in some form. Billy O'Neal wrote to the C1XX team: "To implement the new atomic_ref as well as the change to compare the value representation of atomics only, the library needs a way to zero out the padding in arbitrary T, which we can't discover with library tech alone. We would like an intrinsic that accepts a trivially-copyable T and produces a copy with the padding zeroed, or takes a T* and zeros the padding inside that T, or similar."

I think this should be done in-place in memory; producing a copy has the problem that you're passing around a value of type T, and that might permit the padding bits to become undefined again.

void __builtin_clear_padding(T *ptr)
Effects: Set to zero all bits that are padding bits in the representation of every value of type T.

Is the intent here that every value stored into a std::atomic (e.g. via store, exchange, or compare_exchange) would be passed through __builtin_clear_padding first, before being stored into the atomic object? And presumably the same would need to occur implicitly for C-style `_Atomic T`?

Yes; see P0582R3 "The Curious Case of Padding Bits" (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0528r3.html) for details. The intent is to make compare-exchange work for types with padding bits by zeroing those padding bits before storing a value in an atomic or comparing with the value stored in an atomic.

I think you rather want to zero out padding bits in the atomic, not in the value, and you want to do it after storing into the atomic (unless you’re doing zeroing, and then copying the non-padding elements one at a time, in which case you could just memset instead of having a builtin).



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Clang builtins for C++20 STL features

Oleg Smolsky via cfe-dev


On Fri, Nov 30, 2018 at 4:06 PM JF Bastien <[hidden email]> wrote:


On Nov 30, 2018, at 12:39 PM, Casey Carter via cfe-dev <[hidden email]> wrote:

On Fri, Nov 30, 2018 at 12:35 PM James Y Knight via cfe-dev <[hidden email]> wrote:
On Thu, Nov 29, 2018, 9:53 PM Richard Smith via cfe-dev <[hidden email] wrote:
* P0528R3 Atomic Compare-And-Exchange With Padding Bits
We need compiler magic here, in some form. Billy O'Neal wrote to the C1XX team: "To implement the new atomic_ref as well as the change to compare the value representation of atomics only, the library needs a way to zero out the padding in arbitrary T, which we can't discover with library tech alone. We would like an intrinsic that accepts a trivially-copyable T and produces a copy with the padding zeroed, or takes a T* and zeros the padding inside that T, or similar."

I think this should be done in-place in memory; producing a copy has the problem that you're passing around a value of type T, and that might permit the padding bits to become undefined again.

void __builtin_clear_padding(T *ptr)
Effects: Set to zero all bits that are padding bits in the representation of every value of type T.

Is the intent here that every value stored into a std::atomic (e.g. via store, exchange, or compare_exchange) would be passed through __builtin_clear_padding first, before being stored into the atomic object? And presumably the same would need to occur implicitly for C-style `_Atomic T`?

Yes; see P0582R3 "The Curious Case of Padding Bits" (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0528r3.html) for details. The intent is to make compare-exchange work for types with padding bits by zeroing those padding bits before storing a value in an atomic or comparing with the value stored in an atomic.

I think you rather want to zero out padding bits in the atomic, not in the value, and you want to do it after storing into the atomic (unless you’re doing zeroing, and then copying the non-padding elements one at a time, in which case you could just memset instead of having a builtin).

Errr...certainly you can't do additional stores into an atomic after storing a new value into it, that would kinda ruin the "atomic" part.

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Clang builtins for C++20 STL features

Oleg Smolsky via cfe-dev


On Nov 30, 2018, at 1:10 PM, James Y Knight <[hidden email]> wrote:



On Fri, Nov 30, 2018 at 4:06 PM JF Bastien <[hidden email]> wrote:


On Nov 30, 2018, at 12:39 PM, Casey Carter via cfe-dev <[hidden email]> wrote:

On Fri, Nov 30, 2018 at 12:35 PM James Y Knight via cfe-dev <[hidden email]> wrote:
On Thu, Nov 29, 2018, 9:53 PM Richard Smith via cfe-dev <[hidden email] wrote:
* P0528R3 Atomic Compare-And-Exchange With Padding Bits
We need compiler magic here, in some form. Billy O'Neal wrote to the C1XX team: "To implement the new atomic_ref as well as the change to compare the value representation of atomics only, the library needs a way to zero out the padding in arbitrary T, which we can't discover with library tech alone. We would like an intrinsic that accepts a trivially-copyable T and produces a copy with the padding zeroed, or takes a T* and zeros the padding inside that T, or similar."

I think this should be done in-place in memory; producing a copy has the problem that you're passing around a value of type T, and that might permit the padding bits to become undefined again.

void __builtin_clear_padding(T *ptr)
Effects: Set to zero all bits that are padding bits in the representation of every value of type T.

Is the intent here that every value stored into a std::atomic (e.g. via store, exchange, or compare_exchange) would be passed through __builtin_clear_padding first, before being stored into the atomic object? And presumably the same would need to occur implicitly for C-style `_Atomic T`?

Yes; see P0582R3 "The Curious Case of Padding Bits" (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0528r3.html) for details. The intent is to make compare-exchange work for types with padding bits by zeroing those padding bits before storing a value in an atomic or comparing with the value stored in an atomic.

I think you rather want to zero out padding bits in the atomic, not in the value, and you want to do it after storing into the atomic (unless you’re doing zeroing, and then copying the non-padding elements one at a time, in which case you could just memset instead of having a builtin).

Errr...certainly you can't do additional stores into an atomic after storing a new value into it, that would kinda ruin the "atomic" part.

Haha yeah you’re totally right! Ignore me, no idea what I was thinking.

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Clang builtins for C++20 STL features

Oleg Smolsky via cfe-dev
After considering this a bit, I'm not sure the __builtin_clear_padding proposal is workable. Worse, I'm not sure that the standards proposal itself is workable, either.

1. Regarding general implementability of P0528:

Given a call to
  bool atomic_compare_exchange_strong(std::atomic<T>* obj, T* expected, T desired)
we need to first, compare OBJ to EXPECTED -- ignoring padding. Then:
a. if they're equal, store DESIRED into the bytes pointed to by OBJ.
b. if they're not equal, copy the bytes pointed to by OBJ into the bytes pointed to by EXPECTED.

The problem is: how to do a comparison ignoring padding? We cannot reasonably modify the comparison operation itself (it's either a hardware instruction, or else a library routine taking only a size). The obvious solution is to clear the padding on both sides of the comparison (thus __builtin_clear_padding).

We can clear the padding in EXPECTED, although we'll need to make a copy first, as we're not allowed to store into it unless the compare-exchange fails. But, we cannot modify OBJ except as part of the "exchange", when the comparison has already succeeded. 

If we can ensure that the existing value in OBJ has always had its padding cleared, we're okay. We can be fully in-control of all stores that occur into a std::atomic, and, I believe that's true for C11 _Atomic types as well. (it's unclear to me whether using memcpy to non-atomically overwrite a C11 "_Atomic struct Foo" is allowable, but I'll assume that it is not.)

However, it seems infeasible to make this guarantee for std::atomic_ref, as that can be created pointing to any existing object. Said existing object may have its padding in a non-zero state. Of course, <http://wg21.link/P0528r3> doesn't mention atomic_ref, but the atomic_ref paper, <http://wg21.link/P0019r8> mentions this concern, and simply states "We require that the resolution of padding bits follow [P0528r2]."

It's also infeasible to ensure for the GCC __atomic_compare_exchange builtin, as it also operates on arbitrary existing objects. (Of course, this is irrelevant to the C++ standard.)

I believe a workable alternative would be to generate an extra load of OBJ into a temporary memory, copy only the non-padding bytes of EXPECTED on top of that, and then do the compare_exchange operation of OBJ with the temporary. Finally, copy the temporary back into EXPECTED upon a fail result. While that seems like it could work, requiring an extra load before compare-exchange operations is rather poor.

Which leads into...

2. Do we really need to make this change?

The initial paper gives a sample program in http://wg21.link/P0528r0 asserting that it might infinite loop. That would be quite unfortunate.

Repeating the example here, it's basically:
```
  T desired = ...;
  T expected;
  while (
    !atomic->compare_exchange_strong(
      expected,
      desired // Padding bits added and removed here
  ));
```

However, that padding bits in DESIRED may be modified is not super problematic. As long as padding bits in EXPECTED (passed by reference) are not spuriously-modified, we should be fine.

That is, in practice, I believe the loop should execute at most twice, not infinitely. Upon the first failure, all the bits (including padding) pointed to by ATOMIC are copied into the object pointed to by EXPECTED (a reference parameter). At the next iteration of the loop, the call will compare all the bits, including the padding, which was copied as-is from OBJ. As nothing writes to EXPECTED between the two calls, the padding bits will then remain unchanged, and therefore the comparison will succeed on the second iteration.

However, Richard Smith informs me that while the above is true in Clang, apparently in theory it is broken -- nothing in the C++ standard prohibits padding bits of an object from be discarded or modified arbitrarily, at any time! That is, according to the spec (but not Clang), the below function "foo" could arbitrarily return non-zero, randomly! (I used "my_mem*" simply for demonstrating that even if memcpy/memcmp calls themselves are simply function calls and not specially-handled, the problem remains.). One could imagine this occurring in a non-pathologically-evil compiler, if it were to copy EXPECTED into registers and then back into memory between the two calls -- and while doing so, only copies the non-padding parts of the object, for efficiency.

```
struct Foo { char x; /* padding exists here */ long y; };

int foo(char *mem) {
  Foo expected;
  my_memcpy((char*)&expected, mem, sizeof(Foo));
  return my_memcmp((char*)&expected, mem, sizeof(Foo));
}
```

The C standard does appear to prohibit this behavior (C11 6.2.6.1/6), specifying that the padding bytes get set to unspecified values only upon writes to the object.

In summary: in C++, it would appear that the motivating example can in theory infinite-loop, but in Clang, and the C standard, it'd seem that it cannot.

Changing C++ to require the same semantics for padding stability as C could be a reasonable alternative resolution to the original issue, especially if no implementations actually make use of this allowance today. You may still have a single spurious failure, but it would never infinite-loop.


3. Finally, if the decision is to go forward with P0528 anyways, and if padding bits *can* arbitrarily change, then `__builtin_clear_padding(T* ptr)` is not a theoretically sound interface, because the padding bits could change again, after the call to clear them returns.

We'd need to either implement the support within the atomic builtins themselves (which, as noted, cannot be done within the existing GCC builtins), or define some alternative interface.


On Fri, Nov 30, 2018 at 4:13 PM JF Bastien <[hidden email]> wrote:


On Nov 30, 2018, at 1:10 PM, James Y Knight <[hidden email]> wrote:



On Fri, Nov 30, 2018 at 4:06 PM JF Bastien <[hidden email]> wrote:


On Nov 30, 2018, at 12:39 PM, Casey Carter via cfe-dev <[hidden email]> wrote:

On Fri, Nov 30, 2018 at 12:35 PM James Y Knight via cfe-dev <[hidden email]> wrote:
On Thu, Nov 29, 2018, 9:53 PM Richard Smith via cfe-dev <[hidden email] wrote:
* P0528R3 Atomic Compare-And-Exchange With Padding Bits
We need compiler magic here, in some form. Billy O'Neal wrote to the C1XX team: "To implement the new atomic_ref as well as the change to compare the value representation of atomics only, the library needs a way to zero out the padding in arbitrary T, which we can't discover with library tech alone. We would like an intrinsic that accepts a trivially-copyable T and produces a copy with the padding zeroed, or takes a T* and zeros the padding inside that T, or similar."

I think this should be done in-place in memory; producing a copy has the problem that you're passing around a value of type T, and that might permit the padding bits to become undefined again.

void __builtin_clear_padding(T *ptr)
Effects: Set to zero all bits that are padding bits in the representation of every value of type T.

Is the intent here that every value stored into a std::atomic (e.g. via store, exchange, or compare_exchange) would be passed through __builtin_clear_padding first, before being stored into the atomic object? And presumably the same would need to occur implicitly for C-style `_Atomic T`?

Yes; see P0582R3 "The Curious Case of Padding Bits" (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0528r3.html) for details. The intent is to make compare-exchange work for types with padding bits by zeroing those padding bits before storing a value in an atomic or comparing with the value stored in an atomic.

I think you rather want to zero out padding bits in the atomic, not in the value, and you want to do it after storing into the atomic (unless you’re doing zeroing, and then copying the non-padding elements one at a time, in which case you could just memset instead of having a builtin).

Errr...certainly you can't do additional stores into an atomic after storing a new value into it, that would kinda ruin the "atomic" part.

Haha yeah you’re totally right! Ignore me, no idea what I was thinking.

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Clang builtins for C++20 STL features

Oleg Smolsky via cfe-dev
FWIW, I've always thought that the simplest solution is just to expose something like "std::is_trivially_equality_comparable_v<T>" (which would be true whenever comparing objects of type T for equality could be done as-if by memcpy), and then constrain "std::atomic<T>::compare_exchange_foo" to be provided iff "std::is_trivially_equality_comparable_v<T>".  P0528 was already accepted and merged by the time I started talking about this, though.
"std::is_trivially_equality_comparable_v<T>" would be similar, but IIRC not 100% identical, to the existing "std::has_unique_object_representations_v<T>".  As of C++2a, it is detectable by the compiler even for class types (as long as the programmer =default's their type's operator==).

Anyway, I don't think programmers have any business trying to compare-and-swap structs with padding bits. (Or floats.)

–Arthur


On Tue, Dec 11, 2018 at 6:51 PM James Y Knight via cfe-dev <[hidden email]> wrote:
After considering this a bit, I'm not sure the __builtin_clear_padding proposal is workable. Worse, I'm not sure that the standards proposal itself is workable, either.

1. Regarding general implementability of P0528:

Given a call to
  bool atomic_compare_exchange_strong(std::atomic<T>* obj, T* expected, T desired)
we need to first, compare OBJ to EXPECTED -- ignoring padding. Then:
a. if they're equal, store DESIRED into the bytes pointed to by OBJ.
b. if they're not equal, copy the bytes pointed to by OBJ into the bytes pointed to by EXPECTED.

The problem is: how to do a comparison ignoring padding? We cannot reasonably modify the comparison operation itself (it's either a hardware instruction, or else a library routine taking only a size). The obvious solution is to clear the padding on both sides of the comparison (thus __builtin_clear_padding).

We can clear the padding in EXPECTED, although we'll need to make a copy first, as we're not allowed to store into it unless the compare-exchange fails. But, we cannot modify OBJ except as part of the "exchange", when the comparison has already succeeded. 

If we can ensure that the existing value in OBJ has always had its padding cleared, we're okay. We can be fully in-control of all stores that occur into a std::atomic, and, I believe that's true for C11 _Atomic types as well. (it's unclear to me whether using memcpy to non-atomically overwrite a C11 "_Atomic struct Foo" is allowable, but I'll assume that it is not.)

However, it seems infeasible to make this guarantee for std::atomic_ref, as that can be created pointing to any existing object. Said existing object may have its padding in a non-zero state. Of course, <http://wg21.link/P0528r3> doesn't mention atomic_ref, but the atomic_ref paper, <http://wg21.link/P0019r8> mentions this concern, and simply states "We require that the resolution of padding bits follow [P0528r2]."

It's also infeasible to ensure for the GCC __atomic_compare_exchange builtin, as it also operates on arbitrary existing objects. (Of course, this is irrelevant to the C++ standard.)

I believe a workable alternative would be to generate an extra load of OBJ into a temporary memory, copy only the non-padding bytes of EXPECTED on top of that, and then do the compare_exchange operation of OBJ with the temporary. Finally, copy the temporary back into EXPECTED upon a fail result. While that seems like it could work, requiring an extra load before compare-exchange operations is rather poor.

Which leads into...

2. Do we really need to make this change?

The initial paper gives a sample program in http://wg21.link/P0528r0 asserting that it might infinite loop. That would be quite unfortunate.

Repeating the example here, it's basically:
```
  T desired = ...;
  T expected;
  while (
    !atomic->compare_exchange_strong(
      expected,
      desired // Padding bits added and removed here
  ));
```

However, that padding bits in DESIRED may be modified is not super problematic. As long as padding bits in EXPECTED (passed by reference) are not spuriously-modified, we should be fine.

That is, in practice, I believe the loop should execute at most twice, not infinitely. Upon the first failure, all the bits (including padding) pointed to by ATOMIC are copied into the object pointed to by EXPECTED (a reference parameter). At the next iteration of the loop, the call will compare all the bits, including the padding, which was copied as-is from OBJ. As nothing writes to EXPECTED between the two calls, the padding bits will then remain unchanged, and therefore the comparison will succeed on the second iteration.

However, Richard Smith informs me that while the above is true in Clang, apparently in theory it is broken -- nothing in the C++ standard prohibits padding bits of an object from be discarded or modified arbitrarily, at any time! That is, according to the spec (but not Clang), the below function "foo" could arbitrarily return non-zero, randomly! (I used "my_mem*" simply for demonstrating that even if memcpy/memcmp calls themselves are simply function calls and not specially-handled, the problem remains.). One could imagine this occurring in a non-pathologically-evil compiler, if it were to copy EXPECTED into registers and then back into memory between the two calls -- and while doing so, only copies the non-padding parts of the object, for efficiency.

```
struct Foo { char x; /* padding exists here */ long y; };

int foo(char *mem) {
  Foo expected;
  my_memcpy((char*)&expected, mem, sizeof(Foo));
  return my_memcmp((char*)&expected, mem, sizeof(Foo));
}
```

The C standard does appear to prohibit this behavior (C11 6.2.6.1/6), specifying that the padding bytes get set to unspecified values only upon writes to the object.

In summary: in C++, it would appear that the motivating example can in theory infinite-loop, but in Clang, and the C standard, it'd seem that it cannot.

Changing C++ to require the same semantics for padding stability as C could be a reasonable alternative resolution to the original issue, especially if no implementations actually make use of this allowance today. You may still have a single spurious failure, but it would never infinite-loop.


3. Finally, if the decision is to go forward with P0528 anyways, and if padding bits *can* arbitrarily change, then `__builtin_clear_padding(T* ptr)` is not a theoretically sound interface, because the padding bits could change again, after the call to clear them returns.

We'd need to either implement the support within the atomic builtins themselves (which, as noted, cannot be done within the existing GCC builtins), or define some alternative interface.


On Fri, Nov 30, 2018 at 4:13 PM JF Bastien <[hidden email]> wrote:


On Nov 30, 2018, at 1:10 PM, James Y Knight <[hidden email]> wrote:



On Fri, Nov 30, 2018 at 4:06 PM JF Bastien <[hidden email]> wrote:


On Nov 30, 2018, at 12:39 PM, Casey Carter via cfe-dev <[hidden email]> wrote:

On Fri, Nov 30, 2018 at 12:35 PM James Y Knight via cfe-dev <[hidden email]> wrote:
On Thu, Nov 29, 2018, 9:53 PM Richard Smith via cfe-dev <[hidden email] wrote:
* P0528R3 Atomic Compare-And-Exchange With Padding Bits
We need compiler magic here, in some form. Billy O'Neal wrote to the C1XX team: "To implement the new atomic_ref as well as the change to compare the value representation of atomics only, the library needs a way to zero out the padding in arbitrary T, which we can't discover with library tech alone. We would like an intrinsic that accepts a trivially-copyable T and produces a copy with the padding zeroed, or takes a T* and zeros the padding inside that T, or similar."

I think this should be done in-place in memory; producing a copy has the problem that you're passing around a value of type T, and that might permit the padding bits to become undefined again.

void __builtin_clear_padding(T *ptr)
Effects: Set to zero all bits that are padding bits in the representation of every value of type T.

Is the intent here that every value stored into a std::atomic (e.g. via store, exchange, or compare_exchange) would be passed through __builtin_clear_padding first, before being stored into the atomic object? And presumably the same would need to occur implicitly for C-style `_Atomic T`?

Yes; see P0582R3 "The Curious Case of Padding Bits" (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0528r3.html) for details. The intent is to make compare-exchange work for types with padding bits by zeroing those padding bits before storing a value in an atomic or comparing with the value stored in an atomic.

I think you rather want to zero out padding bits in the atomic, not in the value, and you want to do it after storing into the atomic (unless you’re doing zeroing, and then copying the non-padding elements one at a time, in which case you could just memset instead of having a builtin).

Errr...certainly you can't do additional stores into an atomic after storing a new value into it, that would kinda ruin the "atomic" part.

Haha yeah you’re totally right! Ignore me, no idea what I was thinking.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Clang builtins for C++20 STL features

Oleg Smolsky via cfe-dev

On Dec 11, 2018, at 5:37 PM, Arthur O'Dwyer <[hidden email]> wrote:

FWIW, I've always thought that the simplest solution is just to expose something like "std::is_trivially_equality_comparable_v<T>" (which would be true whenever comparing objects of type T for equality could be done as-if by memcpy), and then constrain "std::atomic<T>::compare_exchange_foo" to be provided iff "std::is_trivially_equality_comparable_v<T>".  P0528 was already accepted and merged by the time I started talking about this, though.

You should read the first version of the paper, which suggested exactly this. You should then read the EWG discussion, which led to the currently adopted solution.


"std::is_trivially_equality_comparable_v<T>" would be similar, but IIRC not 100% identical, to the existing "std::has_unique_object_representations_v<T>".  As of C++2a, it is detectable by the compiler even for class types (as long as the programmer =default's their type's operator==).

Anyway, I don't think programmers have any business trying to compare-and-swap structs with padding bits. (Or floats.)

–Arthur


On Tue, Dec 11, 2018 at 6:51 PM James Y Knight via cfe-dev <[hidden email]> wrote:
After considering this a bit, I'm not sure the __builtin_clear_padding proposal is workable. Worse, I'm not sure that the standards proposal itself is workable, either.

1. Regarding general implementability of P0528:

Given a call to
  bool atomic_compare_exchange_strong(std::atomic<T>* obj, T* expected, T desired)
we need to first, compare OBJ to EXPECTED -- ignoring padding. Then:
a. if they're equal, store DESIRED into the bytes pointed to by OBJ.
b. if they're not equal, copy the bytes pointed to by OBJ into the bytes pointed to by EXPECTED.

The problem is: how to do a comparison ignoring padding? We cannot reasonably modify the comparison operation itself (it's either a hardware instruction, or else a library routine taking only a size). The obvious solution is to clear the padding on both sides of the comparison (thus __builtin_clear_padding).

We can clear the padding in EXPECTED, although we'll need to make a copy first, as we're not allowed to store into it unless the compare-exchange fails. But, we cannot modify OBJ except as part of the "exchange", when the comparison has already succeeded. 

If we can ensure that the existing value in OBJ has always had its padding cleared, we're okay. We can be fully in-control of all stores that occur into a std::atomic, and, I believe that's true for C11 _Atomic types as well. (it's unclear to me whether using memcpy to non-atomically overwrite a C11 "_Atomic struct Foo" is allowable, but I'll assume that it is not.)

However, it seems infeasible to make this guarantee for std::atomic_ref, as that can be created pointing to any existing object. Said existing object may have its padding in a non-zero state. Of course, <http://wg21.link/P0528r3> doesn't mention atomic_ref, but the atomic_ref paper, <http://wg21.link/P0019r8> mentions this concern, and simply states "We require that the resolution of padding bits follow [P0528r2]."

It's also infeasible to ensure for the GCC __atomic_compare_exchange builtin, as it also operates on arbitrary existing objects. (Of course, this is irrelevant to the C++ standard.)

I believe a workable alternative would be to generate an extra load of OBJ into a temporary memory, copy only the non-padding bytes of EXPECTED on top of that, and then do the compare_exchange operation of OBJ with the temporary. Finally, copy the temporary back into EXPECTED upon a fail result. While that seems like it could work, requiring an extra load before compare-exchange operations is rather poor.

I suggest you synchronize with your company’s C++ representatives, who were present at the EWG discussion which led to this change. They were strongly in favor of this direction, and AFAIK it was understood that there would be a cost for structs with padding bits.


Which leads into...

2. Do we really need to make this change?

The initial paper gives a sample program in http://wg21.link/P0528r0 asserting that it might infinite loop. That would be quite unfortunate.

Repeating the example here, it's basically:
```
  T desired = ...;
  T expected;
  while (
    !atomic->compare_exchange_strong(
      expected,
      desired // Padding bits added and removed here
  ));
```

However, that padding bits in DESIRED may be modified is not super problematic. As long as padding bits in EXPECTED (passed by reference) are not spuriously-modified, we should be fine.

That is, in practice, I believe the loop should execute at most twice, not infinitely. Upon the first failure, all the bits (including padding) pointed to by ATOMIC are copied into the object pointed to by EXPECTED (a reference parameter). At the next iteration of the loop, the call will compare all the bits, including the padding, which was copied as-is from OBJ. As nothing writes to EXPECTED between the two calls, the padding bits will then remain unchanged, and therefore the comparison will succeed on the second iteration.

However, Richard Smith informs me that while the above is true in Clang, apparently in theory it is broken -- nothing in the C++ standard prohibits padding bits of an object from be discarded or modified arbitrarily, at any time! That is, according to the spec (but not Clang), the below function "foo" could arbitrarily return non-zero, randomly! (I used "my_mem*" simply for demonstrating that even if memcpy/memcmp calls themselves are simply function calls and not specially-handled, the problem remains.). One could imagine this occurring in a non-pathologically-evil compiler, if it were to copy EXPECTED into registers and then back into memory between the two calls -- and while doing so, only copies the non-padding parts of the object, for efficiency.

```
struct Foo { char x; /* padding exists here */ long y; };

int foo(char *mem) {
  Foo expected;
  my_memcpy((char*)&expected, mem, sizeof(Foo));
  return my_memcmp((char*)&expected, mem, sizeof(Foo));
}
```

The C standard does appear to prohibit this behavior (C11 6.2.6.1/6), specifying that the padding bytes get set to unspecified values only upon writes to the object.

In summary: in C++, it would appear that the motivating example can in theory infinite-loop, but in Clang, and the C standard, it'd seem that it cannot.

Changing C++ to require the same semantics for padding stability as C could be a reasonable alternative resolution to the original issue, especially if no implementations actually make use of this allowance today. You may still have a single spurious failure, but it would never infinite-loop.


3. Finally, if the decision is to go forward with P0528 anyways, and if padding bits *can* arbitrarily change, then `__builtin_clear_padding(T* ptr)` is not a theoretically sound interface, because the padding bits could change again, after the call to clear them returns.

We'd need to either implement the support within the atomic builtins themselves (which, as noted, cannot be done within the existing GCC builtins), or define some alternative interface.


On Fri, Nov 30, 2018 at 4:13 PM JF Bastien <[hidden email]> wrote:


On Nov 30, 2018, at 1:10 PM, James Y Knight <[hidden email]> wrote:



On Fri, Nov 30, 2018 at 4:06 PM JF Bastien <[hidden email]> wrote:


On Nov 30, 2018, at 12:39 PM, Casey Carter via cfe-dev <[hidden email]> wrote:

On Fri, Nov 30, 2018 at 12:35 PM James Y Knight via cfe-dev <[hidden email]> wrote:
On Thu, Nov 29, 2018, 9:53 PM Richard Smith via cfe-dev <[hidden email] wrote:
* P0528R3 Atomic Compare-And-Exchange With Padding Bits
We need compiler magic here, in some form. Billy O'Neal wrote to the C1XX team: "To implement the new atomic_ref as well as the change to compare the value representation of atomics only, the library needs a way to zero out the padding in arbitrary T, which we can't discover with library tech alone. We would like an intrinsic that accepts a trivially-copyable T and produces a copy with the padding zeroed, or takes a T* and zeros the padding inside that T, or similar."

I think this should be done in-place in memory; producing a copy has the problem that you're passing around a value of type T, and that might permit the padding bits to become undefined again.

void __builtin_clear_padding(T *ptr)
Effects: Set to zero all bits that are padding bits in the representation of every value of type T.

Is the intent here that every value stored into a std::atomic (e.g. via store, exchange, or compare_exchange) would be passed through __builtin_clear_padding first, before being stored into the atomic object? And presumably the same would need to occur implicitly for C-style `_Atomic T`?

Yes; see P0582R3 "The Curious Case of Padding Bits" (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0528r3.html) for details. The intent is to make compare-exchange work for types with padding bits by zeroing those padding bits before storing a value in an atomic or comparing with the value stored in an atomic.

I think you rather want to zero out padding bits in the atomic, not in the value, and you want to do it after storing into the atomic (unless you’re doing zeroing, and then copying the non-padding elements one at a time, in which case you could just memset instead of having a builtin).

Errr...certainly you can't do additional stores into an atomic after storing a new value into it, that would kinda ruin the "atomic" part.

Haha yeah you’re totally right! Ignore me, no idea what I was thinking.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Clang builtins for C++20 STL features

Oleg Smolsky via cfe-dev
On Thu, Dec 13, 2018 at 2:45 PM JF Bastien <[hidden email]> wrote:
On Tue, Dec 11, 2018 at 6:51 PM James Y Knight via cfe-dev <[hidden email]> wrote:
After considering this a bit, I'm not sure the __builtin_clear_padding proposal is workable. Worse, I'm not sure that the standards proposal itself is workable, either.

1. Regarding general implementability of P0528:

Given a call to
  bool atomic_compare_exchange_strong(std::atomic<T>* obj, T* expected, T desired)
we need to first, compare OBJ to EXPECTED -- ignoring padding. Then:
a. if they're equal, store DESIRED into the bytes pointed to by OBJ.
b. if they're not equal, copy the bytes pointed to by OBJ into the bytes pointed to by EXPECTED.

The problem is: how to do a comparison ignoring padding? We cannot reasonably modify the comparison operation itself (it's either a hardware instruction, or else a library routine taking only a size). The obvious solution is to clear the padding on both sides of the comparison (thus __builtin_clear_padding).

We can clear the padding in EXPECTED, although we'll need to make a copy first, as we're not allowed to store into it unless the compare-exchange fails. But, we cannot modify OBJ except as part of the "exchange", when the comparison has already succeeded. 

If we can ensure that the existing value in OBJ has always had its padding cleared, we're okay. We can be fully in-control of all stores that occur into a std::atomic, and, I believe that's true for C11 _Atomic types as well. (it's unclear to me whether using memcpy to non-atomically overwrite a C11 "_Atomic struct Foo" is allowable, but I'll assume that it is not.)

However, it seems infeasible to make this guarantee for std::atomic_ref, as that can be created pointing to any existing object. Said existing object may have its padding in a non-zero state. Of course, <http://wg21.link/P0528r3> doesn't mention atomic_ref, but the atomic_ref paper, <http://wg21.link/P0019r8> mentions this concern, and simply states "We require that the resolution of padding bits follow [P0528r2]."

It's also infeasible to ensure for the GCC __atomic_compare_exchange builtin, as it also operates on arbitrary existing objects. (Of course, this is irrelevant to the C++ standard.)

I believe a workable alternative would be to generate an extra load of OBJ into a temporary memory, copy only the non-padding bytes of EXPECTED on top of that, and then do the compare_exchange operation of OBJ with the temporary. Finally, copy the temporary back into EXPECTED upon a fail result. While that seems like it could work, requiring an extra load before compare-exchange operations is rather poor.

I suggest you synchronize with your company’s C++ representatives, who were present at the EWG discussion which led to this change. They were strongly in favor of this direction, and AFAIK it was understood that there would be a cost for structs with padding bits.

I'm not on the standards committee myself, and, while I am employed by Google, I do not generally coordinate my llvm-dev emails with my company, nor do I have a list of all Google employees who attend which standards body meetings readily available. However, certainly some of said representatives are on this list and can read this message, and could chime in with their own opinions if they so desire. Or, if you tell me which individuals you mean, I could contact them directly.

Was requiring that the previous value of the atomic be loaded during a compare_exchange the understood implementation strategy in committee discussions? From what I can tell from the outside, it seemed like people were in favor of the change because it was thought it would be easy and cheap to implement. Adding an extraneous load of the atomic value seems to me a whole different matter in terms of cost, and not one I would expect that people would've been readily willing to agree to.

Had it been discussed that the example in the R0 paper was not an infinite-loop because padding is unspecified during object initialization object copies, as the paper suggests is the case? But, that there is a potential infinite loop only if the C++ standard allows the padding bits in "expected" to arbitrarily modify themselves at any time between the iterations of the loop, which, unfortunately, it would appear to allow?

Which leads into...

2. Do we really need to make this change?

The initial paper gives a sample program in http://wg21.link/P0528r0 asserting that it might infinite loop. That would be quite unfortunate.

Repeating the example here, it's basically:
```
  T desired = ...;
  T expected;
  while (
    !atomic->compare_exchange_strong(
      expected,
      desired // Padding bits added and removed here
  ));
```

However, that padding bits in DESIRED may be modified is not super problematic. As long as padding bits in EXPECTED (passed by reference) are not spuriously-modified, we should be fine.

That is, in practice, I believe the loop should execute at most twice, not infinitely. Upon the first failure, all the bits (including padding) pointed to by ATOMIC are copied into the object pointed to by EXPECTED (a reference parameter). At the next iteration of the loop, the call will compare all the bits, including the padding, which was copied as-is from OBJ. As nothing writes to EXPECTED between the two calls, the padding bits will then remain unchanged, and therefore the comparison will succeed on the second iteration.

However, Richard Smith informs me that while the above is true in Clang, apparently in theory it is broken -- nothing in the C++ standard prohibits padding bits of an object from be discarded or modified arbitrarily, at any time! That is, according to the spec (but not Clang), the below function "foo" could arbitrarily return non-zero, randomly! (I used "my_mem*" simply for demonstrating that even if memcpy/memcmp calls themselves are simply function calls and not specially-handled, the problem remains.). One could imagine this occurring in a non-pathologically-evil compiler, if it were to copy EXPECTED into registers and then back into memory between the two calls -- and while doing so, only copies the non-padding parts of the object, for efficiency.

```
struct Foo { char x; /* padding exists here */ long y; };

int foo(char *mem) {
  Foo expected;
  my_memcpy((char*)&expected, mem, sizeof(Foo));
  return my_memcmp((char*)&expected, mem, sizeof(Foo));
}
```

The C standard does appear to prohibit this behavior (C11 6.2.6.1/6), specifying that the padding bytes get set to unspecified values only upon writes to the object.

In summary: in C++, it would appear that the motivating example can in theory infinite-loop, but in Clang, and the C standard, it'd seem that it cannot.

Changing C++ to require the same semantics for padding stability as C could be a reasonable alternative resolution to the original issue, especially if no implementations actually make use of this allowance today. You may still have a single spurious failure, but it would never infinite-loop.


3. Finally, if the decision is to go forward with P0528 anyways, and if padding bits *can* arbitrarily change, then `__builtin_clear_padding(T* ptr)` is not a theoretically sound interface, because the padding bits could change again, after the call to clear them returns.

We'd need to either implement the support within the atomic builtins themselves (which, as noted, cannot be done within the existing GCC builtins), or define some alternative interface.

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Clang builtins for C++20 STL features

Oleg Smolsky via cfe-dev


On Dec 14, 2018, at 6:34 AM, James Y Knight <[hidden email]> wrote:

On Thu, Dec 13, 2018 at 2:45 PM JF Bastien <[hidden email]> wrote:
On Tue, Dec 11, 2018 at 6:51 PM James Y Knight via cfe-dev <[hidden email]> wrote:
After considering this a bit, I'm not sure the __builtin_clear_padding proposal is workable. Worse, I'm not sure that the standards proposal itself is workable, either.

1. Regarding general implementability of P0528:

Given a call to
  bool atomic_compare_exchange_strong(std::atomic<T>* obj, T* expected, T desired)
we need to first, compare OBJ to EXPECTED -- ignoring padding. Then:
a. if they're equal, store DESIRED into the bytes pointed to by OBJ.
b. if they're not equal, copy the bytes pointed to by OBJ into the bytes pointed to by EXPECTED.

The problem is: how to do a comparison ignoring padding? We cannot reasonably modify the comparison operation itself (it's either a hardware instruction, or else a library routine taking only a size). The obvious solution is to clear the padding on both sides of the comparison (thus __builtin_clear_padding).

We can clear the padding in EXPECTED, although we'll need to make a copy first, as we're not allowed to store into it unless the compare-exchange fails. But, we cannot modify OBJ except as part of the "exchange", when the comparison has already succeeded. 

If we can ensure that the existing value in OBJ has always had its padding cleared, we're okay. We can be fully in-control of all stores that occur into a std::atomic, and, I believe that's true for C11 _Atomic types as well. (it's unclear to me whether using memcpy to non-atomically overwrite a C11 "_Atomic struct Foo" is allowable, but I'll assume that it is not.)

However, it seems infeasible to make this guarantee for std::atomic_ref, as that can be created pointing to any existing object. Said existing object may have its padding in a non-zero state. Of course, <http://wg21.link/P0528r3> doesn't mention atomic_ref, but the atomic_ref paper, <http://wg21.link/P0019r8> mentions this concern, and simply states "We require that the resolution of padding bits follow [P0528r2]."

It's also infeasible to ensure for the GCC __atomic_compare_exchange builtin, as it also operates on arbitrary existing objects. (Of course, this is irrelevant to the C++ standard.)

I believe a workable alternative would be to generate an extra load of OBJ into a temporary memory, copy only the non-padding bytes of EXPECTED on top of that, and then do the compare_exchange operation of OBJ with the temporary. Finally, copy the temporary back into EXPECTED upon a fail result. While that seems like it could work, requiring an extra load before compare-exchange operations is rather poor.

I suggest you synchronize with your company’s C++ representatives, who were present at the EWG discussion which led to this change. They were strongly in favor of this direction, and AFAIK it was understood that there would be a cost for structs with padding bits.

I'm not on the standards committee myself, and, while I am employed by Google, I do not generally coordinate my llvm-dev emails with my company, nor do I have a list of all Google employees who attend which standards body meetings readily available. However, certainly some of said representatives are on this list and can read this message, and could chime in with their own opinions if they so desire. Or, if you tell me which individuals you mean, I could contact them directly.

The main Google representative is Chandler. I suggest talking to him or Richard (who you’ve talked to) because they have access to the discussion’s transcripts, which aren’t public and therefore can’t go to cfe-dev. They can provide the context you’re asking for. I don’t think I can provide that context on a public list without asking everyone who participated.


Was requiring that the previous value of the atomic be loaded during a compare_exchange the understood implementation strategy in committee discussions?

It was *one of* the understood implementation strategies. Another was specific HW support, and yet another was the compiler just always “doing the right thing” w.r.t. padding (which is the approach you seem to want to take).


From what I can tell from the outside, it seemed like people were in favor of the change because it was thought it would be easy and cheap to implement. Adding an extraneous load of the atomic value seems to me a whole different matter in terms of cost, and not one I would expect that people would've been readily willing to agree to.

An extra load for cmpxchg of atomic types with padding… which occurs ~never, and likely never on purpose. People agreed that it didn’t matter, but should Just Work, and if there’s a very minor cost then people didn’t really care.


Had it been discussed that the example in the R0 paper was not an infinite-loop because padding is unspecified during object initialization object copies, as the paper suggests is the case?

?


But, that there is a potential infinite loop only if the C++ standard allows the padding bits in "expected" to arbitrarily modify themselves at any time between the iterations of the loop, which, unfortunately, it would appear to allow?

This is exactly what the paper is addressing, yes.


Which leads into...

2. Do we really need to make this change?

The initial paper gives a sample program in http://wg21.link/P0528r0 asserting that it might infinite loop. That would be quite unfortunate.

Repeating the example here, it's basically:
```
  T desired = ...;
  T expected;
  while (
    !atomic->compare_exchange_strong(
      expected,
      desired // Padding bits added and removed here
  ));
```

However, that padding bits in DESIRED may be modified is not super problematic. As long as padding bits in EXPECTED (passed by reference) are not spuriously-modified, we should be fine.

That is, in practice, I believe the loop should execute at most twice, not infinitely. Upon the first failure, all the bits (including padding) pointed to by ATOMIC are copied into the object pointed to by EXPECTED (a reference parameter). At the next iteration of the loop, the call will compare all the bits, including the padding, which was copied as-is from OBJ. As nothing writes to EXPECTED between the two calls, the padding bits will then remain unchanged, and therefore the comparison will succeed on the second iteration.

However, Richard Smith informs me that while the above is true in Clang, apparently in theory it is broken -- nothing in the C++ standard prohibits padding bits of an object from be discarded or modified arbitrarily, at any time! That is, according to the spec (but not Clang), the below function "foo" could arbitrarily return non-zero, randomly! (I used "my_mem*" simply for demonstrating that even if memcpy/memcmp calls themselves are simply function calls and not specially-handled, the problem remains.). One could imagine this occurring in a non-pathologically-evil compiler, if it were to copy EXPECTED into registers and then back into memory between the two calls -- and while doing so, only copies the non-padding parts of the object, for efficiency.

```
struct Foo { char x; /* padding exists here */ long y; };

int foo(char *mem) {
  Foo expected;
  my_memcpy((char*)&expected, mem, sizeof(Foo));
  return my_memcmp((char*)&expected, mem, sizeof(Foo));
}
```

The C standard does appear to prohibit this behavior (C11 6.2.6.1/6), specifying that the padding bytes get set to unspecified values only upon writes to the object.

In summary: in C++, it would appear that the motivating example can in theory infinite-loop, but in Clang, and the C standard, it'd seem that it cannot.

Changing C++ to require the same semantics for padding stability as C could be a reasonable alternative resolution to the original issue, especially if no implementations actually make use of this allowance today. You may still have a single spurious failure, but it would never infinite-loop.


3. Finally, if the decision is to go forward with P0528 anyways, and if padding bits *can* arbitrarily change, then `__builtin_clear_padding(T* ptr)` is not a theoretically sound interface, because the padding bits could change again, after the call to clear them returns.

We'd need to either implement the support within the atomic builtins themselves (which, as noted, cannot be done within the existing GCC builtins), or define some alternative interface.


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev