[RFC] volatile mem* builtins

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[RFC] volatile mem* builtins

Fangrui Song via cfe-dev
Hi fans of volatility!

I’d like to add volatile overloads to mem* builtins, and authored a patch: https://reviews.llvm.org/D79279

The mem* builtins are often used (or should be used) in places where time-of-check time-of-use security issues are important (e.g. copying from untrusted buffers), because it prevents multiple reads / multiple writes from occurring at the untrusted memory location. The current builtins don't accept volatile pointee parameters in C++, and merely warn about such parameters in C, which leads to confusion. In these settings, it's useful to overload the builtin and permit volatile pointee parameters. The code generation then directly emits the existing volatile variant of the mem* builtin function call, which ensures that the affected memory location is only accessed once (thereby preventing double-reads under an adversarial memory mapping).

Side-note: yes, ToCToU avoidance is a valid use for volatile.

My patch currently only affects:
  • __builtin_memcpy
  • __builtin_memmove
  • __builtin_memset
There’s a bunch more “mem-like” functions such as bzero, but those 3 are the ones I expect to be used the most at the moment. We can add others later.

John brought up the following: __builtin_memcpy is a library builtin, which means its primary use pattern is #define tricks in the C standard library headers that redirect calls to the memcpy library function. So doing what you're suggesting to __builtin_memcpy is also changing the semantics of memcpy, which is not something we should do lightly. If we were talking about changing a non-library builtin function, or introducing a new builtin, the considerations would be very different.

I can instead add __builtin_volatile_* functions which are overloaded on at least one pointee parameter being volatile.

What do y’all think?

Thanks,

JF

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] volatile mem* builtins

Fangrui Song via cfe-dev

On 6 May 2020, at 18:40, JF Bastien wrote:

Hi fans of volatility!

I’d like to add volatile overloads to mem* builtins, and authored a patch: https://reviews.llvm.org/D79279 <https://reviews.llvm.org/D79279>

The mem* builtins are often used (or should be used) in places where time-of-check time-of-use security issues are important (e.g. copying from untrusted buffers), because it prevents multiple reads / multiple writes from occurring at the untrusted memory location. The current builtins don't accept volatile pointee parameters in C++, and merely warn about such parameters in C, which leads to confusion. In these settings, it's useful to overload the builtin and permit volatile pointee parameters. The code generation then directly emits the existing volatile variant of the mem* builtin function call, which ensures that the affected memory location is only accessed once (thereby preventing double-reads under an adversarial memory mapping).

Side-note: yes, ToCToU avoidance is a valid use for volatile <https://wg21.link/p1152r0#uses>.

My patch currently only affects:
__builtin_memcpy
__builtin_memmove
__builtin_memset
There’s a bunch more “mem-like” functions such as bzero, but those 3 are the ones I expect to be used the most at the moment. We can add others later.

John brought up the following: __builtin_memcpy is a library builtin, which means its primary use pattern is #define tricks in the C standard library headers that redirect calls to the memcpy library function. So doing what you're suggesting to __builtin_memcpy is also changing the semantics of memcpy, which is not something we should do lightly. If we were talking about changing a non-library builtin function, or introducing a new builtin, the considerations would be very different.

I can instead add __builtin_volatile_* functions which are overloaded on at least one pointee parameter being volatile.

So, to be clear, you would like there to be some way to request a volatile memcpy (etc.). You don’t need it to specifically be __builtin_memcpy (etc.) — i.e. you’re not relying on this automatically triggering when users call memcpy — you just need some way to spell it.

A few thoughts:

  • A memcpy/memmove is conceptually a load from one address and a store to another. It is potentially valuable to know that e.g. only the store is volatile. We can’t express that in today’s LLVM intrinsics, but it’s certainly imaginable that we could express it in the future, the same way that we added the ability to record different alignments for both sides. So I think it would be nice if whatever we do here allows us to pick up on the difference, e.g. by triggering based on the qualification of the source/dest pointers.

  • There are other qualifiers that can meaningfully contribute to the operation here besides volatile, such as restrict and (more importantly) address spaces. And again, for the copy operations these might differ between the two pointer types.

In both cases, I’d say that the logical design is to allow the pointers to be to arbitrarily-qualified types. We can then propagate that information from the builtin into the LLVM intrinsic call as best as we’re allowed. So I think you should make builtins called something like __builtin_overloaded_memcpy (name to be decided) and just have their semantics be type-directed.

I do think it would treacherous to actually apply these semantics to memcpy via __builtin_memcpy, though.

John.


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] volatile mem* builtins

Fangrui Song via cfe-dev


On May 6, 2020, at 4:23 PM, John McCall <[hidden email]> wrote:

On 6 May 2020, at 18:40, JF Bastien wrote:

Hi fans of volatility!

I’d like to add volatile overloads to mem* builtins, and authored a patch: https://reviews.llvm.org/D79279 <https://reviews.llvm.org/D79279>

The mem* builtins are often used (or should be used) in places where time-of-check time-of-use security issues are important (e.g. copying from untrusted buffers), because it prevents multiple reads / multiple writes from occurring at the untrusted memory location. The current builtins don't accept volatile pointee parameters in C++, and merely warn about such parameters in C, which leads to confusion. In these settings, it's useful to overload the builtin and permit volatile pointee parameters. The code generation then directly emits the existing volatile variant of the mem* builtin function call, which ensures that the affected memory location is only accessed once (thereby preventing double-reads under an adversarial memory mapping).

Side-note: yes, ToCToU avoidance is a valid use for volatile <https://wg21.link/p1152r0#uses>.

My patch currently only affects:
__builtin_memcpy
__builtin_memmove
__builtin_memset
There’s a bunch more “mem-like” functions such as bzero, but those 3 are the ones I expect to be used the most at the moment. We can add others later.

John brought up the following: __builtin_memcpy is a library builtin, which means its primary use pattern is #define tricks in the C standard library headers that redirect calls to the memcpy library function. So doing what you're suggesting to __builtin_memcpy is also changing the semantics of memcpy, which is not something we should do lightly. If we were talking about changing a non-library builtin function, or introducing a new builtin, the considerations would be very different.

I can instead add __builtin_volatile_* functions which are overloaded on at least one pointee parameter being volatile.

So, to be clear, you would like there to be some way to request a volatile memcpy (etc.). You don’t need it to specifically be __builtin_memcpy (etc.) — i.e. you’re not relying on this automatically triggering when users call memcpy — you just need some way to spell it.


Yup.


A few thoughts:

  • A memcpy/memmove is conceptually a load from one address and a store to another. It is potentially valuable to know that e.g. only the store is volatile. We can’t express that in today’s LLVM intrinsics, but it’s certainly imaginable that we could express it in the future, the same way that we added the ability to record different alignments for both sides. So I think it would be nice if whatever we do here allows us to pick up on the difference, e.g. by triggering based on the qualification of the source/dest pointers.


Indeed, that’s what I went with overloading.


  • There are other qualifiers that can meaningfully contribute to the operation here besides volatile, such as restrict and (more importantly) address spaces. And again, for the copy operations these might differ between the two pointer types.

In both cases, I’d say that the logical design is to allow the pointers to be to arbitrarily-qualified types. We can then propagate that information from the builtin into the LLVM intrinsic call as best as we’re allowed. So I think you should make builtins called something like __builtin_overloaded_memcpy (name to be decided) and just have their semantics be type-directed.


Ah yes, I’d like to hear what others think of this. I hadn’t thought about it before you brought it up, and it sounds like a good idea.


I do think it would treacherous to actually apply these semantics to memcpy via __builtin_memcpy, though.


Yeah I think you’ve convinced me.


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] volatile mem* builtins

Fangrui Song via cfe-dev
In reply to this post by Fangrui Song via cfe-dev
On Wed, May 06, 2020 at 03:40:43PM -0700, JF Bastien via cfe-dev wrote:
> I’d like to add volatile overloads to mem* builtins, and authored a patch: https://reviews.llvm.org/D79279 <https://reviews.llvm.org/D79279>

The major issue I have here is that it seems to seriously underspecify
what it is actually happening with the memory. Consider an
implementation of memset for example. It is entirely legal and
potentially even useful to check for length being >= two registers and
in that case implement it as
    write to [start, start+reglen)
    write to [start+len-reglen,start+len)
    write to (start+reglen)&(reglen-1) in reglen blocks until reaching
    the end

or variations to try to hide the unaligned head and tail case in the
main loop. This would violate the normal assumptions around volatile,
e.g. when using device memory.

Joerg
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] volatile mem* builtins

Fangrui Song via cfe-dev
Volatile memcpy/memset/etc are indeed rather underspecified. As LangRef states, "If the isvolatile parameter is true, the llvm.memcpy call is a volatile operation. The detailed access behavior is not very cleanly specified and it is unwise to depend on it."

I think the intent at the moment is that the mem-operation as a whole should be treated as volatile -- so the memory copy, as a whole, will happen only once -- but that there's no guarantee as to what loads/stores are used to implement that, including potentially reading/writing some bytes multiple times, with different access sizes. I think those semantics make sense, but really ought to be fully nailed down.




On Thu, May 7, 2020 at 5:13 PM Joerg Sonnenberger via cfe-dev <[hidden email]> wrote:
On Wed, May 06, 2020 at 03:40:43PM -0700, JF Bastien via cfe-dev wrote:
> I’d like to add volatile overloads to mem* builtins, and authored a patch: https://reviews.llvm.org/D79279 <https://reviews.llvm.org/D79279>

The major issue I have here is that it seems to seriously underspecify
what it is actually happening with the memory. Consider an
implementation of memset for example. It is entirely legal and
potentially even useful to check for length being >= two registers and
in that case implement it as
    write to [start, start+reglen)
    write to [start+len-reglen,start+len)
    write to (start+reglen)&(reglen-1) in reglen blocks until reaching
    the end

or variations to try to hide the unaligned head and tail case in the
main loop. This would violate the normal assumptions around volatile,
e.g. when using device memory.

Joerg
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] volatile mem* builtins

Fangrui Song via cfe-dev
I indeed think that this is fine for the purpose I’ve stated.

Fundamentally, if you’re interacting with device memory you want to know that accesses are performed in a certain order, at a certain size granularity. Today’s C and C++ implementations volatile on scalars effectively provide this knowledge, even if the Standard doesn’t say so. Volatile on anything “large” (more than one or two registers) doesn’t, and ought not to be used for device memory with specific semantics.

However, the volatile usage for ToCToU doesn’t care if what you describe happens! An adversary can race all they want, as long as any validity check occurs after the copy has completed then we’re fine.

Maybe this argues for calling the builtin something else than volatile, but still mapping it to volatile memcpy IR for now?


On May 7, 2020, at 3:07 PM, James Y Knight via cfe-dev <[hidden email]> wrote:

Volatile memcpy/memset/etc are indeed rather underspecified. As LangRef states, "If the isvolatile parameter is true, the llvm.memcpy call is a volatile operation. The detailed access behavior is not very cleanly specified and it is unwise to depend on it."

I think the intent at the moment is that the mem-operation as a whole should be treated as volatile -- so the memory copy, as a whole, will happen only once -- but that there's no guarantee as to what loads/stores are used to implement that, including potentially reading/writing some bytes multiple times, with different access sizes. I think those semantics make sense, but really ought to be fully nailed down.




On Thu, May 7, 2020 at 5:13 PM Joerg Sonnenberger via cfe-dev <[hidden email]> wrote:
On Wed, May 06, 2020 at 03:40:43PM -0700, JF Bastien via cfe-dev wrote:
> I’d like to add volatile overloads to mem* builtins, and authored a patch: https://reviews.llvm.org/D79279 <https://reviews.llvm.org/D79279>

The major issue I have here is that it seems to seriously underspecify
what it is actually happening with the memory. Consider an
implementation of memset for example. It is entirely legal and
potentially even useful to check for length being >= two registers and
in that case implement it as
    write to [start, start+reglen)
    write to [start+len-reglen,start+len)
    write to (start+reglen)&(reglen-1) in reglen blocks until reaching
    the end

or variations to try to hide the unaligned head and tail case in the
main loop. This would violate the normal assumptions around volatile,
e.g. when using device memory.

Joerg
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] volatile mem* builtins

Fangrui Song via cfe-dev
IMO, it's quite fine to call this "volatile", and it's actually the desirable semantics -- even for talking to devices.

E.g. you may want to memcpy data out of a device's buffer, and then do a volatile write to tell the device you're done with the buffer. Here, you want the memcpy to be volatile so it cannot be reordered across the "done" write. But, other than ensuring the correct ordering w.r.t. the subsequent volatile write, it's quite irrelevant exactly how the data is read from the buffer -- doing it in the most efficient way is desirable.

All I want is that this specification of what exactly is (and perhaps more importantly) isn't guaranteed is actually be written down, both in Clang docs for the builtins, and LLVM langref for the IR intrinsics. Other than that, this seems like a fine and useful addition.


On Fri, May 8, 2020 at 1:07 PM JF Bastien <[hidden email]> wrote:
I indeed think that this is fine for the purpose I’ve stated.

Fundamentally, if you’re interacting with device memory you want to know that accesses are performed in a certain order, at a certain size granularity. Today’s C and C++ implementations volatile on scalars effectively provide this knowledge, even if the Standard doesn’t say so. Volatile on anything “large” (more than one or two registers) doesn’t, and ought not to be used for device memory with specific semantics.

However, the volatile usage for ToCToU doesn’t care if what you describe happens! An adversary can race all they want, as long as any validity check occurs after the copy has completed then we’re fine.

Maybe this argues for calling the builtin something else than volatile, but still mapping it to volatile memcpy IR for now?


On May 7, 2020, at 3:07 PM, James Y Knight via cfe-dev <[hidden email]> wrote:

Volatile memcpy/memset/etc are indeed rather underspecified. As LangRef states, "If the isvolatile parameter is true, the llvm.memcpy call is a volatile operation. The detailed access behavior is not very cleanly specified and it is unwise to depend on it."

I think the intent at the moment is that the mem-operation as a whole should be treated as volatile -- so the memory copy, as a whole, will happen only once -- but that there's no guarantee as to what loads/stores are used to implement that, including potentially reading/writing some bytes multiple times, with different access sizes. I think those semantics make sense, but really ought to be fully nailed down.




On Thu, May 7, 2020 at 5:13 PM Joerg Sonnenberger via cfe-dev <[hidden email]> wrote:
On Wed, May 06, 2020 at 03:40:43PM -0700, JF Bastien via cfe-dev wrote:
> I’d like to add volatile overloads to mem* builtins, and authored a patch: https://reviews.llvm.org/D79279 <https://reviews.llvm.org/D79279>

The major issue I have here is that it seems to seriously underspecify
what it is actually happening with the memory. Consider an
implementation of memset for example. It is entirely legal and
potentially even useful to check for length being >= two registers and
in that case implement it as
    write to [start, start+reglen)
    write to [start+len-reglen,start+len)
    write to (start+reglen)&(reglen-1) in reglen blocks until reaching
    the end

or variations to try to hide the unaligned head and tail case in the
main loop. This would violate the normal assumptions around volatile,
e.g. when using device memory.

Joerg
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] volatile mem* builtins

Fangrui Song via cfe-dev
On Fri, May 08, 2020 at 03:33:18PM -0400, James Y Knight wrote:
> E.g. you may want to memcpy data out of a device's buffer, and then do a
> volatile write to tell the device you're done with the buffer. Here, you
> want the memcpy to be volatile so it cannot be reordered across the "done"
> write. But, other than ensuring the correct ordering w.r.t. the subsequent
> volatile write, it's quite irrelevant exactly how the data is read from the
> buffer -- doing it in the most efficient way is desirable.

I was wondering whether it might make sense to make this a memory
ordering constraint instead or a barrier.

Joerg
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev