Smart Pointer Lifetime Optimizations

classic Classic list List threaded Threaded
32 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Smart Pointer Lifetime Optimizations

Vassil Vassilev via cfe-dev

Hello all,


I'm planning to do some work to add lifetime optimization passes for smart pointers and reference-counted objects. I'll use this email as a sort of proposal for what I hope to do. 


Scope


As I'm developing the pass, I'm trying to keep it general and create utilities that could work across multiple smart pointers. But, right now, I'm focussing on unique_ptr and applying specific ownership optimizations to unique_ptr only. 


unique_ptr Optimzations


The pass I'm currently developing adds a single, simple, optimization: constant fold the destructor based on ownership information. unique_ptr has a lot of ownership information communicated with reference semantics. When a unique_ptr is moved into another function, that function takes over ownership of the unique_ptr, and subsequent destructors can be eliminated (because they will be no-ops). Otherwise, branchless functions are often complicated after inlining unique_ptr's destructor so, this optimization should be fairly beneficial.


unique_ptr's reset and release methods both complicate this optimization a bit. Because they are also able to transfer and remove ownership, all unknown instructions must be ignored. However, in the future, knowledge of those methods might be able to make the pass more robust. 


With unique_ptr, it's difficult to prove liveness. So, it is hard to constant fold the destructor call to always be there. Maybe in the future, this would be possible, though (with sufficient analysis). 


Last, an optimization that I hope to do is lowering the unique_ptr to a raw pointer if all lifetime paths are known. I think removing this layer of abstraction would make it easier for other optimization passes to be successful. Eventually, we may even be able to specialize functions that used to take a unique_ptr to now take a raw pointer, if the argument's lifetime was also able to be fully analyzed. 


Lifetime Annotations


Right now, the pass relies on (pre-inlined) function calls to generate ownership information. Another approach would be to add ownership annotations, such as the lifetime intrinsics (i.e. llvm.lifetime.start). 


ARC Optimizations


There are a huge number of large and small ARC optimizations already in LLVM. For unique_ptr specifically, I'm not sure these are of any benefit because unique_ptr doesn't actually do any reference counting. But, later on, when I start working on generalizing this pass to support more smart pointers (specifically shared_ptr) I think the ARC optimization pass, and especially the utilities it contains, could be very beneficial. If anyone has experience with ARC optimizations, I'd love to hear your thoughts on extending them to other reference counted objects. 


trivial_abi and Hidden References


Arthur O'Dwyer made a good point, which is that a lot of these optimizations can be applied when with the trivial_abi attribute. However, given that's not a standard attribute and these optimizations only happen to work with trivial_abi (i.e., in a more complicated program, they may not continue to work). I think lifetime utilities and specific lifetime optimization passes are still beneficial (especially if they can be applied to other smart pointers in the future). 


Because all smart pointers have non-trivial destructors, they are always passed by hidden references. With unique_ptr, this is as simple as bit-casting the pointer member to unique_ptr, which would allow for it to be lowered to a single raw pointer instead of a stack-allocated object. Even without the trival_abi attribute, I think this is an optimization that could be done.


Results


Here's the unique_ptr pass I've been talking about: ⚙ D81288 Opt Smart pointer lifetime optimizations pass.

For reference, here are the before and after results:

Clang trunk (four branches): Compiler Explorer

With optimizations (branchless): https://pastebin.com/raw/mQ2r6pru


Best,

Zoe


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Smart Pointer Lifetime Optimizations

Vassil Vassilev via cfe-dev

On 5 Jun 2020, at 14:45, Zoe Carver via cfe-dev wrote:

Hello all,


I'm planning to do some work to add lifetime optimization passes for smart
pointers and reference-counted objects. I'll use this email as a sort of
proposal for what I hope to do.


*Scope*


As I'm developing the pass, I'm trying to keep it general and create
utilities that could work across multiple smart pointers. But, right now,
I'm focussing on unique_ptr and applying specific ownership optimizations to
unique_ptr only.


*unique_ptr Optimzations*


The pass I'm currently developing adds a single, simple, optimization:
constant fold the destructor based on ownership information. unique_ptr has
a lot of ownership information communicated with reference semantics. When a
unique_ptr is moved into another function, that function takes over
ownership of the unique_ptr, and subsequent destructors can be eliminated
(because they will be no-ops). Otherwise, branchless functions are often
complicated after inlining unique_ptr's destructor so, this optimization
should be fairly beneficial.


unique_ptr's reset and release methods both complicate this optimization a
bit. Because they are also able to transfer and remove ownership, all
unknown instructions must be ignored. However, in the future, knowledge of
those methods might be able to make the pass more robust.


With unique_ptr, it's difficult to prove liveness. So, it is hard to
constant fold the destructor call to always be there. Maybe in the future,
this would be possible, though (with sufficient analysis).


Last, an optimization that I hope to do is lowering the unique_ptr to a raw
pointer if all lifetime paths are known. I think removing this layer of
abstraction would make it easier for other optimization passes to be
successful. Eventually, we may even be able to specialize functions that
used to take a unique_ptr to now take a raw pointer, if the argument's
lifetime was also able to be fully analyzed.


*Lifetime Annotations*


Right now, the pass relies on (pre-inlined) function calls to generate
ownership information. Another approach would be to add ownership
annotations, such as the lifetime intrinsics (i.e. llvm.lifetime.start).


*ARC Optimizations*


There are a huge number of large and small ARC optimizations already in
LLVM. For unique_ptr specifically, I'm not sure these are of any benefit
because unique_ptr doesn't actually do any reference counting. But, later
on, when I start working on generalizing this pass to support more smart
pointers (specifically shared_ptr) I think the ARC optimization pass, and
especially the utilities it contains, could be very beneficial. If anyone
has experience with ARC optimizations, I'd love to hear your thoughts on
extending them to other reference counted objects.


*trivial_abi and Hidden References*


Arthur O'Dwyer made a good point, which is that a lot of these
optimizations can be applied when with the trivial_abi attribute. However,
given that's not a standard attribute and these optimizations only *happen*
to work with trivial_abi (i.e., in a more complicated program, they may not
continue to work). I think lifetime utilities and specific lifetime
optimization passes are still beneficial (especially if they can be applied
to other smart pointers in the future).


Because all smart pointers have non-trivial destructors, they are always
passed by hidden references. With unique_ptr, this is as simple as
bit-casting the pointer member to unique_ptr, which would allow for it to
be lowered to a single raw pointer instead of a stack-allocated object.
Even without the trival_abi attribute, I think this is an optimization that
could be done.


*Results*


Here's the unique_ptr pass I've been talking about: ⚙ D81288 Opt Smart
pointer lifetime optimizations pass. <https://reviews.llvm.org/D81288>

For reference, here are the before and after results:

Clang trunk (four branches): Compiler Explorer
<https://godbolt.org/z/bsJFty>

With optimizations (branchless): https://pastebin.com/raw/mQ2r6pru

Unfortunately, these are not legal optimizations for your test case:

  • guaranteed is permitted to escape a reference (or pointer) to the
    object it was passed. Tat references and pointers remain valid
    until the object goes out of scope.

  • The object can be mutated through that reference because the underlying
    object is not const. Being passed a const reference is not a
    semantic contract in C++.

  • Through a combination of the above, the call to owner may change
    the value of p, and so the caller may not rely on it still being
    in a trivially-destructible state after that call.

  • owner may leave the value of its parameter object in a
    non-trivially-destructible state, and under the Itanium C++ ABI, cleaning
    up that object is the caller’s responsibility. I agree that this is a
    bad rule for optimization purposes, but it’s the rule. This can only be
    optimized with a more global, interprocedural optimization that shifts
    responsibility to owner to destroy its argument.

John.


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Smart Pointer Lifetime Optimizations

Vassil Vassilev via cfe-dev
John,

Thanks, those are good points. I think we can still remove one of the destructors (which could also be done by a more powerful DSE+load propagation) but, you're right; one needs to stay.

> This can only be optimized with a more global, interprocedural optimization that shifts responsibility to owner to destroy its argument.

I'll think about implementing something like this, but I suspect any possible optimizations will already happen with inlining and analysis.

Thanks for the response,
Zoe

On Fri, Jun 5, 2020 at 1:09 PM John McCall <[hidden email]> wrote:

On 5 Jun 2020, at 14:45, Zoe Carver via cfe-dev wrote:

Hello all,


I'm planning to do some work to add lifetime optimization passes for smart
pointers and reference-counted objects. I'll use this email as a sort of
proposal for what I hope to do.


*Scope*


As I'm developing the pass, I'm trying to keep it general and create
utilities that could work across multiple smart pointers. But, right now,
I'm focussing on unique_ptr and applying specific ownership optimizations to
unique_ptr only.


*unique_ptr Optimzations*


The pass I'm currently developing adds a single, simple, optimization:
constant fold the destructor based on ownership information. unique_ptr has
a lot of ownership information communicated with reference semantics. When a
unique_ptr is moved into another function, that function takes over
ownership of the unique_ptr, and subsequent destructors can be eliminated
(because they will be no-ops). Otherwise, branchless functions are often
complicated after inlining unique_ptr's destructor so, this optimization
should be fairly beneficial.


unique_ptr's reset and release methods both complicate this optimization a
bit. Because they are also able to transfer and remove ownership, all
unknown instructions must be ignored. However, in the future, knowledge of
those methods might be able to make the pass more robust.


With unique_ptr, it's difficult to prove liveness. So, it is hard to
constant fold the destructor call to always be there. Maybe in the future,
this would be possible, though (with sufficient analysis).


Last, an optimization that I hope to do is lowering the unique_ptr to a raw
pointer if all lifetime paths are known. I think removing this layer of
abstraction would make it easier for other optimization passes to be
successful. Eventually, we may even be able to specialize functions that
used to take a unique_ptr to now take a raw pointer, if the argument's
lifetime was also able to be fully analyzed.


*Lifetime Annotations*


Right now, the pass relies on (pre-inlined) function calls to generate
ownership information. Another approach would be to add ownership
annotations, such as the lifetime intrinsics (i.e. llvm.lifetime.start).


*ARC Optimizations*


There are a huge number of large and small ARC optimizations already in
LLVM. For unique_ptr specifically, I'm not sure these are of any benefit
because unique_ptr doesn't actually do any reference counting. But, later
on, when I start working on generalizing this pass to support more smart
pointers (specifically shared_ptr) I think the ARC optimization pass, and
especially the utilities it contains, could be very beneficial. If anyone
has experience with ARC optimizations, I'd love to hear your thoughts on
extending them to other reference counted objects.


*trivial_abi and Hidden References*


Arthur O'Dwyer made a good point, which is that a lot of these
optimizations can be applied when with the trivial_abi attribute. However,
given that's not a standard attribute and these optimizations only *happen*
to work with trivial_abi (i.e., in a more complicated program, they may not
continue to work). I think lifetime utilities and specific lifetime
optimization passes are still beneficial (especially if they can be applied
to other smart pointers in the future).


Because all smart pointers have non-trivial destructors, they are always
passed by hidden references. With unique_ptr, this is as simple as
bit-casting the pointer member to unique_ptr, which would allow for it to
be lowered to a single raw pointer instead of a stack-allocated object.
Even without the trival_abi attribute, I think this is an optimization that
could be done.


*Results*


Here's the unique_ptr pass I've been talking about: ⚙ D81288 Opt Smart
pointer lifetime optimizations pass. <https://reviews.llvm.org/D81288>

For reference, here are the before and after results:

Clang trunk (four branches): Compiler Explorer
<https://godbolt.org/z/bsJFty>

With optimizations (branchless): https://pastebin.com/raw/mQ2r6pru

Unfortunately, these are not legal optimizations for your test case:

  • guaranteed is permitted to escape a reference (or pointer) to the
    object it was passed. Tat references and pointers remain valid
    until the object goes out of scope.

  • The object can be mutated through that reference because the underlying
    object is not const. Being passed a const reference is not a
    semantic contract in C++.

  • Through a combination of the above, the call to owner may change
    the value of p, and so the caller may not rely on it still being
    in a trivially-destructible state after that call.

  • owner may leave the value of its parameter object in a
    non-trivially-destructible state, and under the Itanium C++ ABI, cleaning
    up that object is the caller’s responsibility. I agree that this is a
    bad rule for optimization purposes, but it’s the rule. This can only be
    optimized with a more global, interprocedural optimization that shifts
    responsibility to owner to destroy its argument.

John.


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Smart Pointer Lifetime Optimizations

Vassil Vassilev via cfe-dev

On 6 Jun 2020, at 13:47, Zoe Carver wrote:

John,

Thanks, those are good points. I think we can still remove one of the
destructors (which could also be done by a more powerful DSE+load
propagation) but, you're right; one needs to stay.

Can you explain in more detail which destructor you think you can
eliminate?

This can only be optimized with a more global, interprocedural

optimization that shifts responsibility to owner to destroy its argument.

I'll think about implementing something like this, but I suspect any
possible optimizations will already happen with inlining and analysis.

Yeah. For the narrow case of std::unique_ptr, since its operations
are easily inlined and can be easily optimized after copy propagation,
there’s not much more that can be done at a high level.

Note that trivial_abi (if it could be adopted on std::unique_ptr)
also changes the ABI to make the callee responsible for destruction.
So as part of getting a more efficient low-level ABI, you also get a
more optimizable high-level one.

One idea I’ve personally been kicking around is some way to mark
declarations as having an “unstable ABI”: basically, a guarantee that
all the code that uses them will be compiled with a single toolchain,
and therefore a license for the implementation to alter the ABI however
it likes with any code that uses any of those declarations.

A type would be unstable if it was composed even partially from a
declaration marked unstable. So class Unstable would be unstable,
but so would const Unstable * — and, crucially, so would
std::unique_ptr<Unstable>. But for soundness reasons, this would
need to ignore type sugar (so no marking typedefs), and it wouldn’t
be able to automatically descend into fields.

There are a few ways that we could use that directly in the compiler.
The big restriction is that you’re not breaking ABI globally and so
you always need an unstable “contaminant” that permits using the
unstable ABI. For example, we can’t just change function ABIs
for all unstable functions because function pointers have to remain
compatible. On the other hand, programs aren’t allowed to call
function pointers under the wrong type, so if the function type is
unstable, we can change anything we want about its ABI.

(For functions specifically, there’s another option: we could emit
the functions with an unstable ABI and then introduce thunks that
adapt the calling convention when the address is taken. But that’s
a non-trivial code-size hit that we might have to do unconditionally.
It also can’t adapt a callee-destroy ABI into a caller-destroy one
without introducing an extra move, which isn’t necessarily semantically
allowed.)

Probably more importantly, though, we could allow unstable-ness to
be detected with a type trait, and that would allow the standard
library to take advantage of it. So std::unique_ptr<int> would
be stuck with the stable ABI, but std::unique_ptr<Unstable> could
switch to be trivial_abi.

That does leave the problem of actually doing the annotation.
Adding an attribute to every class is probably beyond what people
would accept. There are several ways to do mass annotation. Pragmas
are problematic because you don’t want to accidentally leave the
pragma on when you exit a file and then have it cover a system
include. We do have some pragmas that prevent file changes while
the pragma is active, which is a decent solution for that problem.
An alternative is to mark namespaces. That probably needs to be
lexical: that is, you wouldn’t be able to mark the entire clang
namespace, you would mark a specific namespace clang declaration
in a single header. But that’s still much more manageable, and
after all, the cost to missing an annotation is just a missed
optimization.

We could also implicitly make all anonymous-namespace declarations
unstable.

John.

Thanks for the response,
Zoe

On Fri, Jun 5, 2020 at 1:09 PM John McCall <[hidden email]> wrote:

On 5 Jun 2020, at 14:45, Zoe Carver via cfe-dev wrote:

Hello all,


I'm planning to do some work to add lifetime optimization passes for smart
pointers and reference-counted objects. I'll use this email as a sort of
proposal for what I hope to do.


*Scope*


As I'm developing the pass, I'm trying to keep it general and create
utilities that could work across multiple smart pointers. But, right now,
I'm focussing on unique_ptr and applying specific ownership optimizations
to
unique_ptr only.


*unique_ptr Optimzations*


The pass I'm currently developing adds a single, simple, optimization:
constant fold the destructor based on ownership information. unique_ptr has
a lot of ownership information communicated with reference semantics. When
a
unique_ptr is moved into another function, that function takes over
ownership of the unique_ptr, and subsequent destructors can be eliminated
(because they will be no-ops). Otherwise, branchless functions are often
complicated after inlining unique_ptr's destructor so, this optimization
should be fairly beneficial.


unique_ptr's reset and release methods both complicate this optimization a
bit. Because they are also able to transfer and remove ownership, all
unknown instructions must be ignored. However, in the future, knowledge of
those methods might be able to make the pass more robust.


With unique_ptr, it's difficult to prove liveness. So, it is hard to
constant fold the destructor call to always be there. Maybe in the future,
this would be possible, though (with sufficient analysis).


Last, an optimization that I hope to do is lowering the unique_ptr to a raw
pointer if all lifetime paths are known. I think removing this layer of
abstraction would make it easier for other optimization passes to be
successful. Eventually, we may even be able to specialize functions that
used to take a unique_ptr to now take a raw pointer, if the argument's
lifetime was also able to be fully analyzed.


*Lifetime Annotations*


Right now, the pass relies on (pre-inlined) function calls to generate
ownership information. Another approach would be to add ownership
annotations, such as the lifetime intrinsics (i.e. llvm.lifetime.start).


*ARC Optimizations*


There are a huge number of large and small ARC optimizations already in
LLVM. For unique_ptr specifically, I'm not sure these are of any benefit
because unique_ptr doesn't actually do any reference counting. But, later
on, when I start working on generalizing this pass to support more smart
pointers (specifically shared_ptr) I think the ARC optimization pass, and
especially the utilities it contains, could be very beneficial. If anyone
has experience with ARC optimizations, I'd love to hear your thoughts on
extending them to other reference counted objects.


*trivial_abi and Hidden References*


Arthur O'Dwyer made a good point, which is that a lot of these
optimizations can be applied when with the trivial_abi attribute. However,
given that's not a standard attribute and these optimizations only *happen*
to work with trivial_abi (i.e., in a more complicated program, they may not
continue to work). I think lifetime utilities and specific lifetime
optimization passes are still beneficial (especially if they can be applied
to other smart pointers in the future).


Because all smart pointers have non-trivial destructors, they are always
passed by hidden references. With unique_ptr, this is as simple as
bit-casting the pointer member to unique_ptr, which would allow for it to
be lowered to a single raw pointer instead of a stack-allocated object.
Even without the trival_abi attribute, I think this is an optimization that
could be done.


*Results*


Here's the unique_ptr pass I've been talking about: ⚙ D81288 Opt Smart
pointer lifetime optimizations pass. <https://reviews.llvm.org/D81288>

For reference, here are the before and after results:

Clang trunk (four branches): Compiler Explorer
<https://godbolt.org/z/bsJFty>

With optimizations (branchless): https://pastebin.com/raw/mQ2r6pru

Unfortunately, these are not legal optimizations for your test case:

-

guaranteed is permitted to escape a reference (or pointer) to the
object it was passed. Tat references and pointers remain valid
until the object goes out of scope.
-

The object can be mutated through that reference because the underlying
object is not const. Being passed a const reference is not a
semantic contract in C++.
-

Through a combination of the above, the call to owner may change
the value of p, and so the caller may not rely on it still being
in a trivially-destructible state after that call.
-

owner may leave the value of its parameter object in a
non-trivially-destructible state, and under the Itanium C++ ABI,
cleaning
up that object is the caller’s responsibility. I agree that this is a
bad rule for optimization purposes, but it’s the rule. This can only be
optimized with a more global, interprocedural optimization that shifts
responsibility to owner to destroy its argument.

John.


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Smart Pointer Lifetime Optimizations

Vassil Vassilev via cfe-dev

John, sorry for my delayed response.


> Can you explain in more detail which destructor you think you can eliminate?


In the Godbolt link in my original post, there are two unique_ptrs, %3 and %4. They are both passed to the move constructor (as the source and destination). In the move constructor, the source's held pointer is nulled out. If the next use of the pointer is the destructor (as is the case in the Godbolt example) then, the destructor will be a no-op. In the optimized code, it's a bit more difficult to see. But, the dead store is still there. Two instructions before the call to owner, null is stored into %8. Then, in the 3rd block (%17) that same pointer (%8) is loaded and compared against null (which is always true). 


> One idea I’ve personally been kicking around is some way to mark declarations as having an “unstable ABI”...


This is a super interesting idea. When I become more familiar with clang's internals, I'd be interested in helping to implement it.


> Probably more importantly, though, we could allow unstable-ness to be detected with a type trait, and that would allow the standard library to take advantage of it. 


We could actually do this for trivial_abi types too. If we added a builtin type trait to check if a type has the trivial_abi attribute, libc++ could conditionally give unique_ptr the trivial_abi attribute if its base type also had the attribute. Additionally, we could add a config macro that would do this globally when libc++ is in unstable ABI mode. 


Best,

Zoe


On Sat, Jun 6, 2020 at 2:07 PM John McCall <[hidden email]> wrote:

On 6 Jun 2020, at 13:47, Zoe Carver wrote:

John,

Thanks, those are good points. I think we can still remove one of the
destructors (which could also be done by a more powerful DSE+load
propagation) but, you're right; one needs to stay.

Can you explain in more detail which destructor you think you can
eliminate?

This can only be optimized with a more global, interprocedural

optimization that shifts responsibility to owner to destroy its argument.

I'll think about implementing something like this, but I suspect any
possible optimizations will already happen with inlining and analysis.

Yeah. For the narrow case of std::unique_ptr, since its operations
are easily inlined and can be easily optimized after copy propagation,
there’s not much more that can be done at a high level.

Note that trivial_abi (if it could be adopted on std::unique_ptr)
also changes the ABI to make the callee responsible for destruction.
So as part of getting a more efficient low-level ABI, you also get a
more optimizable high-level one.

One idea I’ve personally been kicking around is some way to mark
declarations as having an “unstable ABI”: basically, a guarantee that
all the code that uses them will be compiled with a single toolchain,
and therefore a license for the implementation to alter the ABI however
it likes with any code that uses any of those declarations.

A type would be unstable if it was composed even partially from a
declaration marked unstable. So class Unstable would be unstable,
but so would const Unstable * — and, crucially, so would
std::unique_ptr<Unstable>. But for soundness reasons, this would
need to ignore type sugar (so no marking typedefs), and it wouldn’t
be able to automatically descend into fields.

There are a few ways that we could use that directly in the compiler.
The big restriction is that you’re not breaking ABI globally and so
you always need an unstable “contaminant” that permits using the
unstable ABI. For example, we can’t just change function ABIs
for all unstable functions because function pointers have to remain
compatible. On the other hand, programs aren’t allowed to call
function pointers under the wrong type, so if the function type is
unstable, we can change anything we want about its ABI.

(For functions specifically, there’s another option: we could emit
the functions with an unstable ABI and then introduce thunks that
adapt the calling convention when the address is taken. But that’s
a non-trivial code-size hit that we might have to do unconditionally.
It also can’t adapt a callee-destroy ABI into a caller-destroy one
without introducing an extra move, which isn’t necessarily semantically
allowed.)

Probably more importantly, though, we could allow unstable-ness to
be detected with a type trait, and that would allow the standard
library to take advantage of it. So std::unique_ptr<int> would
be stuck with the stable ABI, but std::unique_ptr<Unstable> could
switch to be trivial_abi.

That does leave the problem of actually doing the annotation.
Adding an attribute to every class is probably beyond what people
would accept. There are several ways to do mass annotation. Pragmas
are problematic because you don’t want to accidentally leave the
pragma on when you exit a file and then have it cover a system
include. We do have some pragmas that prevent file changes while
the pragma is active, which is a decent solution for that problem.
An alternative is to mark namespaces. That probably needs to be
lexical: that is, you wouldn’t be able to mark the entire clang
namespace, you would mark a specific namespace clang declaration
in a single header. But that’s still much more manageable, and
after all, the cost to missing an annotation is just a missed
optimization.

We could also implicitly make all anonymous-namespace declarations
unstable.

John.

Thanks for the response,
Zoe

On Fri, Jun 5, 2020 at 1:09 PM John McCall <[hidden email]> wrote:

On 5 Jun 2020, at 14:45, Zoe Carver via cfe-dev wrote:

Hello all,


I'm planning to do some work to add lifetime optimization passes for smart
pointers and reference-counted objects. I'll use this email as a sort of
proposal for what I hope to do.


*Scope*


As I'm developing the pass, I'm trying to keep it general and create
utilities that could work across multiple smart pointers. But, right now,
I'm focussing on unique_ptr and applying specific ownership optimizations
to
unique_ptr only.


*unique_ptr Optimzations*


The pass I'm currently developing adds a single, simple, optimization:
constant fold the destructor based on ownership information. unique_ptr has
a lot of ownership information communicated with reference semantics. When
a
unique_ptr is moved into another function, that function takes over
ownership of the unique_ptr, and subsequent destructors can be eliminated
(because they will be no-ops). Otherwise, branchless functions are often
complicated after inlining unique_ptr's destructor so, this optimization
should be fairly beneficial.


unique_ptr's reset and release methods both complicate this optimization a
bit. Because they are also able to transfer and remove ownership, all
unknown instructions must be ignored. However, in the future, knowledge of
those methods might be able to make the pass more robust.


With unique_ptr, it's difficult to prove liveness. So, it is hard to
constant fold the destructor call to always be there. Maybe in the future,
this would be possible, though (with sufficient analysis).


Last, an optimization that I hope to do is lowering the unique_ptr to a raw
pointer if all lifetime paths are known. I think removing this layer of
abstraction would make it easier for other optimization passes to be
successful. Eventually, we may even be able to specialize functions that
used to take a unique_ptr to now take a raw pointer, if the argument's
lifetime was also able to be fully analyzed.


*Lifetime Annotations*


Right now, the pass relies on (pre-inlined) function calls to generate
ownership information. Another approach would be to add ownership
annotations, such as the lifetime intrinsics (i.e. llvm.lifetime.start).


*ARC Optimizations*


There are a huge number of large and small ARC optimizations already in
LLVM. For unique_ptr specifically, I'm not sure these are of any benefit
because unique_ptr doesn't actually do any reference counting. But, later
on, when I start working on generalizing this pass to support more smart
pointers (specifically shared_ptr) I think the ARC optimization pass, and
especially the utilities it contains, could be very beneficial. If anyone
has experience with ARC optimizations, I'd love to hear your thoughts on
extending them to other reference counted objects.


*trivial_abi and Hidden References*


Arthur O'Dwyer made a good point, which is that a lot of these
optimizations can be applied when with the trivial_abi attribute. However,
given that's not a standard attribute and these optimizations only *happen*
to work with trivial_abi (i.e., in a more complicated program, they may not
continue to work). I think lifetime utilities and specific lifetime
optimization passes are still beneficial (especially if they can be applied
to other smart pointers in the future).


Because all smart pointers have non-trivial destructors, they are always
passed by hidden references. With unique_ptr, this is as simple as
bit-casting the pointer member to unique_ptr, which would allow for it to
be lowered to a single raw pointer instead of a stack-allocated object.
Even without the trival_abi attribute, I think this is an optimization that
could be done.


*Results*


Here's the unique_ptr pass I've been talking about: ⚙ D81288 Opt Smart
pointer lifetime optimizations pass. <https://reviews.llvm.org/D81288>

For reference, here are the before and after results:

Clang trunk (four branches): Compiler Explorer
<https://godbolt.org/z/bsJFty>

With optimizations (branchless): https://pastebin.com/raw/mQ2r6pru

Unfortunately, these are not legal optimizations for your test case:

-

guaranteed is permitted to escape a reference (or pointer) to the
object it was passed. Tat references and pointers remain valid
until the object goes out of scope.
-

The object can be mutated through that reference because the underlying
object is not const. Being passed a const reference is not a
semantic contract in C++.
-

Through a combination of the above, the call to owner may change
the value of p, and so the caller may not rely on it still being
in a trivially-destructible state after that call.
-

owner may leave the value of its parameter object in a
non-trivially-destructible state, and under the Itanium C++ ABI,
cleaning
up that object is the caller’s responsibility. I agree that this is a
bad rule for optimization purposes, but it’s the rule. This can only be
optimized with a more global, interprocedural optimization that shifts
responsibility to owner to destroy its argument.

John.


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Smart Pointer Lifetime Optimizations

Vassil Vassilev via cfe-dev

On 7 Jun 2020, at 18:52, Zoe Carver wrote:

John, sorry for my delayed response.


*> Can you explain in more detail which destructor you think you can
eliminate?*


In the Godbolt link in my original post, there are two unique_ptrs, %3 and %
4. They are both passed to the move constructor (as the source and
destination). In the move constructor, the source's held pointer is nulled
out. If the next use of the pointer is the destructor (as is the case in
the Godbolt example) then, the destructor will be a no-op. In the optimized
code, it's a bit more difficult to see. But, the dead store is still there.
Two instructions before the call to owner, null is stored into %8. Then, in
the 3rd block (%17) that same pointer (%8) is loaded and compared against
null (which is always true).

Yes, the destructor for the source (the local variable, as opposed to the
temporary) is the one that you can’t eliminate without proving that a
reference to it isn’t escaped by the first call and then mutated during the
second.

You wouldn’t be the first person to be surprised by the result of this sort
of analysis, but I’m afraid it’s what we’re working with.

Unfortunately, there’s really no way to eliminate this one without either
interprocedural information or language changes. trivial_abi eliminates
the other one because it changes the convention for passing by value, but to
pass an “immutably borrowed” value in C++ we have to pass by reference, which
allows the reference to be escaped and accessed (and even mutated, if the
original object wasn’t declared const) as long as those accesses happen
before destruction.

*> Probably more importantly, though, we could allow unstable-ness to be
detected with a type trait, and that would allow the standard library to
take advantage of it. *


We could actually do this for trivial_abi types too. If we added a builtin
type trait to check if a type has the trivial_abi attribute, libc++ could
conditionally give unique_ptr the trivial_abi attribute if its base type
also had the attribute. Additionally, we could add a config macro that
would do this globally when libc++ is in unstable ABI mode.

Hmm. That doesn’t just fall out from any analysis I see. trivial_abi
is an existing, ABI-stable attribute, so changing the ABI of std::unique_ptr
for types that are already trivial_abi is just as much of an ABI break
as changing it in general would be. You could try to justify it by saying
that there just aren’t very many trivial_abi types yet, or that trivial_abi
is a vendor-specific attribute that’s unlikely to be used on a type with a
stable ABI because non-Clang implementations wouldn’t be able to compile
it compatibly, but those aren’t terribly convincing arguments to me.

John.

Best,

Zoe

On Sat, Jun 6, 2020 at 2:07 PM John McCall <[hidden email]> wrote:

On 6 Jun 2020, at 13:47, Zoe Carver wrote:

John,

Thanks, those are good points. I think we can still remove one of the
destructors (which could also be done by a more powerful DSE+load
propagation) but, you're right; one needs to stay.

Can you explain in more detail which destructor you think you can
eliminate?

This can only be optimized with a more global, interprocedural

optimization that shifts responsibility to owner to destroy its argument.

I'll think about implementing something like this, but I suspect any
possible optimizations will already happen with inlining and analysis.

Yeah. For the narrow case of std::unique_ptr, since its operations
are easily inlined and can be easily optimized after copy propagation,
there’s not much more that can be done at a high level.

Note that trivial_abi (if it could be adopted on std::unique_ptr)
also changes the ABI to make the callee responsible for destruction.
So as part of getting a more efficient low-level ABI, you also get a
more optimizable high-level one.

One idea I’ve personally been kicking around is some way to mark
declarations as having an “unstable ABI”: basically, a guarantee that
all the code that uses them will be compiled with a single toolchain,
and therefore a license for the implementation to alter the ABI however
it likes with any code that uses any of those declarations.

A type would be unstable if it was composed even partially from a
declaration marked unstable. So class Unstable would be unstable,
but so would const Unstable * — and, crucially, so would
std::unique_ptr<Unstable>. But for soundness reasons, this would
need to ignore type sugar (so no marking typedefs), and it wouldn’t
be able to automatically descend into fields.

There are a few ways that we could use that directly in the compiler.
The big restriction is that you’re not breaking ABI globally and so
you always need an unstable “contaminant” that permits using the
unstable ABI. For example, we can’t just change function ABIs
for all unstable functions because function pointers have to remain
compatible. On the other hand, programs aren’t allowed to call
function pointers under the wrong type, so if the function type is
unstable, we can change anything we want about its ABI.

(For functions specifically, there’s another option: we could emit
the functions with an unstable ABI and then introduce thunks that
adapt the calling convention when the address is taken. But that’s
a non-trivial code-size hit that we might have to do unconditionally.
It also can’t adapt a callee-destroy ABI into a caller-destroy one
without introducing an extra move, which isn’t necessarily semantically
allowed.)

Probably more importantly, though, we could allow unstable-ness to
be detected with a type trait, and that would allow the standard
library to take advantage of it. So std::unique_ptr<int> would
be stuck with the stable ABI, but std::unique_ptr<Unstable> could
switch to be trivial_abi.

That does leave the problem of actually doing the annotation.
Adding an attribute to every class is probably beyond what people
would accept. There are several ways to do mass annotation. Pragmas
are problematic because you don’t want to accidentally leave the
pragma on when you exit a file and then have it cover a system
include. We do have some pragmas that prevent file changes while
the pragma is active, which is a decent solution for that problem.
An alternative is to mark namespaces. That probably needs to be
lexical: that is, you wouldn’t be able to mark the entire clang
namespace, you would mark a specific namespace clang declaration
in a single header. But that’s still much more manageable, and
after all, the cost to missing an annotation is just a missed
optimization.

We could also implicitly make all anonymous-namespace declarations
unstable.

John.

Thanks for the response,
Zoe

On Fri, Jun 5, 2020 at 1:09 PM John McCall <[hidden email]> wrote:

On 5 Jun 2020, at 14:45, Zoe Carver via cfe-dev wrote:

Hello all,


I'm planning to do some work to add lifetime optimization passes for smart
pointers and reference-counted objects. I'll use this email as a sort of
proposal for what I hope to do.


*Scope*


As I'm developing the pass, I'm trying to keep it general and create
utilities that could work across multiple smart pointers. But, right now,
I'm focussing on unique_ptr and applying specific ownership optimizations
to
unique_ptr only.


*unique_ptr Optimzations*


The pass I'm currently developing adds a single, simple, optimization:
constant fold the destructor based on ownership information. unique_ptr has
a lot of ownership information communicated with reference semantics. When
a
unique_ptr is moved into another function, that function takes over
ownership of the unique_ptr, and subsequent destructors can be eliminated
(because they will be no-ops). Otherwise, branchless functions are often
complicated after inlining unique_ptr's destructor so, this optimization
should be fairly beneficial.


unique_ptr's reset and release methods both complicate this optimization a
bit. Because they are also able to transfer and remove ownership, all
unknown instructions must be ignored. However, in the future, knowledge of
those methods might be able to make the pass more robust.


With unique_ptr, it's difficult to prove liveness. So, it is hard to
constant fold the destructor call to always be there. Maybe in the future,
this would be possible, though (with sufficient analysis).


Last, an optimization that I hope to do is lowering the unique_ptr to a raw
pointer if all lifetime paths are known. I think removing this layer of
abstraction would make it easier for other optimization passes to be
successful. Eventually, we may even be able to specialize functions that
used to take a unique_ptr to now take a raw pointer, if the argument's
lifetime was also able to be fully analyzed.


*Lifetime Annotations*


Right now, the pass relies on (pre-inlined) function calls to generate
ownership information. Another approach would be to add ownership
annotations, such as the lifetime intrinsics (i.e. llvm.lifetime.start).


*ARC Optimizations*


There are a huge number of large and small ARC optimizations already in
LLVM. For unique_ptr specifically, I'm not sure these are of any benefit
because unique_ptr doesn't actually do any reference counting. But, later
on, when I start working on generalizing this pass to support more smart
pointers (specifically shared_ptr) I think the ARC optimization pass, and
especially the utilities it contains, could be very beneficial. If anyone
has experience with ARC optimizations, I'd love to hear your thoughts on
extending them to other reference counted objects.


*trivial_abi and Hidden References*


Arthur O'Dwyer made a good point, which is that a lot of these
optimizations can be applied when with the trivial_abi attribute. However,
given that's not a standard attribute and these optimizations only *happen*
to work with trivial_abi (i.e., in a more complicated program, they may not
continue to work). I think lifetime utilities and specific lifetime
optimization passes are still beneficial (especially if they can be applied
to other smart pointers in the future).


Because all smart pointers have non-trivial destructors, they are always
passed by hidden references. With unique_ptr, this is as simple as
bit-casting the pointer member to unique_ptr, which would allow for it to
be lowered to a single raw pointer instead of a stack-allocated object.
Even without the trival_abi attribute, I think this is an optimization that
could be done.


*Results*


Here's the unique_ptr pass I've been talking about: ⚙ D81288 Opt Smart
pointer lifetime optimizations pass. <https://reviews.llvm.org/D81288>

For reference, here are the before and after results:

Clang trunk (four branches): Compiler Explorer
<https://godbolt.org/z/bsJFty>

With optimizations (branchless): https://pastebin.com/raw/mQ2r6pru

Unfortunately, these are not legal optimizations for your test case:

-

guaranteed is permitted to escape a reference (or pointer) to the
object it was passed. Tat references and pointers remain valid
until the object goes out of scope.
-

The object can be mutated through that reference because the underlying
object is not const. Being passed a const reference is not a
semantic contract in C++.
-

Through a combination of the above, the call to owner may change
the value of p, and so the caller may not rely on it still being
in a trivially-destructible state after that call.
-

owner may leave the value of its parameter object in a
non-trivially-destructible state, and under the Itanium C++ ABI,
cleaning
up that object is the caller’s responsibility. I agree that this is a
bad rule for optimization purposes, but it’s the rule. This can only be
optimized with a more global, interprocedural optimization that shifts
responsibility to owner to destroy its argument.

John.


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Smart Pointer Lifetime Optimizations

Vassil Vassilev via cfe-dev
On Mon, 8 Jun 2020 at 00:22, John McCall via cfe-dev <[hidden email]> wrote:

On 7 Jun 2020, at 18:52, Zoe Carver wrote:

John, sorry for my delayed response.


*> Can you explain in more detail which destructor you think you can
eliminate?*


In the Godbolt link in my original post, there are two unique_ptrs, %3 and %
4. They are both passed to the move constructor (as the source and
destination). In the move constructor, the source's held pointer is nulled
out. If the next use of the pointer is the destructor (as is the case in
the Godbolt example) then, the destructor will be a no-op. In the optimized
code, it's a bit more difficult to see. But, the dead store is still there.
Two instructions before the call to owner, null is stored into %8. Then, in
the 3rd block (%17) that same pointer (%8) is loaded and compared against
null (which is always true).

Yes, the destructor for the source (the local variable, as opposed to the
temporary) is the one that you can’t eliminate without proving that a
reference to it isn’t escaped by the first call and then mutated during the
second.


Here, both destructors run after the call to 'owner' returns, and they destroy two different objects. You can't eliminate either.

You wouldn’t be the first person to be surprised by the result of this sort
of analysis, but I’m afraid it’s what we’re working with.

Unfortunately, there’s really no way to eliminate this one without either
interprocedural information or language changes. trivial_abi eliminates
the other one because it changes the convention for passing by value, but to
pass an “immutably borrowed” value in C++ we have to pass by reference, which
allows the reference to be escaped and accessed (and even mutated, if the
original object wasn’t declared const) as long as those accesses happen
before destruction.

Perhaps we should expose LLVM's nocapture attribute to the source level? 

*> Probably more importantly, though, we could allow unstable-ness to be
detected with a type trait, and that would allow the standard library to
take advantage of it. *


We could actually do this for trivial_abi types too. If we added a builtin
type trait to check if a type has the trivial_abi attribute, libc++ could
conditionally give unique_ptr the trivial_abi attribute if its base type
also had the attribute. Additionally, we could add a config macro that
would do this globally when libc++ is in unstable ABI mode.

Hmm. That doesn’t just fall out from any analysis I see. trivial_abi
is an existing, ABI-stable attribute, so changing the ABI of std::unique_ptr
for types that are already trivial_abi is just as much of an ABI break
as changing it in general would be. You could try to justify it by saying
that there just aren’t very many trivial_abi types yet, or that trivial_abi
is a vendor-specific attribute that’s unlikely to be used on a type with a
stable ABI because non-Clang implementations wouldn’t be able to compile
it compatibly, but those aren’t terribly convincing arguments to me.

I guess I should finish https://reviews.llvm.org/D63748 at some point. (Though I think we probably shouldn't enable it in libc++ unstable ABI configurations by default, since it also changes observable program semantics due to altering destruction order, and is arguably non-conforming for the same reason.)

John.

Best,

Zoe

On Sat, Jun 6, 2020 at 2:07 PM John McCall <[hidden email]> wrote:

On 6 Jun 2020, at 13:47, Zoe Carver wrote:

John,

Thanks, those are good points. I think we can still remove one of the
destructors (which could also be done by a more powerful DSE+load
propagation) but, you're right; one needs to stay.

Can you explain in more detail which destructor you think you can
eliminate?

This can only be optimized with a more global, interprocedural

optimization that shifts responsibility to owner to destroy its argument.

I'll think about implementing something like this, but I suspect any
possible optimizations will already happen with inlining and analysis.

Yeah. For the narrow case of std::unique_ptr, since its operations
are easily inlined and can be easily optimized after copy propagation,
there’s not much more that can be done at a high level.

Note that trivial_abi (if it could be adopted on std::unique_ptr)
also changes the ABI to make the callee responsible for destruction.
So as part of getting a more efficient low-level ABI, you also get a
more optimizable high-level one.

One idea I’ve personally been kicking around is some way to mark
declarations as having an “unstable ABI”: basically, a guarantee that
all the code that uses them will be compiled with a single toolchain,
and therefore a license for the implementation to alter the ABI however
it likes with any code that uses any of those declarations.

A type would be unstable if it was composed even partially from a
declaration marked unstable. So class Unstable would be unstable,
but so would const Unstable * — and, crucially, so would
std::unique_ptr<Unstable>. But for soundness reasons, this would
need to ignore type sugar (so no marking typedefs), and it wouldn’t
be able to automatically descend into fields.

There are a few ways that we could use that directly in the compiler.
The big restriction is that you’re not breaking ABI globally and so
you always need an unstable “contaminant” that permits using the
unstable ABI. For example, we can’t just change function ABIs
for all unstable functions because function pointers have to remain
compatible. On the other hand, programs aren’t allowed to call
function pointers under the wrong type, so if the function type is
unstable, we can change anything we want about its ABI.

(For functions specifically, there’s another option: we could emit
the functions with an unstable ABI and then introduce thunks that
adapt the calling convention when the address is taken. But that’s
a non-trivial code-size hit that we might have to do unconditionally.
It also can’t adapt a callee-destroy ABI into a caller-destroy one
without introducing an extra move, which isn’t necessarily semantically
allowed.)

Probably more importantly, though, we could allow unstable-ness to
be detected with a type trait, and that would allow the standard
library to take advantage of it. So std::unique_ptr<int> would
be stuck with the stable ABI, but std::unique_ptr<Unstable> could
switch to be trivial_abi.

That does leave the problem of actually doing the annotation.
Adding an attribute to every class is probably beyond what people
would accept. There are several ways to do mass annotation. Pragmas
are problematic because you don’t want to accidentally leave the
pragma on when you exit a file and then have it cover a system
include. We do have some pragmas that prevent file changes while
the pragma is active, which is a decent solution for that problem.
An alternative is to mark namespaces. That probably needs to be
lexical: that is, you wouldn’t be able to mark the entire clang
namespace, you would mark a specific namespace clang declaration
in a single header. But that’s still much more manageable, and
after all, the cost to missing an annotation is just a missed
optimization.

We could also implicitly make all anonymous-namespace declarations
unstable.

John.

Thanks for the response,
Zoe

On Fri, Jun 5, 2020 at 1:09 PM John McCall <[hidden email]> wrote:

On 5 Jun 2020, at 14:45, Zoe Carver via cfe-dev wrote:

Hello all,


I'm planning to do some work to add lifetime optimization passes for smart
pointers and reference-counted objects. I'll use this email as a sort of
proposal for what I hope to do.


*Scope*


As I'm developing the pass, I'm trying to keep it general and create
utilities that could work across multiple smart pointers. But, right now,
I'm focussing on unique_ptr and applying specific ownership optimizations
to
unique_ptr only.


*unique_ptr Optimzations*


The pass I'm currently developing adds a single, simple, optimization:
constant fold the destructor based on ownership information. unique_ptr has
a lot of ownership information communicated with reference semantics. When
a
unique_ptr is moved into another function, that function takes over
ownership of the unique_ptr, and subsequent destructors can be eliminated
(because they will be no-ops). Otherwise, branchless functions are often
complicated after inlining unique_ptr's destructor so, this optimization
should be fairly beneficial.


unique_ptr's reset and release methods both complicate this optimization a
bit. Because they are also able to transfer and remove ownership, all
unknown instructions must be ignored. However, in the future, knowledge of
those methods might be able to make the pass more robust.


With unique_ptr, it's difficult to prove liveness. So, it is hard to
constant fold the destructor call to always be there. Maybe in the future,
this would be possible, though (with sufficient analysis).


Last, an optimization that I hope to do is lowering the unique_ptr to a raw
pointer if all lifetime paths are known. I think removing this layer of
abstraction would make it easier for other optimization passes to be
successful. Eventually, we may even be able to specialize functions that
used to take a unique_ptr to now take a raw pointer, if the argument's
lifetime was also able to be fully analyzed.


*Lifetime Annotations*


Right now, the pass relies on (pre-inlined) function calls to generate
ownership information. Another approach would be to add ownership
annotations, such as the lifetime intrinsics (i.e. llvm.lifetime.start).


*ARC Optimizations*


There are a huge number of large and small ARC optimizations already in
LLVM. For unique_ptr specifically, I'm not sure these are of any benefit
because unique_ptr doesn't actually do any reference counting. But, later
on, when I start working on generalizing this pass to support more smart
pointers (specifically shared_ptr) I think the ARC optimization pass, and
especially the utilities it contains, could be very beneficial. If anyone
has experience with ARC optimizations, I'd love to hear your thoughts on
extending them to other reference counted objects.


*trivial_abi and Hidden References*


Arthur O'Dwyer made a good point, which is that a lot of these
optimizations can be applied when with the trivial_abi attribute. However,
given that's not a standard attribute and these optimizations only *happen*
to work with trivial_abi (i.e., in a more complicated program, they may not
continue to work). I think lifetime utilities and specific lifetime
optimization passes are still beneficial (especially if they can be applied
to other smart pointers in the future).


Because all smart pointers have non-trivial destructors, they are always
passed by hidden references. With unique_ptr, this is as simple as
bit-casting the pointer member to unique_ptr, which would allow for it to
be lowered to a single raw pointer instead of a stack-allocated object.
Even without the trival_abi attribute, I think this is an optimization that
could be done.


*Results*


Here's the unique_ptr pass I've been talking about: ⚙ D81288 Opt Smart
pointer lifetime optimizations pass. <https://reviews.llvm.org/D81288>

For reference, here are the before and after results:

Clang trunk (four branches): Compiler Explorer
<https://godbolt.org/z/bsJFty>

With optimizations (branchless): https://pastebin.com/raw/mQ2r6pru

Unfortunately, these are not legal optimizations for your test case:

-

guaranteed is permitted to escape a reference (or pointer) to the
object it was passed. Tat references and pointers remain valid
until the object goes out of scope.
-

The object can be mutated through that reference because the underlying
object is not const. Being passed a const reference is not a
semantic contract in C++.
-

Through a combination of the above, the call to owner may change
the value of p, and so the caller may not rely on it still being
in a trivially-destructible state after that call.
-

owner may leave the value of its parameter object in a
non-trivially-destructible state, and under the Itanium C++ ABI,
cleaning
up that object is the caller’s responsibility. I agree that this is a
bad rule for optimization purposes, but it’s the rule. This can only be
optimized with a more global, interprocedural optimization that shifts
responsibility to owner to destroy its argument.

John.

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Smart Pointer Lifetime Optimizations

Vassil Vassilev via cfe-dev
Richard and John, thanks for those thoughts and comments.

> Yes, the destructor for the source (the local variable, as opposed to the temporary) is the one that you can’t eliminate without proving that a reference to it isn’t escaped by the first call and then mutated during the second.

and

> Here, both destructors run after the call to 'owner' returns, and they destroy two different objects. You can't eliminate either.

I get it now. The link really helped, thanks, Richard. 

> Perhaps we should expose LLVM's nocapture attribute to the source level?

This would be very interesting and probably not too difficult to do. I can add it to my backlog of things to investigate. 

> trivial_abi is an existing, ABI-stable attribute, so changing the ABI of std::unique_ptr for types that are already trivial_abi is just as much of an ABI break as changing it in general would be.

We could make it unstable-ABI only. Or add another config flag to enable it. Looks like Richard's patch takes this a step further anyway, though.

On Mon, Jun 8, 2020 at 6:14 PM Richard Smith <[hidden email]> wrote:
On Mon, 8 Jun 2020 at 00:22, John McCall via cfe-dev <[hidden email]> wrote:

On 7 Jun 2020, at 18:52, Zoe Carver wrote:

John, sorry for my delayed response.


*> Can you explain in more detail which destructor you think you can
eliminate?*


In the Godbolt link in my original post, there are two unique_ptrs, %3 and %
4. They are both passed to the move constructor (as the source and
destination). In the move constructor, the source's held pointer is nulled
out. If the next use of the pointer is the destructor (as is the case in
the Godbolt example) then, the destructor will be a no-op. In the optimized
code, it's a bit more difficult to see. But, the dead store is still there.
Two instructions before the call to owner, null is stored into %8. Then, in
the 3rd block (%17) that same pointer (%8) is loaded and compared against
null (which is always true).

Yes, the destructor for the source (the local variable, as opposed to the
temporary) is the one that you can’t eliminate without proving that a
reference to it isn’t escaped by the first call and then mutated during the
second.


Here, both destructors run after the call to 'owner' returns, and they destroy two different objects. You can't eliminate either.

You wouldn’t be the first person to be surprised by the result of this sort
of analysis, but I’m afraid it’s what we’re working with.

Unfortunately, there’s really no way to eliminate this one without either
interprocedural information or language changes. trivial_abi eliminates
the other one because it changes the convention for passing by value, but to
pass an “immutably borrowed” value in C++ we have to pass by reference, which
allows the reference to be escaped and accessed (and even mutated, if the
original object wasn’t declared const) as long as those accesses happen
before destruction.

Perhaps we should expose LLVM's nocapture attribute to the source level? 

*> Probably more importantly, though, we could allow unstable-ness to be
detected with a type trait, and that would allow the standard library to
take advantage of it. *


We could actually do this for trivial_abi types too. If we added a builtin
type trait to check if a type has the trivial_abi attribute, libc++ could
conditionally give unique_ptr the trivial_abi attribute if its base type
also had the attribute. Additionally, we could add a config macro that
would do this globally when libc++ is in unstable ABI mode.

Hmm. That doesn’t just fall out from any analysis I see. trivial_abi
is an existing, ABI-stable attribute, so changing the ABI of std::unique_ptr
for types that are already trivial_abi is just as much of an ABI break
as changing it in general would be. You could try to justify it by saying
that there just aren’t very many trivial_abi types yet, or that trivial_abi
is a vendor-specific attribute that’s unlikely to be used on a type with a
stable ABI because non-Clang implementations wouldn’t be able to compile
it compatibly, but those aren’t terribly convincing arguments to me.

I guess I should finish https://reviews.llvm.org/D63748 at some point. (Though I think we probably shouldn't enable it in libc++ unstable ABI configurations by default, since it also changes observable program semantics due to altering destruction order, and is arguably non-conforming for the same reason.)

John.

Best,

Zoe

On Sat, Jun 6, 2020 at 2:07 PM John McCall <[hidden email]> wrote:

On 6 Jun 2020, at 13:47, Zoe Carver wrote:

John,

Thanks, those are good points. I think we can still remove one of the
destructors (which could also be done by a more powerful DSE+load
propagation) but, you're right; one needs to stay.

Can you explain in more detail which destructor you think you can
eliminate?

This can only be optimized with a more global, interprocedural

optimization that shifts responsibility to owner to destroy its argument.

I'll think about implementing something like this, but I suspect any
possible optimizations will already happen with inlining and analysis.

Yeah. For the narrow case of std::unique_ptr, since its operations
are easily inlined and can be easily optimized after copy propagation,
there’s not much more that can be done at a high level.

Note that trivial_abi (if it could be adopted on std::unique_ptr)
also changes the ABI to make the callee responsible for destruction.
So as part of getting a more efficient low-level ABI, you also get a
more optimizable high-level one.

One idea I’ve personally been kicking around is some way to mark
declarations as having an “unstable ABI”: basically, a guarantee that
all the code that uses them will be compiled with a single toolchain,
and therefore a license for the implementation to alter the ABI however
it likes with any code that uses any of those declarations.

A type would be unstable if it was composed even partially from a
declaration marked unstable. So class Unstable would be unstable,
but so would const Unstable * — and, crucially, so would
std::unique_ptr<Unstable>. But for soundness reasons, this would
need to ignore type sugar (so no marking typedefs), and it wouldn’t
be able to automatically descend into fields.

There are a few ways that we could use that directly in the compiler.
The big restriction is that you’re not breaking ABI globally and so
you always need an unstable “contaminant” that permits using the
unstable ABI. For example, we can’t just change function ABIs
for all unstable functions because function pointers have to remain
compatible. On the other hand, programs aren’t allowed to call
function pointers under the wrong type, so if the function type is
unstable, we can change anything we want about its ABI.

(For functions specifically, there’s another option: we could emit
the functions with an unstable ABI and then introduce thunks that
adapt the calling convention when the address is taken. But that’s
a non-trivial code-size hit that we might have to do unconditionally.
It also can’t adapt a callee-destroy ABI into a caller-destroy one
without introducing an extra move, which isn’t necessarily semantically
allowed.)

Probably more importantly, though, we could allow unstable-ness to
be detected with a type trait, and that would allow the standard
library to take advantage of it. So std::unique_ptr<int> would
be stuck with the stable ABI, but std::unique_ptr<Unstable> could
switch to be trivial_abi.

That does leave the problem of actually doing the annotation.
Adding an attribute to every class is probably beyond what people
would accept. There are several ways to do mass annotation. Pragmas
are problematic because you don’t want to accidentally leave the
pragma on when you exit a file and then have it cover a system
include. We do have some pragmas that prevent file changes while
the pragma is active, which is a decent solution for that problem.
An alternative is to mark namespaces. That probably needs to be
lexical: that is, you wouldn’t be able to mark the entire clang
namespace, you would mark a specific namespace clang declaration
in a single header. But that’s still much more manageable, and
after all, the cost to missing an annotation is just a missed
optimization.

We could also implicitly make all anonymous-namespace declarations
unstable.

John.

Thanks for the response,
Zoe

On Fri, Jun 5, 2020 at 1:09 PM John McCall <[hidden email]> wrote:

On 5 Jun 2020, at 14:45, Zoe Carver via cfe-dev wrote:

Hello all,


I'm planning to do some work to add lifetime optimization passes for smart
pointers and reference-counted objects. I'll use this email as a sort of
proposal for what I hope to do.


*Scope*


As I'm developing the pass, I'm trying to keep it general and create
utilities that could work across multiple smart pointers. But, right now,
I'm focussing on unique_ptr and applying specific ownership optimizations
to
unique_ptr only.


*unique_ptr Optimzations*


The pass I'm currently developing adds a single, simple, optimization:
constant fold the destructor based on ownership information. unique_ptr has
a lot of ownership information communicated with reference semantics. When
a
unique_ptr is moved into another function, that function takes over
ownership of the unique_ptr, and subsequent destructors can be eliminated
(because they will be no-ops). Otherwise, branchless functions are often
complicated after inlining unique_ptr's destructor so, this optimization
should be fairly beneficial.


unique_ptr's reset and release methods both complicate this optimization a
bit. Because they are also able to transfer and remove ownership, all
unknown instructions must be ignored. However, in the future, knowledge of
those methods might be able to make the pass more robust.


With unique_ptr, it's difficult to prove liveness. So, it is hard to
constant fold the destructor call to always be there. Maybe in the future,
this would be possible, though (with sufficient analysis).


Last, an optimization that I hope to do is lowering the unique_ptr to a raw
pointer if all lifetime paths are known. I think removing this layer of
abstraction would make it easier for other optimization passes to be
successful. Eventually, we may even be able to specialize functions that
used to take a unique_ptr to now take a raw pointer, if the argument's
lifetime was also able to be fully analyzed.


*Lifetime Annotations*


Right now, the pass relies on (pre-inlined) function calls to generate
ownership information. Another approach would be to add ownership
annotations, such as the lifetime intrinsics (i.e. llvm.lifetime.start).


*ARC Optimizations*


There are a huge number of large and small ARC optimizations already in
LLVM. For unique_ptr specifically, I'm not sure these are of any benefit
because unique_ptr doesn't actually do any reference counting. But, later
on, when I start working on generalizing this pass to support more smart
pointers (specifically shared_ptr) I think the ARC optimization pass, and
especially the utilities it contains, could be very beneficial. If anyone
has experience with ARC optimizations, I'd love to hear your thoughts on
extending them to other reference counted objects.


*trivial_abi and Hidden References*


Arthur O'Dwyer made a good point, which is that a lot of these
optimizations can be applied when with the trivial_abi attribute. However,
given that's not a standard attribute and these optimizations only *happen*
to work with trivial_abi (i.e., in a more complicated program, they may not
continue to work). I think lifetime utilities and specific lifetime
optimization passes are still beneficial (especially if they can be applied
to other smart pointers in the future).


Because all smart pointers have non-trivial destructors, they are always
passed by hidden references. With unique_ptr, this is as simple as
bit-casting the pointer member to unique_ptr, which would allow for it to
be lowered to a single raw pointer instead of a stack-allocated object.
Even without the trival_abi attribute, I think this is an optimization that
could be done.


*Results*


Here's the unique_ptr pass I've been talking about: ⚙ D81288 Opt Smart
pointer lifetime optimizations pass. <https://reviews.llvm.org/D81288>

For reference, here are the before and after results:

Clang trunk (four branches): Compiler Explorer
<https://godbolt.org/z/bsJFty>

With optimizations (branchless): https://pastebin.com/raw/mQ2r6pru

Unfortunately, these are not legal optimizations for your test case:

-

guaranteed is permitted to escape a reference (or pointer) to the
object it was passed. Tat references and pointers remain valid
until the object goes out of scope.
-

The object can be mutated through that reference because the underlying
object is not const. Being passed a const reference is not a
semantic contract in C++.
-

Through a combination of the above, the call to owner may change
the value of p, and so the caller may not rely on it still being
in a trivially-destructible state after that call.
-

owner may leave the value of its parameter object in a
non-trivially-destructible state, and under the Itanium C++ ABI,
cleaning
up that object is the caller’s responsibility. I agree that this is a
bad rule for optimization purposes, but it’s the rule. This can only be
optimized with a more global, interprocedural optimization that shifts
responsibility to owner to destroy its argument.

John.

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Smart Pointer Lifetime Optimizations

Vassil Vassilev via cfe-dev
In reply to this post by Vassil Vassilev via cfe-dev

On 8 Jun 2020, at 21:13, Richard Smith wrote:

On Mon, 8 Jun 2020 at 00:22, John McCall via cfe-dev <[hidden email]>
wrote:

You wouldn’t be the first person to be surprised by the result of this sort
of analysis, but I’m afraid it’s what we’re working with.

Unfortunately, there’s really no way to eliminate this one without either
interprocedural information or language changes. trivial_abi eliminates
the other one because it changes the convention for passing by value, but
to
pass an “immutably borrowed” value in C++ we have to pass by reference,
which
allows the reference to be escaped and accessed (and even mutated, if the
original object wasn’t declared const) as long as those accesses happen
before destruction.

Perhaps we should expose LLVM's nocapture attribute to the source level?

I think we have with __attribute__((noescape)). Of course, adopting it
systematically would be hugely invasive.

*> Probably more importantly, though, we could allow unstable-ness to be
detected with a type trait, and that would allow the standard library to
take advantage of it. *


We could actually do this for trivial_abi types too. If we added a builtin
type trait to check if a type has the trivial_abi attribute, libc++ could
conditionally give unique_ptr the trivial_abi attribute if its base type
also had the attribute. Additionally, we could add a config macro that
would do this globally when libc++ is in unstable ABI mode.

Hmm. That doesn’t just fall out from any analysis I see. trivial_abi
is an existing, ABI-stable attribute, so changing the ABI of
std::unique_ptr
for types that are already trivial_abi is just as much of an ABI break
as changing it in general would be. You could try to justify it by saying
that there just aren’t very many trivial_abi types yet, or that
trivial_abi
is a vendor-specific attribute that’s unlikely to be used on a type with a
stable ABI because non-Clang implementations wouldn’t be able to compile
it compatibly, but those aren’t terribly convincing arguments to me.

I guess I should finish https://reviews.llvm.org/D63748 at some point.
(Though I think we probably shouldn't enable it in libc++ unstable ABI
configurations by default, since it also changes observable program
semantics due to altering destruction order, and is arguably non-conforming
for the same reason.)

It definitely changes observable semantics, but it’s not obviously
non-conforming; [expr.call]p7 gives us a lot of flexibility here:

It is implementation-defined whether the lifetime of a parameter
ends when the function in which it is defined returns or at the
end of the enclosing full-expression.

And note that MSVC has traditionally destroyed parameters in the callee.
IIRC the standard actually originally specified that parameters were
always destroyed at the end of the call and only changed it due to
Itanium doing otherwise.

Now, it’s possible that the copy-elision rules have an unfortunate
impact here. IIRC an object initialized with an elided copy is supposed
to take on the longer of the two natural lifetimes. Does that mean that
if you have a parameter initialized by an elided copy from a temporary,
the parameter needs to live until the end of the calling full-expression
like the temporary would have? If so, you either wouldn’t be able to
use a callee-destroy ABI or you wouldn’t be allowed to elide copies
into parameters, and the latter seems unacceptable.

Even if it’s conforming, I’m sure there are bugs about e.g. the proper
ordering with things like function-try-blocks and exception specifications.

John.

John.

Best,

Zoe

On Sat, Jun 6, 2020 at 2:07 PM John McCall <[hidden email]> wrote:

On 6 Jun 2020, at 13:47, Zoe Carver wrote:

John,

Thanks, those are good points. I think we can still remove one of the
destructors (which could also be done by a more powerful DSE+load
propagation) but, you're right; one needs to stay.

Can you explain in more detail which destructor you think you can
eliminate?

This can only be optimized with a more global, interprocedural

optimization that shifts responsibility to owner to destroy its argument.

I'll think about implementing something like this, but I suspect any
possible optimizations will already happen with inlining and analysis.

Yeah. For the narrow case of std::unique_ptr, since its operations
are easily inlined and can be easily optimized after copy propagation,
there’s not much more that can be done at a high level.

Note that trivial_abi (if it could be adopted on std::unique_ptr)
also changes the ABI to make the callee responsible for destruction.
So as part of getting a more efficient low-level ABI, you also get a
more optimizable high-level one.

One idea I’ve personally been kicking around is some way to mark
declarations as having an “unstable ABI”: basically, a guarantee that
all the code that uses them will be compiled with a single toolchain,
and therefore a license for the implementation to alter the ABI however
it likes with any code that uses any of those declarations.

A type would be unstable if it was composed even partially from a
declaration marked unstable. So class Unstable would be unstable,
but so would const Unstable * — and, crucially, so would
std::unique_ptr<Unstable>. But for soundness reasons, this would
need to ignore type sugar (so no marking typedefs), and it wouldn’t
be able to automatically descend into fields.

There are a few ways that we could use that directly in the compiler.
The big restriction is that you’re not breaking ABI globally and so
you always need an unstable “contaminant” that permits using the
unstable ABI. For example, we can’t just change function ABIs
for all unstable functions because function pointers have to remain
compatible. On the other hand, programs aren’t allowed to call
function pointers under the wrong type, so if the function type is
unstable, we can change anything we want about its ABI.

(For functions specifically, there’s another option: we could emit
the functions with an unstable ABI and then introduce thunks that
adapt the calling convention when the address is taken. But that’s
a non-trivial code-size hit that we might have to do unconditionally.
It also can’t adapt a callee-destroy ABI into a caller-destroy one
without introducing an extra move, which isn’t necessarily semantically
allowed.)

Probably more importantly, though, we could allow unstable-ness to
be detected with a type trait, and that would allow the standard
library to take advantage of it. So std::unique_ptr<int> would
be stuck with the stable ABI, but std::unique_ptr<Unstable> could
switch to be trivial_abi.

That does leave the problem of actually doing the annotation.
Adding an attribute to every class is probably beyond what people
would accept. There are several ways to do mass annotation. Pragmas
are problematic because you don’t want to accidentally leave the
pragma on when you exit a file and then have it cover a system
include. We do have some pragmas that prevent file changes while
the pragma is active, which is a decent solution for that problem.
An alternative is to mark namespaces. That probably needs to be
lexical: that is, you wouldn’t be able to mark the entire clang
namespace, you would mark a specific namespace clang declaration
in a single header. But that’s still much more manageable, and
after all, the cost to missing an annotation is just a missed
optimization.

We could also implicitly make all anonymous-namespace declarations
unstable.

John.

Thanks for the response,
Zoe

On Fri, Jun 5, 2020 at 1:09 PM John McCall <[hidden email]> wrote:

On 5 Jun 2020, at 14:45, Zoe Carver via cfe-dev wrote:

Hello all,


I'm planning to do some work to add lifetime optimization passes for smart
pointers and reference-counted objects. I'll use this email as a sort of
proposal for what I hope to do.


*Scope*


As I'm developing the pass, I'm trying to keep it general and create
utilities that could work across multiple smart pointers. But, right now,
I'm focussing on unique_ptr and applying specific ownership optimizations
to
unique_ptr only.


*unique_ptr Optimzations*


The pass I'm currently developing adds a single, simple, optimization:
constant fold the destructor based on ownership information. unique_ptr has
a lot of ownership information communicated with reference semantics. When
a
unique_ptr is moved into another function, that function takes over
ownership of the unique_ptr, and subsequent destructors can be eliminated
(because they will be no-ops). Otherwise, branchless functions are often
complicated after inlining unique_ptr's destructor so, this optimization
should be fairly beneficial.


unique_ptr's reset and release methods both complicate this optimization a
bit. Because they are also able to transfer and remove ownership, all
unknown instructions must be ignored. However, in the future, knowledge of
those methods might be able to make the pass more robust.


With unique_ptr, it's difficult to prove liveness. So, it is hard to
constant fold the destructor call to always be there. Maybe in the future,
this would be possible, though (with sufficient analysis).


Last, an optimization that I hope to do is lowering the unique_ptr to a raw
pointer if all lifetime paths are known. I think removing this layer of
abstraction would make it easier for other optimization passes to be
successful. Eventually, we may even be able to specialize functions that
used to take a unique_ptr to now take a raw pointer, if the argument's
lifetime was also able to be fully analyzed.


*Lifetime Annotations*


Right now, the pass relies on (pre-inlined) function calls to generate
ownership information. Another approach would be to add ownership
annotations, such as the lifetime intrinsics (i.e. llvm.lifetime.start).


*ARC Optimizations*


There are a huge number of large and small ARC optimizations already in
LLVM. For unique_ptr specifically, I'm not sure these are of any benefit
because unique_ptr doesn't actually do any reference counting. But, later
on, when I start working on generalizing this pass to support more smart
pointers (specifically shared_ptr) I think the ARC optimization pass, and
especially the utilities it contains, could be very beneficial. If anyone
has experience with ARC optimizations, I'd love to hear your thoughts on
extending them to other reference counted objects.


*trivial_abi and Hidden References*


Arthur O'Dwyer made a good point, which is that a lot of these
optimizations can be applied when with the trivial_abi attribute. However,
given that's not a standard attribute and these optimizations only *happen*
to work with trivial_abi (i.e., in a more complicated program, they may not
continue to work). I think lifetime utilities and specific lifetime
optimization passes are still beneficial (especially if they can be applied
to other smart pointers in the future).


Because all smart pointers have non-trivial destructors, they are always
passed by hidden references. With unique_ptr, this is as simple as
bit-casting the pointer member to unique_ptr, which would allow for it to
be lowered to a single raw pointer instead of a stack-allocated object.
Even without the trival_abi attribute, I think this is an optimization that
could be done.


*Results*


Here's the unique_ptr pass I've been talking about: ⚙ D81288 Opt Smart
pointer lifetime optimizations pass. <https://reviews.llvm.org/D81288>

For reference, here are the before and after results:

Clang trunk (four branches): Compiler Explorer
<https://godbolt.org/z/bsJFty>

With optimizations (branchless): https://pastebin.com/raw/mQ2r6pru

Unfortunately, these are not legal optimizations for your test case:

-

guaranteed is permitted to escape a reference (or pointer) to the
object it was passed. Tat references and pointers remain valid
until the object goes out of scope.
-

The object can be mutated through that reference because the underlying
object is not const. Being passed a const reference is not a
semantic contract in C++.
-

Through a combination of the above, the call to owner may change
the value of p, and so the caller may not rely on it still being
in a trivially-destructible state after that call.
-

owner may leave the value of its parameter object in a
non-trivially-destructible state, and under the Itanium C++ ABI,
cleaning
up that object is the caller’s responsibility. I agree that this is a
bad rule for optimization purposes, but it’s the rule. This can only be
optimized with a more global, interprocedural optimization that shifts
responsibility to owner to destroy its argument.

John.

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Smart Pointer Lifetime Optimizations

Vassil Vassilev via cfe-dev
On 8 Jun 2020, at 22:52, John McCall wrote:

> On 8 Jun 2020, at 21:13, Richard Smith wrote:
>> On Mon, 8 Jun 2020 at 00:22, John McCall via cfe-dev
>> <[hidden email]>
>> wrote:
>>> *> Probably more importantly, though, we could allow unstable-ness
>>> to be
>>> detected with a type trait, and that would allow the standard
>>> library to
>>> take advantage of it. *
>>>
>>>
>>> We could actually do this for trivial_abi types too. If we added a
>>> builtin
>>> type trait to check if a type has the trivial_abi attribute, libc++
>>> could
>>> conditionally give unique_ptr the trivial_abi attribute if its base
>>> type
>>> also had the attribute. Additionally, we could add a config macro
>>> that
>>> would do this globally when libc++ is in unstable ABI mode.
>>>
>>> Hmm. That doesn’t just fall out from any analysis I see.
>>> trivial_abi
>>> is an existing, ABI-stable attribute, so changing the ABI of
>>> std::unique_ptr
>>> for types that are already trivial_abi is just as much of an ABI
>>> break
>>> as changing it in general would be. You could try to justify it by
>>> saying
>>> that there just aren’t very many trivial_abi types yet, or that
>>> trivial_abi
>>> is a vendor-specific attribute that’s unlikely to be used on a
>>> type with a
>>> stable ABI because non-Clang implementations wouldn’t be able to
>>> compile
>>> it compatibly, but those aren’t terribly convincing arguments to
>>> me.
>>>
>> I guess I should finish https://reviews.llvm.org/D63748 at some
>> point.
>> (Though I think we probably shouldn't enable it in libc++ unstable
>> ABI
>> configurations by default, since it also changes observable program
>> semantics due to altering destruction order, and is arguably
>> non-conforming
>> for the same reason.)
>
> It definitely changes observable semantics, but it’s not *obviously*
> non-conforming; [expr.call]p7 gives us a lot of flexibility here:
>
>   It is implementation-defined whether the lifetime of a parameter
>   ends when the function in which it is defined returns or at the
>   end of the enclosing full-expression.
>
> And note that MSVC has traditionally destroyed parameters in the
> callee.
> IIRC the standard actually originally specified that parameters were
> always destroyed at the end of the call and only changed it due to
> Itanium doing otherwise.
>
> Now, it’s possible that the copy-elision rules have an unfortunate
> impact here.  IIRC an object initialized with an elided copy is
> supposed
> to take on the longer of the two natural lifetimes.  Does that mean
> that
> if you have a parameter initialized by an elided copy from a
> temporary,
> the parameter needs to live until the end of the calling
> full-expression
> like the temporary would have?  If so, you either wouldn’t be able
> to
> use a callee-destroy ABI or you wouldn’t be allowed to elide copies
> into parameters, and the latter seems unacceptable.
>
> Even if it’s conforming, I’m sure there are bugs about e.g. the
> proper
> ordering with things like function-try-blocks and exception
> specifications.

If you were just thinking about [class.temporary]’s strict order
rules for non-extended temporaries, though, I don’t think parameters
are actually temporaries for this purpose.  There’s discussion of
temporaries for parameters and return values in [class.temporary],
but only for (a particular sense of) trivially-copyable types, and
it’s only a formalism for allowing the objects to be passed directly.

John.

>
> John.
>
>>
>>> John.
>>>
>>> Best,
>>>
>>> Zoe
>>>
>>> On Sat, Jun 6, 2020 at 2:07 PM John McCall <[hidden email]>
>>> wrote:
>>>
>>> On 6 Jun 2020, at 13:47, Zoe Carver wrote:
>>>
>>> John,
>>>
>>> Thanks, those are good points. I think we can still remove one of
>>> the
>>> destructors (which could also be done by a more powerful DSE+load
>>> propagation) but, you're right; one needs to stay.
>>>
>>> Can you explain in more detail which destructor you think you can
>>> eliminate?
>>>
>>> This can only be optimized with a more global, interprocedural
>>>
>>> optimization that shifts responsibility to owner to destroy its
>>> argument.
>>>
>>> I'll think about implementing something like this, but I suspect any
>>> possible optimizations will already happen with inlining and
>>> analysis.
>>>
>>> Yeah. For the narrow case of std::unique_ptr, since its operations
>>> are easily inlined and can be easily optimized after copy
>>> propagation,
>>> there’s not much more that can be done at a high level.
>>>
>>> Note that trivial_abi (if it could be adopted on std::unique_ptr)
>>> also changes the ABI to make the callee responsible for destruction.
>>> So as part of getting a more efficient low-level ABI, you also get a
>>> more optimizable high-level one.
>>>
>>> One idea I’ve personally been kicking around is some way to mark
>>> declarations as having an “unstable ABI”: basically, a guarantee
>>> that
>>> all the code that uses them will be compiled with a single
>>> toolchain,
>>> and therefore a license for the implementation to alter the ABI
>>> however
>>> it likes with any code that uses any of those declarations.
>>>
>>> A type would be unstable if it was composed even partially from a
>>> declaration marked unstable. So class Unstable would be unstable,
>>> but so would const Unstable * — and, crucially, so would
>>> std::unique_ptr<Unstable>. But for soundness reasons, this would
>>> need to ignore type sugar (so no marking typedefs), and it
>>> wouldn’t
>>> be able to automatically descend into fields.
>>>
>>> There are a few ways that we could use that directly in the
>>> compiler.
>>> The big restriction is that you’re not breaking ABI globally and
>>> so
>>> you always need an unstable “contaminant” that permits using the
>>> unstable ABI. For example, we can’t just change function ABIs
>>> for all unstable functions because function pointers have to remain
>>> compatible. On the other hand, programs aren’t allowed to call
>>> function pointers under the wrong type, so if the function type is
>>> unstable, we can change anything we want about its ABI.
>>>
>>> (For functions specifically, there’s another option: we could emit
>>> the functions with an unstable ABI and then introduce thunks that
>>> adapt the calling convention when the address is taken. But that’s
>>> a non-trivial code-size hit that we might have to do
>>> unconditionally.
>>> It also can’t adapt a callee-destroy ABI into a caller-destroy one
>>> without introducing an extra move, which isn’t necessarily
>>> semantically
>>> allowed.)
>>>
>>> Probably more importantly, though, we could allow unstable-ness to
>>> be detected with a type trait, and that would allow the standard
>>> library to take advantage of it. So std::unique_ptr<int> would
>>> be stuck with the stable ABI, but std::unique_ptr<Unstable> could
>>> switch to be trivial_abi.
>>>
>>> That does leave the problem of actually doing the annotation.
>>> Adding an attribute to every class is probably beyond what people
>>> would accept. There are several ways to do mass annotation. Pragmas
>>> are problematic because you don’t want to accidentally leave the
>>> pragma on when you exit a file and then have it cover a system
>>> include. We do have some pragmas that prevent file changes while
>>> the pragma is active, which is a decent solution for that problem.
>>> An alternative is to mark namespaces. That probably needs to be
>>> lexical: that is, you wouldn’t be able to mark the entire clang
>>> namespace, you would mark a specific namespace clang declaration
>>> in a single header. But that’s still much more manageable, and
>>> after all, the cost to missing an annotation is just a missed
>>> optimization.
>>>
>>> We could also implicitly make all anonymous-namespace declarations
>>> unstable.
>>>
>>> John.
>>>
>>> Thanks for the response,
>>> Zoe
>>>
>>> On Fri, Jun 5, 2020 at 1:09 PM John McCall <[hidden email]>
>>> wrote:
>>>
>>> On 5 Jun 2020, at 14:45, Zoe Carver via cfe-dev wrote:
>>>
>>> Hello all,
>>>
>>>
>>> I'm planning to do some work to add lifetime optimization passes for
>>> smart
>>> pointers and reference-counted objects. I'll use this email as a
>>> sort of
>>> proposal for what I hope to do.
>>>
>>>
>>> *Scope*
>>>
>>>
>>> As I'm developing the pass, I'm trying to keep it general and create
>>> utilities that could work across multiple smart pointers. But, right
>>> now,
>>> I'm focussing on unique_ptr and applying specific ownership
>>> optimizations
>>> to
>>> unique_ptr only.
>>>
>>>
>>> *unique_ptr Optimzations*
>>>
>>>
>>> The pass I'm currently developing adds a single, simple,
>>> optimization:
>>> constant fold the destructor based on ownership information.
>>> unique_ptr has
>>> a lot of ownership information communicated with reference
>>> semantics. When
>>> a
>>> unique_ptr is moved into another function, that function takes over
>>> ownership of the unique_ptr, and subsequent destructors can be
>>> eliminated
>>> (because they will be no-ops). Otherwise, branchless functions are
>>> often
>>> complicated after inlining unique_ptr's destructor so, this
>>> optimization
>>> should be fairly beneficial.
>>>
>>>
>>> unique_ptr's reset and release methods both complicate this
>>> optimization a
>>> bit. Because they are also able to transfer and remove ownership,
>>> all
>>> unknown instructions must be ignored. However, in the future,
>>> knowledge of
>>> those methods might be able to make the pass more robust.
>>>
>>>
>>> With unique_ptr, it's difficult to prove liveness. So, it is hard to
>>> constant fold the destructor call to always be there. Maybe in the
>>> future,
>>> this would be possible, though (with sufficient analysis).
>>>
>>>
>>> Last, an optimization that I hope to do is lowering the unique_ptr
>>> to a raw
>>> pointer if all lifetime paths are known. I think removing this layer
>>> of
>>> abstraction would make it easier for other optimization passes to be
>>> successful. Eventually, we may even be able to specialize functions
>>> that
>>> used to take a unique_ptr to now take a raw pointer, if the
>>> argument's
>>> lifetime was also able to be fully analyzed.
>>>
>>>
>>> *Lifetime Annotations*
>>>
>>>
>>> Right now, the pass relies on (pre-inlined) function calls to
>>> generate
>>> ownership information. Another approach would be to add ownership
>>> annotations, such as the lifetime intrinsics (i.e.
>>> llvm.lifetime.start).
>>>
>>>
>>> *ARC Optimizations*
>>>
>>>
>>> There are a huge number of large and small ARC optimizations already
>>> in
>>> LLVM. For unique_ptr specifically, I'm not sure these are of any
>>> benefit
>>> because unique_ptr doesn't actually do any reference counting. But,
>>> later
>>> on, when I start working on generalizing this pass to support more
>>> smart
>>> pointers (specifically shared_ptr) I think the ARC optimization
>>> pass, and
>>> especially the utilities it contains, could be very beneficial. If
>>> anyone
>>> has experience with ARC optimizations, I'd love to hear your
>>> thoughts on
>>> extending them to other reference counted objects.
>>>
>>>
>>> *trivial_abi and Hidden References*
>>>
>>>
>>> Arthur O'Dwyer made a good point, which is that a lot of these
>>> optimizations can be applied when with the trivial_abi attribute.
>>> However,
>>> given that's not a standard attribute and these optimizations only
>>> *happen*
>>> to work with trivial_abi (i.e., in a more complicated program, they
>>> may not
>>> continue to work). I think lifetime utilities and specific lifetime
>>> optimization passes are still beneficial (especially if they can be
>>> applied
>>> to other smart pointers in the future).
>>>
>>>
>>> Because all smart pointers have non-trivial destructors, they are
>>> always
>>> passed by hidden references. With unique_ptr, this is as simple as
>>> bit-casting the pointer member to unique_ptr, which would allow for
>>> it to
>>> be lowered to a single raw pointer instead of a stack-allocated
>>> object.
>>> Even without the trival_abi attribute, I think this is an
>>> optimization that
>>> could be done.
>>>
>>>
>>> *Results*
>>>
>>>
>>> Here's the unique_ptr pass I've been talking about: ⚙ D81288 Opt
>>> Smart
>>> pointer lifetime optimizations pass.
>>> <https://reviews.llvm.org/D81288>
>>>
>>> For reference, here are the before and after results:
>>>
>>> Clang trunk (four branches): Compiler Explorer
>>> <https://godbolt.org/z/bsJFty>
>>>
>>> With optimizations (branchless): https://pastebin.com/raw/mQ2r6pru
>>>
>>> Unfortunately, these are not legal optimizations for your test case:
>>>
>>> -
>>>
>>> guaranteed is permitted to escape a reference (or pointer) to the
>>> object it was passed. Tat references and pointers remain valid
>>> until the object goes out of scope.
>>> -
>>>
>>> The object can be mutated through that reference because the
>>> underlying
>>> object is not const. Being passed a const reference is not a
>>> semantic contract in C++.
>>> -
>>>
>>> Through a combination of the above, the call to owner may change
>>> the value of p, and so the caller may not rely on it still being
>>> in a trivially-destructible state after that call.
>>> -
>>>
>>> owner may leave the value of its parameter object in a
>>> non-trivially-destructible state, and under the Itanium C++ ABI,
>>> cleaning
>>> up that object is the caller’s responsibility. I agree that this
>>> is a
>>> bad rule for optimization purposes, but it’s the rule. This can
>>> only be
>>> optimized with a more global, interprocedural optimization that
>>> shifts
>>> responsibility to owner to destroy its argument.
>>>
>>> John.
>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> [hidden email]
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Smart Pointer Lifetime Optimizations

Vassil Vassilev via cfe-dev

> Does that mean that if you have a parameter initialized by an elided copy from a temporary, the parameter needs to live until the end of the calling full-expression like the temporary would have? 


This is a very good point. Does the trivial_abi attribute force the compiler to pass by value or just allow it to pass by value? If the former, I don't think we actually have an issue. Because even if the copy is elided, it is still getting passed through registers to the callee. So it shouldn't have any effect on the elided copy's source.


On Mon, Jun 8, 2020 at 8:09 PM John McCall <[hidden email]> wrote:
On 8 Jun 2020, at 22:52, John McCall wrote:
> On 8 Jun 2020, at 21:13, Richard Smith wrote:
>> On Mon, 8 Jun 2020 at 00:22, John McCall via cfe-dev
>> <[hidden email]>
>> wrote:
>>> *> Probably more importantly, though, we could allow unstable-ness
>>> to be
>>> detected with a type trait, and that would allow the standard
>>> library to
>>> take advantage of it. *
>>>
>>>
>>> We could actually do this for trivial_abi types too. If we added a
>>> builtin
>>> type trait to check if a type has the trivial_abi attribute, libc++
>>> could
>>> conditionally give unique_ptr the trivial_abi attribute if its base
>>> type
>>> also had the attribute. Additionally, we could add a config macro
>>> that
>>> would do this globally when libc++ is in unstable ABI mode.
>>>
>>> Hmm. That doesn’t just fall out from any analysis I see.
>>> trivial_abi
>>> is an existing, ABI-stable attribute, so changing the ABI of
>>> std::unique_ptr
>>> for types that are already trivial_abi is just as much of an ABI
>>> break
>>> as changing it in general would be. You could try to justify it by
>>> saying
>>> that there just aren’t very many trivial_abi types yet, or that
>>> trivial_abi
>>> is a vendor-specific attribute that’s unlikely to be used on a
>>> type with a
>>> stable ABI because non-Clang implementations wouldn’t be able to
>>> compile
>>> it compatibly, but those aren’t terribly convincing arguments to
>>> me.
>>>
>> I guess I should finish https://reviews.llvm.org/D63748 at some
>> point.
>> (Though I think we probably shouldn't enable it in libc++ unstable
>> ABI
>> configurations by default, since it also changes observable program
>> semantics due to altering destruction order, and is arguably
>> non-conforming
>> for the same reason.)
>
> It definitely changes observable semantics, but it’s not *obviously*
> non-conforming; [expr.call]p7 gives us a lot of flexibility here:
>
>   It is implementation-defined whether the lifetime of a parameter
>   ends when the function in which it is defined returns or at the
>   end of the enclosing full-expression.
>
> And note that MSVC has traditionally destroyed parameters in the
> callee.
> IIRC the standard actually originally specified that parameters were
> always destroyed at the end of the call and only changed it due to
> Itanium doing otherwise.
>
> Now, it’s possible that the copy-elision rules have an unfortunate
> impact here.  IIRC an object initialized with an elided copy is
> supposed
> to take on the longer of the two natural lifetimes.  Does that mean
> that
> if you have a parameter initialized by an elided copy from a
> temporary,
> the parameter needs to live until the end of the calling
> full-expression
> like the temporary would have?  If so, you either wouldn’t be able
> to
> use a callee-destroy ABI or you wouldn’t be allowed to elide copies
> into parameters, and the latter seems unacceptable.
>
> Even if it’s conforming, I’m sure there are bugs about e.g. the
> proper
> ordering with things like function-try-blocks and exception
> specifications.

If you were just thinking about [class.temporary]’s strict order
rules for non-extended temporaries, though, I don’t think parameters
are actually temporaries for this purpose.  There’s discussion of
temporaries for parameters and return values in [class.temporary],
but only for (a particular sense of) trivially-copyable types, and
it’s only a formalism for allowing the objects to be passed directly.

John.

>
> John.
>
>>
>>> John.
>>>
>>> Best,
>>>
>>> Zoe
>>>
>>> On Sat, Jun 6, 2020 at 2:07 PM John McCall <[hidden email]>
>>> wrote:
>>>
>>> On 6 Jun 2020, at 13:47, Zoe Carver wrote:
>>>
>>> John,
>>>
>>> Thanks, those are good points. I think we can still remove one of
>>> the
>>> destructors (which could also be done by a more powerful DSE+load
>>> propagation) but, you're right; one needs to stay.
>>>
>>> Can you explain in more detail which destructor you think you can
>>> eliminate?
>>>
>>> This can only be optimized with a more global, interprocedural
>>>
>>> optimization that shifts responsibility to owner to destroy its
>>> argument.
>>>
>>> I'll think about implementing something like this, but I suspect any
>>> possible optimizations will already happen with inlining and
>>> analysis.
>>>
>>> Yeah. For the narrow case of std::unique_ptr, since its operations
>>> are easily inlined and can be easily optimized after copy
>>> propagation,
>>> there’s not much more that can be done at a high level.
>>>
>>> Note that trivial_abi (if it could be adopted on std::unique_ptr)
>>> also changes the ABI to make the callee responsible for destruction.
>>> So as part of getting a more efficient low-level ABI, you also get a
>>> more optimizable high-level one.
>>>
>>> One idea I’ve personally been kicking around is some way to mark
>>> declarations as having an “unstable ABI”: basically, a guarantee
>>> that
>>> all the code that uses them will be compiled with a single
>>> toolchain,
>>> and therefore a license for the implementation to alter the ABI
>>> however
>>> it likes with any code that uses any of those declarations.
>>>
>>> A type would be unstable if it was composed even partially from a
>>> declaration marked unstable. So class Unstable would be unstable,
>>> but so would const Unstable * — and, crucially, so would
>>> std::unique_ptr<Unstable>. But for soundness reasons, this would
>>> need to ignore type sugar (so no marking typedefs), and it
>>> wouldn’t
>>> be able to automatically descend into fields.
>>>
>>> There are a few ways that we could use that directly in the
>>> compiler.
>>> The big restriction is that you’re not breaking ABI globally and
>>> so
>>> you always need an unstable “contaminant” that permits using the
>>> unstable ABI. For example, we can’t just change function ABIs
>>> for all unstable functions because function pointers have to remain
>>> compatible. On the other hand, programs aren’t allowed to call
>>> function pointers under the wrong type, so if the function type is
>>> unstable, we can change anything we want about its ABI.
>>>
>>> (For functions specifically, there’s another option: we could emit
>>> the functions with an unstable ABI and then introduce thunks that
>>> adapt the calling convention when the address is taken. But that’s
>>> a non-trivial code-size hit that we might have to do
>>> unconditionally.
>>> It also can’t adapt a callee-destroy ABI into a caller-destroy one
>>> without introducing an extra move, which isn’t necessarily
>>> semantically
>>> allowed.)
>>>
>>> Probably more importantly, though, we could allow unstable-ness to
>>> be detected with a type trait, and that would allow the standard
>>> library to take advantage of it. So std::unique_ptr<int> would
>>> be stuck with the stable ABI, but std::unique_ptr<Unstable> could
>>> switch to be trivial_abi.
>>>
>>> That does leave the problem of actually doing the annotation.
>>> Adding an attribute to every class is probably beyond what people
>>> would accept. There are several ways to do mass annotation. Pragmas
>>> are problematic because you don’t want to accidentally leave the
>>> pragma on when you exit a file and then have it cover a system
>>> include. We do have some pragmas that prevent file changes while
>>> the pragma is active, which is a decent solution for that problem.
>>> An alternative is to mark namespaces. That probably needs to be
>>> lexical: that is, you wouldn’t be able to mark the entire clang
>>> namespace, you would mark a specific namespace clang declaration
>>> in a single header. But that’s still much more manageable, and
>>> after all, the cost to missing an annotation is just a missed
>>> optimization.
>>>
>>> We could also implicitly make all anonymous-namespace declarations
>>> unstable.
>>>
>>> John.
>>>
>>> Thanks for the response,
>>> Zoe
>>>
>>> On Fri, Jun 5, 2020 at 1:09 PM John McCall <[hidden email]>
>>> wrote:
>>>
>>> On 5 Jun 2020, at 14:45, Zoe Carver via cfe-dev wrote:
>>>
>>> Hello all,
>>>
>>>
>>> I'm planning to do some work to add lifetime optimization passes for
>>> smart
>>> pointers and reference-counted objects. I'll use this email as a
>>> sort of
>>> proposal for what I hope to do.
>>>
>>>
>>> *Scope*
>>>
>>>
>>> As I'm developing the pass, I'm trying to keep it general and create
>>> utilities that could work across multiple smart pointers. But, right
>>> now,
>>> I'm focussing on unique_ptr and applying specific ownership
>>> optimizations
>>> to
>>> unique_ptr only.
>>>
>>>
>>> *unique_ptr Optimzations*
>>>
>>>
>>> The pass I'm currently developing adds a single, simple,
>>> optimization:
>>> constant fold the destructor based on ownership information.
>>> unique_ptr has
>>> a lot of ownership information communicated with reference
>>> semantics. When
>>> a
>>> unique_ptr is moved into another function, that function takes over
>>> ownership of the unique_ptr, and subsequent destructors can be
>>> eliminated
>>> (because they will be no-ops). Otherwise, branchless functions are
>>> often
>>> complicated after inlining unique_ptr's destructor so, this
>>> optimization
>>> should be fairly beneficial.
>>>
>>>
>>> unique_ptr's reset and release methods both complicate this
>>> optimization a
>>> bit. Because they are also able to transfer and remove ownership,
>>> all
>>> unknown instructions must be ignored. However, in the future,
>>> knowledge of
>>> those methods might be able to make the pass more robust.
>>>
>>>
>>> With unique_ptr, it's difficult to prove liveness. So, it is hard to
>>> constant fold the destructor call to always be there. Maybe in the
>>> future,
>>> this would be possible, though (with sufficient analysis).
>>>
>>>
>>> Last, an optimization that I hope to do is lowering the unique_ptr
>>> to a raw
>>> pointer if all lifetime paths are known. I think removing this layer
>>> of
>>> abstraction would make it easier for other optimization passes to be
>>> successful. Eventually, we may even be able to specialize functions
>>> that
>>> used to take a unique_ptr to now take a raw pointer, if the
>>> argument's
>>> lifetime was also able to be fully analyzed.
>>>
>>>
>>> *Lifetime Annotations*
>>>
>>>
>>> Right now, the pass relies on (pre-inlined) function calls to
>>> generate
>>> ownership information. Another approach would be to add ownership
>>> annotations, such as the lifetime intrinsics (i.e.
>>> llvm.lifetime.start).
>>>
>>>
>>> *ARC Optimizations*
>>>
>>>
>>> There are a huge number of large and small ARC optimizations already
>>> in
>>> LLVM. For unique_ptr specifically, I'm not sure these are of any
>>> benefit
>>> because unique_ptr doesn't actually do any reference counting. But,
>>> later
>>> on, when I start working on generalizing this pass to support more
>>> smart
>>> pointers (specifically shared_ptr) I think the ARC optimization
>>> pass, and
>>> especially the utilities it contains, could be very beneficial. If
>>> anyone
>>> has experience with ARC optimizations, I'd love to hear your
>>> thoughts on
>>> extending them to other reference counted objects.
>>>
>>>
>>> *trivial_abi and Hidden References*
>>>
>>>
>>> Arthur O'Dwyer made a good point, which is that a lot of these
>>> optimizations can be applied when with the trivial_abi attribute.
>>> However,
>>> given that's not a standard attribute and these optimizations only
>>> *happen*
>>> to work with trivial_abi (i.e., in a more complicated program, they
>>> may not
>>> continue to work). I think lifetime utilities and specific lifetime
>>> optimization passes are still beneficial (especially if they can be
>>> applied
>>> to other smart pointers in the future).
>>>
>>>
>>> Because all smart pointers have non-trivial destructors, they are
>>> always
>>> passed by hidden references. With unique_ptr, this is as simple as
>>> bit-casting the pointer member to unique_ptr, which would allow for
>>> it to
>>> be lowered to a single raw pointer instead of a stack-allocated
>>> object.
>>> Even without the trival_abi attribute, I think this is an
>>> optimization that
>>> could be done.
>>>
>>>
>>> *Results*
>>>
>>>
>>> Here's the unique_ptr pass I've been talking about: ⚙ D81288 Opt
>>> Smart
>>> pointer lifetime optimizations pass.
>>> <https://reviews.llvm.org/D81288>
>>>
>>> For reference, here are the before and after results:
>>>
>>> Clang trunk (four branches): Compiler Explorer
>>> <https://godbolt.org/z/bsJFty>
>>>
>>> With optimizations (branchless): https://pastebin.com/raw/mQ2r6pru
>>>
>>> Unfortunately, these are not legal optimizations for your test case:
>>>
>>> -
>>>
>>> guaranteed is permitted to escape a reference (or pointer) to the
>>> object it was passed. Tat references and pointers remain valid
>>> until the object goes out of scope.
>>> -
>>>
>>> The object can be mutated through that reference because the
>>> underlying
>>> object is not const. Being passed a const reference is not a
>>> semantic contract in C++.
>>> -
>>>
>>> Through a combination of the above, the call to owner may change
>>> the value of p, and so the caller may not rely on it still being
>>> in a trivially-destructible state after that call.
>>> -
>>>
>>> owner may leave the value of its parameter object in a
>>> non-trivially-destructible state, and under the Itanium C++ ABI,
>>> cleaning
>>> up that object is the caller’s responsibility. I agree that this
>>> is a
>>> bad rule for optimization purposes, but it’s the rule. This can
>>> only be
>>> optimized with a more global, interprocedural optimization that
>>> shifts
>>> responsibility to owner to destroy its argument.
>>>
>>> John.
>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> [hidden email]
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>



_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Smart Pointer Lifetime Optimizations

Vassil Vassilev via cfe-dev
In reply to this post by Vassil Vassilev via cfe-dev

On 6/8/20 8:13 PM, Richard Smith via cfe-dev wrote

>> You wouldn’t be the first person to be surprised by the result of this sort
>> of analysis, but I’m afraid it’s what we’re working with.
>>
>> Unfortunately, there’s really no way to eliminate this one without either
>> interprocedural information or language changes. trivial_abi eliminates
>> the other one because it changes the convention for passing by value, but
>> to
>> pass an “immutably borrowed” value in C++ we have to pass by reference,
>> which
>> allows the reference to be escaped and accessed (and even mutated, if the
>> original object wasn’t declared const) as long as those accesses happen
>> before destruction.
>>
> Perhaps we should expose LLVM's nocapture attribute to the source level?

Yes please ;)


There was an RFC (I think) and a prototype patch to expose LLVM-IR

attributes via clang. We can limit the ones allowed if we want.

See https://c.wsmoses.com/presentations/HTO_Presentation.pdf for

another motivation :)


--

   Johannes

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Smart Pointer Lifetime Optimizations

Vassil Vassilev via cfe-dev
On Wed, Jun 10, 2020 at 11:14 AM Johannes Doerfert via cfe-dev <[hidden email]> wrote:
On 6/8/20 8:13 PM, Richard Smith via cfe-dev wrote
>> You wouldn’t be the first person to be surprised by the result of this sort
>> of analysis, but I’m afraid it’s what we’re working with.
>>
>> Unfortunately, there’s really no way to eliminate this one without either
>> interprocedural information or language changes. trivial_abi eliminates
>> the other one because it changes the convention for passing by value, but
>> to pass an “immutably borrowed” value in C++ we have to pass by reference,
>> which allows the reference to be escaped and accessed (and even mutated, if the
>> original object wasn’t declared const) as long as those accesses happen
>> before destruction.
>>
> Perhaps we should expose LLVM's nocapture attribute to the source level?

Yes please ;)

As someone (John?) has said in this thread, the nocapture attribute is already exposed to C/C++ via __attribute__((noescape)).

Here is an example of __attribute__((noescape)) doing its thing:
In function `one`, the call to `noargs` is assumed to trash the value of `i`.
In function `two`, the call to `noargs` is not assumed to trash the value of `i`.

However, notice that the store corresponding to the assignment `i = 42;` cannot be removed in either case!  We initialize `i` to 42, then pass it by const reference to `dont_escape`, and yet cannot assume that its value is unchanged by that call. This is required by C++'s loose rules around `const`. But it doesn't match up with human intuition very well.

I mentioned to Zoe offline that I'd like to see Clang add an opt-in non-conforming optimization mode, on the lines of `-ffast-math`, which would assume things like "things passed by const reference don't change" and "things passed by reference don't escape," and then measure to what degree codegen is improved on real codebases by that non-conforming optimization mode.

–Arthur

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Smart Pointer Lifetime Optimizations

Vassil Vassilev via cfe-dev

On 6/10/20 10:31 AM, Arthur O'Dwyer wrote:

> On Wed, Jun 10, 2020 at 11:14 AM Johannes Doerfert via cfe-dev <
> [hidden email]> wrote:
>
>> On 6/8/20 8:13 PM, Richard Smith via cfe-dev wrote
>>>> You wouldn’t be the first person to be surprised by the result of this
>> sort
>>>> of analysis, but I’m afraid it’s what we’re working with.
>>>>
>>>> Unfortunately, there’s really no way to eliminate this one without
>> either
>>>> interprocedural information or language changes. trivial_abi eliminates
>>>> the other one because it changes the convention for passing by value,
>> but
>>>> to pass an “immutably borrowed” value in C++ we have to pass by
>> reference,
>>>> which allows the reference to be escaped and accessed (and even
>> mutated, if the
>>>> original object wasn’t declared const) as long as those accesses happen
>>>> before destruction.
>>>>
>>> Perhaps we should expose LLVM's nocapture attribute to the source level?
>> Yes please ;)
>>
> As someone (John?) has said in this thread, the nocapture attribute is
> already exposed to C/C++ via __attribute__((noescape)).
>
Yeah, I saw that later, thanks. I actually want way more attributes exposed,

e.g., the ones I mention below. But also `dereferenceable`, `nosync`,

`willreturn`, ...



> Here is an example of __attribute__((noescape)) doing its thing:
> https://godbolt.org/z/rJbEmZ
> In function `one`, the call to `noargs` is assumed to trash the value of
> `i`.
> In function `two`, the call to `noargs` is *not* assumed to trash the value
> of `i`.
>
> However, notice that the store corresponding to the assignment `i = 42;`
> cannot be removed in either case!  We initialize `i` to 42, then pass it by
> const reference to `dont_escape`, and yet *cannot assume that its value is
> unchanged by that call*. This is required by C++'s loose rules around
> `const`. But it doesn't match up with human intuition very well.
>
> I mentioned to Zoe offline that I'd like to see Clang add an opt-in
> non-conforming optimization mode, on the lines of `-ffast-math`, which
> would *assume* things like "things passed by const reference don't change"
> and "things passed by reference don't escape," and then measure to what
> degree codegen is improved on real codebases by that non-conforming
> optimization mode.

Interesting, especially to determine lost performance potential [0].

Should not even be hard, mark `const` as `nocapture`, `readonly`,

and `nofree` during IR-gen.


-- Johannes


[0] https://link.springer.com/chapter/10.1007/978-3-030-20656-7_13


> –Arthur
>
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Smart Pointer Lifetime Optimizations

Vassil Vassilev via cfe-dev
On Wed, Jun 10, 2020 at 11:41 AM Johannes Doerfert <[hidden email]> wrote:
On 6/10/20 10:31 AM, Arthur O'Dwyer wrote:
> I mentioned to Zoe offline that I'd like to see Clang add an opt-in
> non-conforming optimization mode, on the lines of `-ffast-math`, which
> would *assume* things like "things passed by const reference don't change"
> and "things passed by reference don't escape," and then measure to what
> degree codegen is improved on real codebases by that non-conforming
> optimization mode.

Interesting, especially to determine lost performance potential [0].
[0] https://link.springer.com/chapter/10.1007/978-3-030-20656-7_13

Exactly! :)  And I see you're the first author of that paper!
If I understand correctly, what you did there was have the compiler add annotations to every possible site, and then iteratively use an "acceptance test" to find out which annotations broke the program and must be rolled back, until you arrived at a local minimum. (Similar to what C-Reduce does.)

My idea of a "-ffast-math–style" opt-in mode couldn't depend on the existence of an "acceptance test" for the program it was compiling. It would blindly use heuristics invented by the Clang developer, such as "things marked const should always be annotated readonly" and "parameters of reference type should always be annotated noescape."

If your PETOSPA framework hasn't yet bit-rotted, you could use PETOSPA to sanity-check my heuristics! Suppose you did a special run of your compiler, applying just the readonly and noescape annotations, in exactly the cases outlined in my previous paragraph. And then you run your "acceptance tests" on the resulting binaries. My hypothesis is that the acceptance tests should all pass. That is: applying these heuristics should not break any of your programs.  If applying one of these heuristics does break one of your programs, I would be very interested to learn why.

Thoughts? Is there any chance you could run a test like that in the next month or two?
–Arthur

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Smart Pointer Lifetime Optimizations

Vassil Vassilev via cfe-dev

On 6/10/20 11:17 AM, Arthur O'Dwyer wrote:

> On Wed, Jun 10, 2020 at 11:41 AM Johannes Doerfert <
> [hidden email]> wrote:
>
>> On 6/10/20 10:31 AM, Arthur O'Dwyer wrote:
>>> I mentioned to Zoe offline that I'd like to see Clang add an opt-in
>>> non-conforming optimization mode, on the lines of `-ffast-math`, which
>>> would *assume* things like "things passed by const reference don't
>> change"
>>> and "things passed by reference don't escape," and then measure to what
>>> degree codegen is improved on real codebases by that non-conforming
>>> optimization mode.
>> Interesting, especially to determine lost performance potential [0].
>> [0] https://link.springer.com/chapter/10.1007/978-3-030-20656-7_13
>>
> Exactly! :)  And I see you're the first author of that paper!
> Fulltext PDF: "Performance Exploration Through Optimistic Static Program
> Annotations" (Doerfert, Homerding, Finkel; 2019)
> <https://github.com/jdoerfert/PETOSPA/blob/master/ISC19.pdf>
> If I understand correctly, what you did there was have the compiler add
> annotations to every possible site, and then iteratively use an "acceptance
> test" to find out which annotations broke the program and must be rolled
> back, until you arrived at a local minimum. (Similar to what C-Reduce does.)
Pretty much, yes. (Slides and paper attached in case that helps.)


> My idea of a "-ffast-math–style" opt-in mode couldn't depend on the
> existence of an "acceptance test" for the program it was compiling. It
> would blindly use heuristics invented by the Clang developer, such as
> "things marked const should always be annotated readonly" and "parameters
> of reference type should always be annotated noescape."

Agreed. Totally. You would probably still run your tests (I mean the

user would). It is just a single manually triggered flag instead of

an automatic exploration. Not to say this is not actually more

useful in the short term ;)


> If your PETOSPA framework hasn't yet bit-rotted, *you could use PETOSPA to
> sanity-check my heuristics!* Suppose you did a special run of your
> compiler, applying just the readonly and noescape annotations, in exactly
> the cases outlined in my previous paragraph. And then you run your
> "acceptance tests" on the resulting binaries. My hypothesis is that the
> acceptance tests should all pass. That is: applying these heuristics should
> not break any of your programs.  If applying one of these heuristics does
> break one of your programs, I would be very interested to learn why.

In Figure 6 we stated that only 3 pointer arguments we annotated with

readnone/readonly/writeonly and none of the nocapture annotated pointers
caused

our verification to fail. To be fair, pointer capturing is not really

what you expect many functions in an HPC application to do ;)



> Thoughts? Is there any chance you *could *run a test like that in the next
> month or two?

I think we should "just" modify the clang codegen and run the test suite

and SPEC. That way we measure the actual thing with similar work overhead.


WDYT?


> –Arthur
>

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

paper.pdf (604K) Download Attachment
talk.pdf (979K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Smart Pointer Lifetime Optimizations

Vassil Vassilev via cfe-dev
On Wed, Jun 10, 2020 at 12:40 PM Johannes Doerfert <[hidden email]> wrote:
On 6/10/20 11:17 AM, Arthur O'Dwyer wrote:
> My idea of a "-ffast-math–style" opt-in mode couldn't depend on the
> existence of an "acceptance test" for the program it was compiling. [...]

Agreed. Totally. You would probably still run your tests (I mean the
user would).

Heh, yes. :) But if the user's tests failed, they'd have no recourse except "don't opt in anymore," similar to what would happen if their tests failed with `-ffast-math` (or with `-O3` or with `-fstrict-aliasing`, for that matter).

> If your PETOSPA framework hasn't yet bit-rotted, *you could use PETOSPA to
> sanity-check my heuristics!* Suppose you did a special run of your
> compiler, applying just the readonly and noescape annotations, in exactly
> the cases outlined in my previous paragraph. And then you run your
> "acceptance tests" on the resulting binaries. My hypothesis is that the
> acceptance tests should all pass. That is: applying these heuristics should
> not break any of your programs.  If applying one of these heuristics does
> break one of your programs, I would be very interested to learn why.

In Figure 6 we stated that only 3 pointer arguments we annotated with
readnone/readonly/writeonly and none of the nocapture annotated pointers caused
our verification to fail. To be fair, pointer capturing is not really
what you expect many functions in an HPC application to do ;)

I see. The paper is unclear to me (on a quick reading) in three ways:
(1) When you say "pointer argument" do you also include "by-reference argument"?
(2) Do you include the implicit `this` parameter?
(3) IIUC, your statement in Figure 6 says that all-but-3 pointer parameters were successfully annotated with some annotation, but it doesn't break them down into pass-by-pointer and pass-by-const-pointer, and it doesn't break the annotations down into `readnone`/`readonly`/`writeonly`. It is theoretically possible that your all-but-3 number includes some const pointers annotated with `writeonly`(!), or that the 3 includes some const pointers. Either of those outcomes would falsify my hypothesis.


I think we should "just" modify the clang codegen and run the test suite
and SPEC. That way we measure the actual thing with similar work overhead.
WDYT?

"Just" modifying Clang would also be awesome. :)
If you modify Clang, I expect Clang's own test suite to fail at least twice, because you'd be deliberately introducing non-conforming compiler behavior, and the test suite should definitely treat that as a "correctness regression."
Ah: you could use this mode to compile Clang itself and then use "time to run Clang's test suite" as your benchmark.  If that's what you meant by "run the test suite," then I agree!

–Arthur

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Smart Pointer Lifetime Optimizations

Vassil Vassilev via cfe-dev
In reply to this post by Vassil Vassilev via cfe-dev

On 10 Jun 2020, at 11:31, Arthur O'Dwyer wrote:

On Wed, Jun 10, 2020 at 11:14 AM Johannes Doerfert via cfe-dev <
[hidden email]> wrote:

On 6/8/20 8:13 PM, Richard Smith via cfe-dev wrote

You wouldn’t be the first person to be surprised by the result of this

sort

of analysis, but I’m afraid it’s what we’re working with.

Unfortunately, there’s really no way to eliminate this one without

either

interprocedural information or language changes. trivial_abi eliminates
the other one because it changes the convention for passing by value,

but

to pass an “immutably borrowed” value in C++ we have to pass by

reference,

which allows the reference to be escaped and accessed (and even

mutated, if the

original object wasn’t declared const) as long as those accesses happen
before destruction.

Perhaps we should expose LLVM's nocapture attribute to the source level?

Yes please ;)

As someone (John?) has said in this thread, the nocapture attribute is
already exposed to C/C++ via __attribute__((noescape)).

Here is an example of __attribute__((noescape)) doing its thing:
https://godbolt.org/z/rJbEmZ
In function `one`, the call to `noargs` is assumed to trash the value of
`i`.
In function `two`, the call to `noargs` is *not* assumed to trash the value
of `i`.

However, notice that the store corresponding to the assignment `i = 42;`
cannot be removed in either case! We initialize `i` to 42, then pass it by
const reference to `dont_escape`, and yet *cannot assume that its value is
unchanged by that call*. This is required by C++'s loose rules around
`const`. But it doesn't match up with human intuition very well.

I mentioned to Zoe offline that I'd like to see Clang add an opt-in
non-conforming optimization mode, on the lines of `-ffast-math`, which
would *assume* things like "things passed by const reference don't change"
and "things passed by reference don't escape," and then measure to what
degree codegen is improved on real codebases by that non-conforming
optimization mode.

This is quite an interesting idea. Having a mode that assumes that
const references don’t get cast back might be quite interesting. I
suspect that enforcing stricter escape rules would break some specific
idioms and types.

John.


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Smart Pointer Lifetime Optimizations

Vassil Vassilev via cfe-dev
In reply to this post by Vassil Vassilev via cfe-dev

On 6/10/20 12:06 PM, Arthur O'Dwyer wrote:

> On Wed, Jun 10, 2020 at 12:40 PM Johannes Doerfert <
> [hidden email]> wrote:
>
>> On 6/10/20 11:17 AM, Arthur O'Dwyer wrote:
>>> My idea of a "-ffast-math–style" opt-in mode couldn't depend on the
>>> existence of an "acceptance test" for the program it was compiling. [...]
>> Agreed. Totally. You would probably still run your tests (I mean the
>> user would).
>
> Heh, yes. :) But if the user's tests failed, they'd have no recourse except
> "don't opt in anymore," similar to what would happen if their tests
> failed with `-ffast-math` (or with `-O3` or with `-fstrict-aliasing`, for
> that matter).

Agreed :)


>> If your PETOSPA framework hasn't yet bit-rotted, *you could use PETOSPA to
>>> sanity-check my heuristics!* Suppose you did a special run of your
>>> compiler, applying just the readonly and noescape annotations, in exactly
>>> the cases outlined in my previous paragraph. And then you run your
>>> "acceptance tests" on the resulting binaries. My hypothesis is that the
>>> acceptance tests should all pass. That is: applying these heuristics
>> should
>>> not break any of your programs.  If applying one of these heuristics does
>>> break one of your programs, I would be very interested to learn why.
>> In Figure 6 we stated that only 3 pointer arguments we annotated with
>> readnone/readonly/writeonly and none of the nocapture annotated
>> pointers caused
>> our verification to fail. To be fair, pointer capturing is not really
>> what you expect many functions in an HPC application to do ;)
>>
> I see. The paper is unclear to me (on a quick reading) in three ways:
> (1) When you say "pointer argument" do you also include "by-reference
> argument"?

Yes, it's implemented on IR, there is no(t much) difference.



> (2) Do you include the implicit `this` parameter?

Yes, it's implemented on IR. (Though the benchmarks were not C++ heavy,
I think.)


> (3) IIUC, your statement in Figure 6 says that all-but-3 pointer parameters
> were successfully annotated with *some* annotation, but it doesn't break
> them down into pass-by-pointer and pass-by-const-pointer, and it doesn't
> break the annotations down into `readnone`/`readonly`/`writeonly`. It is
> theoretically possible that your all-but-3 number includes some const
> pointers annotated with `writeonly`(!), or that the 3 includes some const
> pointers. Either of those outcomes would falsify my hypothesis.
>
It is possible, yes. Doubtful from what I remember, though I had to
check the data to make sure.


> I think we should "just" modify the clang codegen and run the test suite
>> and SPEC. That way we measure the actual thing with similar work overhead.
>> WDYT?
>>
> "Just" modifying Clang would also be awesome. :)
> If you modify Clang, I *expect* Clang's own test suite to fail at least
> twice, because you'd be deliberately introducing non-conforming compiler
> behavior, and the test suite should definitely treat that as a "correctness
> regression."

I am not even sure we'd break many tests. I am not aware of tests that

ensure either (1) const_cast + modify work properly, or, (2) let the pointer

escape and trigger some modification (store/free) to be moved prior to the

"hidden" read via the escaped pointer. That said, there are certainly
situations

of (1) and (2) in the wild. I would add the annotation, run it on
llvm-test-suite,

spec, ... and see what falls out. If there is no/little compile/runtime
improvement,

less incentive to go forth with this. Otherwise, we can take a closer
look at the

failure rate and determine if a user flag makes sense.


> Ah: you could use this mode to *compile Clang itself* and then use "time to
> run Clang's test suite" as your benchmark.  If that's what you meant by
> "run the test suite," then I agree!

Self hosting is certainly also a nice benchmark here.



>
> –Arthur
>
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Smart Pointer Lifetime Optimizations

Vassil Vassilev via cfe-dev
In reply to this post by Vassil Vassilev via cfe-dev

On 6/10/20 12:36 PM, John McCall wrote:

>
>
> On 10 Jun 2020, at 11:31, Arthur O'Dwyer wrote:
>
>> On Wed, Jun 10, 2020 at 11:14 AM Johannes Doerfert via cfe-dev <
>> [hidden email]> wrote:
>>
>>> On 6/8/20 8:13 PM, Richard Smith via cfe-dev wrote
>>>>> You wouldn’t be the first person to be surprised by the result of
>>>>> this
>>> sort
>>>>> of analysis, but I’m afraid it’s what we’re working with.
>>>>>
>>>>> Unfortunately, there’s really no way to eliminate this one without
>>> either
>>>>> interprocedural information or language changes. trivial_abi
>>>>> eliminates
>>>>> the other one because it changes the convention for passing by value,
>>> but
>>>>> to pass an “immutably borrowed” value in C++ we have to pass by
>>> reference,
>>>>> which allows the reference to be escaped and accessed (and even
>>> mutated, if the
>>>>> original object wasn’t declared const) as long as those accesses
>>>>> happen
>>>>> before destruction.
>>>>>
>>>> Perhaps we should expose LLVM's nocapture attribute to the source
>>>> level?
>>>
>>> Yes please ;)
>>>
>>
>> As someone (John?) has said in this thread, the nocapture attribute is
>> already exposed to C/C++ via __attribute__((noescape)).
>>
>> Here is an example of __attribute__((noescape)) doing its thing:
>> https://godbolt.org/z/rJbEmZ
>> In function `one`, the call to `noargs` is assumed to trash the value of
>> `i`.
>> In function `two`, the call to `noargs` is *not* assumed to trash the
>> value
>> of `i`.
>>
>> However, notice that the store corresponding to the assignment `i = 42;`
>> cannot be removed in either case!  We initialize `i` to 42, then pass
>> it by
>> const reference to `dont_escape`, and yet *cannot assume that its
>> value is
>> unchanged by that call*. This is required by C++'s loose rules around
>> `const`. But it doesn't match up with human intuition very well.
>>
>> I mentioned to Zoe offline that I'd like to see Clang add an opt-in
>> non-conforming optimization mode, on the lines of `-ffast-math`, which
>> would *assume* things like "things passed by const reference don't
>> change"
>> and "things passed by reference don't escape," and then measure to what
>> degree codegen is improved on real codebases by that non-conforming
>> optimization mode.
>
> This is quite an interesting idea.  Having a mode that assumes that
> `const` references don’t get cast back might be quite interesting.  I
> suspect that enforcing stricter escape rules would break some specific
> idioms and types.
>

Agreed. There are no doubt use cases though. I remember the one vendor
compiler

that offered a flag to make *all* arguments restrict. It for sure breaks
code,

but some seemed to really like it. There is also the option to only
"apply" these

to user code, not system includes (assuming we can reasonably
differentiate).


-- Johannes


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
12