Question about optimization of new and delete

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Question about optimization of new and delete

Nicola Gigante
Hello.

Probably this is an old enough question, sorry in this case.

I was wondering which optimizations clang/llvm does (or can do, given restrictions from the standard)
about the use of default new/delete operators.

In particular, I'm wondering about a couple of interesting cases:

1) Can clang convert a call to default operator new to stack allocation, if it can prove that
the lifetime of the allocated memory is the same as the enclosing scope?
Would it be someway contrary to the standard or somewhat undesirable?

Trivial example:

void func() {
    Class *obj = new Class(args);
    obj->stuff();
    delete obj;
}

Trying to compile something simple like this, I see the couple of operator calls are elided, but in more complex cases in
my code this does not happen, and I would like to know why (is the effect of a tradeoff with the size of the allocated memory?)

2) Can clang/llvm identify as a dead store a write to a piece of memory that will be delete?

void func(Class *obj) {
    obj->member = 42; // Is this store removed?
    delete obj;
}

Compiling this exact code with -O3, I see the store still there.

Thank you very much.

Nicola
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Question about optimization of new and delete

Chandler Carruth
On Sun, Sep 29, 2013 at 12:04 PM, Nicola Gigante <[hidden email]> wrote:
Hello.

Probably this is an old enough question, sorry in this case.

Heh, not really. We're just now getting the C++ standard fixed: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3664.html
 

I was wondering which optimizations clang/llvm does (or can do, given restrictions from the standard)
about the use of default new/delete operators.

See the paper for the high level theory of what the standard allows (once it goes in). The current plan in Clang is to follow that paper's strategy of optimization even in older standard modes unless there is significant existing code broken by it.
 

In particular, I'm wondering about a couple of interesting cases:

1) Can clang convert a call to default operator new to stack allocation, if it can prove that
the lifetime of the allocated memory is the same as the enclosing scope?
Would it be someway contrary to the standard or somewhat undesirable?

We can do this, but it may cause an unacceptable growth in stack usage. Hal Finkel has recently started work on an optimization pass to do precisely this, and I'm helping review and get it into the tree. I would expect Clang to start doing this soon for small, constant size allocations.

Currently, Clang only really does this when it can delete the allocation entirely, which is probably why you see confusing results from your testing.


2) Can clang/llvm identify as a dead store a write to a piece of memory that will be delete?

void func(Class *obj) {
    obj->member = 42; // Is this store removed?
    delete obj;
}

Compiling this exact code with -O3, I see the store still there.

Huh. No, I bet we miss this. This is almost certainly just a missed optimization that we should get.


However, note that all of Clang's optimizations of this form are mostly conducted by the LLVM optimization libraries Clang is built on, and so the fix to this missed optimization will likely involve mostly a change to LLVM.

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Question about optimization of new and delete

Nicola Gigante

Il giorno 30/set/2013, alle ore 00:28, Chandler Carruth <[hidden email]> ha scritto:


Heh, not really. We're just now getting the C++ standard fixed: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3664.html

Uh, ok :)

See the paper for the high level theory of what the standard allows (once it goes in). The current plan in Clang is to follow that paper's strategy of optimization even in older standard modes unless there is significant existing code broken by it.
 

What could be broken? I think such source code might be quite perverse.. 

We can do this, but it may cause an unacceptable growth in stack usage. Hal Finkel has recently started work on an optimization pass to do precisely this, and I'm helping review and get it into the tree. I would expect Clang to start doing this soon for small, constant size allocations.


This applies to most common use cases, right? Most objects are little…
Anyway, what will be the criterion for the decision of which allocation to lower to the stack and which not (e.g. what does "small" mean)?


2) Can clang/llvm identify as a dead store a write to a piece of memory that will be delete?

void func(Class *obj) {
    obj->member = 42; // Is this store removed?
    delete obj;
}

Compiling this exact code with -O3, I see the store still there.

Huh. No, I bet we miss this. This is almost certainly just a missed optimization that we should get.


However, note that all of Clang's optimizations of this form are mostly conducted by the LLVM optimization libraries Clang is built on, and so the fix to this missed optimization will likely involve mostly a change to LLVM.

Yeah, of course. But what I don't know is how does LLVM know at IR-level that a call to _Znam allocates memory and is paired with a call to _Zdam that frees it? It's special knowledge about this symbols, or some attribute put there in the extern symbol declaration?
For example, for this dead store, can you model in the IR that the memory will be lost or do you need a special case in the dead store elimination pass? Anyway, is it C++ specific or is there some generic architecture to model dynamic allocation (I believed not)?

I ask this because in another project I'm writing a llvm-based compiler and I'll want to apply those kind of optimizations as well.

Maybe, it could be worth to use some intrisincs, lowered by a front-end specific pass, that model dynamic allocation primitives.
What do you think?

Thanks,
Nicola

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Question about optimization of new and delete

Benjamin Kramer-2
In reply to this post by Nicola Gigante

On 29.09.2013, at 19:04, Nicola Gigante <[hidden email]> wrote:

> Hello.
>
> Probably this is an old enough question, sorry in this case.
>
> I was wondering which optimizations clang/llvm does (or can do, given restrictions from the standard)
> about the use of default new/delete operators.
>
> In particular, I'm wondering about a couple of interesting cases:
>
> 1) Can clang convert a call to default operator new to stack allocation, if it can prove that
> the lifetime of the allocated memory is the same as the enclosing scope?
> Would it be someway contrary to the standard or somewhat undesirable?
>
> Trivial example:
>
> void func() {
>    Class *obj = new Class(args);
>    obj->stuff();
>    delete obj;
> }
>
> Trying to compile something simple like this, I see the couple of operator calls are elided, but in more complex cases in
> my code this does not happen, and I would like to know why (is the effect of a tradeoff with the size of the allocated memory?)
>
> 2) Can clang/llvm identify as a dead store a write to a piece of memory that will be delete?
>
> void func(Class *obj) {
>    obj->member = 42; // Is this store removed?
>    delete obj;
> }
>
> Compiling this exact code with -O3, I see the store still there.

I've been fighting similar patterns recently. The IR looks like this:

define void @_Z4funcP5Class(%struct.Class* %obj) #0 {
entry:
  %member = getelementptr inbounds %struct.Class* %obj, i64 0, i32 0
  store i32 42, i32* %member, align 4, !tbaa !0
  %isnull = icmp eq %struct.Class* %obj, null
  br i1 %isnull, label %delete.end, label %delete.notnull

delete.notnull:                                   ; preds = %entry
  %0 = bitcast %struct.Class* %obj to i8*
  tail call void @_ZdlPv(i8* %0) #2
  br label %delete.end

delete.end:                                       ; preds = %delete.notnull, %entry
  ret void
}

The superfluous NULL check here is preventing the elimination of the store. In my opinion we should optimize this based on the assumption that a pointer can't be NULL after a load/store to it. GCC does this and got some flak because it turned a crash into a security problem in the Linux kernel (-fno-delete-null-pointer-checks was introduced as a remedy). I'm not sure how to best implement that with our current VRP infrastructure.

- Ben
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Question about optimization of new and delete

David Tweed-2
In reply to this post by Nicola Gigante

| Anyway, is it C++ specific or is there some generic architecture to model dynamic allocation (I believed not)?

 

Most of this is based in lib/Analysis/MemoryBuiltins.cpp, which basically splits things into “prototypical types of behaviour” for functions that LLVM knows how to do useful transformations with, and some plumbing to connect the function names (or mangled function names in C++) to the appropriate behaviour.

 

Cheers,

Dave


-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Question about optimization of new and delete

Hal Finkel
In reply to this post by Nicola Gigante
----- Original Message -----

>
>
>
>
> Il giorno 30/set/2013, alle ore 00:28, Chandler Carruth <
> [hidden email] > ha scritto:
>
>
>
>
>
>
>
>
> Heh, not really. We're just now getting the C++ standard fixed:
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3664.html
>
>
> Uh, ok :)
>
>
>
>
>
>
> See the paper for the high level theory of what the standard allows
> (once it goes in). The current plan in Clang is to follow that
> paper's strategy of optimization even in older standard modes unless
> there is significant existing code broken by it.
>
>
>
> What could be broken? I think such source code might be quite
> perverse..
>
>
>
>
>
>
> We can do this, but it may cause an unacceptable growth in stack
> usage. Hal Finkel has recently started work on an optimization pass
> to do precisely this, and I'm helping review and get it into the
> tree. I would expect Clang to start doing this soon for small,
> constant size allocations.
>
>
>
>
> This applies to most common use cases, right? Most objects are
> little…
> Anyway, what will be the criterion for the decision of which
> allocation to lower to the stack and which not (e.g. what does
> "small" mean)?
>

The criteria have not yet been decided, and I think some experimentation will be necessary. In the current implementation, the conversion happens for all allocations provably less than 1024 bytes. However, we may want to cap the total size of converted mallocs in addition to, or instead of, the individual sizes. Maybe this cap should depend on how much stack memory is already being requested by the function. I'm certainly open to suggestion.

>
>
>
>
>
>
>
>
>
> 2) Can clang/llvm identify as a dead store a write to a piece of
> memory that will be delete?
>
> void func(Class *obj) {
> obj->member = 42; // Is this store removed?
> delete obj;
> }
>
> Compiling this exact code with -O3, I see the store still there.
>
>
>
> Huh. No, I bet we miss this. This is almost certainly just a missed
> optimization that we should get.

Is there a call to the destructor after the store? If so, then we might not know that the destructor does not access the stored value.

>
>
>
>
> However, note that all of Clang's optimizations of this form are
> mostly conducted by the LLVM optimization libraries Clang is built
> on, and so the fix to this missed optimization will likely involve
> mostly a change to LLVM.
>
> Yeah, of course. But what I don't know is how does LLVM know at
> IR-level that a call to _Znam allocates memory and is paired with a
> call to _Zdam that frees it? It's special knowledge about this
> symbols, or some attribute put there in the extern symbol
> declaration?

Yes, LLVM has special knowledge of certain external symbols. In this case, the relevant code is in: lib/Analysis/MemoryBuiltins.cpp (in LLVM). Although perhaps not obvious from the code, the symbol recognition depends on both the name, and the presence of a special 'builtin' attribute. Clang will add this special attribute in cases where the compiler is free to make assumptions about the semantics of the call.

> For example, for this dead store, can you model in the IR that the
> memory will be lost or do you need a special case in the dead store
> elimination pass? Anyway, is it C++ specific or is there some
> generic architecture to model dynamic allocation (I believed not)?

It is not C++ specific, although there may be some C++-specific knowledge in some of the function categories.

>
>
> I ask this because in another project I'm writing a llvm-based
> compiler and I'll want to apply those kind of optimizations as well.
>
>
> Maybe, it could be worth to use some intrisincs, lowered by a
> front-end specific pass, that model dynamic allocation primitives.
> What do you think?

We may end up with something like that at some point, but for now the builtin symbol recognition has been sufficient. FWIW, we also do this same thing with other 'libc' functions, see for example lib/Transforms/Utils/SimplifyLibCalls.cpp, and some functions get special CodeGen support as well.

 -Hal

>
>
> Thanks,
> Nicola
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>

--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Question about optimization of new and delete

Hal Finkel
In reply to this post by Benjamin Kramer-2
----- Original Message -----

>
> On 29.09.2013, at 19:04, Nicola Gigante <[hidden email]>
> wrote:
>
> > Hello.
> >
> > Probably this is an old enough question, sorry in this case.
> >
> > I was wondering which optimizations clang/llvm does (or can do,
> > given restrictions from the standard)
> > about the use of default new/delete operators.
> >
> > In particular, I'm wondering about a couple of interesting cases:
> >
> > 1) Can clang convert a call to default operator new to stack
> > allocation, if it can prove that
> > the lifetime of the allocated memory is the same as the enclosing
> > scope?
> > Would it be someway contrary to the standard or somewhat
> > undesirable?
> >
> > Trivial example:
> >
> > void func() {
> >    Class *obj = new Class(args);
> >    obj->stuff();
> >    delete obj;
> > }
> >
> > Trying to compile something simple like this, I see the couple of
> > operator calls are elided, but in more complex cases in
> > my code this does not happen, and I would like to know why (is the
> > effect of a tradeoff with the size of the allocated memory?)
> >
> > 2) Can clang/llvm identify as a dead store a write to a piece of
> > memory that will be delete?
> >
> > void func(Class *obj) {
> >    obj->member = 42; // Is this store removed?
> >    delete obj;
> > }
> >
> > Compiling this exact code with -O3, I see the store still there.
>
> I've been fighting similar patterns recently. The IR looks like this:
>
> define void @_Z4funcP5Class(%struct.Class* %obj) #0 {
> entry:
>   %member = getelementptr inbounds %struct.Class* %obj, i64 0, i32 0
>   store i32 42, i32* %member, align 4, !tbaa !0
>   %isnull = icmp eq %struct.Class* %obj, null
>   br i1 %isnull, label %delete.end, label %delete.notnull
>
> delete.notnull:                                   ; preds = %entry
>   %0 = bitcast %struct.Class* %obj to i8*
>   tail call void @_ZdlPv(i8* %0) #2
>   br label %delete.end
>
> delete.end:                                       ; preds =
> %delete.notnull, %entry
>   ret void
> }
>
> The superfluous NULL check here is preventing the elimination of the
> store. In my opinion we should optimize this based on the assumption
> that a pointer can't be NULL after a load/store to it. GCC does this
> and got some flak because it turned a crash into a security problem
> in the Linux kernel (-fno-delete-null-pointer-checks was introduced
> as a remedy). I'm not sure how to best implement that with our
> current VRP infrastructure.

Interesting. Do you mean that the kernel will not internally fault if it stores to some place near 0x0? Or was the offset so large that it was going someplace valid?

In any case, I guess that we should probably do the same.

 -Hal

>
> - Ben
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>

--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Question about optimization of new and delete

Nicola Gigante
In reply to this post by Hal Finkel

Il giorno 30/set/2013, alle ore 15:11, Hal Finkel <[hidden email]> ha scritto:
>
> The criteria have not yet been decided, and I think some experimentation will be necessary. In the current implementation, the conversion happens for all allocations provably less than 1024 bytes. However, we may want to cap the total size of converted mallocs in addition to, or instead of, the individual sizes. Maybe this cap should depend on how much stack memory is already being requested by the function. I'm certainly open to suggestion.
>

Capping the total size seems more reasonable to me, but as you said, it's necessary to experiment.
Anyway, I was thinking about a tradeoff between stack usage and speed based on the "hotness" of the code (how to call it?). For example a
relatively big allocation inside a tight loop might be lowered while an allocation of the same size at function scope would not.
Also, I think the function being recursive and/but getting transformed by tail call elimination should be factors to take into account.

What do you think?

>
> Is there a call to the destructor after the store? If so, then we might not know that the destructor does not access the stored value.
>

No there's not. The issue exists with an empty class with only the member.


Thanks,
Nicola
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Question about optimization of new and delete

Hal Finkel
----- Original Message -----

>
> Il giorno 30/set/2013, alle ore 15:11, Hal Finkel <[hidden email]>
> ha scritto:
> >
> > The criteria have not yet been decided, and I think some
> > experimentation will be necessary. In the current implementation,
> > the conversion happens for all allocations provably less than 1024
> > bytes. However, we may want to cap the total size of converted
> > mallocs in addition to, or instead of, the individual sizes. Maybe
> > this cap should depend on how much stack memory is already being
> > requested by the function. I'm certainly open to suggestion.
> >
>
> Capping the total size seems more reasonable to me, but as you said,
> it's necessary to experiment.
> Anyway, I was thinking about a tradeoff between stack usage and speed
> based on the "hotness" of the code (how to call it?). For example a
> relatively big allocation inside a tight loop might be lowered while
> an allocation of the same size at function scope would not.
> Also, I think the function being recursive and/but getting
> transformed by tail call elimination should be factors to take into
> account.
>
> What do you think?

These all seem like good ideas; once the initial support lands upstream, please play with it. I think that user feedback is really going to be the only way to set these various thresholds.

 -Hal

>
> >
> > Is there a call to the destructor after the store? If so, then we
> > might not know that the destructor does not access the stored
> > value.
> >
>
> No there's not. The issue exists with an empty class with only the
> member.
>
>
> Thanks,
> Nicola

--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Question about optimization of new and delete

Nicola Gigante

Il giorno 01/ott/2013, alle ore 07:48, Hal Finkel <[hidden email]> ha scritto:
>
> These all seem like good ideas; once the initial support lands upstream, please play with it. I think that user feedback is really going to be the only way to set these various thresholds.
>

Of course I'll play with it! Please let us know on this thread when the commit comes out.

> -Hal

Thank you,
Nicola
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Question about optimization of new and delete

Stephan Tolksdorf
In reply to this post by Hal Finkel
Hal Finkel wrote:

>> Capping the total size seems more reasonable to me, but as you said,
>> it's necessary to experiment.
>> Anyway, I was thinking about a tradeoff between stack usage and speed
>> based on the "hotness" of the code (how to call it?). For example a
>> relatively big allocation inside a tight loop might be lowered while
>> an allocation of the same size at function scope would not.
>> Also, I think the function being recursive and/but getting
>> transformed by tail call elimination should be factors to take into
>> account.
>>
>> What do you think?
>
> These all seem like good ideas; once the initial support lands upstream, please play with it. I think that user feedback is really going to be the only way to set these various thresholds.

Stack space constraints can vary so greatly in practice that it might be
a good idea to make these thresholds user-adjustable through compiler
switches, pragmas or function attributes.

The default thresholds for optimizing allocations in non-leaf functions
should probably be very low, as you otherwise risk blowing up programs
like e.g. deeply recursive parsers or network servers with a million
lightweight threads.

To monitor the effects of such an optimization it would be nice if the
compiler could optionally output some information on the minimum and
maximum stack usage of functions.

I also suspect that the effectiveness of this optimization will depend a
lot on whether functions can be opportunistically inlined for it.

- Stephan

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev