Inlining temporary destructors in the analyzer

classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Inlining temporary destructors in the analyzer

Manuel Klimek
Hi Jordan,

I made some progress on inlining temporary destructors. My main problem currently is that somehow the initializer values in my constructor get lost if I don't also reference 'this' via some other way.

An examples shows that best. Consider:
struct A {
  A(int* y) : z(y) {}
  ~A() { *z = 23; }
  int *z;
};
void f() {
  int x = 0;
  {(A(&x));}
}

This leads to a "Dereference of undefined pointer value" warning on *z = 23.
Funnily enough this warning goes away (and all values get correctly propagated) if change the constructor to be:
A(int* y) : z(y) { (void)this; }

Obviously this also works fine if I use a local variable instead of a temporary.

Any ideas what might be going wrong here would be greatly appreciated.
Thanks!
/Manuel

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Inlining temporary destructors in the analyzer

Jordan Rose

On Jul 30, 2014, at 8:14 , Manuel Klimek <[hidden email]> wrote:

Hi Jordan,

I made some progress on inlining temporary destructors. My main problem currently is that somehow the initializer values in my constructor get lost if I don't also reference 'this' via some other way.

An examples shows that best. Consider:
struct A {
  A(int* y) : z(y) {}
  ~A() { *z = 23; }
  int *z;
};
void f() {
  int x = 0;
  {(A(&x));}
}

This leads to a "Dereference of undefined pointer value" warning on *z = 23.
Funnily enough this warning goes away (and all values get correctly propagated) if change the constructor to be:
A(int* y) : z(y) { (void)this; }

Obviously this also works fine if I use a local variable instead of a temporary.

Any ideas what might be going wrong here would be greatly appreciated.

My guess is that the temporary is an rvalue, and in the analyzer a NonLoc, and that the only reason this even appears to work is because we needed to be able to call methods on temporaries and so will create a temporary region for them on demand to serve as the 'this' pointer. However, I don't think we expect that to happen for constructors, and so we don't keep the region around after the constructor has finished executing. (That is, when it's just a method being called—and neither the constructor nor destructor are inlined—it doesn't actually matter what you do to the contents of the region, because nothing else can access it.)

I am a little confused as to how your workaround actually changes anything, but regardless there is an issue here.

To fix this, we may need to treat temporaries as Loc values always, and figure out how to teach the rest of the analyzer to ignore the fact that the Expr says it's an rvalue. Alternately, we could update the compiler to actually treat these as xvalues (which are glvalues), and therefore we'd have a location we could deal with. I vaguely remember talking to Richard about this at one point and him thinking it to be a reasonable idea.

Or I could be totally off-base on this, because I'm coming up with this from memory rather than actually digging into the program trace.

Jordan

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Inlining temporary destructors in the analyzer

Manuel Klimek
On Thu, Jul 31, 2014 at 5:19 AM, Jordan Rose <[hidden email]> wrote:

On Jul 30, 2014, at 8:14 , Manuel Klimek <[hidden email]> wrote:

Hi Jordan,

I made some progress on inlining temporary destructors. My main problem currently is that somehow the initializer values in my constructor get lost if I don't also reference 'this' via some other way.

An examples shows that best. Consider:
struct A {
  A(int* y) : z(y) {}
  ~A() { *z = 23; }
  int *z;
};
void f() {
  int x = 0;
  {(A(&x));}
}

This leads to a "Dereference of undefined pointer value" warning on *z = 23.
Funnily enough this warning goes away (and all values get correctly propagated) if change the constructor to be:
A(int* y) : z(y) { (void)this; }

Obviously this also works fine if I use a local variable instead of a temporary.

Any ideas what might be going wrong here would be greatly appreciated.

My guess is that the temporary is an rvalue, and in the analyzer a NonLoc,

Yes, it's a NonLoc, but I don't understand what that means yet :) (especially where that would make a difference how something that is written into the Region gets treated)
 
and that the only reason this even appears to work is because we needed to be able to call methods on temporaries and so will create a temporary region for them on demand to serve as the 'this' pointer. However, I don't think we expect that to happen for constructors, and so we don't keep the region around after the constructor has finished executing. (That is, when it's just a method being called—and neither the constructor nor destructor are inlined—it doesn't actually matter what you do to the contents of the region, because nothing else can access it.)

I am a little confused as to how your workaround actually changes anything, but regardless there is an issue here.

To fix this, we may need to treat temporaries as Loc values always, and figure out how to teach the rest of the analyzer to ignore the fact that the Expr says it's an rvalue. Alternately, we could update the compiler to actually treat these as xvalues (which are glvalues), and therefore we'd have a location we could deal with. I vaguely remember talking to Richard about this at one point and him thinking it to be a reasonable idea.

Or I could be totally off-base on this, because I'm coming up with this from memory rather than actually digging into the program trace.

Jordan


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Inlining temporary destructors in the analyzer

Richard Smith
In reply to this post by Jordan Rose
On Wed, Jul 30, 2014 at 8:19 PM, Jordan Rose <[hidden email]> wrote:

On Jul 30, 2014, at 8:14 , Manuel Klimek <[hidden email]> wrote:

Hi Jordan,

I made some progress on inlining temporary destructors. My main problem currently is that somehow the initializer values in my constructor get lost if I don't also reference 'this' via some other way.

An examples shows that best. Consider:
struct A {
  A(int* y) : z(y) {}
  ~A() { *z = 23; }
  int *z;
};
void f() {
  int x = 0;
  {(A(&x));}
}

This leads to a "Dereference of undefined pointer value" warning on *z = 23.
Funnily enough this warning goes away (and all values get correctly propagated) if change the constructor to be:
A(int* y) : z(y) { (void)this; }

Obviously this also works fine if I use a local variable instead of a temporary.

Any ideas what might be going wrong here would be greatly appreciated.

My guess is that the temporary is an rvalue, and in the analyzer a NonLoc, and that the only reason this even appears to work is because we needed to be able to call methods on temporaries and so will create a temporary region for them on demand to serve as the 'this' pointer. However, I don't think we expect that to happen for constructors, and so we don't keep the region around after the constructor has finished executing. (That is, when it's just a method being called—and neither the constructor nor destructor are inlined—it doesn't actually matter what you do to the contents of the region, because nothing else can access it.)

I am a little confused as to how your workaround actually changes anything, but regardless there is an issue here.

To fix this, we may need to treat temporaries as Loc values always, and figure out how to teach the rest of the analyzer to ignore the fact that the Expr says it's an rvalue. Alternately, we could update the compiler to actually treat these as xvalues (which are glvalues), and therefore we'd have a location we could deal with. I vaguely remember talking to Richard about this at one point and him thinking it to be a reasonable idea.

C++ itself is very definitely moving in this direction. When you have:

  struct S { S(); int n; };

the expression S() is a prvalue but S().n is now an xvalue. Clang doesn't implement that yet, but the obvious way to do it is to create

  MemberExpr xvalue .n
  `- MaterializeTemporaryExpr xvalue
    `- CXXTemporaryObjectExpr prvalue S()

... that is, materialize a temporary any time member access is performed. (The same thing happens when array indexing is performed on an array temporary.) This naturally extends to also materializing a temporary when a member function is called on it.

FWIW, this representation would also fix the wrong lifetime extension in a case like:

  int &&r = S().n;

... where we currently fail to lifetime-extend the S object because we never materialized it.

Or I could be totally off-base on this, because I'm coming up with this from memory rather than actually digging into the program trace.

Jordan

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Inlining temporary destructors in the analyzer

Manuel Klimek
On Thu, Jul 31, 2014 at 9:12 AM, Richard Smith <[hidden email]> wrote:
On Wed, Jul 30, 2014 at 8:19 PM, Jordan Rose <[hidden email]> wrote:

On Jul 30, 2014, at 8:14 , Manuel Klimek <[hidden email]> wrote:

Hi Jordan,

I made some progress on inlining temporary destructors. My main problem currently is that somehow the initializer values in my constructor get lost if I don't also reference 'this' via some other way.

An examples shows that best. Consider:
struct A {
  A(int* y) : z(y) {}
  ~A() { *z = 23; }
  int *z;
};
void f() {
  int x = 0;
  {(A(&x));}
}

This leads to a "Dereference of undefined pointer value" warning on *z = 23.
Funnily enough this warning goes away (and all values get correctly propagated) if change the constructor to be:
A(int* y) : z(y) { (void)this; }

Obviously this also works fine if I use a local variable instead of a temporary.

Any ideas what might be going wrong here would be greatly appreciated.

My guess is that the temporary is an rvalue, and in the analyzer a NonLoc, and that the only reason this even appears to work is because we needed to be able to call methods on temporaries and so will create a temporary region for them on demand to serve as the 'this' pointer. However, I don't think we expect that to happen for constructors, and so we don't keep the region around after the constructor has finished executing. (That is, when it's just a method being called—and neither the constructor nor destructor are inlined—it doesn't actually matter what you do to the contents of the region, because nothing else can access it.)

I am a little confused as to how your workaround actually changes anything, but regardless there is an issue here.

To fix this, we may need to treat temporaries as Loc values always, and figure out how to teach the rest of the analyzer to ignore the fact that the Expr says it's an rvalue. Alternately, we could update the compiler to actually treat these as xvalues (which are glvalues), and therefore we'd have a location we could deal with. I vaguely remember talking to Richard about this at one point and him thinking it to be a reasonable idea.

C++ itself is very definitely moving in this direction. When you have:

  struct S { S(); int n; };

the expression S() is a prvalue but S().n is now an xvalue. Clang doesn't implement that yet, but the obvious way to do it is to create

  MemberExpr xvalue .n
  `- MaterializeTemporaryExpr xvalue
    `- CXXTemporaryObjectExpr prvalue S()

... that is, materialize a temporary any time member access is performed. (The same thing happens when array indexing is performed on an array temporary.) This naturally extends to also materializing a temporary when a member function is called on it.

FWIW, this representation would also fix the wrong lifetime extension in a case like:

  int &&r = S().n;

... where we currently fail to lifetime-extend the S object because we never materialized it.

Not all temporaries have MaterializeTemporary though (some just have a CXXBindTemporaryExpr if they die after the full expression). How should we handle that case then?
Also, while we're at it - would it make sense to have the MaterializeTemporaryExpr somehow reference the CXXBindTemoprary it handles (iff Materialize actually materializes a C++ object). I imagine we'd run into problems with conditional operators, though, where the Materialize can reference one of multiple possible constructed objects (the CXXBindTemporary is inside the conditional operator branches).

Cheers,
/Manuel
 

Or I could be totally off-base on this, because I'm coming up with this from memory rather than actually digging into the program trace.

Jordan

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev




_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Inlining temporary destructors in the analyzer

Manuel Klimek
In reply to this post by Jordan Rose
On Thu, Jul 31, 2014 at 5:19 AM, Jordan Rose <[hidden email]> wrote:

On Jul 30, 2014, at 8:14 , Manuel Klimek <[hidden email]> wrote:

Hi Jordan,

I made some progress on inlining temporary destructors. My main problem currently is that somehow the initializer values in my constructor get lost if I don't also reference 'this' via some other way.

An examples shows that best. Consider:
struct A {
  A(int* y) : z(y) {}
  ~A() { *z = 23; }
  int *z;
};
void f() {
  int x = 0;
  {(A(&x));}
}

This leads to a "Dereference of undefined pointer value" warning on *z = 23.
Funnily enough this warning goes away (and all values get correctly propagated) if change the constructor to be:
A(int* y) : z(y) { (void)this; }

Obviously this also works fine if I use a local variable instead of a temporary.

Any ideas what might be going wrong here would be greatly appreciated.

My guess is that the temporary is an rvalue, and in the analyzer a NonLoc, and that the only reason this even appears to work is because we needed to be able to call methods on temporaries and so will create a temporary region for them on demand to serve as the 'this' pointer. However, I don't think we expect that to happen for constructors, and so we don't keep the region around after the constructor has finished executing. (That is, when it's just a method being called—and neither the constructor nor destructor are inlined—it doesn't actually matter what you do to the contents of the region, because nothing else can access it.)


So, funnily enough if I remove this code:
  if (isTemporaryPRValue(CCE, ThisV))
  ThisV = state->getSVal(ThisV.castAs<Loc>());
which puts the temp region into a lazycompoundregion, the memory seems to be correctly handled. I have not yet looked what else this breaks though.
 
I am a little confused as to how your workaround actually changes anything, but regardless there is an issue here.

To fix this, we may need to treat temporaries as Loc values always, and figure out how to teach the rest of the analyzer to ignore the fact that the Expr says it's an rvalue. Alternately, we could update the compiler to actually treat these as xvalues (which are glvalues), and therefore we'd have a location we could deal with. I vaguely remember talking to Richard about this at one point and him thinking it to be a reasonable idea.

Or I could be totally off-base on this, because I'm coming up with this from memory rather than actually digging into the program trace.

Jordan


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Inlining temporary destructors in the analyzer

Jordan Rose

On Jul 31, 2014, at 1:20 , Manuel Klimek <[hidden email]> wrote:

On Thu, Jul 31, 2014 at 5:19 AM, Jordan Rose <[hidden email]> wrote:

On Jul 30, 2014, at 8:14 , Manuel Klimek <[hidden email]> wrote:

Hi Jordan,

I made some progress on inlining temporary destructors. My main problem currently is that somehow the initializer values in my constructor get lost if I don't also reference 'this' via some other way.

An examples shows that best. Consider:
struct A {
  A(int* y) : z(y) {}
  ~A() { *z = 23; }
  int *z;
};
void f() {
  int x = 0;
  {(A(&x));}
}

This leads to a "Dereference of undefined pointer value" warning on *z = 23.
Funnily enough this warning goes away (and all values get correctly propagated) if change the constructor to be:
A(int* y) : z(y) { (void)this; }

Obviously this also works fine if I use a local variable instead of a temporary.

Any ideas what might be going wrong here would be greatly appreciated.

My guess is that the temporary is an rvalue, and in the analyzer a NonLoc, and that the only reason this even appears to work is because we needed to be able to call methods on temporaries and so will create a temporary region for them on demand to serve as the 'this' pointer. However, I don't think we expect that to happen for constructors, and so we don't keep the region around after the constructor has finished executing. (That is, when it's just a method being called—and neither the constructor nor destructor are inlined—it doesn't actually matter what you do to the contents of the region, because nothing else can access it.)


So, funnily enough if I remove this code:
  if (isTemporaryPRValue(CCE, ThisV))
  ThisV = state->getSVal(ThisV.castAs<Loc>());
which puts the temp region into a lazycompoundregion, the memory seems to be correctly handled. I have not yet looked what else this breaks though.

Okay, analyzer design time. :-)

C++ has a distinction between lvalues and rvalues (actually glvalues and prvalues). There are a number of things that go into this distinction, but the part the analyzer cares about is that glvalues refer to true objects and prvalues do not. (In C++ parlance, "object" refers to anything with storage, a type, and lifetime, not just class instances. In the analyzer, we call such things "regions".)

If you were to build a very literal C++ compiler, every glvalue would be represented by an address. PRValues*, on the other hand, don't really need addresses; they only exist to be used transiently.

* How do you capitalize "prvalue"?

The analyzer pretty much always believes the compiler when it comes to glvalues vs. prvalues. That means that the value of a glvalue expression is always a Loc. A prvalue expression, on the other hand, will only be a Loc if it's a pointer type*. If you're talking about a struct prvalue, what you're really talking about is the bindings in its memory region—it has no identity. This is represented by a CompoundVal or LazyCompoundVal. This model worked great for C, but it's starting to show strain in C++.

* or one of the other types mentioned in Loc::isLocType. But not all of those can be expression values.

Besides "regular" temporaries, this is also a problem for prvalue function arguments. When the analyzer inlines a function, it just blithely binds the arguments to the proper VarRegions created for each ParmVarDecl in the current stack frame. That doesn't really fly if we're supposed to be calling the copy constructor. Worse, the copy constructor call happens outside of the function call, which means it happens before we've even decided whether or not to inline, which means there are no VarRegions yet to construct into. We've/I've been putting off dealing with this one for a while; we get around it mostly because we don't inline constructors for temporaries.

The general temporary problem goes to what Richard was saying: a temporary expression is an rvalue, but if we have to run a constructor then clearly there was a region at some point. The C++11 standard (at least the version I have) is not wonderful about classifying temporaries ([basic.lval]p1):

- An lvalue...designates a function or an object.
- An xvalue (an “eXpiring” value) also refers to an object, usually near the end of its lifetime.
- A glvalue (“generalized” lvalue) is an lvalue or an xvalue.
- An rvalue...is an xvalue, a temporary object (12.2) or subobject thereof, or a value that is not associated with an object.
- A prvalue (“pure” rvalue) is an rvalue that is not an xvalue.

So temporary objects are prvalues. Even though they are modifiable (defined later in the same section). Even though with construction and destruction they end up needing a fixed region, or at least consistency in region contents.

I don't have a great answer here. My guess is we'll probably end up turning on some inlining for temporaries to start, but not all.

Oh, right, so what's that 'isTemporaryPRValue' doing? That's taking the few cases where the AST wants a prvalue but the analyzer really needs a valid Loc, and conjuring a region on the fly. That's important for things that need to have referential identity, like the 'this' pointer in a method called on a temporary, or (say) a temporary that's been lifetime-extended.

Yuck, right? Just making all temporaries into xvalues would most likely make 60% of these problems go away.
Jordan

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Inlining temporary destructors in the analyzer

Manuel Klimek
On Thu, Jul 31, 2014 at 9:36 PM, Jordan Rose <[hidden email]> wrote:

On Jul 31, 2014, at 1:20 , Manuel Klimek <[hidden email]> wrote:

On Thu, Jul 31, 2014 at 5:19 AM, Jordan Rose <[hidden email]> wrote:

On Jul 30, 2014, at 8:14 , Manuel Klimek <[hidden email]> wrote:

Hi Jordan,

I made some progress on inlining temporary destructors. My main problem currently is that somehow the initializer values in my constructor get lost if I don't also reference 'this' via some other way.

An examples shows that best. Consider:
struct A {
  A(int* y) : z(y) {}
  ~A() { *z = 23; }
  int *z;
};
void f() {
  int x = 0;
  {(A(&x));}
}

This leads to a "Dereference of undefined pointer value" warning on *z = 23.
Funnily enough this warning goes away (and all values get correctly propagated) if change the constructor to be:
A(int* y) : z(y) { (void)this; }

Obviously this also works fine if I use a local variable instead of a temporary.

Any ideas what might be going wrong here would be greatly appreciated.

My guess is that the temporary is an rvalue, and in the analyzer a NonLoc, and that the only reason this even appears to work is because we needed to be able to call methods on temporaries and so will create a temporary region for them on demand to serve as the 'this' pointer. However, I don't think we expect that to happen for constructors, and so we don't keep the region around after the constructor has finished executing. (That is, when it's just a method being called—and neither the constructor nor destructor are inlined—it doesn't actually matter what you do to the contents of the region, because nothing else can access it.)


So, funnily enough if I remove this code:
  if (isTemporaryPRValue(CCE, ThisV))
  ThisV = state->getSVal(ThisV.castAs<Loc>());
which puts the temp region into a lazycompoundregion, the memory seems to be correctly handled. I have not yet looked what else this breaks though.

Okay, analyzer design time. :-)

C++ has a distinction between lvalues and rvalues (actually glvalues and prvalues). There are a number of things that go into this distinction, but the part the analyzer cares about is that glvalues refer to true objects and prvalues do not. (In C++ parlance, "object" refers to anything with storage, a type, and lifetime, not just class instances. In the analyzer, we call such things "regions".)

If you were to build a very literal C++ compiler, every glvalue would be represented by an address. PRValues*, on the other hand, don't really need addresses; they only exist to be used transiently.

I'm not sure how "exists transiently" equals "doesn't need an address". At least in C++, this seems incorrect (?) for objects.
 
* How do you capitalize "prvalue"?

The analyzer pretty much always believes the compiler when it comes to glvalues vs. prvalues. That means that the value of a glvalue expression is always a Loc. A prvalue expression, on the other hand, will only be a Loc if it's a pointer type*. If you're talking about a struct prvalue, what you're really talking about is the bindings in its memory region—it has no identity. This is represented by a CompoundVal or LazyCompoundVal. This model worked great for C, but it's starting to show strain in C++.

* or one of the other types mentioned in Loc::isLocType. But not all of those can be expression values.

Besides "regular" temporaries, this is also a problem for prvalue function arguments. When the analyzer inlines a function, it just blithely binds the arguments to the proper VarRegions created for each ParmVarDecl in the current stack frame. That doesn't really fly if we're supposed to be calling the copy constructor. Worse, the copy constructor call happens outside of the function call, which means it happens before we've even decided whether or not to inline, which means there are no VarRegions yet to construct into. We've/I've been putting off dealing with this one for a while; we get around it mostly because we don't inline constructors for temporaries.

The general temporary problem goes to what Richard was saying: a temporary expression is an rvalue, but if we have to run a constructor then clearly there was a region at some point. The C++11 standard (at least the version I have) is not wonderful about classifying temporaries ([basic.lval]p1):

- An lvalue...designates a function or an object.
- An xvalue (an “eXpiring” value) also refers to an object, usually near the end of its lifetime.
- A glvalue (“generalized” lvalue) is an lvalue or an xvalue.
- An rvalue...is an xvalue, a temporary object (12.2) or subobject thereof, or a value that is not associated with an object.
- A prvalue (“pure” rvalue) is an rvalue that is not an xvalue.

So temporary objects are prvalues. Even though they are modifiable (defined later in the same section). Even though with construction and destruction they end up needing a fixed region, or at least consistency in region contents.

I don't have a great answer here. My guess is we'll probably end up turning on some inlining for temporaries to start, but not all.

Oh, right, so what's that 'isTemporaryPRValue' doing? That's taking the few cases where the AST wants a prvalue but the analyzer really needs a valid Loc, and conjuring a region on the fly. That's important for things that need to have referential identity, like the 'this' pointer in a method called on a temporary, or (say) a temporary that's been lifetime-extended.

Since this is right at the end of the constructor call, this always creates and binds the value if we hit the constructor. See my patch - it seems to work fine :) What am I missing?
 

Yuck, right? Just making all temporaries into xvalues would most likely make 60% of these problems go away.
Jordan


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Inlining temporary destructors in the analyzer

Jordan Rose

On Jul 31, 2014, at 12:46 , Manuel Klimek <[hidden email]> wrote:

On Thu, Jul 31, 2014 at 9:36 PM, Jordan Rose <[hidden email]> wrote:

On Jul 31, 2014, at 1:20 , Manuel Klimek <[hidden email]> wrote:

On Thu, Jul 31, 2014 at 5:19 AM, Jordan Rose <[hidden email]> wrote:

On Jul 30, 2014, at 8:14 , Manuel Klimek <[hidden email]> wrote:

Hi Jordan,

I made some progress on inlining temporary destructors. My main problem currently is that somehow the initializer values in my constructor get lost if I don't also reference 'this' via some other way.

An examples shows that best. Consider:
struct A {
  A(int* y) : z(y) {}
  ~A() { *z = 23; }
  int *z;
};
void f() {
  int x = 0;
  {(A(&x));}
}

This leads to a "Dereference of undefined pointer value" warning on *z = 23.
Funnily enough this warning goes away (and all values get correctly propagated) if change the constructor to be:
A(int* y) : z(y) { (void)this; }

Obviously this also works fine if I use a local variable instead of a temporary.

Any ideas what might be going wrong here would be greatly appreciated.

My guess is that the temporary is an rvalue, and in the analyzer a NonLoc, and that the only reason this even appears to work is because we needed to be able to call methods on temporaries and so will create a temporary region for them on demand to serve as the 'this' pointer. However, I don't think we expect that to happen for constructors, and so we don't keep the region around after the constructor has finished executing. (That is, when it's just a method being called—and neither the constructor nor destructor are inlined—it doesn't actually matter what you do to the contents of the region, because nothing else can access it.)


So, funnily enough if I remove this code:
  if (isTemporaryPRValue(CCE, ThisV))
  ThisV = state->getSVal(ThisV.castAs<Loc>());
which puts the temp region into a lazycompoundregion, the memory seems to be correctly handled. I have not yet looked what else this breaks though.

Okay, analyzer design time. :-)

C++ has a distinction between lvalues and rvalues (actually glvalues and prvalues). There are a number of things that go into this distinction, but the part the analyzer cares about is that glvalues refer to true objects and prvalues do not. (In C++ parlance, "object" refers to anything with storage, a type, and lifetime, not just class instances. In the analyzer, we call such things "regions".)

If you were to build a very literal C++ compiler, every glvalue would be represented by an address. PRValues*, on the other hand, don't really need addresses; they only exist to be used transiently.

I'm not sure how "exists transiently" equals "doesn't need an address". At least in C++, this seems incorrect (?) for objects.

IMHO it's a problem with the standard, and this extra invention of "modifiable" seems weird to me.


* How do you capitalize "prvalue"?

The analyzer pretty much always believes the compiler when it comes to glvalues vs. prvalues. That means that the value of a glvalue expression is always a Loc. A prvalue expression, on the other hand, will only be a Loc if it's a pointer type*. If you're talking about a struct prvalue, what you're really talking about is the bindings in its memory region—it has no identity. This is represented by a CompoundVal or LazyCompoundVal. This model worked great for C, but it's starting to show strain in C++.

* or one of the other types mentioned in Loc::isLocType. But not all of those can be expression values.

Besides "regular" temporaries, this is also a problem for prvalue function arguments. When the analyzer inlines a function, it just blithely binds the arguments to the proper VarRegions created for each ParmVarDecl in the current stack frame. That doesn't really fly if we're supposed to be calling the copy constructor. Worse, the copy constructor call happens outside of the function call, which means it happens before we've even decided whether or not to inline, which means there are no VarRegions yet to construct into. We've/I've been putting off dealing with this one for a while; we get around it mostly because we don't inline constructors for temporaries.

The general temporary problem goes to what Richard was saying: a temporary expression is an rvalue, but if we have to run a constructor then clearly there was a region at some point. The C++11 standard (at least the version I have) is not wonderful about classifying temporaries ([basic.lval]p1):

- An lvalue...designates a function or an object.
- An xvalue (an “eXpiring” value) also refers to an object, usually near the end of its lifetime.
- A glvalue (“generalized” lvalue) is an lvalue or an xvalue.
- An rvalue...is an xvalue, a temporary object (12.2) or subobject thereof, or a value that is not associated with an object.
- A prvalue (“pure” rvalue) is an rvalue that is not an xvalue.

So temporary objects are prvalues. Even though they are modifiable (defined later in the same section). Even though with construction and destruction they end up needing a fixed region, or at least consistency in region contents.

I don't have a great answer here. My guess is we'll probably end up turning on some inlining for temporaries to start, but not all.

Oh, right, so what's that 'isTemporaryPRValue' doing? That's taking the few cases where the AST wants a prvalue but the analyzer really needs a valid Loc, and conjuring a region on the fly. That's important for things that need to have referential identity, like the 'this' pointer in a method called on a temporary, or (say) a temporary that's been lifetime-extended.

Since this is right at the end of the constructor call, this always creates and binds the value if we hit the constructor. See my patch - it seems to work fine :) What am I missing?

That breaks the invariant that a non-Loc-type prvalue is not a Loc. If you tried to assign this directly into another struct, you'd effectively be assigning an address instead of a set of bindings. IIRC there's some code in the analyzer that tries to handle this case anyway, so it may not break down immediately, but I'd rather not lose that assumption.

Jordan


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Inlining temporary destructors in the analyzer

Manuel Klimek
On Thu, Jul 31, 2014 at 9:51 PM, Jordan Rose <[hidden email]> wrote:

On Jul 31, 2014, at 12:46 , Manuel Klimek <[hidden email]> wrote:

On Thu, Jul 31, 2014 at 9:36 PM, Jordan Rose <[hidden email]> wrote:

On Jul 31, 2014, at 1:20 , Manuel Klimek <[hidden email]> wrote:

On Thu, Jul 31, 2014 at 5:19 AM, Jordan Rose <[hidden email]> wrote:

On Jul 30, 2014, at 8:14 , Manuel Klimek <[hidden email]> wrote:

Hi Jordan,

I made some progress on inlining temporary destructors. My main problem currently is that somehow the initializer values in my constructor get lost if I don't also reference 'this' via some other way.

An examples shows that best. Consider:
struct A {
  A(int* y) : z(y) {}
  ~A() { *z = 23; }
  int *z;
};
void f() {
  int x = 0;
  {(A(&x));}
}

This leads to a "Dereference of undefined pointer value" warning on *z = 23.
Funnily enough this warning goes away (and all values get correctly propagated) if change the constructor to be:
A(int* y) : z(y) { (void)this; }

Obviously this also works fine if I use a local variable instead of a temporary.

Any ideas what might be going wrong here would be greatly appreciated.

My guess is that the temporary is an rvalue, and in the analyzer a NonLoc, and that the only reason this even appears to work is because we needed to be able to call methods on temporaries and so will create a temporary region for them on demand to serve as the 'this' pointer. However, I don't think we expect that to happen for constructors, and so we don't keep the region around after the constructor has finished executing. (That is, when it's just a method being called—and neither the constructor nor destructor are inlined—it doesn't actually matter what you do to the contents of the region, because nothing else can access it.)


So, funnily enough if I remove this code:
  if (isTemporaryPRValue(CCE, ThisV))
  ThisV = state->getSVal(ThisV.castAs<Loc>());
which puts the temp region into a lazycompoundregion, the memory seems to be correctly handled. I have not yet looked what else this breaks though.

Okay, analyzer design time. :-)

C++ has a distinction between lvalues and rvalues (actually glvalues and prvalues). There are a number of things that go into this distinction, but the part the analyzer cares about is that glvalues refer to true objects and prvalues do not. (In C++ parlance, "object" refers to anything with storage, a type, and lifetime, not just class instances. In the analyzer, we call such things "regions".)

If you were to build a very literal C++ compiler, every glvalue would be represented by an address. PRValues*, on the other hand, don't really need addresses; they only exist to be used transiently.

I'm not sure how "exists transiently" equals "doesn't need an address". At least in C++, this seems incorrect (?) for objects.

IMHO it's a problem with the standard, and this extra invention of "modifiable" seems weird to me.


* How do you capitalize "prvalue"?

The analyzer pretty much always believes the compiler when it comes to glvalues vs. prvalues. That means that the value of a glvalue expression is always a Loc. A prvalue expression, on the other hand, will only be a Loc if it's a pointer type*. If you're talking about a struct prvalue, what you're really talking about is the bindings in its memory region—it has no identity. This is represented by a CompoundVal or LazyCompoundVal. This model worked great for C, but it's starting to show strain in C++.

* or one of the other types mentioned in Loc::isLocType. But not all of those can be expression values.

Besides "regular" temporaries, this is also a problem for prvalue function arguments. When the analyzer inlines a function, it just blithely binds the arguments to the proper VarRegions created for each ParmVarDecl in the current stack frame. That doesn't really fly if we're supposed to be calling the copy constructor. Worse, the copy constructor call happens outside of the function call, which means it happens before we've even decided whether or not to inline, which means there are no VarRegions yet to construct into. We've/I've been putting off dealing with this one for a while; we get around it mostly because we don't inline constructors for temporaries.

The general temporary problem goes to what Richard was saying: a temporary expression is an rvalue, but if we have to run a constructor then clearly there was a region at some point. The C++11 standard (at least the version I have) is not wonderful about classifying temporaries ([basic.lval]p1):

- An lvalue...designates a function or an object.
- An xvalue (an “eXpiring” value) also refers to an object, usually near the end of its lifetime.
- A glvalue (“generalized” lvalue) is an lvalue or an xvalue.
- An rvalue...is an xvalue, a temporary object (12.2) or subobject thereof, or a value that is not associated with an object.
- A prvalue (“pure” rvalue) is an rvalue that is not an xvalue.

So temporary objects are prvalues. Even though they are modifiable (defined later in the same section). Even though with construction and destruction they end up needing a fixed region, or at least consistency in region contents.

I don't have a great answer here. My guess is we'll probably end up turning on some inlining for temporaries to start, but not all.

Oh, right, so what's that 'isTemporaryPRValue' doing? That's taking the few cases where the AST wants a prvalue but the analyzer really needs a valid Loc, and conjuring a region on the fly. That's important for things that need to have referential identity, like the 'this' pointer in a method called on a temporary, or (say) a temporary that's been lifetime-extended.

Since this is right at the end of the constructor call, this always creates and binds the value if we hit the constructor. See my patch - it seems to work fine :) What am I missing?

That breaks the invariant that a non-Loc-type prvalue is not a Loc. If you tried to assign this directly into another struct, you'd effectively be assigning an address instead of a set of bindings. IIRC there's some code in the analyzer that tries to handle this case anyway, so it may not break down immediately, but I'd rather not lose that assumption.

I'm confused - isn't that already the case currently? Have you looked at http://reviews.llvm.org/D4740? It's very small, and I don't see where I regress an assumption that's not already violated (if I remove the isTemporaryPRValue section loads of tests break, so I assume changing the layout of a temporary section will make significant changes to the analyzer necessary).
 

Jordan



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Inlining temporary destructors in the analyzer

Jordan Rose

On Jul 31, 2014, at 12:54 , Manuel Klimek <[hidden email]> wrote:

On Thu, Jul 31, 2014 at 9:51 PM, Jordan Rose <[hidden email]> wrote:

On Jul 31, 2014, at 12:46 , Manuel Klimek <[hidden email]> wrote:

On Thu, Jul 31, 2014 at 9:36 PM, Jordan Rose <[hidden email]> wrote:

On Jul 31, 2014, at 1:20 , Manuel Klimek <[hidden email]> wrote:

On Thu, Jul 31, 2014 at 5:19 AM, Jordan Rose <[hidden email]> wrote:

On Jul 30, 2014, at 8:14 , Manuel Klimek <[hidden email]> wrote:

Hi Jordan,

I made some progress on inlining temporary destructors. My main problem currently is that somehow the initializer values in my constructor get lost if I don't also reference 'this' via some other way.

An examples shows that best. Consider:
struct A {
  A(int* y) : z(y) {}
  ~A() { *z = 23; }
  int *z;
};
void f() {
  int x = 0;
  {(A(&x));}
}

This leads to a "Dereference of undefined pointer value" warning on *z = 23.
Funnily enough this warning goes away (and all values get correctly propagated) if change the constructor to be:
A(int* y) : z(y) { (void)this; }

Obviously this also works fine if I use a local variable instead of a temporary.

Any ideas what might be going wrong here would be greatly appreciated.

My guess is that the temporary is an rvalue, and in the analyzer a NonLoc, and that the only reason this even appears to work is because we needed to be able to call methods on temporaries and so will create a temporary region for them on demand to serve as the 'this' pointer. However, I don't think we expect that to happen for constructors, and so we don't keep the region around after the constructor has finished executing. (That is, when it's just a method being called—and neither the constructor nor destructor are inlined—it doesn't actually matter what you do to the contents of the region, because nothing else can access it.)


So, funnily enough if I remove this code:
  if (isTemporaryPRValue(CCE, ThisV))
  ThisV = state->getSVal(ThisV.castAs<Loc>());
which puts the temp region into a lazycompoundregion, the memory seems to be correctly handled. I have not yet looked what else this breaks though.

Okay, analyzer design time. :-)

C++ has a distinction between lvalues and rvalues (actually glvalues and prvalues). There are a number of things that go into this distinction, but the part the analyzer cares about is that glvalues refer to true objects and prvalues do not. (In C++ parlance, "object" refers to anything with storage, a type, and lifetime, not just class instances. In the analyzer, we call such things "regions".)

If you were to build a very literal C++ compiler, every glvalue would be represented by an address. PRValues*, on the other hand, don't really need addresses; they only exist to be used transiently.

I'm not sure how "exists transiently" equals "doesn't need an address". At least in C++, this seems incorrect (?) for objects.

IMHO it's a problem with the standard, and this extra invention of "modifiable" seems weird to me.


* How do you capitalize "prvalue"?

The analyzer pretty much always believes the compiler when it comes to glvalues vs. prvalues. That means that the value of a glvalue expression is always a Loc. A prvalue expression, on the other hand, will only be a Loc if it's a pointer type*. If you're talking about a struct prvalue, what you're really talking about is the bindings in its memory region—it has no identity. This is represented by a CompoundVal or LazyCompoundVal. This model worked great for C, but it's starting to show strain in C++.

* or one of the other types mentioned in Loc::isLocType. But not all of those can be expression values.

Besides "regular" temporaries, this is also a problem for prvalue function arguments. When the analyzer inlines a function, it just blithely binds the arguments to the proper VarRegions created for each ParmVarDecl in the current stack frame. That doesn't really fly if we're supposed to be calling the copy constructor. Worse, the copy constructor call happens outside of the function call, which means it happens before we've even decided whether or not to inline, which means there are no VarRegions yet to construct into. We've/I've been putting off dealing with this one for a while; we get around it mostly because we don't inline constructors for temporaries.

The general temporary problem goes to what Richard was saying: a temporary expression is an rvalue, but if we have to run a constructor then clearly there was a region at some point. The C++11 standard (at least the version I have) is not wonderful about classifying temporaries ([basic.lval]p1):

- An lvalue...designates a function or an object.
- An xvalue (an “eXpiring” value) also refers to an object, usually near the end of its lifetime.
- A glvalue (“generalized” lvalue) is an lvalue or an xvalue.
- An rvalue...is an xvalue, a temporary object (12.2) or subobject thereof, or a value that is not associated with an object.
- A prvalue (“pure” rvalue) is an rvalue that is not an xvalue.

So temporary objects are prvalues. Even though they are modifiable (defined later in the same section). Even though with construction and destruction they end up needing a fixed region, or at least consistency in region contents.

I don't have a great answer here. My guess is we'll probably end up turning on some inlining for temporaries to start, but not all.

Oh, right, so what's that 'isTemporaryPRValue' doing? That's taking the few cases where the AST wants a prvalue but the analyzer really needs a valid Loc, and conjuring a region on the fly. That's important for things that need to have referential identity, like the 'this' pointer in a method called on a temporary, or (say) a temporary that's been lifetime-extended.

Since this is right at the end of the constructor call, this always creates and binds the value if we hit the constructor. See my patch - it seems to work fine :) What am I missing?

That breaks the invariant that a non-Loc-type prvalue is not a Loc. If you tried to assign this directly into another struct, you'd effectively be assigning an address instead of a set of bindings. IIRC there's some code in the analyzer that tries to handle this case anyway, so it may not break down immediately, but I'd rather not lose that assumption.

I'm confused - isn't that already the case currently? Have you looked at http://reviews.llvm.org/D4740? It's very small, and I don't see where I regress an assumption that's not already violated (if I remove the isTemporaryPRValue section loads of tests break, so I assume changing the layout of a temporary section will make significant changes to the analyzer necessary).

Sorry, I meant removing the isTemporaryPRValue thing would break invariants. Enabling inlining doesn't break any existing invariants, but it makes things worse than they currently are. This is the test case I have in mind:

struct Test {
  bool flag;
  ~Test {
    if (flag)
      clang_analyzer_checkReached(true);
  }
}

void use(Test arg) {
arg.flag = true;
}

void topLevel() {
Test t;
use(t);
}

The call 'test(t)' involves a copy constructor, which the analyzer treats as a temporary construction. Maybe that's the problem—this is really a different sort of construction. In any case, though, who's responsible for destroying 'arg'? I think it has to be the caller, because the callee may not be inlined...but maybe I'm wrong about that. Still, the value of 'this' won't match between construction at the call site and use within the 'use' method.

Does this break assumptions about any other temporaries? I'm not sure. I just don't want to end up in a situation where we're inlining constructors and destructors, but they don't have a consistent view of the object.

Jordan

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Inlining temporary destructors in the analyzer

Richard Smith
In reply to this post by Jordan Rose
On Thu, Jul 31, 2014 at 12:36 PM, Jordan Rose <[hidden email]> wrote:

On Jul 31, 2014, at 1:20 , Manuel Klimek <[hidden email]> wrote:

On Thu, Jul 31, 2014 at 5:19 AM, Jordan Rose <[hidden email]> wrote:

On Jul 30, 2014, at 8:14 , Manuel Klimek <[hidden email]> wrote:

Hi Jordan,

I made some progress on inlining temporary destructors. My main problem currently is that somehow the initializer values in my constructor get lost if I don't also reference 'this' via some other way.

An examples shows that best. Consider:
struct A {
  A(int* y) : z(y) {}
  ~A() { *z = 23; }
  int *z;
};
void f() {
  int x = 0;
  {(A(&x));}
}

This leads to a "Dereference of undefined pointer value" warning on *z = 23.
Funnily enough this warning goes away (and all values get correctly propagated) if change the constructor to be:
A(int* y) : z(y) { (void)this; }

Obviously this also works fine if I use a local variable instead of a temporary.

Any ideas what might be going wrong here would be greatly appreciated.

My guess is that the temporary is an rvalue, and in the analyzer a NonLoc, and that the only reason this even appears to work is because we needed to be able to call methods on temporaries and so will create a temporary region for them on demand to serve as the 'this' pointer. However, I don't think we expect that to happen for constructors, and so we don't keep the region around after the constructor has finished executing. (That is, when it's just a method being called—and neither the constructor nor destructor are inlined—it doesn't actually matter what you do to the contents of the region, because nothing else can access it.)


So, funnily enough if I remove this code:
  if (isTemporaryPRValue(CCE, ThisV))
  ThisV = state->getSVal(ThisV.castAs<Loc>());
which puts the temp region into a lazycompoundregion, the memory seems to be correctly handled. I have not yet looked what else this breaks though.

Okay, analyzer design time. :-)

C++ has a distinction between lvalues and rvalues (actually glvalues and prvalues). There are a number of things that go into this distinction, but the part the analyzer cares about is that glvalues refer to true objects and prvalues do not. (In C++ parlance, "object" refers to anything with storage, a type, and lifetime, not just class instances. In the analyzer, we call such things "regions".)

If you were to build a very literal C++ compiler, every glvalue would be represented by an address. PRValues*, on the other hand, don't really need addresses; they only exist to be used transiently.

* How do you capitalize "prvalue"?

The analyzer pretty much always believes the compiler when it comes to glvalues vs. prvalues. That means that the value of a glvalue expression is always a Loc. A prvalue expression, on the other hand, will only be a Loc if it's a pointer type*. If you're talking about a struct prvalue, what you're really talking about is the bindings in its memory region—it has no identity. This is represented by a CompoundVal or LazyCompoundVal. This model worked great for C, but it's starting to show strain in C++.

* or one of the other types mentioned in Loc::isLocType. But not all of those can be expression values.

Besides "regular" temporaries, this is also a problem for prvalue function arguments. When the analyzer inlines a function, it just blithely binds the arguments to the proper VarRegions created for each ParmVarDecl in the current stack frame. That doesn't really fly if we're supposed to be calling the copy constructor. Worse, the copy constructor call happens outside of the function call, which means it happens before we've even decided whether or not to inline, which means there are no VarRegions yet to construct into. We've/I've been putting off dealing with this one for a while; we get around it mostly because we don't inline constructors for temporaries.

The general temporary problem goes to what Richard was saying: a temporary expression is an rvalue, but if we have to run a constructor then clearly there was a region at some point.

Here's how I think of this:

A glvalue expression is a function from the environment to a region.
A prvalue expression is a function of (environment, region) that produces no output.

That is, glvalue expressions tell you how to find a region, and prvalue expressions tell you how to initialize a region.

As a special case, you probably shouldn't bother to create a region when evaluating prvalues of non-class, non-array types (neither IRGen nor the constant evaluator do so). The standard blesses this with its "value that is not associated with an object" wording, but you can't actually observe the difference.

The C++11 standard (at least the version I have) is not wonderful about classifying temporaries ([basic.lval]p1):

- An lvalue...designates a function or an object.
- An xvalue (an “eXpiring” value) also refers to an object, usually near the end of its lifetime.
- A glvalue (“generalized” lvalue) is an lvalue or an xvalue.
- An rvalue...is an xvalue, a temporary object (12.2) or subobject thereof, or a value that is not associated with an object.
- A prvalue (“pure” rvalue) is an rvalue that is not an xvalue.

So temporary objects are prvalues. Even though they are modifiable (defined later in the same section). Even though with construction and destruction they end up needing a fixed region, or at least consistency in region contents.

I don't have a great answer here. My guess is we'll probably end up turning on some inlining for temporaries to start, but not all.

Oh, right, so what's that 'isTemporaryPRValue' doing? That's taking the few cases where the AST wants a prvalue but the analyzer really needs a valid Loc, and conjuring a region on the fly. That's important for things that need to have referential identity, like the 'this' pointer in a method called on a temporary, or (say) a temporary that's been lifetime-extended.

Yuck, right? Just making all temporaries into xvalues would most likely make 60% of these problems go away.
Jordan

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Inlining temporary destructors in the analyzer

Manuel Klimek
In reply to this post by Jordan Rose
On Fri, Aug 1, 2014 at 1:02 AM, Jordan Rose <[hidden email]> wrote:

On Jul 31, 2014, at 12:54 , Manuel Klimek <[hidden email]> wrote:

On Thu, Jul 31, 2014 at 9:51 PM, Jordan Rose <[hidden email]> wrote:

On Jul 31, 2014, at 12:46 , Manuel Klimek <[hidden email]> wrote:

On Thu, Jul 31, 2014 at 9:36 PM, Jordan Rose <[hidden email]> wrote:

On Jul 31, 2014, at 1:20 , Manuel Klimek <[hidden email]> wrote:

On Thu, Jul 31, 2014 at 5:19 AM, Jordan Rose <[hidden email]> wrote:

On Jul 30, 2014, at 8:14 , Manuel Klimek <[hidden email]> wrote:

Hi Jordan,

I made some progress on inlining temporary destructors. My main problem currently is that somehow the initializer values in my constructor get lost if I don't also reference 'this' via some other way.

An examples shows that best. Consider:
struct A {
  A(int* y) : z(y) {}
  ~A() { *z = 23; }
  int *z;
};
void f() {
  int x = 0;
  {(A(&x));}
}

This leads to a "Dereference of undefined pointer value" warning on *z = 23.
Funnily enough this warning goes away (and all values get correctly propagated) if change the constructor to be:
A(int* y) : z(y) { (void)this; }

Obviously this also works fine if I use a local variable instead of a temporary.

Any ideas what might be going wrong here would be greatly appreciated.

My guess is that the temporary is an rvalue, and in the analyzer a NonLoc, and that the only reason this even appears to work is because we needed to be able to call methods on temporaries and so will create a temporary region for them on demand to serve as the 'this' pointer. However, I don't think we expect that to happen for constructors, and so we don't keep the region around after the constructor has finished executing. (That is, when it's just a method being called—and neither the constructor nor destructor are inlined—it doesn't actually matter what you do to the contents of the region, because nothing else can access it.)


So, funnily enough if I remove this code:
  if (isTemporaryPRValue(CCE, ThisV))
  ThisV = state->getSVal(ThisV.castAs<Loc>());
which puts the temp region into a lazycompoundregion, the memory seems to be correctly handled. I have not yet looked what else this breaks though.

Okay, analyzer design time. :-)

C++ has a distinction between lvalues and rvalues (actually glvalues and prvalues). There are a number of things that go into this distinction, but the part the analyzer cares about is that glvalues refer to true objects and prvalues do not. (In C++ parlance, "object" refers to anything with storage, a type, and lifetime, not just class instances. In the analyzer, we call such things "regions".)

If you were to build a very literal C++ compiler, every glvalue would be represented by an address. PRValues*, on the other hand, don't really need addresses; they only exist to be used transiently.

I'm not sure how "exists transiently" equals "doesn't need an address". At least in C++, this seems incorrect (?) for objects.

IMHO it's a problem with the standard, and this extra invention of "modifiable" seems weird to me.


* How do you capitalize "prvalue"?

The analyzer pretty much always believes the compiler when it comes to glvalues vs. prvalues. That means that the value of a glvalue expression is always a Loc. A prvalue expression, on the other hand, will only be a Loc if it's a pointer type*. If you're talking about a struct prvalue, what you're really talking about is the bindings in its memory region—it has no identity. This is represented by a CompoundVal or LazyCompoundVal. This model worked great for C, but it's starting to show strain in C++.

* or one of the other types mentioned in Loc::isLocType. But not all of those can be expression values.

Besides "regular" temporaries, this is also a problem for prvalue function arguments. When the analyzer inlines a function, it just blithely binds the arguments to the proper VarRegions created for each ParmVarDecl in the current stack frame. That doesn't really fly if we're supposed to be calling the copy constructor. Worse, the copy constructor call happens outside of the function call, which means it happens before we've even decided whether or not to inline, which means there are no VarRegions yet to construct into. We've/I've been putting off dealing with this one for a while; we get around it mostly because we don't inline constructors for temporaries.

The general temporary problem goes to what Richard was saying: a temporary expression is an rvalue, but if we have to run a constructor then clearly there was a region at some point. The C++11 standard (at least the version I have) is not wonderful about classifying temporaries ([basic.lval]p1):

- An lvalue...designates a function or an object.
- An xvalue (an “eXpiring” value) also refers to an object, usually near the end of its lifetime.
- A glvalue (“generalized” lvalue) is an lvalue or an xvalue.
- An rvalue...is an xvalue, a temporary object (12.2) or subobject thereof, or a value that is not associated with an object.
- A prvalue (“pure” rvalue) is an rvalue that is not an xvalue.

So temporary objects are prvalues. Even though they are modifiable (defined later in the same section). Even though with construction and destruction they end up needing a fixed region, or at least consistency in region contents.

I don't have a great answer here. My guess is we'll probably end up turning on some inlining for temporaries to start, but not all.

Oh, right, so what's that 'isTemporaryPRValue' doing? That's taking the few cases where the AST wants a prvalue but the analyzer really needs a valid Loc, and conjuring a region on the fly. That's important for things that need to have referential identity, like the 'this' pointer in a method called on a temporary, or (say) a temporary that's been lifetime-extended.

Since this is right at the end of the constructor call, this always creates and binds the value if we hit the constructor. See my patch - it seems to work fine :) What am I missing?

That breaks the invariant that a non-Loc-type prvalue is not a Loc. If you tried to assign this directly into another struct, you'd effectively be assigning an address instead of a set of bindings. IIRC there's some code in the analyzer that tries to handle this case anyway, so it may not break down immediately, but I'd rather not lose that assumption.

I'm confused - isn't that already the case currently? Have you looked at http://reviews.llvm.org/D4740? It's very small, and I don't see where I regress an assumption that's not already violated (if I remove the isTemporaryPRValue section loads of tests break, so I assume changing the layout of a temporary section will make significant changes to the analyzer necessary).

Sorry, I meant removing the isTemporaryPRValue thing would break invariants. Enabling inlining doesn't break any existing invariants, but it makes things worse than they currently are. This is the test case I have in mind:

struct Test {
  bool flag;
  ~Test {
    if (flag)
      clang_analyzer_checkReached(true);
  }
}

void use(Test arg) {
arg.flag = true;
}

void topLevel() {
Test t;
use(t);
}

The call 'test(t)' involves a copy constructor, which the analyzer treats as a temporary construction. Maybe that's the problem—this is really a different sort of construction. In any case, though, who's responsible for destroying 'arg'? I think it has to be the caller, because the callee may not be inlined...but maybe I'm wrong about that. Still, the value of 'this' won't match between construction at the call site and use within the 'use' method.

That makes sense, but it would seem to me like that needs to be fixed when passing a parameter by value - that seems to be incorrectly modeled then.

I think it'd be fine to model this as a copy constructor (Richard might know more), but I think in that case we'd need to call the destructors for the temporary outside *and* the destructor for the parameter inside. That is, instead of having 2 destructor calls (one for 't' and one for the temporary), we'd have three: One for 't', one for the temporary (both with flag == false) and one for the parameter, executed inside the function (with flag == true).
On the other hand the more correct thing to do is probably to somehow use the memory region of the temporary we hand in for 'arg'. I have no idea where in the static analyzer this is handled though - I dug through Store.cpp and RegionStore.cpp but couldn't figure out where the copy of the regions really happens...


Does this break assumptions about any other temporaries? I'm not sure. I just don't want to end up in a situation where we're inlining constructors and destructors, but they don't have a consistent view of the object.

Jordan


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Inlining temporary destructors in the analyzer

Manuel Klimek
Ping.


On Fri, Aug 1, 2014 at 1:35 PM, Manuel Klimek <[hidden email]> wrote:
On Fri, Aug 1, 2014 at 1:02 AM, Jordan Rose <[hidden email]> wrote:

On Jul 31, 2014, at 12:54 , Manuel Klimek <[hidden email]> wrote:

On Thu, Jul 31, 2014 at 9:51 PM, Jordan Rose <[hidden email]> wrote:

On Jul 31, 2014, at 12:46 , Manuel Klimek <[hidden email]> wrote:

On Thu, Jul 31, 2014 at 9:36 PM, Jordan Rose <[hidden email]> wrote:

On Jul 31, 2014, at 1:20 , Manuel Klimek <[hidden email]> wrote:

On Thu, Jul 31, 2014 at 5:19 AM, Jordan Rose <[hidden email]> wrote:

On Jul 30, 2014, at 8:14 , Manuel Klimek <[hidden email]> wrote:

Hi Jordan,

I made some progress on inlining temporary destructors. My main problem currently is that somehow the initializer values in my constructor get lost if I don't also reference 'this' via some other way.

An examples shows that best. Consider:
struct A {
  A(int* y) : z(y) {}
  ~A() { *z = 23; }
  int *z;
};
void f() {
  int x = 0;
  {(A(&x));}
}

This leads to a "Dereference of undefined pointer value" warning on *z = 23.
Funnily enough this warning goes away (and all values get correctly propagated) if change the constructor to be:
A(int* y) : z(y) { (void)this; }

Obviously this also works fine if I use a local variable instead of a temporary.

Any ideas what might be going wrong here would be greatly appreciated.

My guess is that the temporary is an rvalue, and in the analyzer a NonLoc, and that the only reason this even appears to work is because we needed to be able to call methods on temporaries and so will create a temporary region for them on demand to serve as the 'this' pointer. However, I don't think we expect that to happen for constructors, and so we don't keep the region around after the constructor has finished executing. (That is, when it's just a method being called—and neither the constructor nor destructor are inlined—it doesn't actually matter what you do to the contents of the region, because nothing else can access it.)


So, funnily enough if I remove this code:
  if (isTemporaryPRValue(CCE, ThisV))
  ThisV = state->getSVal(ThisV.castAs<Loc>());
which puts the temp region into a lazycompoundregion, the memory seems to be correctly handled. I have not yet looked what else this breaks though.

Okay, analyzer design time. :-)

C++ has a distinction between lvalues and rvalues (actually glvalues and prvalues). There are a number of things that go into this distinction, but the part the analyzer cares about is that glvalues refer to true objects and prvalues do not. (In C++ parlance, "object" refers to anything with storage, a type, and lifetime, not just class instances. In the analyzer, we call such things "regions".)

If you were to build a very literal C++ compiler, every glvalue would be represented by an address. PRValues*, on the other hand, don't really need addresses; they only exist to be used transiently.

I'm not sure how "exists transiently" equals "doesn't need an address". At least in C++, this seems incorrect (?) for objects.

IMHO it's a problem with the standard, and this extra invention of "modifiable" seems weird to me.


* How do you capitalize "prvalue"?

The analyzer pretty much always believes the compiler when it comes to glvalues vs. prvalues. That means that the value of a glvalue expression is always a Loc. A prvalue expression, on the other hand, will only be a Loc if it's a pointer type*. If you're talking about a struct prvalue, what you're really talking about is the bindings in its memory region—it has no identity. This is represented by a CompoundVal or LazyCompoundVal. This model worked great for C, but it's starting to show strain in C++.

* or one of the other types mentioned in Loc::isLocType. But not all of those can be expression values.

Besides "regular" temporaries, this is also a problem for prvalue function arguments. When the analyzer inlines a function, it just blithely binds the arguments to the proper VarRegions created for each ParmVarDecl in the current stack frame. That doesn't really fly if we're supposed to be calling the copy constructor. Worse, the copy constructor call happens outside of the function call, which means it happens before we've even decided whether or not to inline, which means there are no VarRegions yet to construct into. We've/I've been putting off dealing with this one for a while; we get around it mostly because we don't inline constructors for temporaries.

The general temporary problem goes to what Richard was saying: a temporary expression is an rvalue, but if we have to run a constructor then clearly there was a region at some point. The C++11 standard (at least the version I have) is not wonderful about classifying temporaries ([basic.lval]p1):

- An lvalue...designates a function or an object.
- An xvalue (an “eXpiring” value) also refers to an object, usually near the end of its lifetime.
- A glvalue (“generalized” lvalue) is an lvalue or an xvalue.
- An rvalue...is an xvalue, a temporary object (12.2) or subobject thereof, or a value that is not associated with an object.
- A prvalue (“pure” rvalue) is an rvalue that is not an xvalue.

So temporary objects are prvalues. Even though they are modifiable (defined later in the same section). Even though with construction and destruction they end up needing a fixed region, or at least consistency in region contents.

I don't have a great answer here. My guess is we'll probably end up turning on some inlining for temporaries to start, but not all.

Oh, right, so what's that 'isTemporaryPRValue' doing? That's taking the few cases where the AST wants a prvalue but the analyzer really needs a valid Loc, and conjuring a region on the fly. That's important for things that need to have referential identity, like the 'this' pointer in a method called on a temporary, or (say) a temporary that's been lifetime-extended.

Since this is right at the end of the constructor call, this always creates and binds the value if we hit the constructor. See my patch - it seems to work fine :) What am I missing?

That breaks the invariant that a non-Loc-type prvalue is not a Loc. If you tried to assign this directly into another struct, you'd effectively be assigning an address instead of a set of bindings. IIRC there's some code in the analyzer that tries to handle this case anyway, so it may not break down immediately, but I'd rather not lose that assumption.

I'm confused - isn't that already the case currently? Have you looked at http://reviews.llvm.org/D4740? It's very small, and I don't see where I regress an assumption that's not already violated (if I remove the isTemporaryPRValue section loads of tests break, so I assume changing the layout of a temporary section will make significant changes to the analyzer necessary).

Sorry, I meant removing the isTemporaryPRValue thing would break invariants. Enabling inlining doesn't break any existing invariants, but it makes things worse than they currently are. This is the test case I have in mind:

struct Test {
  bool flag;
  ~Test {
    if (flag)
      clang_analyzer_checkReached(true);
  }
}

void use(Test arg) {
arg.flag = true;
}

void topLevel() {
Test t;
use(t);
}

The call 'test(t)' involves a copy constructor, which the analyzer treats as a temporary construction. Maybe that's the problem—this is really a different sort of construction. In any case, though, who's responsible for destroying 'arg'? I think it has to be the caller, because the callee may not be inlined...but maybe I'm wrong about that. Still, the value of 'this' won't match between construction at the call site and use within the 'use' method.

That makes sense, but it would seem to me like that needs to be fixed when passing a parameter by value - that seems to be incorrectly modeled then.

I think it'd be fine to model this as a copy constructor (Richard might know more), but I think in that case we'd need to call the destructors for the temporary outside *and* the destructor for the parameter inside. That is, instead of having 2 destructor calls (one for 't' and one for the temporary), we'd have three: One for 't', one for the temporary (both with flag == false) and one for the parameter, executed inside the function (with flag == true).
On the other hand the more correct thing to do is probably to somehow use the memory region of the temporary we hand in for 'arg'. I have no idea where in the static analyzer this is handled though - I dug through Store.cpp and RegionStore.cpp but couldn't figure out where the copy of the regions really happens...


Does this break assumptions about any other temporaries? I'm not sure. I just don't want to end up in a situation where we're inlining constructors and destructors, but they don't have a consistent view of the object.

Jordan



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Inlining temporary destructors in the analyzer

Jordan Rose
In reply to this post by Manuel Klimek

On Aug 1, 2014, at 4:35 , Manuel Klimek <[hidden email]> wrote:

On Fri, Aug 1, 2014 at 1:02 AM, Jordan Rose <[hidden email]> wrote:

On Jul 31, 2014, at 12:54 , Manuel Klimek <[hidden email]> wrote:

On Thu, Jul 31, 2014 at 9:51 PM, Jordan Rose <[hidden email]> wrote:

On Jul 31, 2014, at 12:46 , Manuel Klimek <[hidden email]> wrote:

On Thu, Jul 31, 2014 at 9:36 PM, Jordan Rose <[hidden email]> wrote:

On Jul 31, 2014, at 1:20 , Manuel Klimek <[hidden email]> wrote:

On Thu, Jul 31, 2014 at 5:19 AM, Jordan Rose <[hidden email]> wrote:

On Jul 30, 2014, at 8:14 , Manuel Klimek <[hidden email]> wrote:

Hi Jordan,

I made some progress on inlining temporary destructors. My main problem currently is that somehow the initializer values in my constructor get lost if I don't also reference 'this' via some other way.

An examples shows that best. Consider:
struct A {
  A(int* y) : z(y) {}
  ~A() { *z = 23; }
  int *z;
};
void f() {
  int x = 0;
  {(A(&x));}
}

This leads to a "Dereference of undefined pointer value" warning on *z = 23.
Funnily enough this warning goes away (and all values get correctly propagated) if change the constructor to be:
A(int* y) : z(y) { (void)this; }

Obviously this also works fine if I use a local variable instead of a temporary.

Any ideas what might be going wrong here would be greatly appreciated.

My guess is that the temporary is an rvalue, and in the analyzer a NonLoc, and that the only reason this even appears to work is because we needed to be able to call methods on temporaries and so will create a temporary region for them on demand to serve as the 'this' pointer. However, I don't think we expect that to happen for constructors, and so we don't keep the region around after the constructor has finished executing. (That is, when it's just a method being called—and neither the constructor nor destructor are inlined—it doesn't actually matter what you do to the contents of the region, because nothing else can access it.)


So, funnily enough if I remove this code:
  if (isTemporaryPRValue(CCE, ThisV))
  ThisV = state->getSVal(ThisV.castAs<Loc>());
which puts the temp region into a lazycompoundregion, the memory seems to be correctly handled. I have not yet looked what else this breaks though.

Okay, analyzer design time. :-)

C++ has a distinction between lvalues and rvalues (actually glvalues and prvalues). There are a number of things that go into this distinction, but the part the analyzer cares about is that glvalues refer to true objects and prvalues do not. (In C++ parlance, "object" refers to anything with storage, a type, and lifetime, not just class instances. In the analyzer, we call such things "regions".)

If you were to build a very literal C++ compiler, every glvalue would be represented by an address. PRValues*, on the other hand, don't really need addresses; they only exist to be used transiently.

I'm not sure how "exists transiently" equals "doesn't need an address". At least in C++, this seems incorrect (?) for objects.

IMHO it's a problem with the standard, and this extra invention of "modifiable" seems weird to me.


* How do you capitalize "prvalue"?

The analyzer pretty much always believes the compiler when it comes to glvalues vs. prvalues. That means that the value of a glvalue expression is always a Loc. A prvalue expression, on the other hand, will only be a Loc if it's a pointer type*. If you're talking about a struct prvalue, what you're really talking about is the bindings in its memory region—it has no identity. This is represented by a CompoundVal or LazyCompoundVal. This model worked great for C, but it's starting to show strain in C++.

* or one of the other types mentioned in Loc::isLocType. But not all of those can be expression values.

Besides "regular" temporaries, this is also a problem for prvalue function arguments. When the analyzer inlines a function, it just blithely binds the arguments to the proper VarRegions created for each ParmVarDecl in the current stack frame. That doesn't really fly if we're supposed to be calling the copy constructor. Worse, the copy constructor call happens outside of the function call, which means it happens before we've even decided whether or not to inline, which means there are no VarRegions yet to construct into. We've/I've been putting off dealing with this one for a while; we get around it mostly because we don't inline constructors for temporaries.

The general temporary problem goes to what Richard was saying: a temporary expression is an rvalue, but if we have to run a constructor then clearly there was a region at some point. The C++11 standard (at least the version I have) is not wonderful about classifying temporaries ([basic.lval]p1):

- An lvalue...designates a function or an object.
- An xvalue (an “eXpiring” value) also refers to an object, usually near the end of its lifetime.
- A glvalue (“generalized” lvalue) is an lvalue or an xvalue.
- An rvalue...is an xvalue, a temporary object (12.2) or subobject thereof, or a value that is not associated with an object.
- A prvalue (“pure” rvalue) is an rvalue that is not an xvalue.

So temporary objects are prvalues. Even though they are modifiable (defined later in the same section). Even though with construction and destruction they end up needing a fixed region, or at least consistency in region contents.

I don't have a great answer here. My guess is we'll probably end up turning on some inlining for temporaries to start, but not all.

Oh, right, so what's that 'isTemporaryPRValue' doing? That's taking the few cases where the AST wants a prvalue but the analyzer really needs a valid Loc, and conjuring a region on the fly. That's important for things that need to have referential identity, like the 'this' pointer in a method called on a temporary, or (say) a temporary that's been lifetime-extended.

Since this is right at the end of the constructor call, this always creates and binds the value if we hit the constructor. See my patch - it seems to work fine :) What am I missing?

That breaks the invariant that a non-Loc-type prvalue is not a Loc. If you tried to assign this directly into another struct, you'd effectively be assigning an address instead of a set of bindings. IIRC there's some code in the analyzer that tries to handle this case anyway, so it may not break down immediately, but I'd rather not lose that assumption.

I'm confused - isn't that already the case currently? Have you looked at http://reviews.llvm.org/D4740? It's very small, and I don't see where I regress an assumption that's not already violated (if I remove the isTemporaryPRValue section loads of tests break, so I assume changing the layout of a temporary section will make significant changes to the analyzer necessary).

Sorry, I meant removing the isTemporaryPRValue thing would break invariants. Enabling inlining doesn't break any existing invariants, but it makes things worse than they currently are. This is the test case I have in mind:

struct Test {
  bool flag;
  ~Test {
    if (flag)
      clang_analyzer_checkReached(true);
  }
}

void use(Test arg) {
arg.flag = true;
}

void topLevel() {
Test t;
use(t);
}

The call 'test(t)' involves a copy constructor, which the analyzer treats as a temporary construction. Maybe that's the problem—this is really a different sort of construction. In any case, though, who's responsible for destroying 'arg'? I think it has to be the caller, because the callee may not be inlined...but maybe I'm wrong about that. Still, the value of 'this' won't match between construction at the call site and use within the 'use' method.

That makes sense, but it would seem to me like that needs to be fixed when passing a parameter by value - that seems to be incorrectly modeled then.

I think it'd be fine to model this as a copy constructor (Richard might know more), but I think in that case we'd need to call the destructors for the temporary outside *and* the destructor for the parameter inside. That is, instead of having 2 destructor calls (one for 't' and one for the temporary), we'd have three: One for 't', one for the temporary (both with flag == false) and one for the parameter, executed inside the function (with flag == true).
On the other hand the more correct thing to do is probably to somehow use the memory region of the temporary we hand in for 'arg'. I have no idea where in the static analyzer this is handled though - I dug through Store.cpp and RegionStore.cpp but couldn't figure out where the copy of the regions really happens...

I think you're looking for CallEvent::getInitialStackFrameContents (and its overrides).

I'd be concerned about modeling more constructors/destructors than actually exist at runtime—the constructor may have observable side effects. According to the standard, construction and destruction properly happens on the caller side. So the "best" way to deal with this might be to have the parameter regions somehow alias the temporary regions...or maybe for all class-typed parameters to be passed by reference.

Anyway, really I think we should just continue not inlining these particular destructors at the moment, which is conservative and probably not a dealbreaker.

Hm, none of that solves the original problem. I guess the choice there is to break the assumption that all prvalue sruct values are NonLocs (in which case I'd prefer that none of them were), or decide that we're okay with not being consistent about 'this' between the constructor and the destructor and making sure we update the original binding with the results of construction. (I'd expect that to already be working, though, so printing the graph of ProgramStates ought to help see where the binding in your original example is being dropped.)

Jordan


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Inlining temporary destructors in the analyzer

Manuel Klimek
On Tue, Aug 5, 2014 at 7:13 PM, Jordan Rose <[hidden email]> wrote:

On Aug 1, 2014, at 4:35 , Manuel Klimek <[hidden email]> wrote:

On Fri, Aug 1, 2014 at 1:02 AM, Jordan Rose <[hidden email]> wrote:

On Jul 31, 2014, at 12:54 , Manuel Klimek <[hidden email]> wrote:

On Thu, Jul 31, 2014 at 9:51 PM, Jordan Rose <[hidden email]> wrote:

On Jul 31, 2014, at 12:46 , Manuel Klimek <[hidden email]> wrote:

On Thu, Jul 31, 2014 at 9:36 PM, Jordan Rose <[hidden email]> wrote:

On Jul 31, 2014, at 1:20 , Manuel Klimek <[hidden email]> wrote:

On Thu, Jul 31, 2014 at 5:19 AM, Jordan Rose <[hidden email]> wrote:

On Jul 30, 2014, at 8:14 , Manuel Klimek <[hidden email]> wrote:

Hi Jordan,

I made some progress on inlining temporary destructors. My main problem currently is that somehow the initializer values in my constructor get lost if I don't also reference 'this' via some other way.

An examples shows that best. Consider:
struct A {
  A(int* y) : z(y) {}
  ~A() { *z = 23; }
  int *z;
};
void f() {
  int x = 0;
  {(A(&x));}
}

This leads to a "Dereference of undefined pointer value" warning on *z = 23.
Funnily enough this warning goes away (and all values get correctly propagated) if change the constructor to be:
A(int* y) : z(y) { (void)this; }

Obviously this also works fine if I use a local variable instead of a temporary.

Any ideas what might be going wrong here would be greatly appreciated.

My guess is that the temporary is an rvalue, and in the analyzer a NonLoc, and that the only reason this even appears to work is because we needed to be able to call methods on temporaries and so will create a temporary region for them on demand to serve as the 'this' pointer. However, I don't think we expect that to happen for constructors, and so we don't keep the region around after the constructor has finished executing. (That is, when it's just a method being called—and neither the constructor nor destructor are inlined—it doesn't actually matter what you do to the contents of the region, because nothing else can access it.)


So, funnily enough if I remove this code:
  if (isTemporaryPRValue(CCE, ThisV))
  ThisV = state->getSVal(ThisV.castAs<Loc>());
which puts the temp region into a lazycompoundregion, the memory seems to be correctly handled. I have not yet looked what else this breaks though.

Okay, analyzer design time. :-)

C++ has a distinction between lvalues and rvalues (actually glvalues and prvalues). There are a number of things that go into this distinction, but the part the analyzer cares about is that glvalues refer to true objects and prvalues do not. (In C++ parlance, "object" refers to anything with storage, a type, and lifetime, not just class instances. In the analyzer, we call such things "regions".)

If you were to build a very literal C++ compiler, every glvalue would be represented by an address. PRValues*, on the other hand, don't really need addresses; they only exist to be used transiently.

I'm not sure how "exists transiently" equals "doesn't need an address". At least in C++, this seems incorrect (?) for objects.

IMHO it's a problem with the standard, and this extra invention of "modifiable" seems weird to me.


* How do you capitalize "prvalue"?

The analyzer pretty much always believes the compiler when it comes to glvalues vs. prvalues. That means that the value of a glvalue expression is always a Loc. A prvalue expression, on the other hand, will only be a Loc if it's a pointer type*. If you're talking about a struct prvalue, what you're really talking about is the bindings in its memory region—it has no identity. This is represented by a CompoundVal or LazyCompoundVal. This model worked great for C, but it's starting to show strain in C++.

* or one of the other types mentioned in Loc::isLocType. But not all of those can be expression values.

Besides "regular" temporaries, this is also a problem for prvalue function arguments. When the analyzer inlines a function, it just blithely binds the arguments to the proper VarRegions created for each ParmVarDecl in the current stack frame. That doesn't really fly if we're supposed to be calling the copy constructor. Worse, the copy constructor call happens outside of the function call, which means it happens before we've even decided whether or not to inline, which means there are no VarRegions yet to construct into. We've/I've been putting off dealing with this one for a while; we get around it mostly because we don't inline constructors for temporaries.

The general temporary problem goes to what Richard was saying: a temporary expression is an rvalue, but if we have to run a constructor then clearly there was a region at some point. The C++11 standard (at least the version I have) is not wonderful about classifying temporaries ([basic.lval]p1):

- An lvalue...designates a function or an object.
- An xvalue (an “eXpiring” value) also refers to an object, usually near the end of its lifetime.
- A glvalue (“generalized” lvalue) is an lvalue or an xvalue.
- An rvalue...is an xvalue, a temporary object (12.2) or subobject thereof, or a value that is not associated with an object.
- A prvalue (“pure” rvalue) is an rvalue that is not an xvalue.

So temporary objects are prvalues. Even though they are modifiable (defined later in the same section). Even though with construction and destruction they end up needing a fixed region, or at least consistency in region contents.

I don't have a great answer here. My guess is we'll probably end up turning on some inlining for temporaries to start, but not all.

Oh, right, so what's that 'isTemporaryPRValue' doing? That's taking the few cases where the AST wants a prvalue but the analyzer really needs a valid Loc, and conjuring a region on the fly. That's important for things that need to have referential identity, like the 'this' pointer in a method called on a temporary, or (say) a temporary that's been lifetime-extended.

Since this is right at the end of the constructor call, this always creates and binds the value if we hit the constructor. See my patch - it seems to work fine :) What am I missing?

That breaks the invariant that a non-Loc-type prvalue is not a Loc. If you tried to assign this directly into another struct, you'd effectively be assigning an address instead of a set of bindings. IIRC there's some code in the analyzer that tries to handle this case anyway, so it may not break down immediately, but I'd rather not lose that assumption.

I'm confused - isn't that already the case currently? Have you looked at http://reviews.llvm.org/D4740? It's very small, and I don't see where I regress an assumption that's not already violated (if I remove the isTemporaryPRValue section loads of tests break, so I assume changing the layout of a temporary section will make significant changes to the analyzer necessary).

Sorry, I meant removing the isTemporaryPRValue thing would break invariants. Enabling inlining doesn't break any existing invariants, but it makes things worse than they currently are. This is the test case I have in mind:

struct Test {
  bool flag;
  ~Test {
    if (flag)
      clang_analyzer_checkReached(true);
  }
}

void use(Test arg) {
arg.flag = true;
}

void topLevel() {
Test t;
use(t);
}

The call 'test(t)' involves a copy constructor, which the analyzer treats as a temporary construction. Maybe that's the problem—this is really a different sort of construction. In any case, though, who's responsible for destroying 'arg'? I think it has to be the caller, because the callee may not be inlined...but maybe I'm wrong about that. Still, the value of 'this' won't match between construction at the call site and use within the 'use' method.

That makes sense, but it would seem to me like that needs to be fixed when passing a parameter by value - that seems to be incorrectly modeled then.

I think it'd be fine to model this as a copy constructor (Richard might know more), but I think in that case we'd need to call the destructors for the temporary outside *and* the destructor for the parameter inside. That is, instead of having 2 destructor calls (one for 't' and one for the temporary), we'd have three: One for 't', one for the temporary (both with flag == false) and one for the parameter, executed inside the function (with flag == true).
On the other hand the more correct thing to do is probably to somehow use the memory region of the temporary we hand in for 'arg'. I have no idea where in the static analyzer this is handled though - I dug through Store.cpp and RegionStore.cpp but couldn't figure out where the copy of the regions really happens...

I think you're looking for CallEvent::getInitialStackFrameContents (and its overrides).

I'd be concerned about modeling more constructors/destructors than actually exist at runtime—the constructor may have observable side effects. According to the standard, construction and destruction properly happens on the caller side. So the "best" way to deal with this might be to have the parameter regions somehow alias the temporary regions...or maybe for all class-typed parameters to be passed by reference.

Yep, I assumed the idea to fix that would be to look at where we copy the region (currently without invoking the constructor) and instead creating an alias.
 
Anyway, really I think we should just continue not inlining these particular destructors at the moment, which is conservative and probably not a dealbreaker.

I'm not sure I understand what you mean by "not inlining these particular destructors". Because if we inline temporary destructors, we will inline these destructors. I also think inlining is benign from the call site point of view, because all that currently seems to happen is that we inline the destructor "as if" we didn't inline the function to which the object was passed as by-value argument, or am I missing something here?
 
Hm, none of that solves the original problem. I guess the choice there is to break the assumption that all prvalue sruct values are NonLocs (in which case I'd prefer that none of them were), or decide that we're okay with not being consistent about 'this' between the constructor and the destructor and making sure we update the original binding with the results of construction. (I'd expect that to already be working, though, so printing the graph of ProgramStates ought to help see where the binding in your original example is being dropped.)

Oh, my original example now works fine - the only thing I know of that currently doesn't work (if I combine all my open patches) is the pass-by-value calls. I have not seen a mismatch between this in constructor and destructor. (if you look at the patch, what I did was change the scan method for LazyCompoundValue - it seems to me like it didn't handle a couple of cases, but tried to do basically the same as scan of MemRegion anyway).
 

Jordan



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Inlining temporary destructors in the analyzer

Manuel Klimek
On Tue, Aug 5, 2014 at 7:28 PM, Manuel Klimek <[hidden email]> wrote:
On Tue, Aug 5, 2014 at 7:13 PM, Jordan Rose <[hidden email]> wrote:

On Aug 1, 2014, at 4:35 , Manuel Klimek <[hidden email]> wrote:

On Fri, Aug 1, 2014 at 1:02 AM, Jordan Rose <[hidden email]> wrote:

On Jul 31, 2014, at 12:54 , Manuel Klimek <[hidden email]> wrote:

On Thu, Jul 31, 2014 at 9:51 PM, Jordan Rose <[hidden email]> wrote:

On Jul 31, 2014, at 12:46 , Manuel Klimek <[hidden email]> wrote:

On Thu, Jul 31, 2014 at 9:36 PM, Jordan Rose <[hidden email]> wrote:

On Jul 31, 2014, at 1:20 , Manuel Klimek <[hidden email]> wrote:

On Thu, Jul 31, 2014 at 5:19 AM, Jordan Rose <[hidden email]> wrote:

On Jul 30, 2014, at 8:14 , Manuel Klimek <[hidden email]> wrote:

Hi Jordan,

I made some progress on inlining temporary destructors. My main problem currently is that somehow the initializer values in my constructor get lost if I don't also reference 'this' via some other way.

An examples shows that best. Consider:
struct A {
  A(int* y) : z(y) {}
  ~A() { *z = 23; }
  int *z;
};
void f() {
  int x = 0;
  {(A(&x));}
}

This leads to a "Dereference of undefined pointer value" warning on *z = 23.
Funnily enough this warning goes away (and all values get correctly propagated) if change the constructor to be:
A(int* y) : z(y) { (void)this; }

Obviously this also works fine if I use a local variable instead of a temporary.

Any ideas what might be going wrong here would be greatly appreciated.

My guess is that the temporary is an rvalue, and in the analyzer a NonLoc, and that the only reason this even appears to work is because we needed to be able to call methods on temporaries and so will create a temporary region for them on demand to serve as the 'this' pointer. However, I don't think we expect that to happen for constructors, and so we don't keep the region around after the constructor has finished executing. (That is, when it's just a method being called—and neither the constructor nor destructor are inlined—it doesn't actually matter what you do to the contents of the region, because nothing else can access it.)


So, funnily enough if I remove this code:
  if (isTemporaryPRValue(CCE, ThisV))
  ThisV = state->getSVal(ThisV.castAs<Loc>());
which puts the temp region into a lazycompoundregion, the memory seems to be correctly handled. I have not yet looked what else this breaks though.

Okay, analyzer design time. :-)

C++ has a distinction between lvalues and rvalues (actually glvalues and prvalues). There are a number of things that go into this distinction, but the part the analyzer cares about is that glvalues refer to true objects and prvalues do not. (In C++ parlance, "object" refers to anything with storage, a type, and lifetime, not just class instances. In the analyzer, we call such things "regions".)

If you were to build a very literal C++ compiler, every glvalue would be represented by an address. PRValues*, on the other hand, don't really need addresses; they only exist to be used transiently.

I'm not sure how "exists transiently" equals "doesn't need an address". At least in C++, this seems incorrect (?) for objects.

IMHO it's a problem with the standard, and this extra invention of "modifiable" seems weird to me.


* How do you capitalize "prvalue"?

The analyzer pretty much always believes the compiler when it comes to glvalues vs. prvalues. That means that the value of a glvalue expression is always a Loc. A prvalue expression, on the other hand, will only be a Loc if it's a pointer type*. If you're talking about a struct prvalue, what you're really talking about is the bindings in its memory region—it has no identity. This is represented by a CompoundVal or LazyCompoundVal. This model worked great for C, but it's starting to show strain in C++.

* or one of the other types mentioned in Loc::isLocType. But not all of those can be expression values.

Besides "regular" temporaries, this is also a problem for prvalue function arguments. When the analyzer inlines a function, it just blithely binds the arguments to the proper VarRegions created for each ParmVarDecl in the current stack frame. That doesn't really fly if we're supposed to be calling the copy constructor. Worse, the copy constructor call happens outside of the function call, which means it happens before we've even decided whether or not to inline, which means there are no VarRegions yet to construct into. We've/I've been putting off dealing with this one for a while; we get around it mostly because we don't inline constructors for temporaries.

The general temporary problem goes to what Richard was saying: a temporary expression is an rvalue, but if we have to run a constructor then clearly there was a region at some point. The C++11 standard (at least the version I have) is not wonderful about classifying temporaries ([basic.lval]p1):

- An lvalue...designates a function or an object.
- An xvalue (an “eXpiring” value) also refers to an object, usually near the end of its lifetime.
- A glvalue (“generalized” lvalue) is an lvalue or an xvalue.
- An rvalue...is an xvalue, a temporary object (12.2) or subobject thereof, or a value that is not associated with an object.
- A prvalue (“pure” rvalue) is an rvalue that is not an xvalue.

So temporary objects are prvalues. Even though they are modifiable (defined later in the same section). Even though with construction and destruction they end up needing a fixed region, or at least consistency in region contents.

I don't have a great answer here. My guess is we'll probably end up turning on some inlining for temporaries to start, but not all.

Oh, right, so what's that 'isTemporaryPRValue' doing? That's taking the few cases where the AST wants a prvalue but the analyzer really needs a valid Loc, and conjuring a region on the fly. That's important for things that need to have referential identity, like the 'this' pointer in a method called on a temporary, or (say) a temporary that's been lifetime-extended.

Since this is right at the end of the constructor call, this always creates and binds the value if we hit the constructor. See my patch - it seems to work fine :) What am I missing?

That breaks the invariant that a non-Loc-type prvalue is not a Loc. If you tried to assign this directly into another struct, you'd effectively be assigning an address instead of a set of bindings. IIRC there's some code in the analyzer that tries to handle this case anyway, so it may not break down immediately, but I'd rather not lose that assumption.

I'm confused - isn't that already the case currently? Have you looked at http://reviews.llvm.org/D4740? It's very small, and I don't see where I regress an assumption that's not already violated (if I remove the isTemporaryPRValue section loads of tests break, so I assume changing the layout of a temporary section will make significant changes to the analyzer necessary).

Sorry, I meant removing the isTemporaryPRValue thing would break invariants. Enabling inlining doesn't break any existing invariants, but it makes things worse than they currently are. This is the test case I have in mind:

struct Test {
  bool flag;
  ~Test {
    if (flag)
      clang_analyzer_checkReached(true);
  }
}

void use(Test arg) {
arg.flag = true;
}

void topLevel() {
Test t;
use(t);
}

The call 'test(t)' involves a copy constructor, which the analyzer treats as a temporary construction. Maybe that's the problem—this is really a different sort of construction. In any case, though, who's responsible for destroying 'arg'? I think it has to be the caller, because the callee may not be inlined...but maybe I'm wrong about that. Still, the value of 'this' won't match between construction at the call site and use within the 'use' method.

That makes sense, but it would seem to me like that needs to be fixed when passing a parameter by value - that seems to be incorrectly modeled then.

I think it'd be fine to model this as a copy constructor (Richard might know more), but I think in that case we'd need to call the destructors for the temporary outside *and* the destructor for the parameter inside. That is, instead of having 2 destructor calls (one for 't' and one for the temporary), we'd have three: One for 't', one for the temporary (both with flag == false) and one for the parameter, executed inside the function (with flag == true).
On the other hand the more correct thing to do is probably to somehow use the memory region of the temporary we hand in for 'arg'. I have no idea where in the static analyzer this is handled though - I dug through Store.cpp and RegionStore.cpp but couldn't figure out where the copy of the regions really happens...

I think you're looking for CallEvent::getInitialStackFrameContents (and its overrides).

I'd be concerned about modeling more constructors/destructors than actually exist at runtime—the constructor may have observable side effects. According to the standard, construction and destruction properly happens on the caller side. So the "best" way to deal with this might be to have the parameter regions somehow alias the temporary regions...or maybe for all class-typed parameters to be passed by reference.

Yep, I assumed the idea to fix that would be to look at where we copy the region (currently without invoking the constructor) and instead creating an alias.
 
Anyway, really I think we should just continue not inlining these particular destructors at the moment, which is conservative and probably not a dealbreaker.

I'm not sure I understand what you mean by "not inlining these particular destructors". Because if we inline temporary destructors, we will inline these destructors. I also think inlining is benign from the call site point of view, because all that currently seems to happen is that we inline the destructor "as if" we didn't inline the function to which the object was passed as by-value argument, or am I missing something here?

In addition, after having looked at the standard, it seems to me like the standard does not require the temporary to be constructed in the slot for the by-value argument, but it can also copy-construct the by-value argument:
12.2-2:
"""
class X {
public:
X(int);
X(const X&);
X& operator=(const X&);
~X();
};
class Y {
public:
Y(int);
Y(Y&&);
~Y();
};
X f(X);
Y g(Y);
void h() {
X a(1);
X b = f(X(2));
Y c = g(Y(3));
a = f(a);
}
An implementation might use a temporary in which to construct X(2) before passing it to f() using X’s copy
constructor;
"""

Thus, it seems to me like in the example the author of the code must deal correctly with the destructor being called on the unchanged object anyway.

 
Hm, none of that solves the original problem. I guess the choice there is to break the assumption that all prvalue sruct values are NonLocs (in which case I'd prefer that none of them were), or decide that we're okay with not being consistent about 'this' between the constructor and the destructor and making sure we update the original binding with the results of construction. (I'd expect that to already be working, though, so printing the graph of ProgramStates ought to help see where the binding in your original example is being dropped.)

Oh, my original example now works fine - the only thing I know of that currently doesn't work (if I combine all my open patches) is the pass-by-value calls. I have not seen a mismatch between this in constructor and destructor. (if you look at the patch, what I did was change the scan method for LazyCompoundValue - it seems to me like it didn't handle a couple of cases, but tried to do basically the same as scan of MemRegion anyway).
 

Jordan




_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Inlining temporary destructors in the analyzer

Tim Toomay
In reply to this post by Manuel Klimek

On 8/6/2014 12:51 PM, [hidden email] wrote:

> Hi Jordan,
> >>>>>>
> >>>>>>I made some progress on inlining temporary destructors. My main
> >>>>>>problem currently is that somehow the initializer values in my constructor
> >>>>>>get lost if I don't also reference 'this' via some other way.
> >>>>>>
> >>>>>>An examples shows that best. Consider:
> >>>>>>struct A {
> >>>>>>   A(int* y) : z(y) {}
> >>>>>>   ~A() { *z = 23; }
> >>>>>>   int *z;
> >>>>>>};
> >>>>>>void f() {
> >>>>>>   int x = 0;
> >>>>>>   {(A(&x));}
> >>>>>>}
> >>>>>>
> >>>>>>This leads to a "Dereference of undefined pointer value" warning on
> >>>>>>*z = 23.
> >>>>>>Funnily enough this warning goes away (and all values get correctly
> >>>>>>propagated) if change the constructor to be:
> >>>>>>A(int* y) : z(y) { (void)this; }
> >>>>>>

It sounds like Manuel is saying the warning is wrong.  The warning is actually quite helpful because the example code is unrealistic and the warning brings attention to that fact quite nicely.  Sorry if I am missing the boat :-) There is no guarantee the code won't crash on destruction of the object, so why worry about the initializer values?  It would help me understand what the problem is if the example made more sense.  Maybe like this:

struct A {
        A(int& y) : z(&y) {}
  ~A() { *z = 23; }
        int *z;
};

void f() {
int x = 0;
{(A(&x));}
}

Do the initializer values behave in this circumstance?

Cheers!
Tim


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Inlining temporary destructors in the analyzer

Jordan Rose
In reply to this post by Manuel Klimek

On Aug 6, 2014, at 12:51 , Manuel Klimek <[hidden email]> wrote:

On Tue, Aug 5, 2014 at 7:28 PM, Manuel Klimek <[hidden email]> wrote:
On Tue, Aug 5, 2014 at 7:13 PM, Jordan Rose <[hidden email]> wrote:

On Aug 1, 2014, at 4:35 , Manuel Klimek <[hidden email]> wrote:

On Fri, Aug 1, 2014 at 1:02 AM, Jordan Rose <[hidden email]> wrote:

On Jul 31, 2014, at 12:54 , Manuel Klimek <[hidden email]> wrote:

On Thu, Jul 31, 2014 at 9:51 PM, Jordan Rose <[hidden email]> wrote:

On Jul 31, 2014, at 12:46 , Manuel Klimek <[hidden email]> wrote:

On Thu, Jul 31, 2014 at 9:36 PM, Jordan Rose <[hidden email]> wrote:

On Jul 31, 2014, at 1:20 , Manuel Klimek <[hidden email]> wrote:

On Thu, Jul 31, 2014 at 5:19 AM, Jordan Rose <[hidden email]> wrote:

On Jul 30, 2014, at 8:14 , Manuel Klimek <[hidden email]> wrote:

Hi Jordan,

I made some progress on inlining temporary destructors. My main problem currently is that somehow the initializer values in my constructor get lost if I don't also reference 'this' via some other way.

An examples shows that best. Consider:
struct A {
  A(int* y) : z(y) {}
  ~A() { *z = 23; }
  int *z;
};
void f() {
  int x = 0;
  {(A(&x));}
}

This leads to a "Dereference of undefined pointer value" warning on *z = 23.
Funnily enough this warning goes away (and all values get correctly propagated) if change the constructor to be:
A(int* y) : z(y) { (void)this; }

Obviously this also works fine if I use a local variable instead of a temporary.

Any ideas what might be going wrong here would be greatly appreciated.

My guess is that the temporary is an rvalue, and in the analyzer a NonLoc, and that the only reason this even appears to work is because we needed to be able to call methods on temporaries and so will create a temporary region for them on demand to serve as the 'this' pointer. However, I don't think we expect that to happen for constructors, and so we don't keep the region around after the constructor has finished executing. (That is, when it's just a method being called—and neither the constructor nor destructor are inlined—it doesn't actually matter what you do to the contents of the region, because nothing else can access it.)


So, funnily enough if I remove this code:
  if (isTemporaryPRValue(CCE, ThisV))
  ThisV = state->getSVal(ThisV.castAs<Loc>());
which puts the temp region into a lazycompoundregion, the memory seems to be correctly handled. I have not yet looked what else this breaks though.

Okay, analyzer design time. :-)

C++ has a distinction between lvalues and rvalues (actually glvalues and prvalues). There are a number of things that go into this distinction, but the part the analyzer cares about is that glvalues refer to true objects and prvalues do not. (In C++ parlance, "object" refers to anything with storage, a type, and lifetime, not just class instances. In the analyzer, we call such things "regions".)

If you were to build a very literal C++ compiler, every glvalue would be represented by an address. PRValues*, on the other hand, don't really need addresses; they only exist to be used transiently.

I'm not sure how "exists transiently" equals "doesn't need an address". At least in C++, this seems incorrect (?) for objects.

IMHO it's a problem with the standard, and this extra invention of "modifiable" seems weird to me.


* How do you capitalize "prvalue"?

The analyzer pretty much always believes the compiler when it comes to glvalues vs. prvalues. That means that the value of a glvalue expression is always a Loc. A prvalue expression, on the other hand, will only be a Loc if it's a pointer type*. If you're talking about a struct prvalue, what you're really talking about is the bindings in its memory region—it has no identity. This is represented by a CompoundVal or LazyCompoundVal. This model worked great for C, but it's starting to show strain in C++.

* or one of the other types mentioned in Loc::isLocType. But not all of those can be expression values.

Besides "regular" temporaries, this is also a problem for prvalue function arguments. When the analyzer inlines a function, it just blithely binds the arguments to the proper VarRegions created for each ParmVarDecl in the current stack frame. That doesn't really fly if we're supposed to be calling the copy constructor. Worse, the copy constructor call happens outside of the function call, which means it happens before we've even decided whether or not to inline, which means there are no VarRegions yet to construct into. We've/I've been putting off dealing with this one for a while; we get around it mostly because we don't inline constructors for temporaries.

The general temporary problem goes to what Richard was saying: a temporary expression is an rvalue, but if we have to run a constructor then clearly there was a region at some point. The C++11 standard (at least the version I have) is not wonderful about classifying temporaries ([basic.lval]p1):

- An lvalue...designates a function or an object.
- An xvalue (an “eXpiring” value) also refers to an object, usually near the end of its lifetime.
- A glvalue (“generalized” lvalue) is an lvalue or an xvalue.
- An rvalue...is an xvalue, a temporary object (12.2) or subobject thereof, or a value that is not associated with an object.
- A prvalue (“pure” rvalue) is an rvalue that is not an xvalue.

So temporary objects are prvalues. Even though they are modifiable (defined later in the same section). Even though with construction and destruction they end up needing a fixed region, or at least consistency in region contents.

I don't have a great answer here. My guess is we'll probably end up turning on some inlining for temporaries to start, but not all.

Oh, right, so what's that 'isTemporaryPRValue' doing? That's taking the few cases where the AST wants a prvalue but the analyzer really needs a valid Loc, and conjuring a region on the fly. That's important for things that need to have referential identity, like the 'this' pointer in a method called on a temporary, or (say) a temporary that's been lifetime-extended.

Since this is right at the end of the constructor call, this always creates and binds the value if we hit the constructor. See my patch - it seems to work fine :) What am I missing?

That breaks the invariant that a non-Loc-type prvalue is not a Loc. If you tried to assign this directly into another struct, you'd effectively be assigning an address instead of a set of bindings. IIRC there's some code in the analyzer that tries to handle this case anyway, so it may not break down immediately, but I'd rather not lose that assumption.

I'm confused - isn't that already the case currently? Have you looked at http://reviews.llvm.org/D4740? It's very small, and I don't see where I regress an assumption that's not already violated (if I remove the isTemporaryPRValue section loads of tests break, so I assume changing the layout of a temporary section will make significant changes to the analyzer necessary).

Sorry, I meant removing the isTemporaryPRValue thing would break invariants. Enabling inlining doesn't break any existing invariants, but it makes things worse than they currently are. This is the test case I have in mind:

struct Test {
  bool flag;
  ~Test {
    if (flag)
      clang_analyzer_checkReached(true);
  }
}

void use(Test arg) {
arg.flag = true;
}

void topLevel() {
Test t;
use(t);
}

The call 'test(t)' involves a copy constructor, which the analyzer treats as a temporary construction. Maybe that's the problem—this is really a different sort of construction. In any case, though, who's responsible for destroying 'arg'? I think it has to be the caller, because the callee may not be inlined...but maybe I'm wrong about that. Still, the value of 'this' won't match between construction at the call site and use within the 'use' method.

That makes sense, but it would seem to me like that needs to be fixed when passing a parameter by value - that seems to be incorrectly modeled then.

I think it'd be fine to model this as a copy constructor (Richard might know more), but I think in that case we'd need to call the destructors for the temporary outside *and* the destructor for the parameter inside. That is, instead of having 2 destructor calls (one for 't' and one for the temporary), we'd have three: One for 't', one for the temporary (both with flag == false) and one for the parameter, executed inside the function (with flag == true).
On the other hand the more correct thing to do is probably to somehow use the memory region of the temporary we hand in for 'arg'. I have no idea where in the static analyzer this is handled though - I dug through Store.cpp and RegionStore.cpp but couldn't figure out where the copy of the regions really happens...

I think you're looking for CallEvent::getInitialStackFrameContents (and its overrides).

I'd be concerned about modeling more constructors/destructors than actually exist at runtime—the constructor may have observable side effects. According to the standard, construction and destruction properly happens on the caller side. So the "best" way to deal with this might be to have the parameter regions somehow alias the temporary regions...or maybe for all class-typed parameters to be passed by reference.

Yep, I assumed the idea to fix that would be to look at where we copy the region (currently without invoking the constructor) and instead creating an alias.
 
Anyway, really I think we should just continue not inlining these particular destructors at the moment, which is conservative and probably not a dealbreaker.

I'm not sure I understand what you mean by "not inlining these particular destructors". Because if we inline temporary destructors, we will inline these destructors. I also think inlining is benign from the call site point of view, because all that currently seems to happen is that we inline the destructor "as if" we didn't inline the function to which the object was passed as by-value argument, or am I missing something here?

In addition, after having looked at the standard, it seems to me like the standard does not require the temporary to be constructed in the slot for the by-value argument, but it can also copy-construct the by-value argument:
12.2-2:
"""
class X {
public:
X(int);
X(const X&);
X& operator=(const X&);
~X();
};
class Y {
public:
Y(int);
Y(Y&&);
~Y();
};
X f(X);
Y g(Y);
void h() {
X a(1);
X b = f(X(2));
Y c = g(Y(3));
a = f(a);
}
An implementation might use a temporary in which to construct X(2) before passing it to f() using X’s copy
constructor;
"""

Thus, it seems to me like in the example the author of the code must deal correctly with the destructor being called on the unchanged object anyway.

That isn't the problem. It's the copy-constructed object that then ends up unchanged.

struct X {
void **ptr;
X() { static void *global; ptr = &global; }
~X() { *ptr = nullptr; }
}
void nullOut(X x) {
x.ptr = nullptr;
//
}
void test() {
X local;
nullOut(local); // pass-by-value param constructed and destroyed here
// 'local' destroyed here.
}

In this example, there is always one copy-construction (for the value passed to 'nullOut') and one temporary destructor (for the same). However, we should see a null pointer dereference here (because we can inline 'nullOut', and that modifies the parameter region) but we don't (because the parameter region isn't actually the region that gets destroyed).

Anyway, really I think we should just continue not inlining these particular destructors at the moment, which is conservative and probably not a dealbreaker.

I'm not sure I understand what you mean by "not inlining these particular destructors". Because if we inline temporary destructors, we will inline these destructors. I also think inlining is benign from the call site point of view, because all that currently seems to happen is that we inline the destructor "as if" we didn't inline the function to which the object was passed as by-value argument, or am I missing something here?

Anything we do or don't inline is up to us; we can have a new rule that says we don't inline dtors of temporaries copy-constructed only for use as parameters. (We've had plenty of rules like this along the way, and still have some: we don't inline constructors for 'new' expressions, we don't do array *tors properly, and we ban particular methods in the STL that do more harm than good.)

Jordan

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Inlining temporary destructors in the analyzer

Manuel Klimek
On Thu, Aug 7, 2014 at 6:57 PM, Jordan Rose <[hidden email]> wrote:

On Aug 6, 2014, at 12:51 , Manuel Klimek <[hidden email]> wrote:

On Tue, Aug 5, 2014 at 7:28 PM, Manuel Klimek <[hidden email]> wrote:
On Tue, Aug 5, 2014 at 7:13 PM, Jordan Rose <[hidden email]> wrote:

On Aug 1, 2014, at 4:35 , Manuel Klimek <[hidden email]> wrote:

On Fri, Aug 1, 2014 at 1:02 AM, Jordan Rose <[hidden email]> wrote:

On Jul 31, 2014, at 12:54 , Manuel Klimek <[hidden email]> wrote:

On Thu, Jul 31, 2014 at 9:51 PM, Jordan Rose <[hidden email]> wrote:

On Jul 31, 2014, at 12:46 , Manuel Klimek <[hidden email]> wrote:

On Thu, Jul 31, 2014 at 9:36 PM, Jordan Rose <[hidden email]> wrote:

On Jul 31, 2014, at 1:20 , Manuel Klimek <[hidden email]> wrote:

On Thu, Jul 31, 2014 at 5:19 AM, Jordan Rose <[hidden email]> wrote:

On Jul 30, 2014, at 8:14 , Manuel Klimek <[hidden email]> wrote:

Hi Jordan,

I made some progress on inlining temporary destructors. My main problem currently is that somehow the initializer values in my constructor get lost if I don't also reference 'this' via some other way.

An examples shows that best. Consider:
struct A {
  A(int* y) : z(y) {}
  ~A() { *z = 23; }
  int *z;
};
void f() {
  int x = 0;
  {(A(&x));}
}

This leads to a "Dereference of undefined pointer value" warning on *z = 23.
Funnily enough this warning goes away (and all values get correctly propagated) if change the constructor to be:
A(int* y) : z(y) { (void)this; }

Obviously this also works fine if I use a local variable instead of a temporary.

Any ideas what might be going wrong here would be greatly appreciated.

My guess is that the temporary is an rvalue, and in the analyzer a NonLoc, and that the only reason this even appears to work is because we needed to be able to call methods on temporaries and so will create a temporary region for them on demand to serve as the 'this' pointer. However, I don't think we expect that to happen for constructors, and so we don't keep the region around after the constructor has finished executing. (That is, when it's just a method being called—and neither the constructor nor destructor are inlined—it doesn't actually matter what you do to the contents of the region, because nothing else can access it.)


So, funnily enough if I remove this code:
  if (isTemporaryPRValue(CCE, ThisV))
  ThisV = state->getSVal(ThisV.castAs<Loc>());
which puts the temp region into a lazycompoundregion, the memory seems to be correctly handled. I have not yet looked what else this breaks though.

Okay, analyzer design time. :-)

C++ has a distinction between lvalues and rvalues (actually glvalues and prvalues). There are a number of things that go into this distinction, but the part the analyzer cares about is that glvalues refer to true objects and prvalues do not. (In C++ parlance, "object" refers to anything with storage, a type, and lifetime, not just class instances. In the analyzer, we call such things "regions".)

If you were to build a very literal C++ compiler, every glvalue would be represented by an address. PRValues*, on the other hand, don't really need addresses; they only exist to be used transiently.

I'm not sure how "exists transiently" equals "doesn't need an address". At least in C++, this seems incorrect (?) for objects.

IMHO it's a problem with the standard, and this extra invention of "modifiable" seems weird to me.


* How do you capitalize "prvalue"?

The analyzer pretty much always believes the compiler when it comes to glvalues vs. prvalues. That means that the value of a glvalue expression is always a Loc. A prvalue expression, on the other hand, will only be a Loc if it's a pointer type*. If you're talking about a struct prvalue, what you're really talking about is the bindings in its memory region—it has no identity. This is represented by a CompoundVal or LazyCompoundVal. This model worked great for C, but it's starting to show strain in C++.

* or one of the other types mentioned in Loc::isLocType. But not all of those can be expression values.

Besides "regular" temporaries, this is also a problem for prvalue function arguments. When the analyzer inlines a function, it just blithely binds the arguments to the proper VarRegions created for each ParmVarDecl in the current stack frame. That doesn't really fly if we're supposed to be calling the copy constructor. Worse, the copy constructor call happens outside of the function call, which means it happens before we've even decided whether or not to inline, which means there are no VarRegions yet to construct into. We've/I've been putting off dealing with this one for a while; we get around it mostly because we don't inline constructors for temporaries.

The general temporary problem goes to what Richard was saying: a temporary expression is an rvalue, but if we have to run a constructor then clearly there was a region at some point. The C++11 standard (at least the version I have) is not wonderful about classifying temporaries ([basic.lval]p1):

- An lvalue...designates a function or an object.
- An xvalue (an “eXpiring” value) also refers to an object, usually near the end of its lifetime.
- A glvalue (“generalized” lvalue) is an lvalue or an xvalue.
- An rvalue...is an xvalue, a temporary object (12.2) or subobject thereof, or a value that is not associated with an object.
- A prvalue (“pure” rvalue) is an rvalue that is not an xvalue.

So temporary objects are prvalues. Even though they are modifiable (defined later in the same section). Even though with construction and destruction they end up needing a fixed region, or at least consistency in region contents.

I don't have a great answer here. My guess is we'll probably end up turning on some inlining for temporaries to start, but not all.

Oh, right, so what's that 'isTemporaryPRValue' doing? That's taking the few cases where the AST wants a prvalue but the analyzer really needs a valid Loc, and conjuring a region on the fly. That's important for things that need to have referential identity, like the 'this' pointer in a method called on a temporary, or (say) a temporary that's been lifetime-extended.

Since this is right at the end of the constructor call, this always creates and binds the value if we hit the constructor. See my patch - it seems to work fine :) What am I missing?

That breaks the invariant that a non-Loc-type prvalue is not a Loc. If you tried to assign this directly into another struct, you'd effectively be assigning an address instead of a set of bindings. IIRC there's some code in the analyzer that tries to handle this case anyway, so it may not break down immediately, but I'd rather not lose that assumption.

I'm confused - isn't that already the case currently? Have you looked at http://reviews.llvm.org/D4740? It's very small, and I don't see where I regress an assumption that's not already violated (if I remove the isTemporaryPRValue section loads of tests break, so I assume changing the layout of a temporary section will make significant changes to the analyzer necessary).

Sorry, I meant removing the isTemporaryPRValue thing would break invariants. Enabling inlining doesn't break any existing invariants, but it makes things worse than they currently are. This is the test case I have in mind:

struct Test {
  bool flag;
  ~Test {
    if (flag)
      clang_analyzer_checkReached(true);
  }
}

void use(Test arg) {
arg.flag = true;
}

void topLevel() {
Test t;
use(t);
}

The call 'test(t)' involves a copy constructor, which the analyzer treats as a temporary construction. Maybe that's the problem—this is really a different sort of construction. In any case, though, who's responsible for destroying 'arg'? I think it has to be the caller, because the callee may not be inlined...but maybe I'm wrong about that. Still, the value of 'this' won't match between construction at the call site and use within the 'use' method.

That makes sense, but it would seem to me like that needs to be fixed when passing a parameter by value - that seems to be incorrectly modeled then.

I think it'd be fine to model this as a copy constructor (Richard might know more), but I think in that case we'd need to call the destructors for the temporary outside *and* the destructor for the parameter inside. That is, instead of having 2 destructor calls (one for 't' and one for the temporary), we'd have three: One for 't', one for the temporary (both with flag == false) and one for the parameter, executed inside the function (with flag == true).
On the other hand the more correct thing to do is probably to somehow use the memory region of the temporary we hand in for 'arg'. I have no idea where in the static analyzer this is handled though - I dug through Store.cpp and RegionStore.cpp but couldn't figure out where the copy of the regions really happens...

I think you're looking for CallEvent::getInitialStackFrameContents (and its overrides).

I'd be concerned about modeling more constructors/destructors than actually exist at runtime—the constructor may have observable side effects. According to the standard, construction and destruction properly happens on the caller side. So the "best" way to deal with this might be to have the parameter regions somehow alias the temporary regions...or maybe for all class-typed parameters to be passed by reference.

Yep, I assumed the idea to fix that would be to look at where we copy the region (currently without invoking the constructor) and instead creating an alias.
 
Anyway, really I think we should just continue not inlining these particular destructors at the moment, which is conservative and probably not a dealbreaker.

I'm not sure I understand what you mean by "not inlining these particular destructors". Because if we inline temporary destructors, we will inline these destructors. I also think inlining is benign from the call site point of view, because all that currently seems to happen is that we inline the destructor "as if" we didn't inline the function to which the object was passed as by-value argument, or am I missing something here?

In addition, after having looked at the standard, it seems to me like the standard does not require the temporary to be constructed in the slot for the by-value argument, but it can also copy-construct the by-value argument:
12.2-2:
"""
class X {
public:
X(int);
X(const X&);
X& operator=(const X&);
~X();
};
class Y {
public:
Y(int);
Y(Y&&);
~Y();
};
X f(X);
Y g(Y);
void h() {
X a(1);
X b = f(X(2));
Y c = g(Y(3));
a = f(a);
}
An implementation might use a temporary in which to construct X(2) before passing it to f() using X’s copy
constructor;
"""

Thus, it seems to me like in the example the author of the code must deal correctly with the destructor being called on the unchanged object anyway.

That isn't the problem. It's the copy-constructed object that then ends up unchanged.

struct X {
void **ptr;
X() { static void *global; ptr = &global; }
~X() { *ptr = nullptr; }
}
void nullOut(X x) {
x.ptr = nullptr;
//
}
void test() {
X local;
nullOut(local); // pass-by-value param constructed and destroyed here
// 'local' destroyed here.
}

In this example, there is always one copy-construction (for the value passed to 'nullOut') and one temporary destructor (for the same). However, we should see a null pointer dereference here (because we can inline 'nullOut', and that modifies the parameter region) but we don't (because the parameter region isn't actually the region that gets destroyed).

If we for some reason decide not to inline the nullOut, it seems to me like we won't see the null pointer dereference either.
Given that the static analyzer has arbitrary limits on inlining etc, I don't see why some remaining false negatives are a problem (false positives in C++ code are already a huge problem, and implementing inlining for temp dtors resolves a couple (I can get more accurate numbers)).
I haven't yet found cases where this would lead to more false positives - examples of false positives could easily convince me that this indeed needs better handling.
 
Anyway, really I think we should just continue not inlining these particular destructors at the moment, which is conservative and probably not a dealbreaker.

I'm not sure I understand what you mean by "not inlining these particular destructors". Because if we inline temporary destructors, we will inline these destructors. I also think inlining is benign from the call site point of view, because all that currently seems to happen is that we inline the destructor "as if" we didn't inline the function to which the object was passed as by-value argument, or am I missing something here?

Anything we do or don't inline is up to us; we can have a new rule that says we don't inline dtors of temporaries copy-constructed only for use as parameters. (We've had plenty of rules like this along the way, and still have some: we don't inline constructors for 'new' expressions, we don't do array *tors properly, and we ban particular methods in the STL that do more harm than good.)

That sounds fine; I'd say we would want to keep the ctor/dtor pair; either inline both, or inline none (as a ctor being inlined but a dtor not being inlined is what I've seen to lead to confusion). If that's what it takes I can take a stab at it (although I'd also be interested in implementing the memory region aliasing - my main problem is that I'm not really sure where that needs to go / how that needs to work).

Cheers,
/Manuel


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
12