[analyzer] Inlining, operator new(), checker callbacks.

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[analyzer] Inlining, operator new(), checker callbacks.

Eric Fiselier via cfe-dev
All right, so how do we want to equip C++ operator new() with checker
callbacks?

~~~

First of all, a quick note about how do we evaluate call-expressions.
Typical code to do that in the analyzer (eg. ExprEngine::VisitCallExpr)
kinda looks like this:

```
   // Take node `Pred` from the CoreEngine's worklist.
   NodeSet1 = { Pred };

   NodeSet2;
   for (N in NodeSet1) {
     // Checkers evaluate check::PreStmt<CallExpr> event, put their
transitions to NodeSet2.
     runCheckersForPreStmt(N, &NodeSet2);
   }

   NodeSet3;
   for (N in NodeSet2) {
     // Checkers evaluate check::PreCall event, put their transitions to
NodeSet3.
     runCheckersForPreCall(N, &NodeSet3);
   }

   NodeSet4;
   for (N in NodeSet3) {
     // Either eval::Call event in Checkers, or inlineCall, or
conservativeEvalCall.
     evalCall(N, &NodeSet4);
   }

   NodeSet5;
   for (N in NodeSet4) {
     // Checkers evaluate check::PostCall event, put their transitions
to NodeSet5.
     runCheckersForPostCall(N, &NodeSet5);
   }

   NodeSet6;
   for (N in NodeSet5) {
     // Checkers evaluate check::PostStmt<CallExpr> event, put their
transitions to NodeSet6.
     runCheckersForPreStmt(N, &NodeSet6);
   }

   // Put nodes from NodeSet6 back to the worklist.
```

During evalCall(), if any checker's eval::Call succeeds, that checker
simply puts their transitions to NodeSet4. During conservativeEvalCall,
the core puts exactly one new node to NodeSet4.

The interesting part here is that in case of inlineCall(), *the call is
not actually evaluated* (!). The only thing that happens here is that
inlineCall enters the stack frame and puts the node with the new stack
frame *directly to the worklist* (!!), completely bypassing NodeSet4.
Because of that, in inlineCall case, NodeSet{4,5,6} are all empty, and
the remaining code does nothing, so no PostCall callbacks get called
here until the call is actually evaluated. However, when the call is
fully inlined, at the CallExit program point
(ExprEngine::processCallExit()), PostCall and PostStmt callbacks are
called manually. So we get the correct sequence of callbacks regardless.

~~~

Now, suppose we have operator new:

   new (args1...) C(args2...)

The semantics of this code can be expressed in the following
"statement-expression" pseudo-code:

```
   01  C *_this = (C *) operator new(sizeof(C), args1...);
   02  if (_this) _this->C(args2...);
   03  return _this;
```

Note that the constructor is simply not called when the operator returns
nullptr.
Note the cast on the first line - needs to be modeled, because operator
new returns `void *`.
Note that i did ask George if he thinks that expressing this piece of
code as a body-farm-thing is a good idea, and he didn't think so :)

By looking at this construct, it should be obvious that the relationship
between CXXNewExpr and its respective CXXConstructExpr is more
complicated than that of a "normal" expression and its sub-expression.
They are kinda computed in the counter-intuitive order, and the easiest
way to explain that would be to announce that there are actually three
"expressions" involved: operator new call fake expression, constructor
call expression, and the "big new-expression" of which both of these are
children, in that order.

This is how our CFG currently sees it, in case of `-analyzer-config
c++-allocator-inlining=true`:

   1. CFGNewAllocator  // evaluate `operator new(sizeof(C), args1...)`
   2. CXXConstructExpr  // evaluate `_this->C(args2...)`
   3. CXXNewExpr  // bind `_this` to the expression

So i guess it's so far so good.

At 1., we do defaultEvalCall for operator new. FIXME: do a regular
evalCall, i.e. allow checkers to evaluate operator new. Also in
https://reviews.llvm.org/D40560 we add a new program state trait to hold
the artificial variable "_this", since there's no room for it in the
Environment, since it's not an expression but something we made up. Also
we perform the cast from `void *` to `C *`.

At 2., we evalCall for the constructor with the help of our fake
variable `_this`. FIXME: we should also do the if() part of it, i.e.
don't evaluate the constructor when the operator returns null. It
shouldn't be a state split though, i guess we should just suppress the
branch on which the return value is null, unless we inlined the operator
new and sure it's null.

At 3., we take `_this` and declare that the value stored in it would
from now on be the value of the "big new-expression" that unites them
all, regardless of whether it's null or not.

~~~

Now when it comes to checker callbacks, it's a bit messy right now. On
CFGNewAllocator call evaluation, the call site is CXXNewExpr. It means
that if the operator is inlined, and we hit processCallExit(), as
explained above, we'd be triggering PostStmt<CXXNewExpr>, even though we
didn't ever trigger PreStmt<CXXNewExpr>. Then at 3., we'd have
PreStmt<CXXNewExpr> and then another(!!!) PostStmt<CXXNewExpr>, which
drives MallocChecker crazy. This, of course, can be trivially fixed. But
i wanted to write this long explanation to specifically highlight this
bug, with the hope that in the future it would be less likely to get
reintroduced.

And then, now that we realize that we have three kinda-statements here,
the question is, do we want all three equipped with checker callbacks?
We could plan to make a new callback that'd be surrounding
CFGNewAllocator similarly to how PreStmt/PostStmt surrounds
CXXConstructExpr and CXXNewExpr.

It may look like this:

(A)

   -> check::PreCXXAllocator  // new callback
     -> check::PreCall
       1. CFGNewAllocator
     <- check::PostCall
   <- check::PostCXXAllocator  // new callback

   -> check::PreStmt<CXXConstructExpr>
     -> check::PreCall
       2. CXXConstructExpr
     <- check::PostCall
   <- check::PostStmt<CXXConstructExpr>

   -> check::PreStmt<CXXNewExpr>
     3. CXXNewExpr
   <- check::PostStmt<CXXNewExpr>

Or like this:

(B)

   -> check::PreStmt<CXXNewExpr>

     -> check::PreCXXAllocator  // new callback
       -> check::PreCall
         1. CFGNewAllocator
       <- check::PostCall
     <- check::PostCXXAllocator  // new callback

     -> check::PreStmt<CXXConstructExpr>
       -> check::PreCall
         2. CXXConstructExpr
       <- check::PostCall
     <- check::PostStmt<CXXConstructExpr>

     3. CXXNewExpr
   <- check::PostStmt<CXXNewExpr>

Or like this:

(C)

   -> check::PreStmt<CXXNewExpr>
     -> check::PreCall
       1. CFGNewAllocator
     <- check::PostCall
   <- check::PostStmt<CXXNewExpr>

   -> check::PreStmt<CXXConstructExpr>
     -> check::PreCall
       2. CXXConstructExpr
     <- check::PostCall
   <- check::PostStmt<CXXConstructExpr>

   -> check::PreBigNewExpr  // new callback (needs better name)
     3. CXXNewExpr
   <- check::PostBigNewExpr  // new callback

Or even like this:

(D)

   -> check::PreWholeNewThing  // new callback

     -> check::PreStmt<CXXNewExpr>
       -> check::PreCall
         1. CFGNewAllocator
       <- check::PostCall
     <- check::PostStmt<CXXNewExpr>

     -> check::PreStmt<CXXConstructExpr>
       -> check::PreCall
         2. CXXConstructExpr
       <- check::PostCall
     <- check::PostStmt<CXXConstructExpr>

     3. CXXNewExpr
   <- check::PostWholeNewThing  // new callback

Or maybe even like this if we want:

(E)

   -> check::PreWholeNewThing  // new callback (needs better name)

     -> check::PreCXXAllocator  // another new callback
       -> check::PreCall
         1. CFGNewAllocator
       <- check::PostCall
     <- check::PostCXXAllocator  // new callback

     -> check::PreStmt<CXXConstructExpr>
       -> check::PreCall
         2. CXXConstructExpr
       <- check::PostCall
     <- check::PostStmt<CXXConstructExpr>

     -> check::PreStmt<CXXNewExpr>
       3. CXXNewExpr
     <- check::PostStmt<CXXNewExpr>

   -> check::PostWholeNewThing  // new callback

Variant (A) is what we commonly do for all other expressions. For
instance, if we have an expression `a + b`, we only do
PreStmt<BinaryOperator> *after* evaluation of both `a` and `b`. It kind
of represents the semantics exactly. However, it would be completely
counter-intuitive for checker authors to subscribe to
PreStmt<CXXNewExpr> and find out that by the time their callback fires,
both operator new() and the constructor(!) have already been called. So
i expect confusion between CXXNewExpr in our sense (the big
new-expression) and CXXNewExpr in a common person's sense (the allocator
call). In particular, it would be nice to move MallocChecker to the new
callback - eg. we're already done with region extent when we're
constructing.

Variant (B) tries to go around this confusion while keeping the
"user-facing meaning" of CXXNewExpr as the big new-expression, but it
goes against what we normally do, which would anyway be confusing.

In variants (C) and (D) we have the "user-facing meaning" of CXXNewExpr
changed into "here's where the allocation occurs". This is how
MallocChecker currently understands things. Like (A), variant (C) tries
to do what we usually do by only doing Pre-thingy after sub-thingies
were evaluated (except evaluating CXXNewExpr before its child
CXXConstructExpr, but we kinda agreed that they're not actually
parent-child-related). However, because this time Pre-thingy is not a
PreStmt-thingy, i find variant (D) fancier: the new callback is saying
"we begin unleashing the operator new hell" and "we're done with the
operator new hell nice and clean". Still, both (C) and (D) have an
unpleasant downside of being unable to reliably read the value of
CXXNewExpr at PostStmt<CXXNewExpr>, as the expression is not yet
evaluated. This can probably be fixed, but it brings us back to the
problem of whether we want to keep the value of the fake `_this`
variable in the Environment as the value of CXXNewExpr this whole time.
We probably do.

Variant (E) is kinda a combination of all approaches, as it provides the
"big" callback like (B) and (D), while still saying that CXXNewExpr is
the big new-expression, while not messing up the usual PreStmt/PostStmt
order like (B) did. It still has the problem of
check::PreStmt<CXXNewExpr> being confusingly late, but otherwise seems
usable.

We can delay the choices between (A) and (E) and also between (C) and
(D) until we actually want to introduce the new callbacks, but i guess
it's reasonable to try to agree on what do we want to mean by CXXNewExpr.

I personally feel that (D) is the friendliest approach. Checkers
subscribe for new - they get operator new. Checkers subscribe for
constructor - they get their constructor. Checkers want the whole thing
- sure, here's our fancy custom callback. They may probably never even
think about this whole hassle.

My second favorite is (E), which has the strength-slash-weakness of
consistency with the AST's CXXNewExpr being a parent of
CXXConstructExpr. I believe that the reorder in the analysis order
compared to the AST, even if it's not an actual reorder, would be quite
intuitively acceptable. While the positioning of the new operator after
the constructor wouldn't be.

Then (A) is simply a trimmed-down variant of (E), (B) is just weird, and
(C) doesn't seem to be anyhow better than (D).

We can also combine (C) and (D), which is even better:

(F)

   -> check::PreWholeNewThing  // new callback (needs better name)

     -> check::PreStmt<CXXNewExpr>
       -> check::PreCall
         1. CFGNewAllocator
       <- check::PostCall
     <- check::PostStmt<CXXNewExpr>

     -> check::PreStmt<CXXConstructExpr>
       -> check::PreCall
         2. CXXConstructExpr
       <- check::PostCall
     <- check::PostStmt<CXXConstructExpr>

     -> check::PreBigNewExpr  // another new callback (needs better name)
       3. CXXNewExpr
     <- check::PostBigNewExpr  // new callback

   <- check::PostWholeNewThing  // new callback

Which by far seems to be the best (not necessarily good) idea i could
have come up with on this subject.

Thoughts are welcome!~
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer] Inlining, operator new(), checker callbacks.

Eric Fiselier via cfe-dev
Hello Artem,

Thank you for sharing this.

09.12.2017 02:46, Artem Dergachev via cfe-dev пишет:

> All right, so how do we want to equip C++ operator new() with checker
> callbacks?
>
> ~~~
>
> First of all, a quick note about how do we evaluate call-expressions.
> Typical code to do that in the analyzer (eg.
> ExprEngine::VisitCallExpr) kinda looks like this:
>
> ```
>   // Take node `Pred` from the CoreEngine's worklist.
>   NodeSet1 = { Pred };
>
>   NodeSet2;
>   for (N in NodeSet1) {
>     // Checkers evaluate check::PreStmt<CallExpr> event, put their
> transitions to NodeSet2.
>     runCheckersForPreStmt(N, &NodeSet2);
>   }
>
>   NodeSet3;
>   for (N in NodeSet2) {
>     // Checkers evaluate check::PreCall event, put their transitions
> to NodeSet3.
>     runCheckersForPreCall(N, &NodeSet3);
>   }
>
>   NodeSet4;
>   for (N in NodeSet3) {
>     // Either eval::Call event in Checkers, or inlineCall, or
> conservativeEvalCall.
>     evalCall(N, &NodeSet4);
>   }
>
>   NodeSet5;
>   for (N in NodeSet4) {
>     // Checkers evaluate check::PostCall event, put their transitions
> to NodeSet5.
>     runCheckersForPostCall(N, &NodeSet5);
>   }
>
>   NodeSet6;
>   for (N in NodeSet5) {
>     // Checkers evaluate check::PostStmt<CallExpr> event, put their
> transitions to NodeSet6.
>     runCheckersForPreStmt(N, &NodeSet6);
>   }
>
>   // Put nodes from NodeSet6 back to the worklist.
> ```
>
> During evalCall(), if any checker's eval::Call succeeds, that checker
> simply puts their transitions to NodeSet4. During
> conservativeEvalCall, the core puts exactly one new node to NodeSet4.
>
> The interesting part here is that in case of inlineCall(), *the call
> is not actually evaluated* (!). The only thing that happens here is
> that inlineCall enters the stack frame and puts the node with the new
> stack frame *directly to the worklist* (!!), completely bypassing
> NodeSet4. Because of that, in inlineCall case, NodeSet{4,5,6} are all
> empty, and the remaining code does nothing, so no PostCall callbacks
> get called here until the call is actually evaluated. However, when
> the call is fully inlined, at the CallExit program point
> (ExprEngine::processCallExit()), PostCall and PostStmt callbacks are
> called manually. So we get the correct sequence of callbacks regardless.
>
> ~~~
>
> Now, suppose we have operator new:
>
>   new (args1...) C(args2...)
>
> The semantics of this code can be expressed in the following
> "statement-expression" pseudo-code:
>
> ```
>   01  C *_this = (C *) operator new(sizeof(C), args1...);
>   02  if (_this) _this->C(args2...);
>   03  return _this;
> ```
>
> Note that the constructor is simply not called when the operator
> returns nullptr.
> Note the cast on the first line - needs to be modeled, because
> operator new returns `void *`.
> Note that i did ask George if he thinks that expressing this piece of
> code as a body-farm-thing is a good idea, and he didn't think so :)
>
> By looking at this construct, it should be obvious that the
> relationship between CXXNewExpr and its respective CXXConstructExpr is
> more complicated than that of a "normal" expression and its
> sub-expression. They are kinda computed in the counter-intuitive
> order, and the easiest way to explain that would be to announce that
> there are actually three "expressions" involved: operator new call
> fake expression, constructor call expression, and the "big
> new-expression" of which both of these are children, in that order.
>
> This is how our CFG currently sees it, in case of `-analyzer-config
> c++-allocator-inlining=true`:
>
>   1. CFGNewAllocator  // evaluate `operator new(sizeof(C), args1...)`
>   2. CXXConstructExpr  // evaluate `_this->C(args2...)`
>   3. CXXNewExpr  // bind `_this` to the expression
>
> So i guess it's so far so good.
>
> At 1., we do defaultEvalCall for operator new. FIXME: do a regular
> evalCall, i.e. allow checkers to evaluate operator new. Also in
> https://reviews.llvm.org/D40560 we add a new program state trait to
> hold the artificial variable "_this", since there's no room for it in
> the Environment, since it's not an expression but something we made
> up. Also we perform the cast from `void *` to `C *`.
>
> At 2., we evalCall for the constructor with the help of our fake
> variable `_this`. FIXME: we should also do the if() part of it, i.e.
> don't evaluate the constructor when the operator returns null. It
> shouldn't be a state split though, i guess we should just suppress the
> branch on which the return value is null, unless we inlined the
> operator new and sure it's null.
>
> At 3., we take `_this` and declare that the value stored in it would
> from now on be the value of the "big new-expression" that unites them
> all, regardless of whether it's null or not.
>
> ~~~
>
> Now when it comes to checker callbacks, it's a bit messy right now. On
> CFGNewAllocator call evaluation, the call site is CXXNewExpr. It means
> that if the operator is inlined, and we hit processCallExit(), as
> explained above, we'd be triggering PostStmt<CXXNewExpr>, even though
> we didn't ever trigger PreStmt<CXXNewExpr>. Then at 3., we'd have
> PreStmt<CXXNewExpr> and then another(!!!) PostStmt<CXXNewExpr>, which
> drives MallocChecker crazy. This, of course, can be trivially fixed.
> But i wanted to write this long explanation to specifically highlight
> this bug, with the hope that in the future it would be less likely to
> get reintroduced.
>
> And then, now that we realize that we have three kinda-statements
> here, the question is, do we want all three equipped with checker
> callbacks? We could plan to make a new callback that'd be surrounding
> CFGNewAllocator similarly to how PreStmt/PostStmt surrounds
> CXXConstructExpr and CXXNewExpr.
>
> It may look like this:
>
> (A)
>
>   -> check::PreCXXAllocator  // new callback
>     -> check::PreCall
>       1. CFGNewAllocator
>     <- check::PostCall
>   <- check::PostCXXAllocator  // new callback
>
>   -> check::PreStmt<CXXConstructExpr>
>     -> check::PreCall
>       2. CXXConstructExpr
>     <- check::PostCall
>   <- check::PostStmt<CXXConstructExpr>
>
>   -> check::PreStmt<CXXNewExpr>
>     3. CXXNewExpr
>   <- check::PostStmt<CXXNewExpr>
>
> Or like this:
>
> (B)
>
>   -> check::PreStmt<CXXNewExpr>
>
>     -> check::PreCXXAllocator  // new callback
>       -> check::PreCall
>         1. CFGNewAllocator
>       <- check::PostCall
>     <- check::PostCXXAllocator  // new callback
>
>     -> check::PreStmt<CXXConstructExpr>
>       -> check::PreCall
>         2. CXXConstructExpr
>       <- check::PostCall
>     <- check::PostStmt<CXXConstructExpr>
>
>     3. CXXNewExpr
>   <- check::PostStmt<CXXNewExpr>
>
> Or like this:
>
> (C)
>
>   -> check::PreStmt<CXXNewExpr>
>     -> check::PreCall
>       1. CFGNewAllocator
>     <- check::PostCall
>   <- check::PostStmt<CXXNewExpr>
>
>   -> check::PreStmt<CXXConstructExpr>
>     -> check::PreCall
>       2. CXXConstructExpr
>     <- check::PostCall
>   <- check::PostStmt<CXXConstructExpr>
>
>   -> check::PreBigNewExpr  // new callback (needs better name)
>     3. CXXNewExpr
>   <- check::PostBigNewExpr  // new callback
>
> Or even like this:
>
> (D)
>
>   -> check::PreWholeNewThing  // new callback
>
>     -> check::PreStmt<CXXNewExpr>
>       -> check::PreCall
>         1. CFGNewAllocator
>       <- check::PostCall
>     <- check::PostStmt<CXXNewExpr>
>
>     -> check::PreStmt<CXXConstructExpr>
>       -> check::PreCall
>         2. CXXConstructExpr
>       <- check::PostCall
>     <- check::PostStmt<CXXConstructExpr>
>
>     3. CXXNewExpr
>   <- check::PostWholeNewThing  // new callback
>
> Or maybe even like this if we want:
>
> (E)
>
>   -> check::PreWholeNewThing  // new callback (needs better name)
>
>     -> check::PreCXXAllocator  // another new callback
>       -> check::PreCall
>         1. CFGNewAllocator
>       <- check::PostCall
>     <- check::PostCXXAllocator  // new callback
>
>     -> check::PreStmt<CXXConstructExpr>
>       -> check::PreCall
>         2. CXXConstructExpr
>       <- check::PostCall
>     <- check::PostStmt<CXXConstructExpr>
>
>     -> check::PreStmt<CXXNewExpr>
>       3. CXXNewExpr
>     <- check::PostStmt<CXXNewExpr>
>
>   -> check::PostWholeNewThing  // new callback
>
> Variant (A) is what we commonly do for all other expressions. For
> instance, if we have an expression `a + b`, we only do
> PreStmt<BinaryOperator> *after* evaluation of both `a` and `b`. It
> kind of represents the semantics exactly. However, it would be
> completely counter-intuitive for checker authors to subscribe to
> PreStmt<CXXNewExpr> and find out that by the time their callback
> fires, both operator new() and the constructor(!) have already been
> called. So i expect confusion between CXXNewExpr in our sense (the big
> new-expression) and CXXNewExpr in a common person's sense (the
> allocator call). In particular, it would be nice to move MallocChecker
> to the new callback - eg. we're already done with region extent when
> we're constructing.

(A) is my favourite. It just follows the common logic of the analyzer.
If we don't like the fact that PreStmt<Stmt> is called only after all
sub-statements are evaluated, we should change it all across the
analyzer. (E) os OK too since we are going to have some similar
"envelop" cases like ScopeEnter/ScopeExit; we can call it
OperatorNewEnter/Exit (similar to Scope/Call Enter/Exit we already
have). However, I'd rather wait for developers to share their needs: I
don't think we can predict what developers expect exactly.

>
> Variant (B) tries to go around this confusion while keeping the
> "user-facing meaning" of CXXNewExpr as the big new-expression, but it
> goes against what we normally do, which would anyway be confusing.
>
> In variants (C) and (D) we have the "user-facing meaning" of
> CXXNewExpr changed into "here's where the allocation occurs". This is
> how MallocChecker currently understands things. Like (A), variant (C)
> tries to do what we usually do by only doing Pre-thingy after
> sub-thingies were evaluated (except evaluating CXXNewExpr before its
> child CXXConstructExpr, but we kinda agreed that they're not actually
> parent-child-related). However, because this time Pre-thingy is not a
> PreStmt-thingy, i find variant (D) fancier: the new callback is saying
> "we begin unleashing the operator new hell" and "we're done with the
> operator new hell nice and clean". Still, both (C) and (D) have an
> unpleasant downside of being unable to reliably read the value of
> CXXNewExpr at PostStmt<CXXNewExpr>, as the expression is not yet
> evaluated. This can probably be fixed, but it brings us back to the
> problem of whether we want to keep the value of the fake `_this`
> variable in the Environment as the value of CXXNewExpr this whole
> time. We probably do.
>
> Variant (E) is kinda a combination of all approaches, as it provides
> the "big" callback like (B) and (D), while still saying that
> CXXNewExpr is the big new-expression, while not messing up the usual
> PreStmt/PostStmt order like (B) did. It still has the problem of
> check::PreStmt<CXXNewExpr> being confusingly late, but otherwise seems
> usable.
>
> We can delay the choices between (A) and (E) and also between (C) and
> (D) until we actually want to introduce the new callbacks, but i guess
> it's reasonable to try to agree on what do we want to mean by CXXNewExpr.
>
> I personally feel that (D) is the friendliest approach. Checkers
> subscribe for new - they get operator new. Checkers subscribe for
> constructor - they get their constructor. Checkers want the whole
> thing - sure, here's our fancy custom callback. They may probably
> never even think about this whole hassle.
>
> My second favorite is (E), which has the strength-slash-weakness of
> consistency with the AST's CXXNewExpr being a parent of
> CXXConstructExpr. I believe that the reorder in the analysis order
> compared to the AST, even if it's not an actual reorder, would be
> quite intuitively acceptable. While the positioning of the new
> operator after the constructor wouldn't be.
>
> Then (A) is simply a trimmed-down variant of (E), (B) is just weird,
> and (C) doesn't seem to be anyhow better than (D).
>
> We can also combine (C) and (D), which is even better:
>
> (F)
>
>   -> check::PreWholeNewThing  // new callback (needs better name)
>
>     -> check::PreStmt<CXXNewExpr>
>       -> check::PreCall
>         1. CFGNewAllocator
>       <- check::PostCall
>     <- check::PostStmt<CXXNewExpr>
>
>     -> check::PreStmt<CXXConstructExpr>
>       -> check::PreCall
>         2. CXXConstructExpr
>       <- check::PostCall
>     <- check::PostStmt<CXXConstructExpr>
>
>     -> check::PreBigNewExpr  // another new callback (needs better name)
>       3. CXXNewExpr
>     <- check::PostBigNewExpr  // new callback
>
>   <- check::PostWholeNewThing  // new callback
>
> Which by far seems to be the best (not necessarily good) idea i could
> have come up with on this subject.
>
> Thoughts are welcome!~
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


--
Best regards,
Aleksei Sidorin,
SRR, Samsung Electronics

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer] Inlining, operator new(), checker callbacks.

Eric Fiselier via cfe-dev
In reply to this post by Eric Fiselier via cfe-dev
Hi Artem!

My favourite is B or D. At least, I feel like those are the most intuitive for checker writers. The reason why I like B because if someone looks up how new expression works in the language, this callback sequence will match exactly what is writtem there.

For this reason, I think using new expr's callback only for the allocation is unexpected for those who already have good knowledge about how the language works.  (And the difference between new expression and operator new.)

I already feel like there are too much divergence between the language wording and the analyzer one (eg loc and nonloc vs rvalue/lvalue). 

And while I agree that for a+b it is good to have the pre binary operator callback after both of the operands are evaluated, the same logic feels counterintuitive to me in case of new expression. In case of a+b it is very onlikely that a check want to store something to the state in pre operator+ callback that is relevant to the evaluation of a or b. In case of new expression, however, I think a check might make a change to the state that is relevant to the allocation. This could be worked around by making the checks use another callback of course, so it is not the strongest argument. 

Regards,
Gábor

2017. dec. 8. 15:46 ezt írta ("Artem Dergachev via cfe-dev" <[hidden email]>):
All right, so how do we want to equip C++ operator new() with checker callbacks?

~~~

First of all, a quick note about how do we evaluate call-expressions. Typical code to do that in the analyzer (eg. ExprEngine::VisitCallExpr) kinda looks like this:

```
  // Take node `Pred` from the CoreEngine's worklist.
  NodeSet1 = { Pred };

  NodeSet2;
  for (N in NodeSet1) {
    // Checkers evaluate check::PreStmt<CallExpr> event, put their transitions to NodeSet2.
    runCheckersForPreStmt(N, &NodeSet2);
  }

  NodeSet3;
  for (N in NodeSet2) {
    // Checkers evaluate check::PreCall event, put their transitions to NodeSet3.
    runCheckersForPreCall(N, &NodeSet3);
  }

  NodeSet4;
  for (N in NodeSet3) {
    // Either eval::Call event in Checkers, or inlineCall, or conservativeEvalCall.
    evalCall(N, &NodeSet4);
  }

  NodeSet5;
  for (N in NodeSet4) {
    // Checkers evaluate check::PostCall event, put their transitions to NodeSet5.
    runCheckersForPostCall(N, &NodeSet5);
  }

  NodeSet6;
  for (N in NodeSet5) {
    // Checkers evaluate check::PostStmt<CallExpr> event, put their transitions to NodeSet6.
    runCheckersForPreStmt(N, &NodeSet6);
  }

  // Put nodes from NodeSet6 back to the worklist.
```

During evalCall(), if any checker's eval::Call succeeds, that checker simply puts their transitions to NodeSet4. During conservativeEvalCall, the core puts exactly one new node to NodeSet4.

The interesting part here is that in case of inlineCall(), *the call is not actually evaluated* (!). The only thing that happens here is that inlineCall enters the stack frame and puts the node with the new stack frame *directly to the worklist* (!!), completely bypassing NodeSet4. Because of that, in inlineCall case, NodeSet{4,5,6} are all empty, and the remaining code does nothing, so no PostCall callbacks get called here until the call is actually evaluated. However, when the call is fully inlined, at the CallExit program point (ExprEngine::processCallExit()), PostCall and PostStmt callbacks are called manually. So we get the correct sequence of callbacks regardless.

~~~

Now, suppose we have operator new:

  new (args1...) C(args2...)

The semantics of this code can be expressed in the following "statement-expression" pseudo-code:

```
  01  C *_this = (C *) operator new(sizeof(C), args1...);
  02  if (_this) _this->C(args2...);
  03  return _this;
```

Note that the constructor is simply not called when the operator returns nullptr.
Note the cast on the first line - needs to be modeled, because operator new returns `void *`.
Note that i did ask George if he thinks that expressing this piece of code as a body-farm-thing is a good idea, and he didn't think so :)

By looking at this construct, it should be obvious that the relationship between CXXNewExpr and its respective CXXConstructExpr is more complicated than that of a "normal" expression and its sub-expression. They are kinda computed in the counter-intuitive order, and the easiest way to explain that would be to announce that there are actually three "expressions" involved: operator new call fake expression, constructor call expression, and the "big new-expression" of which both of these are children, in that order.

This is how our CFG currently sees it, in case of `-analyzer-config c++-allocator-inlining=true`:

  1. CFGNewAllocator  // evaluate `operator new(sizeof(C), args1...)`
  2. CXXConstructExpr  // evaluate `_this->C(args2...)`
  3. CXXNewExpr  // bind `_this` to the expression

So i guess it's so far so good.

At 1., we do defaultEvalCall for operator new. FIXME: do a regular evalCall, i.e. allow checkers to evaluate operator new. Also in https://reviews.llvm.org/D40560 we add a new program state trait to hold the artificial variable "_this", since there's no room for it in the Environment, since it's not an expression but something we made up. Also we perform the cast from `void *` to `C *`.

At 2., we evalCall for the constructor with the help of our fake variable `_this`. FIXME: we should also do the if() part of it, i.e. don't evaluate the constructor when the operator returns null. It shouldn't be a state split though, i guess we should just suppress the branch on which the return value is null, unless we inlined the operator new and sure it's null.

At 3., we take `_this` and declare that the value stored in it would from now on be the value of the "big new-expression" that unites them all, regardless of whether it's null or not.

~~~

Now when it comes to checker callbacks, it's a bit messy right now. On CFGNewAllocator call evaluation, the call site is CXXNewExpr. It means that if the operator is inlined, and we hit processCallExit(), as explained above, we'd be triggering PostStmt<CXXNewExpr>, even though we didn't ever trigger PreStmt<CXXNewExpr>. Then at 3., we'd have PreStmt<CXXNewExpr> and then another(!!!) PostStmt<CXXNewExpr>, which drives MallocChecker crazy. This, of course, can be trivially fixed. But i wanted to write this long explanation to specifically highlight this bug, with the hope that in the future it would be less likely to get reintroduced.

And then, now that we realize that we have three kinda-statements here, the question is, do we want all three equipped with checker callbacks? We could plan to make a new callback that'd be surrounding CFGNewAllocator similarly to how PreStmt/PostStmt surrounds CXXConstructExpr and CXXNewExpr.

It may look like this:

(A)

  -> check::PreCXXAllocator  // new callback
    -> check::PreCall
      1. CFGNewAllocator
    <- check::PostCall
  <- check::PostCXXAllocator  // new callback

  -> check::PreStmt<CXXConstructExpr>
    -> check::PreCall
      2. CXXConstructExpr
    <- check::PostCall
  <- check::PostStmt<CXXConstructExpr>

  -> check::PreStmt<CXXNewExpr>
    3. CXXNewExpr
  <- check::PostStmt<CXXNewExpr>

Or like this:

(B)

  -> check::PreStmt<CXXNewExpr>

    -> check::PreCXXAllocator  // new callback
      -> check::PreCall
        1. CFGNewAllocator
      <- check::PostCall
    <- check::PostCXXAllocator  // new callback

    -> check::PreStmt<CXXConstructExpr>
      -> check::PreCall
        2. CXXConstructExpr
      <- check::PostCall
    <- check::PostStmt<CXXConstructExpr>

    3. CXXNewExpr
  <- check::PostStmt<CXXNewExpr>

Or like this:

(C)

  -> check::PreStmt<CXXNewExpr>
    -> check::PreCall
      1. CFGNewAllocator
    <- check::PostCall
  <- check::PostStmt<CXXNewExpr>

  -> check::PreStmt<CXXConstructExpr>
    -> check::PreCall
      2. CXXConstructExpr
    <- check::PostCall
  <- check::PostStmt<CXXConstructExpr>

  -> check::PreBigNewExpr  // new callback (needs better name)
    3. CXXNewExpr
  <- check::PostBigNewExpr  // new callback

Or even like this:

(D)

  -> check::PreWholeNewThing  // new callback

    -> check::PreStmt<CXXNewExpr>
      -> check::PreCall
        1. CFGNewAllocator
      <- check::PostCall
    <- check::PostStmt<CXXNewExpr>

    -> check::PreStmt<CXXConstructExpr>
      -> check::PreCall
        2. CXXConstructExpr
      <- check::PostCall
    <- check::PostStmt<CXXConstructExpr>

    3. CXXNewExpr
  <- check::PostWholeNewThing  // new callback

Or maybe even like this if we want:

(E)

  -> check::PreWholeNewThing  // new callback (needs better name)

    -> check::PreCXXAllocator  // another new callback
      -> check::PreCall
        1. CFGNewAllocator
      <- check::PostCall
    <- check::PostCXXAllocator  // new callback

    -> check::PreStmt<CXXConstructExpr>
      -> check::PreCall
        2. CXXConstructExpr
      <- check::PostCall
    <- check::PostStmt<CXXConstructExpr>

    -> check::PreStmt<CXXNewExpr>
      3. CXXNewExpr
    <- check::PostStmt<CXXNewExpr>

  -> check::PostWholeNewThing  // new callback

Variant (A) is what we commonly do for all other expressions. For instance, if we have an expression `a + b`, we only do PreStmt<BinaryOperator> *after* evaluation of both `a` and `b`. It kind of represents the semantics exactly. However, it would be completely counter-intuitive for checker authors to subscribe to PreStmt<CXXNewExpr> and find out that by the time their callback fires, both operator new() and the constructor(!) have already been called. So i expect confusion between CXXNewExpr in our sense (the big new-expression) and CXXNewExpr in a common person's sense (the allocator call). In particular, it would be nice to move MallocChecker to the new callback - eg. we're already done with region extent when we're constructing.

Variant (B) tries to go around this confusion while keeping the "user-facing meaning" of CXXNewExpr as the big new-expression, but it goes against what we normally do, which would anyway be confusing.

In variants (C) and (D) we have the "user-facing meaning" of CXXNewExpr changed into "here's where the allocation occurs". This is how MallocChecker currently understands things. Like (A), variant (C) tries to do what we usually do by only doing Pre-thingy after sub-thingies were evaluated (except evaluating CXXNewExpr before its child CXXConstructExpr, but we kinda agreed that they're not actually parent-child-related). However, because this time Pre-thingy is not a PreStmt-thingy, i find variant (D) fancier: the new callback is saying "we begin unleashing the operator new hell" and "we're done with the operator new hell nice and clean". Still, both (C) and (D) have an unpleasant downside of being unable to reliably read the value of CXXNewExpr at PostStmt<CXXNewExpr>, as the expression is not yet evaluated. This can probably be fixed, but it brings us back to the problem of whether we want to keep the value of the fake `_this` variable in the Environment as the value of CXXNewExpr this whole time. We probably do.

Variant (E) is kinda a combination of all approaches, as it provides the "big" callback like (B) and (D), while still saying that CXXNewExpr is the big new-expression, while not messing up the usual PreStmt/PostStmt order like (B) did. It still has the problem of check::PreStmt<CXXNewExpr> being confusingly late, but otherwise seems usable.

We can delay the choices between (A) and (E) and also between (C) and (D) until we actually want to introduce the new callbacks, but i guess it's reasonable to try to agree on what do we want to mean by CXXNewExpr.

I personally feel that (D) is the friendliest approach. Checkers subscribe for new - they get operator new. Checkers subscribe for constructor - they get their constructor. Checkers want the whole thing - sure, here's our fancy custom callback. They may probably never even think about this whole hassle.

My second favorite is (E), which has the strength-slash-weakness of consistency with the AST's CXXNewExpr being a parent of CXXConstructExpr. I believe that the reorder in the analysis order compared to the AST, even if it's not an actual reorder, would be quite intuitively acceptable. While the positioning of the new operator after the constructor wouldn't be.

Then (A) is simply a trimmed-down variant of (E), (B) is just weird, and (C) doesn't seem to be anyhow better than (D).

We can also combine (C) and (D), which is even better:

(F)

  -> check::PreWholeNewThing  // new callback (needs better name)

    -> check::PreStmt<CXXNewExpr>
      -> check::PreCall
        1. CFGNewAllocator
      <- check::PostCall
    <- check::PostStmt<CXXNewExpr>

    -> check::PreStmt<CXXConstructExpr>
      -> check::PreCall
        2. CXXConstructExpr
      <- check::PostCall
    <- check::PostStmt<CXXConstructExpr>

    -> check::PreBigNewExpr  // another new callback (needs better name)
      3. CXXNewExpr
    <- check::PostBigNewExpr  // new callback

  <- check::PostWholeNewThing  // new callback

Which by far seems to be the best (not necessarily good) idea i could have come up with on this subject.

Thoughts are welcome!~
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer] Inlining, operator new(), checker callbacks.

Eric Fiselier via cfe-dev
Hmm. Maybe we can also disable check{Pre|Post}CXXNewExpr for now,
similarly to how check{Pre|Post}IfStmt is not a thing (it's actually
kinda complicated control flow in CXXNewExpr as well, even if in most
cases we'd prefer to behave as if there isn't) and decide later if we
want to introduce it back in a particular manner, or do custom callbacks
for all parts of it instead. It'd still be a bit frustrating, but
arguably causes less surprise, and it'd be at least discoverable through
CheckerDocumentation, where we can provide all the information we want
to explain.

We'd still immediately need some sort of checkPostAllocator for
MallocChecker to use.

There's a separate issue with PreStmt<CXXConstructExpr> which could be
made better if it had a "trigger statement" for the constructor (is it
operator new, or a temporary, or an initializer list like in D40841 even
if it's not callback here, or variable declaration statement, or
whatever). This info could probably be squeezed into CallEvent but
there's no room for it there at the moment. Otherwise we could make a
separate callback for that as well.

On 11/12/2017 11:05 AM, Gábor Horváth wrote:

> Hi Artem!
>
> My favourite is B or D. At least, I feel like those are the most
> intuitive for checker writers. The reason why I like B because if
> someone looks up how new expression works in the language, this
> callback sequence will match exactly what is writtem there.
>
> For this reason, I think using new expr's callback only for the
> allocation is unexpected for those who already have good knowledge
> about how the language works.  (And the difference between new
> expression and operator new.)
>
> I already feel like there are too much divergence between the language
> wording and the analyzer one (eg loc and nonloc vs rvalue/lvalue).
>
> And while I agree that for a+b it is good to have the pre binary
> operator callback after both of the operands are evaluated, the same
> logic feels counterintuitive to me in case of new expression. In case
> of a+b it is very onlikely that a check want to store something to the
> state in pre operator+ callback that is relevant to the evaluation of
> a or b. In case of new expression, however, I think a check might make
> a change to the state that is relevant to the allocation. This could
> be worked around by making the checks use another callback of course,
> so it is not the strongest argument.
>
> Regards,
> Gábor
>
> 2017. dec. 8. 15:46 ezt írta ("Artem Dergachev via cfe-dev"
> <[hidden email] <mailto:[hidden email]>>):
>
>     All right, so how do we want to equip C++ operator new() with
>     checker callbacks?
>
>     ~~~
>
>     First of all, a quick note about how do we evaluate
>     call-expressions. Typical code to do that in the analyzer (eg.
>     ExprEngine::VisitCallExpr) kinda looks like this:
>
>     ```
>       // Take node `Pred` from the CoreEngine's worklist.
>       NodeSet1 = { Pred };
>
>       NodeSet2;
>       for (N in NodeSet1) {
>         // Checkers evaluate check::PreStmt<CallExpr> event, put their
>     transitions to NodeSet2.
>         runCheckersForPreStmt(N, &NodeSet2);
>       }
>
>       NodeSet3;
>       for (N in NodeSet2) {
>         // Checkers evaluate check::PreCall event, put their
>     transitions to NodeSet3.
>         runCheckersForPreCall(N, &NodeSet3);
>       }
>
>       NodeSet4;
>       for (N in NodeSet3) {
>         // Either eval::Call event in Checkers, or inlineCall, or
>     conservativeEvalCall.
>         evalCall(N, &NodeSet4);
>       }
>
>       NodeSet5;
>       for (N in NodeSet4) {
>         // Checkers evaluate check::PostCall event, put their
>     transitions to NodeSet5.
>         runCheckersForPostCall(N, &NodeSet5);
>       }
>
>       NodeSet6;
>       for (N in NodeSet5) {
>         // Checkers evaluate check::PostStmt<CallExpr> event, put
>     their transitions to NodeSet6.
>         runCheckersForPreStmt(N, &NodeSet6);
>       }
>
>       // Put nodes from NodeSet6 back to the worklist.
>     ```
>
>     During evalCall(), if any checker's eval::Call succeeds, that
>     checker simply puts their transitions to NodeSet4. During
>     conservativeEvalCall, the core puts exactly one new node to NodeSet4.
>
>     The interesting part here is that in case of inlineCall(), *the
>     call is not actually evaluated* (!). The only thing that happens
>     here is that inlineCall enters the stack frame and puts the node
>     with the new stack frame *directly to the worklist* (!!),
>     completely bypassing NodeSet4. Because of that, in inlineCall
>     case, NodeSet{4,5,6} are all empty, and the remaining code does
>     nothing, so no PostCall callbacks get called here until the call
>     is actually evaluated. However, when the call is fully inlined, at
>     the CallExit program point (ExprEngine::processCallExit()),
>     PostCall and PostStmt callbacks are called manually. So we get the
>     correct sequence of callbacks regardless.
>
>     ~~~
>
>     Now, suppose we have operator new:
>
>       new (args1...) C(args2...)
>
>     The semantics of this code can be expressed in the following
>     "statement-expression" pseudo-code:
>
>     ```
>       01  C *_this = (C *) operator new(sizeof(C), args1...);
>       02  if (_this) _this->C(args2...);
>       03  return _this;
>     ```
>
>     Note that the constructor is simply not called when the operator
>     returns nullptr.
>     Note the cast on the first line - needs to be modeled, because
>     operator new returns `void *`.
>     Note that i did ask George if he thinks that expressing this piece
>     of code as a body-farm-thing is a good idea, and he didn't think so :)
>
>     By looking at this construct, it should be obvious that the
>     relationship between CXXNewExpr and its respective
>     CXXConstructExpr is more complicated than that of a "normal"
>     expression and its sub-expression. They are kinda computed in the
>     counter-intuitive order, and the easiest way to explain that would
>     be to announce that there are actually three "expressions"
>     involved: operator new call fake expression, constructor call
>     expression, and the "big new-expression" of which both of these
>     are children, in that order.
>
>     This is how our CFG currently sees it, in case of
>     `-analyzer-config c++-allocator-inlining=true`:
>
>       1. CFGNewAllocator  // evaluate `operator new(sizeof(C), args1...)`
>       2. CXXConstructExpr  // evaluate `_this->C(args2...)`
>       3. CXXNewExpr  // bind `_this` to the expression
>
>     So i guess it's so far so good.
>
>     At 1., we do defaultEvalCall for operator new. FIXME: do a regular
>     evalCall, i.e. allow checkers to evaluate operator new. Also in
>     https://reviews.llvm.org/D40560 <https://reviews.llvm.org/D40560>
>     we add a new program state trait to hold the artificial variable
>     "_this", since there's no room for it in the Environment, since
>     it's not an expression but something we made up. Also we perform
>     the cast from `void *` to `C *`.
>
>     At 2., we evalCall for the constructor with the help of our fake
>     variable `_this`. FIXME: we should also do the if() part of it,
>     i.e. don't evaluate the constructor when the operator returns
>     null. It shouldn't be a state split though, i guess we should just
>     suppress the branch on which the return value is null, unless we
>     inlined the operator new and sure it's null.
>
>     At 3., we take `_this` and declare that the value stored in it
>     would from now on be the value of the "big new-expression" that
>     unites them all, regardless of whether it's null or not.
>
>     ~~~
>
>     Now when it comes to checker callbacks, it's a bit messy right
>     now. On CFGNewAllocator call evaluation, the call site is
>     CXXNewExpr. It means that if the operator is inlined, and we hit
>     processCallExit(), as explained above, we'd be triggering
>     PostStmt<CXXNewExpr>, even though we didn't ever trigger
>     PreStmt<CXXNewExpr>. Then at 3., we'd have PreStmt<CXXNewExpr> and
>     then another(!!!) PostStmt<CXXNewExpr>, which drives MallocChecker
>     crazy. This, of course, can be trivially fixed. But i wanted to
>     write this long explanation to specifically highlight this bug,
>     with the hope that in the future it would be less likely to get
>     reintroduced.
>
>     And then, now that we realize that we have three kinda-statements
>     here, the question is, do we want all three equipped with checker
>     callbacks? We could plan to make a new callback that'd be
>     surrounding CFGNewAllocator similarly to how PreStmt/PostStmt
>     surrounds CXXConstructExpr and CXXNewExpr.
>
>     It may look like this:
>
>     (A)
>
>       -> check::PreCXXAllocator  // new callback
>         -> check::PreCall
>           1. CFGNewAllocator
>         <- check::PostCall
>       <- check::PostCXXAllocator  // new callback
>
>       -> check::PreStmt<CXXConstructExpr>
>         -> check::PreCall
>           2. CXXConstructExpr
>         <- check::PostCall
>       <- check::PostStmt<CXXConstructExpr>
>
>       -> check::PreStmt<CXXNewExpr>
>         3. CXXNewExpr
>       <- check::PostStmt<CXXNewExpr>
>
>     Or like this:
>
>     (B)
>
>       -> check::PreStmt<CXXNewExpr>
>
>         -> check::PreCXXAllocator  // new callback
>           -> check::PreCall
>             1. CFGNewAllocator
>           <- check::PostCall
>         <- check::PostCXXAllocator  // new callback
>
>         -> check::PreStmt<CXXConstructExpr>
>           -> check::PreCall
>             2. CXXConstructExpr
>           <- check::PostCall
>         <- check::PostStmt<CXXConstructExpr>
>
>         3. CXXNewExpr
>       <- check::PostStmt<CXXNewExpr>
>
>     Or like this:
>
>     (C)
>
>       -> check::PreStmt<CXXNewExpr>
>         -> check::PreCall
>           1. CFGNewAllocator
>         <- check::PostCall
>       <- check::PostStmt<CXXNewExpr>
>
>       -> check::PreStmt<CXXConstructExpr>
>         -> check::PreCall
>           2. CXXConstructExpr
>         <- check::PostCall
>       <- check::PostStmt<CXXConstructExpr>
>
>       -> check::PreBigNewExpr  // new callback (needs better name)
>         3. CXXNewExpr
>       <- check::PostBigNewExpr  // new callback
>
>     Or even like this:
>
>     (D)
>
>       -> check::PreWholeNewThing  // new callback
>
>         -> check::PreStmt<CXXNewExpr>
>           -> check::PreCall
>             1. CFGNewAllocator
>           <- check::PostCall
>         <- check::PostStmt<CXXNewExpr>
>
>         -> check::PreStmt<CXXConstructExpr>
>           -> check::PreCall
>             2. CXXConstructExpr
>           <- check::PostCall
>         <- check::PostStmt<CXXConstructExpr>
>
>         3. CXXNewExpr
>       <- check::PostWholeNewThing  // new callback
>
>     Or maybe even like this if we want:
>
>     (E)
>
>       -> check::PreWholeNewThing  // new callback (needs better name)
>
>         -> check::PreCXXAllocator  // another new callback
>           -> check::PreCall
>             1. CFGNewAllocator
>           <- check::PostCall
>         <- check::PostCXXAllocator  // new callback
>
>         -> check::PreStmt<CXXConstructExpr>
>           -> check::PreCall
>             2. CXXConstructExpr
>           <- check::PostCall
>         <- check::PostStmt<CXXConstructExpr>
>
>         -> check::PreStmt<CXXNewExpr>
>           3. CXXNewExpr
>         <- check::PostStmt<CXXNewExpr>
>
>       -> check::PostWholeNewThing  // new callback
>
>     Variant (A) is what we commonly do for all other expressions. For
>     instance, if we have an expression `a + b`, we only do
>     PreStmt<BinaryOperator> *after* evaluation of both `a` and `b`. It
>     kind of represents the semantics exactly. However, it would be
>     completely counter-intuitive for checker authors to subscribe to
>     PreStmt<CXXNewExpr> and find out that by the time their callback
>     fires, both operator new() and the constructor(!) have already
>     been called. So i expect confusion between CXXNewExpr in our sense
>     (the big new-expression) and CXXNewExpr in a common person's sense
>     (the allocator call). In particular, it would be nice to move
>     MallocChecker to the new callback - eg. we're already done with
>     region extent when we're constructing.
>
>     Variant (B) tries to go around this confusion while keeping the
>     "user-facing meaning" of CXXNewExpr as the big new-expression, but
>     it goes against what we normally do, which would anyway be confusing.
>
>     In variants (C) and (D) we have the "user-facing meaning" of
>     CXXNewExpr changed into "here's where the allocation occurs". This
>     is how MallocChecker currently understands things. Like (A),
>     variant (C) tries to do what we usually do by only doing
>     Pre-thingy after sub-thingies were evaluated (except evaluating
>     CXXNewExpr before its child CXXConstructExpr, but we kinda agreed
>     that they're not actually parent-child-related). However, because
>     this time Pre-thingy is not a PreStmt-thingy, i find variant (D)
>     fancier: the new callback is saying "we begin unleashing the
>     operator new hell" and "we're done with the operator new hell nice
>     and clean". Still, both (C) and (D) have an unpleasant downside of
>     being unable to reliably read the value of CXXNewExpr at
>     PostStmt<CXXNewExpr>, as the expression is not yet evaluated. This
>     can probably be fixed, but it brings us back to the problem of
>     whether we want to keep the value of the fake `_this` variable in
>     the Environment as the value of CXXNewExpr this whole time. We
>     probably do.
>
>     Variant (E) is kinda a combination of all approaches, as it
>     provides the "big" callback like (B) and (D), while still saying
>     that CXXNewExpr is the big new-expression, while not messing up
>     the usual PreStmt/PostStmt order like (B) did. It still has the
>     problem of check::PreStmt<CXXNewExpr> being confusingly late, but
>     otherwise seems usable.
>
>     We can delay the choices between (A) and (E) and also between (C)
>     and (D) until we actually want to introduce the new callbacks, but
>     i guess it's reasonable to try to agree on what do we want to mean
>     by CXXNewExpr.
>
>     I personally feel that (D) is the friendliest approach. Checkers
>     subscribe for new - they get operator new. Checkers subscribe for
>     constructor - they get their constructor. Checkers want the whole
>     thing - sure, here's our fancy custom callback. They may probably
>     never even think about this whole hassle.
>
>     My second favorite is (E), which has the strength-slash-weakness
>     of consistency with the AST's CXXNewExpr being a parent of
>     CXXConstructExpr. I believe that the reorder in the analysis order
>     compared to the AST, even if it's not an actual reorder, would be
>     quite intuitively acceptable. While the positioning of the new
>     operator after the constructor wouldn't be.
>
>     Then (A) is simply a trimmed-down variant of (E), (B) is just
>     weird, and (C) doesn't seem to be anyhow better than (D).
>
>     We can also combine (C) and (D), which is even better:
>
>     (F)
>
>       -> check::PreWholeNewThing  // new callback (needs better name)
>
>         -> check::PreStmt<CXXNewExpr>
>           -> check::PreCall
>             1. CFGNewAllocator
>           <- check::PostCall
>         <- check::PostStmt<CXXNewExpr>
>
>         -> check::PreStmt<CXXConstructExpr>
>           -> check::PreCall
>             2. CXXConstructExpr
>           <- check::PostCall
>         <- check::PostStmt<CXXConstructExpr>
>
>         -> check::PreBigNewExpr  // another new callback (needs better
>     name)
>           3. CXXNewExpr
>         <- check::PostBigNewExpr  // new callback
>
>       <- check::PostWholeNewThing  // new callback
>
>     Which by far seems to be the best (not necessarily good) idea i
>     could have come up with on this subject.
>
>     Thoughts are welcome!~
>     _______________________________________________
>     cfe-dev mailing list
>     [hidden email] <mailto:[hidden email]>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev