[analyzer] C++ and inlining.

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[analyzer] C++ and inlining.

Robinson, Paul via cfe-dev
I've observed a lot of false positives on WebKit that resulted from our
inability to inline relatively simple functions in C++, specifically
constructors and operator new(), due to unimplemented modeling. So i
wanted to document, and eventually fix, a variety of language feature
modeling we're currently lacking. Note that WebKit, like LLVM, is not a
typical C++ codebase; it uses many language constructs that regular
projects don't use, and uses very little standard library headers.

---

Normally we expect the analyzer to work even without any inlining, by
conservatively invalidating the program state, which eliminates the
clearly false assumptions but sometimes causes infeasible paths when the
invalidation was too aggressive and we start denoting the same value
with a different symbol after invalidation and assume contradictory
stuff on the old and the new symbol. The false positives resulting from
aggressive invalidation are usually treated as less scary than the ones
suppressed by invalidation, because they can be easily suppressed (with
assertions or const qualifiers etc.) without loss of coverage. However,
in C++ there are a lot of implicit function calls which cause massive
frustration when evaluated conservatively. For example,

   class C {
     bool haveX;
     class D d;

   public:
     C(int *x): haveX(x != 0) {
       if (haveX)
         *x = 1; // null dereference??
     }
   };

In this case, 'haveX' should assume that x is non-null. The code eagerly
splits states into {x == 0} and {x != 0}, which seems reasonable.
However, after 'haveX' have been initialized, the default constructor
for field 'd' kicks in. If only this constructor is not inlined, or any
of its callees are not inlined, value stored in 'haveX' would be
invalidated on both paths. In particular, the path on which the original
'haveX' is false but the invalidated 'haveX' is true have now opened up.

---

Inlining of the constructor itself is disabled in many cases for many
reasons. In particular, we are currently only trying to inline the
constructor when the target this-region is a DeclRegion (variable,
member variable, Objective-C++ instance var), and the destructor is
non-trivial. This cuts away a lot of cases:

* Constructing into temporaries is disabled when destructor is
non-trivial. At least, we should be able to inline those when the
destructor is present at all, so that it would be evaluated
conservatively. One thing to note here is that our CFG has been recently
fixed, so we're much closer to fixing this properly nowadays. However,
CFG alone is not enough to figure out which destructor needs to be
called; for instance, if a lifetime-extended temporary is initialized
with an operator?: expression, we'd need path-sensitive information to
figure out which object to destroy.

* Temporaries are also special because our pass-by-value is not always
working correctly. In particular, when we inline 'foo(c)', where
variable 'c' is of 'class C', we first copy 'c' into a temporary region,
and then trivial-copy it into the stack variable-region of the function
parameter, while we should be constructing directly into the parameter
region. We cannot construct directly into the parameter region because
the stack frame has not yet been constructed, because arguments are not
yet computed. More reading on the subject, even if a bit outdated, is at
http://clang-developers.42468.n3.nabble.com/Inlining-temporary-destructors-in-the-analyzer-td4040851.html 
This is a hard problem, but i wasn't noticing many instances of it yet.

* Base object region constructors are disabled when destructor is
non-trivial. This sounds like an accidental omission.

* Construction into ElementRegion is disabled regardless of destructors.
In particular, mass array constructions are disabled. There is a special
AST for this case, which emulates the loop through the array (or return
value of operator new[]) with a loop counter variable and all. We have
no support for this whole language construct. Note, however, that
ElementRegion is much more than array element; it is often used for
representing casts, and in particular it appears in return values of a
conservatively evaluated operator new() (i.e. element{SymRegion}) and is
likely to appear in placement-new() return values for the same reason.
So we should discriminate between these two cases.

* Constructing into return value of operator new() is disabled
completely anyway, because there's a modeling problem that causes us to
be unable to connect the constructor with the correct this-region. The
CFG part of this problem was fixed by Jordan in 2014 by adding the
CFGNewAllocator element, so we now actually call the operator and the
constructor in the correct order, but we still need to pass the operator
new's return value to the constructor. Note how pointless it is to
enable it, or even inline a custom operator new, until construction into
ElementRegion is fixed.

---

Speaking of inlining operator new():

* For correct modeling in the core, it seems that the only thing that
remains is to actually use the return value of operator new() and
construct the new object *into* it. Sounds easy.

* I’m also immediately noticing the unimplemented path diagnostics
within the inlined operator new(), which crash with
llvm_unreachable(“not yet implemented”) whenever a visitor fires within
it (eg. to indicate that memory allocated there leaks).

* MallocChecker keeps being surprised with various non-symbolic stuff
that may come out from operator new(). https://reviews.llvm.org/D37189 
doesn’t cover all cases; in particular, void *operator new(size_t) {
static int x; return &x; } seems to instantly crash after inlining.

---

 From now on i'd like to experiment with, first of all, disabling the
DeclRegion bailout when possible. Then i'd try to inline operator new,
pass its return value to the respective constructor, and inline said
constructor. I'm not sure if i'd be able to dig into the temporaries and
pass-by-copy problems soon.

And i believe that when it comes to C++ pure language constructs, i
listed most of the problems i'm aware of, with the exception of, well,
*exceptions* (which also need CFG work). Of course we could do better
modeling the standard library as well.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer] C++ and inlining.

Robinson, Paul via cfe-dev
Hi Artem!

On 26 November 2017 at 12:10, Artem Dergachev via cfe-dev <[hidden email]> wrote:
I've observed a lot of false positives on WebKit that resulted from our inability to inline relatively simple functions in C++, specifically constructors and operator new(), due to unimplemented modeling. So i wanted to document, and eventually fix, a variety of language feature modeling we're currently lacking. Note that WebKit, like LLVM, is not a typical C++ codebase; it uses many language constructs that regular projects don't use, and uses very little standard library headers.

Great news!
 

---

Normally we expect the analyzer to work even without any inlining, by conservatively invalidating the program state, which eliminates the clearly false assumptions but sometimes causes infeasible paths when the invalidation was too aggressive and we start denoting the same value with a different symbol after invalidation and assume contradictory stuff on the old and the new symbol. The false positives resulting from aggressive invalidation are usually treated as less scary than the ones suppressed by invalidation, because they can be easily suppressed (with assertions or const qualifiers etc.) without loss of coverage. However, in C++ there are a lot of implicit function calls which cause massive frustration when evaluated conservatively. For example,

  class C {
    bool haveX;
    class D d;

  public:
    C(int *x): haveX(x != 0) {
      if (haveX)
        *x = 1; // null dereference??
    }
  };

In this case, 'haveX' should assume that x is non-null. The code eagerly splits states into {x == 0} and {x != 0}, which seems reasonable. However, after 'haveX' have been initialized, the default constructor for field 'd' kicks in. If only this constructor is not inlined, or any of its callees are not inlined, value stored in 'haveX' would be invalidated on both paths. In particular, the path on which the original 'haveX' is false but the invalidated 'haveX' is true have now opened up.

I observed the very same pattern on Ericsson internal codebases, so sorting this out is definitely a huge win for many projects (not just LLVM and WebKit).
 

---

Inlining of the constructor itself is disabled in many cases for many reasons. In particular, we are currently only trying to inline the constructor when the target this-region is a DeclRegion (variable, member variable, Objective-C++ instance var), and the destructor is non-trivial. This cuts away a lot of cases:

It is not clear for my why the non-trivial destructor is a requirement here. Do yo have any info on that?
 

* Constructing into temporaries is disabled when destructor is non-trivial. At least, we should be able to inline those when the destructor is present at all, so that it would be evaluated conservatively. One thing to note here is that our CFG has been recently fixed, so we're much closer to fixing this properly nowadays. However, CFG alone is not enough to figure out which destructor needs to be called; for instance, if a lifetime-extended temporary is initialized with an operator?: expression, we'd need path-sensitive information to figure out which object to destroy.

* Temporaries are also special because our pass-by-value is not always working correctly. In particular, when we inline 'foo(c)', where variable 'c' is of 'class C', we first copy 'c' into a temporary region, and then trivial-copy it into the stack variable-region of the function parameter, while we should be constructing directly into the parameter region. We cannot construct directly into the parameter region because the stack frame has not yet been constructed, because arguments are not yet computed. More reading on the subject, even if a bit outdated, is at http://clang-developers.42468.n3.nabble.com/Inlining-temporary-destructors-in-the-analyzer-td4040851.html This is a hard problem, but i wasn't noticing many instances of it yet.

I am wondering what the analyzer does with copy elision? Do we model that somehow?
 

* Base object region constructors are disabled when destructor is non-trivial. This sounds like an accidental omission.

Nice catch.
 

* Construction into ElementRegion is disabled regardless of destructors. In particular, mass array constructions are disabled. There is a special AST for this case, which emulates the loop through the array (or return value of operator new[]) with a loop counter variable and all. We have no support for this whole language construct. Note, however, that ElementRegion is much more than array element; it is often used for representing casts, and in particular it appears in return values of a conservatively evaluated operator new() (i.e. element{SymRegion}) and is likely to appear in placement-new() return values for the same reason. So we should discriminate between these two cases.

Do you know why casts are represented this way? Is this something that we might want to change in the future? I think one of the reasons why construction into ElementRegion is disabled because arrays are populated lazily by the analyzer. And this does not work well with code that relies on the fact that the number of constructor/destructor invocations is the same as the number of elements in the array.
 

* Constructing into return value of operator new() is disabled completely anyway, because there's a modeling problem that causes us to be unable to connect the constructor with the correct this-region. The CFG part of this problem was fixed by Jordan in 2014 by adding the CFGNewAllocator element, so we now actually call the operator and the constructor in the correct order, but we still need to pass the operator new's return value to the constructor. Note how pointless it is to enable it, or even inline a custom operator new, until construction into ElementRegion is fixed.

---

Speaking of inlining operator new():

* For correct modeling in the core, it seems that the only thing that remains is to actually use the return value of operator new() and construct the new object *into* it. Sounds easy.

* I’m also immediately noticing the unimplemented path diagnostics within the inlined operator new(), which crash with llvm_unreachable(“not yet implemented”) whenever a visitor fires within it (eg. to indicate that memory allocated there leaks).

* MallocChecker keeps being surprised with various non-symbolic stuff that may come out from operator new(). https://reviews.llvm.org/D37189 doesn’t cover all cases; in particular, void *operator new(size_t) { static int x; return &x; } seems to instantly crash after inlining.

---

From now on i'd like to experiment with, first of all, disabling the DeclRegion bailout when possible. Then i'd try to inline operator new, pass its return value to the respective constructor, and inline said constructor. I'm not sure if i'd be able to dig into the temporaries and pass-by-copy problems soon.

And i believe that when it comes to C++ pure language constructs, i listed most of the problems i'm aware of, with the exception of, well, *exceptions* (which also need CFG work). Of course we could do better modeling the standard library as well.

Thank you for collecting these problems!

 
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer] C++ and inlining.

Robinson, Paul via cfe-dev


On 11/26/17 3:17 PM, Gábor Horváth wrote:

> Hi Artem!
>
> On 26 November 2017 at 12:10, Artem Dergachev via cfe-dev
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>     I've observed a lot of false positives on WebKit that resulted
>     from our inability to inline relatively simple functions in C++,
>     specifically constructors and operator new(), due to unimplemented
>     modeling. So i wanted to document, and eventually fix, a variety
>     of language feature modeling we're currently lacking. Note that
>     WebKit, like LLVM, is not a typical C++ codebase; it uses many
>     language constructs that regular projects don't use, and uses very
>     little standard library headers.
>
>
> Great news!
>
>
>     ---
>
>     Normally we expect the analyzer to work even without any inlining,
>     by conservatively invalidating the program state, which eliminates
>     the clearly false assumptions but sometimes causes infeasible
>     paths when the invalidation was too aggressive and we start
>     denoting the same value with a different symbol after invalidation
>     and assume contradictory stuff on the old and the new symbol. The
>     false positives resulting from aggressive invalidation are usually
>     treated as less scary than the ones suppressed by invalidation,
>     because they can be easily suppressed (with assertions or const
>     qualifiers etc.) without loss of coverage. However, in C++ there
>     are a lot of implicit function calls which cause massive
>     frustration when evaluated conservatively. For example,
>
>       class C {
>         bool haveX;
>         class D d;
>
>       public:
>         C(int *x): haveX(x != 0) {
>           if (haveX)
>             *x = 1; // null dereference??
>         }
>       };
>
>     In this case, 'haveX' should assume that x is non-null. The code
>     eagerly splits states into {x == 0} and {x != 0}, which seems
>     reasonable. However, after 'haveX' have been initialized, the
>     default constructor for field 'd' kicks in. If only this
>     constructor is not inlined, or any of its callees are not inlined,
>     value stored in 'haveX' would be invalidated on both paths. In
>     particular, the path on which the original 'haveX' is false but
>     the invalidated 'haveX' is true have now opened up.
>
>
> I observed the very same pattern on Ericsson internal codebases, so
> sorting this out is definitely a huge win for many projects (not just
> LLVM and WebKit).
>
>
>     ---
>
>     Inlining of the constructor itself is disabled in many cases for
>     many reasons. In particular, we are currently only trying to
>     inline the constructor when the target this-region is a DeclRegion
>     (variable, member variable, Objective-C++ instance var), and the
>     destructor is non-trivial. This cuts away a lot of cases:
>
>
> It is not clear for my why the non-trivial destructor is a requirement
> here. Do yo have any info on that?

I believe that if the destructor is trivial, it's irrelevant whether the
destructor would be evaluated at all during analysis, or it is missing
completely due to improper modeling. When the destructor is non-trivial
and doesn't get evaluated at all after we inlined a constructor, we'd
get all sorts of leak FPs and such.

Btw, one thing that i forgot to mention is that we have r290140
(https://bugs.llvm.org/show_bug.cgi?id=15599) - we currently generate a
sink whenever we construct an object into a temporary and its destructor
is *no-return*, losing coverage between the constructor and the destructor.

>
>     * Constructing into temporaries is disabled when destructor is
>     non-trivial. At least, we should be able to inline those when the
>     destructor is present at all, so that it would be evaluated
>     conservatively. One thing to note here is that our CFG has been
>     recently fixed, so we're much closer to fixing this properly
>     nowadays. However, CFG alone is not enough to figure out which
>     destructor needs to be called; for instance, if a
>     lifetime-extended temporary is initialized with an operator?:
>     expression, we'd need path-sensitive information to figure out
>     which object to destroy.
>
>     * Temporaries are also special because our pass-by-value is not
>     always working correctly. In particular, when we inline 'foo(c)',
>     where variable 'c' is of 'class C', we first copy 'c' into a
>     temporary region, and then trivial-copy it into the stack
>     variable-region of the function parameter, while we should be
>     constructing directly into the parameter region. We cannot
>     construct directly into the parameter region because the stack
>     frame has not yet been constructed, because arguments are not yet
>     computed. More reading on the subject, even if a bit outdated, is
>     at
>     http://clang-developers.42468.n3.nabble.com/Inlining-temporary-destructors-in-the-analyzer-td4040851.html
>     <http://clang-developers.42468.n3.nabble.com/Inlining-temporary-destructors-in-the-analyzer-td4040851.html>
>     This is a hard problem, but i wasn't noticing many instances of it
>     yet.
>
>
> I am wondering what the analyzer does with copy elision? Do we model
> that somehow?

Not much, i think, and not sure if we should. When the standard is not
explicitly forcing us to copy (or to avoid copying), i think that out of
the two, we'd rather do the thing that takes less code to implement - it
doesn't sound like a huge performance win if we elide a copy (though i
might be wrong).

>
>     * Base object region constructors are disabled when destructor is
>     non-trivial. This sounds like an accidental omission.
>
>
> Nice catch.
>
>
>     * Construction into ElementRegion is disabled regardless of
>     destructors. In particular, mass array constructions are disabled.
>     There is a special AST for this case, which emulates the loop
>     through the array (or return value of operator new[]) with a loop
>     counter variable and all. We have no support for this whole
>     language construct. Note, however, that ElementRegion is much more
>     than array element; it is often used for representing casts, and
>     in particular it appears in return values of a conservatively
>     evaluated operator new() (i.e. element{SymRegion}) and is likely
>     to appear in placement-new() return values for the same reason. So
>     we should discriminate between these two cases.
>
>
> Do you know why casts are represented this way? Is this something that
> we might want to change in the future? I think one of the reasons why
> construction into ElementRegion is disabled because arrays are
> populated lazily by the analyzer. And this does not work well with
> code that relies on the fact that the number of constructor/destructor
> invocations is the same as the number of elements in the array.

I wish to come up with a better representation of pointer casts for
years, but i still have no better idea. On one hand, i don't see why the
cast should at all be a part of the SVal which merely represents the
pointer value (as opposed to MemRegion which represents a segment), on
the other hand we need to do pointer arithmetic in SValBuilder somehow,
and we often have no other type to rely on. Suggestions are very welcome.

For now when we construct arrays, we model the first constructor and
invalidate the rest of the array. The first constructor is, of course,
not inlined, because, well, it's into an ElementRegion.

>
>
>     * Constructing into return value of operator new() is disabled
>     completely anyway, because there's a modeling problem that causes
>     us to be unable to connect the constructor with the correct
>     this-region. The CFG part of this problem was fixed by Jordan in
>     2014 by adding the CFGNewAllocator element, so we now actually
>     call the operator and the constructor in the correct order, but we
>     still need to pass the operator new's return value to the
>     constructor. Note how pointless it is to enable it, or even inline
>     a custom operator new, until construction into ElementRegion is fixed.
>
>     ---
>
>     Speaking of inlining operator new():
>
>     * For correct modeling in the core, it seems that the only thing
>     that remains is to actually use the return value of operator new()
>     and construct the new object *into* it. Sounds easy.
>
>     * I’m also immediately noticing the unimplemented path diagnostics
>     within the inlined operator new(), which crash with
>     llvm_unreachable(“not yet implemented”) whenever a visitor fires
>     within it (eg. to indicate that memory allocated there leaks).
>
>     * MallocChecker keeps being surprised with various non-symbolic
>     stuff that may come out from operator new().
>     https://reviews.llvm.org/D37189 <https://reviews.llvm.org/D37189>
>     doesn’t cover all cases; in particular, void *operator new(size_t)
>     { static int x; return &x; } seems to instantly crash after inlining.
>
>     ---
>
>     From now on i'd like to experiment with, first of all, disabling
>     the DeclRegion bailout when possible. Then i'd try to inline
>     operator new, pass its return value to the respective constructor,
>     and inline said constructor. I'm not sure if i'd be able to dig
>     into the temporaries and pass-by-copy problems soon.
>
>     And i believe that when it comes to C++ pure language constructs,
>     i listed most of the problems i'm aware of, with the exception of,
>     well, *exceptions* (which also need CFG work). Of course we could
>     do better modeling the standard library as well.
>
>
> Thank you for collecting these problems!
>
>     _______________________________________________
>     cfe-dev mailing list
>     [hidden email] <mailto:[hidden email]>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>
>

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer] C++ and inlining.

Robinson, Paul via cfe-dev
In reply to this post by Robinson, Paul via cfe-dev


On 11/26/17 3:17 PM, Gábor Horváth wrote:

> Hi Artem!
>
> On 26 November 2017 at 12:10, Artem Dergachev via cfe-dev
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>     I've observed a lot of false positives on WebKit that resulted
>     from our inability to inline relatively simple functions in C++,
>     specifically constructors and operator new(), due to unimplemented
>     modeling. So i wanted to document, and eventually fix, a variety
>     of language feature modeling we're currently lacking. Note that
>     WebKit, like LLVM, is not a typical C++ codebase; it uses many
>     language constructs that regular projects don't use, and uses very
>     little standard library headers.
>
>
> Great news!
>
>
>     ---
>
>     Normally we expect the analyzer to work even without any inlining,
>     by conservatively invalidating the program state, which eliminates
>     the clearly false assumptions but sometimes causes infeasible
>     paths when the invalidation was too aggressive and we start
>     denoting the same value with a different symbol after invalidation
>     and assume contradictory stuff on the old and the new symbol. The
>     false positives resulting from aggressive invalidation are usually
>     treated as less scary than the ones suppressed by invalidation,
>     because they can be easily suppressed (with assertions or const
>     qualifiers etc.) without loss of coverage. However, in C++ there
>     are a lot of implicit function calls which cause massive
>     frustration when evaluated conservatively. For example,
>
>       class C {
>         bool haveX;
>         class D d;
>
>       public:
>         C(int *x): haveX(x != 0) {
>           if (haveX)
>             *x = 1; // null dereference??
>         }
>       };
>
>     In this case, 'haveX' should assume that x is non-null. The code
>     eagerly splits states into {x == 0} and {x != 0}, which seems
>     reasonable. However, after 'haveX' have been initialized, the
>     default constructor for field 'd' kicks in. If only this
>     constructor is not inlined, or any of its callees are not inlined,
>     value stored in 'haveX' would be invalidated on both paths. In
>     particular, the path on which the original 'haveX' is false but
>     the invalidated 'haveX' is true have now opened up.
>
>
> I observed the very same pattern on Ericsson internal codebases, so
> sorting this out is definitely a huge win for many projects (not just
> LLVM and WebKit).
>
>
>     ---
>
>     Inlining of the constructor itself is disabled in many cases for
>     many reasons. In particular, we are currently only trying to
>     inline the constructor when the target this-region is a DeclRegion
>     (variable, member variable, Objective-C++ instance var), and the
>     destructor is non-trivial. This cuts away a lot of cases:
>
>
> It is not clear for my why the non-trivial destructor is a requirement
> here. Do yo have any info on that?
>
>
>     * Constructing into temporaries is disabled when destructor is
>     non-trivial. At least, we should be able to inline those when the
>     destructor is present at all, so that it would be evaluated
>     conservatively. One thing to note here is that our CFG has been
>     recently fixed, so we're much closer to fixing this properly
>     nowadays. However, CFG alone is not enough to figure out which
>     destructor needs to be called; for instance, if a
>     lifetime-extended temporary is initialized with an operator?:
>     expression, we'd need path-sensitive information to figure out
>     which object to destroy.
>
>     * Temporaries are also special because our pass-by-value is not
>     always working correctly. In particular, when we inline 'foo(c)',
>     where variable 'c' is of 'class C', we first copy 'c' into a
>     temporary region, and then trivial-copy it into the stack
>     variable-region of the function parameter, while we should be
>     constructing directly into the parameter region. We cannot
>     construct directly into the parameter region because the stack
>     frame has not yet been constructed, because arguments are not yet
>     computed. More reading on the subject, even if a bit outdated, is
>     at
>     http://clang-developers.42468.n3.nabble.com/Inlining-temporary-destructors-in-the-analyzer-td4040851.html
>     <http://clang-developers.42468.n3.nabble.com/Inlining-temporary-destructors-in-the-analyzer-td4040851.html>
>     This is a hard problem, but i wasn't noticing many instances of it
>     yet.
>
>
> I am wondering what the analyzer does with copy elision? Do we model
> that somehow?
>
>
>     * Base object region constructors are disabled when destructor is
>     non-trivial. This sounds like an accidental omission.
>
>
> Nice catch.

Actually not so much. This bailout only fires for CK_Complete
constructor kinds, and base-object constructors aren't "complete" in
this sense, so they work fine.

This might still affect a complete constructor that follows
placement-new into a base-object region (i'd be shocked to see code that
actually does it).

>
>     * Construction into ElementRegion is disabled regardless of
>     destructors. In particular, mass array constructions are disabled.
>     There is a special AST for this case, which emulates the loop
>     through the array (or return value of operator new[]) with a loop
>     counter variable and all. We have no support for this whole
>     language construct. Note, however, that ElementRegion is much more
>     than array element; it is often used for representing casts, and
>     in particular it appears in return values of a conservatively
>     evaluated operator new() (i.e. element{SymRegion}) and is likely
>     to appear in placement-new() return values for the same reason. So
>     we should discriminate between these two cases.
>
>
> Do you know why casts are represented this way? Is this something that
> we might want to change in the future? I think one of the reasons why
> construction into ElementRegion is disabled because arrays are
> populated lazily by the analyzer. And this does not work well with
> code that relies on the fact that the number of constructor/destructor
> invocations is the same as the number of elements in the array.
>
>
>     * Constructing into return value of operator new() is disabled
>     completely anyway, because there's a modeling problem that causes
>     us to be unable to connect the constructor with the correct
>     this-region. The CFG part of this problem was fixed by Jordan in
>     2014 by adding the CFGNewAllocator element, so we now actually
>     call the operator and the constructor in the correct order, but we
>     still need to pass the operator new's return value to the
>     constructor. Note how pointless it is to enable it, or even inline
>     a custom operator new, until construction into ElementRegion is fixed.
>
>     ---
>
>     Speaking of inlining operator new():
>
>     * For correct modeling in the core, it seems that the only thing
>     that remains is to actually use the return value of operator new()
>     and construct the new object *into* it. Sounds easy.
>
>     * I’m also immediately noticing the unimplemented path diagnostics
>     within the inlined operator new(), which crash with
>     llvm_unreachable(“not yet implemented”) whenever a visitor fires
>     within it (eg. to indicate that memory allocated there leaks).
>
>     * MallocChecker keeps being surprised with various non-symbolic
>     stuff that may come out from operator new().
>     https://reviews.llvm.org/D37189 <https://reviews.llvm.org/D37189>
>     doesn’t cover all cases; in particular, void *operator new(size_t)
>     { static int x; return &x; } seems to instantly crash after inlining.
>
>     ---
>
>     From now on i'd like to experiment with, first of all, disabling
>     the DeclRegion bailout when possible. Then i'd try to inline
>     operator new, pass its return value to the respective constructor,
>     and inline said constructor. I'm not sure if i'd be able to dig
>     into the temporaries and pass-by-copy problems soon.
>
>     And i believe that when it comes to C++ pure language constructs,
>     i listed most of the problems i'm aware of, with the exception of,
>     well, *exceptions* (which also need CFG work). Of course we could
>     do better modeling the standard library as well.
>
>
> Thank you for collecting these problems!
>
>     _______________________________________________
>     cfe-dev mailing list
>     [hidden email] <mailto:[hidden email]>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>
>

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer] C++ and inlining.

Robinson, Paul via cfe-dev
Speaking of the code sample from the first e-mail: does anyone has an
explanation why it is the non-inlined constructor being a special
case? If you replace with an arbitrary non-inlined function call
between the haveX initialization and the "if" condition the analysis
seem to work fine. Compare assembly in [1] with the "D d;" line
commented out and active.

[1] https://godbolt.org/g/jmSKT9

Alexey

On Tue, Nov 28, 2017 at 6:19 PM, Artem Dergachev via cfe-dev
<[hidden email]> wrote:

>
>
> On 11/26/17 3:17 PM, Gábor Horváth wrote:
>>
>> Hi Artem!
>>
>>
>> On 26 November 2017 at 12:10, Artem Dergachev via cfe-dev
>> <[hidden email] <mailto:[hidden email]>> wrote:
>>
>>     I've observed a lot of false positives on WebKit that resulted
>>     from our inability to inline relatively simple functions in C++,
>>     specifically constructors and operator new(), due to unimplemented
>>     modeling. So i wanted to document, and eventually fix, a variety
>>     of language feature modeling we're currently lacking. Note that
>>     WebKit, like LLVM, is not a typical C++ codebase; it uses many
>>     language constructs that regular projects don't use, and uses very
>>     little standard library headers.
>>
>>
>> Great news!
>>
>>
>>     ---
>>
>>     Normally we expect the analyzer to work even without any inlining,
>>     by conservatively invalidating the program state, which eliminates
>>     the clearly false assumptions but sometimes causes infeasible
>>     paths when the invalidation was too aggressive and we start
>>     denoting the same value with a different symbol after invalidation
>>     and assume contradictory stuff on the old and the new symbol. The
>>     false positives resulting from aggressive invalidation are usually
>>     treated as less scary than the ones suppressed by invalidation,
>>     because they can be easily suppressed (with assertions or const
>>     qualifiers etc.) without loss of coverage. However, in C++ there
>>     are a lot of implicit function calls which cause massive
>>     frustration when evaluated conservatively. For example,
>>
>>       class C {
>>         bool haveX;
>>         class D d;
>>
>>       public:
>>         C(int *x): haveX(x != 0) {
>>           if (haveX)
>>             *x = 1; // null dereference??
>>         }
>>       };
>>
>>     In this case, 'haveX' should assume that x is non-null. The code
>>     eagerly splits states into {x == 0} and {x != 0}, which seems
>>     reasonable. However, after 'haveX' have been initialized, the
>>     default constructor for field 'd' kicks in. If only this
>>     constructor is not inlined, or any of its callees are not inlined,
>>     value stored in 'haveX' would be invalidated on both paths. In
>>     particular, the path on which the original 'haveX' is false but
>>     the invalidated 'haveX' is true have now opened up.
>>
>>
>> I observed the very same pattern on Ericsson internal codebases, so
>> sorting this out is definitely a huge win for many projects (not just LLVM
>> and WebKit).
>>
>>
>>     ---
>>
>>     Inlining of the constructor itself is disabled in many cases for
>>     many reasons. In particular, we are currently only trying to
>>     inline the constructor when the target this-region is a DeclRegion
>>     (variable, member variable, Objective-C++ instance var), and the
>>     destructor is non-trivial. This cuts away a lot of cases:
>>
>>
>> It is not clear for my why the non-trivial destructor is a requirement
>> here. Do yo have any info on that?
>>
>>
>>     * Constructing into temporaries is disabled when destructor is
>>     non-trivial. At least, we should be able to inline those when the
>>     destructor is present at all, so that it would be evaluated
>>     conservatively. One thing to note here is that our CFG has been
>>     recently fixed, so we're much closer to fixing this properly
>>     nowadays. However, CFG alone is not enough to figure out which
>>     destructor needs to be called; for instance, if a
>>     lifetime-extended temporary is initialized with an operator?:
>>     expression, we'd need path-sensitive information to figure out
>>     which object to destroy.
>>
>>     * Temporaries are also special because our pass-by-value is not
>>     always working correctly. In particular, when we inline 'foo(c)',
>>     where variable 'c' is of 'class C', we first copy 'c' into a
>>     temporary region, and then trivial-copy it into the stack
>>     variable-region of the function parameter, while we should be
>>     constructing directly into the parameter region. We cannot
>>     construct directly into the parameter region because the stack
>>     frame has not yet been constructed, because arguments are not yet
>>     computed. More reading on the subject, even if a bit outdated, is
>>     at
>>
>> http://clang-developers.42468.n3.nabble.com/Inlining-temporary-destructors-in-the-analyzer-td4040851.html
>>
>> <http://clang-developers.42468.n3.nabble.com/Inlining-temporary-destructors-in-the-analyzer-td4040851.html>
>>     This is a hard problem, but i wasn't noticing many instances of it
>>     yet.
>>
>>
>> I am wondering what the analyzer does with copy elision? Do we model that
>> somehow?
>>
>>
>>     * Base object region constructors are disabled when destructor is
>>     non-trivial. This sounds like an accidental omission.
>>
>>
>> Nice catch.
>
>
> Actually not so much. This bailout only fires for CK_Complete constructor
> kinds, and base-object constructors aren't "complete" in this sense, so they
> work fine.
>
> This might still affect a complete constructor that follows placement-new
> into a base-object region (i'd be shocked to see code that actually does
> it).
>
>>
>>     * Construction into ElementRegion is disabled regardless of
>>     destructors. In particular, mass array constructions are disabled.
>>     There is a special AST for this case, which emulates the loop
>>     through the array (or return value of operator new[]) with a loop
>>     counter variable and all. We have no support for this whole
>>     language construct. Note, however, that ElementRegion is much more
>>     than array element; it is often used for representing casts, and
>>     in particular it appears in return values of a conservatively
>>     evaluated operator new() (i.e. element{SymRegion}) and is likely
>>     to appear in placement-new() return values for the same reason. So
>>     we should discriminate between these two cases.
>>
>>
>> Do you know why casts are represented this way? Is this something that we
>> might want to change in the future? I think one of the reasons why
>> construction into ElementRegion is disabled because arrays are populated
>> lazily by the analyzer. And this does not work well with code that relies on
>> the fact that the number of constructor/destructor invocations is the same
>> as the number of elements in the array.
>>
>>
>>     * Constructing into return value of operator new() is disabled
>>     completely anyway, because there's a modeling problem that causes
>>     us to be unable to connect the constructor with the correct
>>     this-region. The CFG part of this problem was fixed by Jordan in
>>     2014 by adding the CFGNewAllocator element, so we now actually
>>     call the operator and the constructor in the correct order, but we
>>     still need to pass the operator new's return value to the
>>     constructor. Note how pointless it is to enable it, or even inline
>>     a custom operator new, until construction into ElementRegion is fixed.
>>
>>     ---
>>
>>     Speaking of inlining operator new():
>>
>>     * For correct modeling in the core, it seems that the only thing
>>     that remains is to actually use the return value of operator new()
>>     and construct the new object *into* it. Sounds easy.
>>
>>     * I’m also immediately noticing the unimplemented path diagnostics
>>     within the inlined operator new(), which crash with
>>     llvm_unreachable(“not yet implemented”) whenever a visitor fires
>>     within it (eg. to indicate that memory allocated there leaks).
>>
>>     * MallocChecker keeps being surprised with various non-symbolic
>>     stuff that may come out from operator new().
>>     https://reviews.llvm.org/D37189 <https://reviews.llvm.org/D37189>
>>     doesn’t cover all cases; in particular, void *operator new(size_t)
>>     { static int x; return &x; } seems to instantly crash after inlining.
>>
>>     ---
>>
>>     From now on i'd like to experiment with, first of all, disabling
>>     the DeclRegion bailout when possible. Then i'd try to inline
>>     operator new, pass its return value to the respective constructor,
>>     and inline said constructor. I'm not sure if i'd be able to dig
>>     into the temporaries and pass-by-copy problems soon.
>>
>>     And i believe that when it comes to C++ pure language constructs,
>>     i listed most of the problems i'm aware of, with the exception of,
>>     well, *exceptions* (which also need CFG work). Of course we could
>>     do better modeling the standard library as well.
>>
>>
>> Thank you for collecting these problems!
>>
>>     _______________________________________________
>>     cfe-dev mailing list
>>     [hidden email] <mailto:[hidden email]>
>>     http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>>
>>
>
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer] C++ and inlining.

Robinson, Paul via cfe-dev
In this particular example this happens because we may potentially have
the constructor of D defined as

D::D() {
   bool *haveXptr = ((void *)this) - (offsetof(C, d) - offsetof(C, x));
   *haveXptr = !*haveXptr;
}

which is valid code that changes haveX between initialization and
assignment.

However, an external function is unlikely to contain the 'this' pointer
(pointer to variable 'c'). At least in your code generation example,
where 'c' is a local variable and the pointer to it never escapes to the
global scope.

Generally, in the analyzer when we don't know if something can happen,
we often prefer to assume the worst, much like in codegen. In this
particular example i'd be for toning down the invalidation (as in
http://lists.llvm.org/pipermail/cfe-dev/2017-November/056103.html), but
there may be more reasons for invalidating stuff, some of which sound
much more convincing.

On 12/3/17 12:56 PM, Alexey Salmin wrote:

> Speaking of the code sample from the first e-mail: does anyone has an
> explanation why it is the non-inlined constructor being a special
> case? If you replace with an arbitrary non-inlined function call
> between the haveX initialization and the "if" condition the analysis
> seem to work fine. Compare assembly in [1] with the "D d;" line
> commented out and active.
>
> [1] https://godbolt.org/g/jmSKT9
>
> Alexey
>
> On Tue, Nov 28, 2017 at 6:19 PM, Artem Dergachev via cfe-dev
> <[hidden email]> wrote:
>>
>> On 11/26/17 3:17 PM, Gábor Horváth wrote:
>>> Hi Artem!
>>>
>>>
>>> On 26 November 2017 at 12:10, Artem Dergachev via cfe-dev
>>> <[hidden email] <mailto:[hidden email]>> wrote:
>>>
>>>      I've observed a lot of false positives on WebKit that resulted
>>>      from our inability to inline relatively simple functions in C++,
>>>      specifically constructors and operator new(), due to unimplemented
>>>      modeling. So i wanted to document, and eventually fix, a variety
>>>      of language feature modeling we're currently lacking. Note that
>>>      WebKit, like LLVM, is not a typical C++ codebase; it uses many
>>>      language constructs that regular projects don't use, and uses very
>>>      little standard library headers.
>>>
>>>
>>> Great news!
>>>
>>>
>>>      ---
>>>
>>>      Normally we expect the analyzer to work even without any inlining,
>>>      by conservatively invalidating the program state, which eliminates
>>>      the clearly false assumptions but sometimes causes infeasible
>>>      paths when the invalidation was too aggressive and we start
>>>      denoting the same value with a different symbol after invalidation
>>>      and assume contradictory stuff on the old and the new symbol. The
>>>      false positives resulting from aggressive invalidation are usually
>>>      treated as less scary than the ones suppressed by invalidation,
>>>      because they can be easily suppressed (with assertions or const
>>>      qualifiers etc.) without loss of coverage. However, in C++ there
>>>      are a lot of implicit function calls which cause massive
>>>      frustration when evaluated conservatively. For example,
>>>
>>>        class C {
>>>          bool haveX;
>>>          class D d;
>>>
>>>        public:
>>>          C(int *x): haveX(x != 0) {
>>>            if (haveX)
>>>              *x = 1; // null dereference??
>>>          }
>>>        };
>>>
>>>      In this case, 'haveX' should assume that x is non-null. The code
>>>      eagerly splits states into {x == 0} and {x != 0}, which seems
>>>      reasonable. However, after 'haveX' have been initialized, the
>>>      default constructor for field 'd' kicks in. If only this
>>>      constructor is not inlined, or any of its callees are not inlined,
>>>      value stored in 'haveX' would be invalidated on both paths. In
>>>      particular, the path on which the original 'haveX' is false but
>>>      the invalidated 'haveX' is true have now opened up.
>>>
>>>
>>> I observed the very same pattern on Ericsson internal codebases, so
>>> sorting this out is definitely a huge win for many projects (not just LLVM
>>> and WebKit).
>>>
>>>
>>>      ---
>>>
>>>      Inlining of the constructor itself is disabled in many cases for
>>>      many reasons. In particular, we are currently only trying to
>>>      inline the constructor when the target this-region is a DeclRegion
>>>      (variable, member variable, Objective-C++ instance var), and the
>>>      destructor is non-trivial. This cuts away a lot of cases:
>>>
>>>
>>> It is not clear for my why the non-trivial destructor is a requirement
>>> here. Do yo have any info on that?
>>>
>>>
>>>      * Constructing into temporaries is disabled when destructor is
>>>      non-trivial. At least, we should be able to inline those when the
>>>      destructor is present at all, so that it would be evaluated
>>>      conservatively. One thing to note here is that our CFG has been
>>>      recently fixed, so we're much closer to fixing this properly
>>>      nowadays. However, CFG alone is not enough to figure out which
>>>      destructor needs to be called; for instance, if a
>>>      lifetime-extended temporary is initialized with an operator?:
>>>      expression, we'd need path-sensitive information to figure out
>>>      which object to destroy.
>>>
>>>      * Temporaries are also special because our pass-by-value is not
>>>      always working correctly. In particular, when we inline 'foo(c)',
>>>      where variable 'c' is of 'class C', we first copy 'c' into a
>>>      temporary region, and then trivial-copy it into the stack
>>>      variable-region of the function parameter, while we should be
>>>      constructing directly into the parameter region. We cannot
>>>      construct directly into the parameter region because the stack
>>>      frame has not yet been constructed, because arguments are not yet
>>>      computed. More reading on the subject, even if a bit outdated, is
>>>      at
>>>
>>> http://clang-developers.42468.n3.nabble.com/Inlining-temporary-destructors-in-the-analyzer-td4040851.html
>>>
>>> <http://clang-developers.42468.n3.nabble.com/Inlining-temporary-destructors-in-the-analyzer-td4040851.html>
>>>      This is a hard problem, but i wasn't noticing many instances of it
>>>      yet.
>>>
>>>
>>> I am wondering what the analyzer does with copy elision? Do we model that
>>> somehow?
>>>
>>>
>>>      * Base object region constructors are disabled when destructor is
>>>      non-trivial. This sounds like an accidental omission.
>>>
>>>
>>> Nice catch.
>>
>> Actually not so much. This bailout only fires for CK_Complete constructor
>> kinds, and base-object constructors aren't "complete" in this sense, so they
>> work fine.
>>
>> This might still affect a complete constructor that follows placement-new
>> into a base-object region (i'd be shocked to see code that actually does
>> it).
>>
>>>      * Construction into ElementRegion is disabled regardless of
>>>      destructors. In particular, mass array constructions are disabled.
>>>      There is a special AST for this case, which emulates the loop
>>>      through the array (or return value of operator new[]) with a loop
>>>      counter variable and all. We have no support for this whole
>>>      language construct. Note, however, that ElementRegion is much more
>>>      than array element; it is often used for representing casts, and
>>>      in particular it appears in return values of a conservatively
>>>      evaluated operator new() (i.e. element{SymRegion}) and is likely
>>>      to appear in placement-new() return values for the same reason. So
>>>      we should discriminate between these two cases.
>>>
>>>
>>> Do you know why casts are represented this way? Is this something that we
>>> might want to change in the future? I think one of the reasons why
>>> construction into ElementRegion is disabled because arrays are populated
>>> lazily by the analyzer. And this does not work well with code that relies on
>>> the fact that the number of constructor/destructor invocations is the same
>>> as the number of elements in the array.
>>>
>>>
>>>      * Constructing into return value of operator new() is disabled
>>>      completely anyway, because there's a modeling problem that causes
>>>      us to be unable to connect the constructor with the correct
>>>      this-region. The CFG part of this problem was fixed by Jordan in
>>>      2014 by adding the CFGNewAllocator element, so we now actually
>>>      call the operator and the constructor in the correct order, but we
>>>      still need to pass the operator new's return value to the
>>>      constructor. Note how pointless it is to enable it, or even inline
>>>      a custom operator new, until construction into ElementRegion is fixed.
>>>
>>>      ---
>>>
>>>      Speaking of inlining operator new():
>>>
>>>      * For correct modeling in the core, it seems that the only thing
>>>      that remains is to actually use the return value of operator new()
>>>      and construct the new object *into* it. Sounds easy.
>>>
>>>      * I’m also immediately noticing the unimplemented path diagnostics
>>>      within the inlined operator new(), which crash with
>>>      llvm_unreachable(“not yet implemented”) whenever a visitor fires
>>>      within it (eg. to indicate that memory allocated there leaks).
>>>
>>>      * MallocChecker keeps being surprised with various non-symbolic
>>>      stuff that may come out from operator new().
>>>      https://reviews.llvm.org/D37189 <https://reviews.llvm.org/D37189>
>>>      doesn’t cover all cases; in particular, void *operator new(size_t)
>>>      { static int x; return &x; } seems to instantly crash after inlining.
>>>
>>>      ---
>>>
>>>      From now on i'd like to experiment with, first of all, disabling
>>>      the DeclRegion bailout when possible. Then i'd try to inline
>>>      operator new, pass its return value to the respective constructor,
>>>      and inline said constructor. I'm not sure if i'd be able to dig
>>>      into the temporaries and pass-by-copy problems soon.
>>>
>>>      And i believe that when it comes to C++ pure language constructs,
>>>      i listed most of the problems i'm aware of, with the exception of,
>>>      well, *exceptions* (which also need CFG work). Of course we could
>>>      do better modeling the standard library as well.
>>>
>>>
>>> Thank you for collecting these problems!
>>>
>>>      _______________________________________________
>>>      cfe-dev mailing list
>>>      [hidden email] <mailto:[hidden email]>
>>>      http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>      <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>>>
>>>
>> _______________________________________________
>> cfe-dev mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer] C++ and inlining.

Robinson, Paul via cfe-dev
Thanks!

I had the "this" pointer leaking to D::D in mind but thought that the
offsetof trick would involve UB at some point. It seems that as long
there's is an actual boolean where D::D looks for it this is probably
fine, even though I'm having a hard time with proving it from the
standard wording.

Alexey

On Mon, Dec 4, 2017 at 1:16 AM, Artem Dergachev <[hidden email]> wrote:

> In this particular example this happens because we may potentially have the
> constructor of D defined as
>
> D::D() {
>   bool *haveXptr = ((void *)this) - (offsetof(C, d) - offsetof(C, x));
>   *haveXptr = !*haveXptr;
> }
>
> which is valid code that changes haveX between initialization and
> assignment.
>
> However, an external function is unlikely to contain the 'this' pointer
> (pointer to variable 'c'). At least in your code generation example, where
> 'c' is a local variable and the pointer to it never escapes to the global
> scope.
>
> Generally, in the analyzer when we don't know if something can happen, we
> often prefer to assume the worst, much like in codegen. In this particular
> example i'd be for toning down the invalidation (as in
> http://lists.llvm.org/pipermail/cfe-dev/2017-November/056103.html), but
> there may be more reasons for invalidating stuff, some of which sound much
> more convincing.
>
>
> On 12/3/17 12:56 PM, Alexey Salmin wrote:
>>
>> Speaking of the code sample from the first e-mail: does anyone has an
>> explanation why it is the non-inlined constructor being a special
>> case? If you replace with an arbitrary non-inlined function call
>> between the haveX initialization and the "if" condition the analysis
>> seem to work fine. Compare assembly in [1] with the "D d;" line
>> commented out and active.
>>
>> [1] https://godbolt.org/g/jmSKT9
>>
>> Alexey
>>
>> On Tue, Nov 28, 2017 at 6:19 PM, Artem Dergachev via cfe-dev
>> <[hidden email]> wrote:
>>>
>>>
>>> On 11/26/17 3:17 PM, Gábor Horváth wrote:
>>>>
>>>> Hi Artem!
>>>>
>>>>
>>>> On 26 November 2017 at 12:10, Artem Dergachev via cfe-dev
>>>> <[hidden email] <mailto:[hidden email]>> wrote:
>>>>
>>>>      I've observed a lot of false positives on WebKit that resulted
>>>>      from our inability to inline relatively simple functions in C++,
>>>>      specifically constructors and operator new(), due to unimplemented
>>>>      modeling. So i wanted to document, and eventually fix, a variety
>>>>      of language feature modeling we're currently lacking. Note that
>>>>      WebKit, like LLVM, is not a typical C++ codebase; it uses many
>>>>      language constructs that regular projects don't use, and uses very
>>>>      little standard library headers.
>>>>
>>>>
>>>> Great news!
>>>>
>>>>
>>>>      ---
>>>>
>>>>      Normally we expect the analyzer to work even without any inlining,
>>>>      by conservatively invalidating the program state, which eliminates
>>>>      the clearly false assumptions but sometimes causes infeasible
>>>>      paths when the invalidation was too aggressive and we start
>>>>      denoting the same value with a different symbol after invalidation
>>>>      and assume contradictory stuff on the old and the new symbol. The
>>>>      false positives resulting from aggressive invalidation are usually
>>>>      treated as less scary than the ones suppressed by invalidation,
>>>>      because they can be easily suppressed (with assertions or const
>>>>      qualifiers etc.) without loss of coverage. However, in C++ there
>>>>      are a lot of implicit function calls which cause massive
>>>>      frustration when evaluated conservatively. For example,
>>>>
>>>>        class C {
>>>>          bool haveX;
>>>>          class D d;
>>>>
>>>>        public:
>>>>          C(int *x): haveX(x != 0) {
>>>>            if (haveX)
>>>>              *x = 1; // null dereference??
>>>>          }
>>>>        };
>>>>
>>>>      In this case, 'haveX' should assume that x is non-null. The code
>>>>      eagerly splits states into {x == 0} and {x != 0}, which seems
>>>>      reasonable. However, after 'haveX' have been initialized, the
>>>>      default constructor for field 'd' kicks in. If only this
>>>>      constructor is not inlined, or any of its callees are not inlined,
>>>>      value stored in 'haveX' would be invalidated on both paths. In
>>>>      particular, the path on which the original 'haveX' is false but
>>>>      the invalidated 'haveX' is true have now opened up.
>>>>
>>>>
>>>> I observed the very same pattern on Ericsson internal codebases, so
>>>> sorting this out is definitely a huge win for many projects (not just
>>>> LLVM
>>>> and WebKit).
>>>>
>>>>
>>>>      ---
>>>>
>>>>      Inlining of the constructor itself is disabled in many cases for
>>>>      many reasons. In particular, we are currently only trying to
>>>>      inline the constructor when the target this-region is a DeclRegion
>>>>      (variable, member variable, Objective-C++ instance var), and the
>>>>      destructor is non-trivial. This cuts away a lot of cases:
>>>>
>>>>
>>>> It is not clear for my why the non-trivial destructor is a requirement
>>>> here. Do yo have any info on that?
>>>>
>>>>
>>>>      * Constructing into temporaries is disabled when destructor is
>>>>      non-trivial. At least, we should be able to inline those when the
>>>>      destructor is present at all, so that it would be evaluated
>>>>      conservatively. One thing to note here is that our CFG has been
>>>>      recently fixed, so we're much closer to fixing this properly
>>>>      nowadays. However, CFG alone is not enough to figure out which
>>>>      destructor needs to be called; for instance, if a
>>>>      lifetime-extended temporary is initialized with an operator?:
>>>>      expression, we'd need path-sensitive information to figure out
>>>>      which object to destroy.
>>>>
>>>>      * Temporaries are also special because our pass-by-value is not
>>>>      always working correctly. In particular, when we inline 'foo(c)',
>>>>      where variable 'c' is of 'class C', we first copy 'c' into a
>>>>      temporary region, and then trivial-copy it into the stack
>>>>      variable-region of the function parameter, while we should be
>>>>      constructing directly into the parameter region. We cannot
>>>>      construct directly into the parameter region because the stack
>>>>      frame has not yet been constructed, because arguments are not yet
>>>>      computed. More reading on the subject, even if a bit outdated, is
>>>>      at
>>>>
>>>>
>>>> http://clang-developers.42468.n3.nabble.com/Inlining-temporary-destructors-in-the-analyzer-td4040851.html
>>>>
>>>>
>>>> <http://clang-developers.42468.n3.nabble.com/Inlining-temporary-destructors-in-the-analyzer-td4040851.html>
>>>>      This is a hard problem, but i wasn't noticing many instances of it
>>>>      yet.
>>>>
>>>>
>>>> I am wondering what the analyzer does with copy elision? Do we model
>>>> that
>>>> somehow?
>>>>
>>>>
>>>>      * Base object region constructors are disabled when destructor is
>>>>      non-trivial. This sounds like an accidental omission.
>>>>
>>>>
>>>> Nice catch.
>>>
>>>
>>> Actually not so much. This bailout only fires for CK_Complete constructor
>>> kinds, and base-object constructors aren't "complete" in this sense, so
>>> they
>>> work fine.
>>>
>>> This might still affect a complete constructor that follows placement-new
>>> into a base-object region (i'd be shocked to see code that actually does
>>> it).
>>>
>>>>      * Construction into ElementRegion is disabled regardless of
>>>>      destructors. In particular, mass array constructions are disabled.
>>>>      There is a special AST for this case, which emulates the loop
>>>>      through the array (or return value of operator new[]) with a loop
>>>>      counter variable and all. We have no support for this whole
>>>>      language construct. Note, however, that ElementRegion is much more
>>>>      than array element; it is often used for representing casts, and
>>>>      in particular it appears in return values of a conservatively
>>>>      evaluated operator new() (i.e. element{SymRegion}) and is likely
>>>>      to appear in placement-new() return values for the same reason. So
>>>>      we should discriminate between these two cases.
>>>>
>>>>
>>>> Do you know why casts are represented this way? Is this something that
>>>> we
>>>> might want to change in the future? I think one of the reasons why
>>>> construction into ElementRegion is disabled because arrays are populated
>>>> lazily by the analyzer. And this does not work well with code that
>>>> relies on
>>>> the fact that the number of constructor/destructor invocations is the
>>>> same
>>>> as the number of elements in the array.
>>>>
>>>>
>>>>      * Constructing into return value of operator new() is disabled
>>>>      completely anyway, because there's a modeling problem that causes
>>>>      us to be unable to connect the constructor with the correct
>>>>      this-region. The CFG part of this problem was fixed by Jordan in
>>>>      2014 by adding the CFGNewAllocator element, so we now actually
>>>>      call the operator and the constructor in the correct order, but we
>>>>      still need to pass the operator new's return value to the
>>>>      constructor. Note how pointless it is to enable it, or even inline
>>>>      a custom operator new, until construction into ElementRegion is
>>>> fixed.
>>>>
>>>>      ---
>>>>
>>>>      Speaking of inlining operator new():
>>>>
>>>>      * For correct modeling in the core, it seems that the only thing
>>>>      that remains is to actually use the return value of operator new()
>>>>      and construct the new object *into* it. Sounds easy.
>>>>
>>>>      * I’m also immediately noticing the unimplemented path diagnostics
>>>>      within the inlined operator new(), which crash with
>>>>      llvm_unreachable(“not yet implemented”) whenever a visitor fires
>>>>      within it (eg. to indicate that memory allocated there leaks).
>>>>
>>>>      * MallocChecker keeps being surprised with various non-symbolic
>>>>      stuff that may come out from operator new().
>>>>      https://reviews.llvm.org/D37189 <https://reviews.llvm.org/D37189>
>>>>      doesn’t cover all cases; in particular, void *operator new(size_t)
>>>>      { static int x; return &x; } seems to instantly crash after
>>>> inlining.
>>>>
>>>>      ---
>>>>
>>>>      From now on i'd like to experiment with, first of all, disabling
>>>>      the DeclRegion bailout when possible. Then i'd try to inline
>>>>      operator new, pass its return value to the respective constructor,
>>>>      and inline said constructor. I'm not sure if i'd be able to dig
>>>>      into the temporaries and pass-by-copy problems soon.
>>>>
>>>>      And i believe that when it comes to C++ pure language constructs,
>>>>      i listed most of the problems i'm aware of, with the exception of,
>>>>      well, *exceptions* (which also need CFG work). Of course we could
>>>>      do better modeling the standard library as well.
>>>>
>>>>
>>>> Thank you for collecting these problems!
>>>>
>>>>      _______________________________________________
>>>>      cfe-dev mailing list
>>>>      [hidden email] <mailto:[hidden email]>
>>>>      http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>      <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>>>>
>>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> [hidden email]
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer] C++ and inlining.

Robinson, Paul via cfe-dev
In reply to this post by Robinson, Paul via cfe-dev


On 11/26/17 3:10 AM, Artem Dergachev wrote:

> I've observed a lot of false positives on WebKit that resulted from
> our inability to inline relatively simple functions in C++,
> specifically constructors and operator new(), due to unimplemented
> modeling. So i wanted to document, and eventually fix, a variety of
> language feature modeling we're currently lacking. Note that WebKit,
> like LLVM, is not a typical C++ codebase; it uses many language
> constructs that regular projects don't use, and uses very little
> standard library headers.
>
> ---
>
> Normally we expect the analyzer to work even without any inlining, by
> conservatively invalidating the program state, which eliminates the
> clearly false assumptions but sometimes causes infeasible paths when
> the invalidation was too aggressive and we start denoting the same
> value with a different symbol after invalidation and assume
> contradictory stuff on the old and the new symbol. The false positives
> resulting from aggressive invalidation are usually treated as less
> scary than the ones suppressed by invalidation, because they can be
> easily suppressed (with assertions or const qualifiers etc.) without
> loss of coverage. However, in C++ there are a lot of implicit function
> calls which cause massive frustration when evaluated conservatively.
> For example,
>
>   class C {
>     bool haveX;
>     class D d;
>
>   public:
>     C(int *x): haveX(x != 0) {
>       if (haveX)
>         *x = 1; // null dereference??
>     }
>   };
>
> In this case, 'haveX' should assume that x is non-null. The code
> eagerly splits states into {x == 0} and {x != 0}, which seems
> reasonable. However, after 'haveX' have been initialized, the default
> constructor for field 'd' kicks in. If only this constructor is not
> inlined, or any of its callees are not inlined, value stored in
> 'haveX' would be invalidated on both paths. In particular, the path on
> which the original 'haveX' is false but the invalidated 'haveX' is
> true have now opened up.
>
> ---
>
> Inlining of the constructor itself is disabled in many cases for many
> reasons. In particular, we are currently only trying to inline the
> constructor when the target this-region is a DeclRegion (variable,
> member variable, Objective-C++ instance var), and the destructor is
> non-trivial. This cuts away a lot of cases:
>
> * Constructing into temporaries is disabled when destructor is
> non-trivial. At least, we should be able to inline those when the
> destructor is present at all, so that it would be evaluated
> conservatively. One thing to note here is that our CFG has been
> recently fixed, so we're much closer to fixing this properly nowadays.
> However, CFG alone is not enough to figure out which destructor needs
> to be called; for instance, if a lifetime-extended temporary is
> initialized with an operator?: expression, we'd need path-sensitive
> information to figure out which object to destroy.
>
> * Temporaries are also special because our pass-by-value is not always
> working correctly. In particular, when we inline 'foo(c)', where
> variable 'c' is of 'class C', we first copy 'c' into a temporary
> region, and then trivial-copy it into the stack variable-region of the
> function parameter, while we should be constructing directly into the
> parameter region. We cannot construct directly into the parameter
> region because the stack frame has not yet been constructed, because
> arguments are not yet computed. More reading on the subject, even if a
> bit outdated, is at
> http://clang-developers.42468.n3.nabble.com/Inlining-temporary-destructors-in-the-analyzer-td4040851.html 
> This is a hard problem, but i wasn't noticing many instances of it yet.
>
> * Base object region constructors are disabled when destructor is
> non-trivial. This sounds like an accidental omission.
>
> * Construction into ElementRegion is disabled regardless of
> destructors. In particular, mass array constructions are disabled.
> There is a special AST for this case, which emulates the loop through
> the array (or return value of operator new[]) with a loop counter
> variable and all. We have no support for this whole language
> construct. Note, however, that ElementRegion is much more than array
> element; it is often used for representing casts, and in particular it
> appears in return values of a conservatively evaluated operator new()
> (i.e. element{SymRegion}) and is likely to appear in placement-new()
> return values for the same reason. So we should discriminate between
> these two cases.
>
> * Constructing into return value of operator new() is disabled
> completely anyway, because there's a modeling problem that causes us
> to be unable to connect the constructor with the correct this-region.
> The CFG part of this problem was fixed by Jordan in 2014 by adding the
> CFGNewAllocator element, so we now actually call the operator and the
> constructor in the correct order, but we still need to pass the
> operator new's return value to the constructor. Note how pointless it
> is to enable it, or even inline a custom operator new, until
> construction into ElementRegion is fixed.

One more syntax pattern that we don't support was unveiled in
https://reviews.llvm.org/D40841 : construction as part of a bigger
aggregate initialization would be never inlined, neither it is performed
with a correct this-region. This includes cases like

   struct A { A(); }; // not an aggregate, has constructor.
   struct B { A a; }; // aggregate, no constructor needed.
   B b = {}; // A() is not handled correctly here.

Or, since C++17, also

   struct B: public A {};

works similarly.

> ---
>
> Speaking of inlining operator new():
>
> * For correct modeling in the core, it seems that the only thing that
> remains is to actually use the return value of operator new() and
> construct the new object *into* it. Sounds easy.
>
> * I’m also immediately noticing the unimplemented path diagnostics
> within the inlined operator new(), which crash with
> llvm_unreachable(“not yet implemented”) whenever a visitor fires
> within it (eg. to indicate that memory allocated there leaks).
>
> * MallocChecker keeps being surprised with various non-symbolic stuff
> that may come out from operator new(). https://reviews.llvm.org/D37189 
> doesn’t cover all cases; in particular, void *operator new(size_t) {
> static int x; return &x; } seems to instantly crash after inlining.
>
> ---
>
> From now on i'd like to experiment with, first of all, disabling the
> DeclRegion bailout when possible. Then i'd try to inline operator new,
> pass its return value to the respective constructor, and inline said
> constructor. I'm not sure if i'd be able to dig into the temporaries
> and pass-by-copy problems soon.
>
> And i believe that when it comes to C++ pure language constructs, i
> listed most of the problems i'm aware of, with the exception of, well,
> *exceptions* (which also need CFG work). Of course we could do better
> modeling the standard library as well.

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev