[analyzer] Toning down invalidations?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[analyzer] Toning down invalidations?

David Blaikie via cfe-dev
So far we've been preferring aggressive invalidation of large chunks of
the program state whenever we encounter anything we cannot model, be it
a function without source code or a language construct we don't support.
As i've mentioned recently, it does both eliminate certain kinds of
false positives (when we're uncertain we assume less incorrect stuff)
and introduce other false positives (when we're uncertain, we may think
that something might be possible, while in fact it isn't). Hence,
invalidation is a trade-off. Hence the question: does everybody like the
current trade-off? I'm particularly thinking about two parts of it:

* Invalidating base region when a field is invalidated (as the whole
base region is reachable through safe pointer arithmetic).
* Invalidating heap regions passed to us by const pointers when heap is
invalidated (as a non-const pointer may exist elsewhere).

Please let us know if you have other cases you'd like to think about :)

---

A bit of an explanation/motivation here.

Invalidation alone doesn't cause false positives all by itself; it only
amplifies false positives that result from our eager approach to state
splits, which causes us to sometimes produce more state splits than
necessary. For instance, in

   // example 1
   if (x) { ... }
   invalidate(&x);
   if (x) { ... }

we may assume that the first if() takes the true branch, and the second
if() takes the false branch. If the invalidation is overly aggressive,
and in fact "x" is untouched, then this path is infeasible. However, the
same problem arises in the following code that doesn't contain any
invalidation:

   // example 2
   if (x) { ... }
   if (y) { ... }

By looking at this code, we can guarantee that both "x is true" and "x
is false" is feasible, otherwise if(x) is deadcode. We can also
guarantee that both "y is true" and "y is false" is feasible, otherwise
if(y) is deadcode. But we cannot guarantee that "x is true and y is
false" is a feasible branch just by looking at the code, hence this path
might be a false positive, and as such can be suppressed by asserting
that "x" is false in the "y is false" branch.

Example 1 is a variant of example 2 in which symbol "y" is replaced with
symbol "invalidated x". Similarly, we may run into an infeasible branch
"original x is true, invalidated x is false". However, depending on how
obvious it is to the reader that this branch is infeasible (eg.
invalidate() certainly doesn't touch "x", or it may even be invisible),
it may be very frustrating to suppress the false positive with an assertion.

Hence the call for heuristics to see if some of those frustrating
invalidations can be toned down.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer] Toning down invalidations?

David Blaikie via cfe-dev


On 27 November 2017 at 16:24, Artem Dergachev via cfe-dev <[hidden email]> wrote:
So far we've been preferring aggressive invalidation of large chunks of the program state whenever we encounter anything we cannot model, be it a function without source code or a language construct we don't support. As i've mentioned recently, it does both eliminate certain kinds of false positives (when we're uncertain we assume less incorrect stuff) and introduce other false positives (when we're uncertain, we may think that something might be possible, while in fact it isn't). Hence, invalidation is a trade-off. Hence the question: does everybody like the current trade-off? I'm particularly thinking about two parts of it:

* Invalidating base region when a field is invalidated (as the whole base region is reachable through safe pointer arithmetic).

I think even if we want to be overly conservative this can be mitigated once the code is modular, since this should only possible if the definition of the class is available in the translation unit where the called function is defined. But since most of the functions are well behaved in a sense they won't touch anything other than the field I think it would be a good default to not to invalidate the whole region and introduce an annotation to mark functions that do such pointer arithmetic to suppress false positives resulting from the lack of invalidation. But I would expect those cases to be very rare (although this only based on intuition I did not do any esearch yet).
 
* Invalidating heap regions passed to us by const pointers when heap is invalidated (as a non-const pointer may exist elsewhere).

What events trigger heap invalidation? Can this effect be mitigated by a conservative points to analysis?
 

Please let us know if you have other cases you'd like to think about :)

---

A bit of an explanation/motivation here.

Invalidation alone doesn't cause false positives all by itself; it only amplifies false positives that result from our eager approach to state splits, which causes us to sometimes produce more state splits than necessary. For instance, in

  // example 1
  if (x) { ... }
  invalidate(&x);
  if (x) { ... }

we may assume that the first if() takes the true branch, and the second if() takes the false branch. If the invalidation is overly aggressive, and in fact "x" is untouched, then this path is infeasible. However, the same problem arises in the following code that doesn't contain any invalidation:

  // example 2
  if (x) { ... }
  if (y) { ... }

By looking at this code, we can guarantee that both "x is true" and "x is false" is feasible, otherwise if(x) is deadcode. We can also guarantee that both "y is true" and "y is false" is feasible, otherwise if(y) is deadcode. But we cannot guarantee that "x is true and y is false" is a feasible branch just by looking at the code, hence this path might be a false positive, and as such can be suppressed by asserting that "x" is false in the "y is false" branch.

Example 1 is a variant of example 2 in which symbol "y" is replaced with symbol "invalidated x". Similarly, we may run into an infeasible branch "original x is true, invalidated x is false". However, depending on how obvious it is to the reader that this branch is infeasible (eg. invalidate() certainly doesn't touch "x", or it may even be invisible), it may be very frustrating to suppress the false positive with an assertion.

Hence the call for heuristics to see if some of those frustrating invalidations can be toned down.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer] Toning down invalidations?

David Blaikie via cfe-dev
In reply to this post by David Blaikie via cfe-dev
Hi!

What is the status of this?

On Mon, 27 Nov 2017 at 16:24, Artem Dergachev via cfe-dev <[hidden email]> wrote:

* Invalidating base region when a field is invalidated (as the whole
base region is reachable through safe pointer arithmetic).

I was doing a little experiment and for the projects I tested the number of results was either the same or increased. Sometimes when the number of results are increased I also lost some results which could be either due to different use of the time budget or ruling out more infeasible paths. The interesting part is, when I turn on refutation, for some projects we can also refute some bugs that we could not earlier.  See the exact numbers [1] and exact bugs [2].

After looking at the new results I found only one FP where the root cause was the less invalidation. But it was an implementation artifact (we should invalidate the super region of a field region, when the super region is a union).

So while having more reports could be annoying, if the root cause of FPs are not likely to be the less invalidation I think it would be worth to actually do less invalidation and have an annotation for cases where we actually need more. In my opinion, currently, when the aggressive invalidation hides some FPs where the root cause is something else, we are only giving ourselves a false sense of quality.

What do you think? Should we pursue this?




_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer] Toning down invalidations?

David Blaikie via cfe-dev
I was distracted and never got to actually do this, but i still think it's a good idea to try out. Your results look very promising, yay. I totally agree that systems of mutually-canceling bugs are worth untangling even if the amount of false positives temporarily increases.

P.S. A related issue - if i go for this, i'd probably start with relaxing the C++ container inlining heuristic, i.e. replacing it with visitor-based suppressions, so that to still enjoy the benefits of inlining.

P.P.S. Mildly related - i noticed that it shouldn't be all that hard to model extents of bindings within RegionStore, so that bindings to sub-structures didn't overwrite bindings to super-structures simply because they have the same base region and the same offset. The only problem here is to model extents of *integers* because we don't represent casts as part of SymbolRefs. All other sorts of SVals have well-defined extents (including, say, lazy compound values).

P.P.P.S. Not really related - just wanted to share an example of a curious false positive due to *lack* of invalidation that i've seen recently:

int test(int **x) {
  int *y = *x;

  if (*y == 0)
    invalidate(x);

  // should not warn
  return 1 / *y;
}


On 1/22/19 2:44 AM, Gábor Horváth wrote:
Hi!

What is the status of this?

On Mon, 27 Nov 2017 at 16:24, Artem Dergachev via cfe-dev <[hidden email]> wrote:

* Invalidating base region when a field is invalidated (as the whole
base region is reachable through safe pointer arithmetic).

I was doing a little experiment and for the projects I tested the number of results was either the same or increased. Sometimes when the number of results are increased I also lost some results which could be either due to different use of the time budget or ruling out more infeasible paths. The interesting part is, when I turn on refutation, for some projects we can also refute some bugs that we could not earlier.  See the exact numbers [1] and exact bugs [2].

After looking at the new results I found only one FP where the root cause was the less invalidation. But it was an implementation artifact (we should invalidate the super region of a field region, when the super region is a union).

So while having more reports could be annoying, if the root cause of FPs are not likely to be the less invalidation I think it would be worth to actually do less invalidation and have an annotation for cases where we actually need more. In my opinion, currently, when the aggressive invalidation hides some FPs where the root cause is something else, we are only giving ourselves a false sense of quality.

What do you think? Should we pursue this?


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev