[analyzer] Toning down invalidations?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[analyzer] Toning down invalidations?

Boris Kolpackov via cfe-dev
So far we've been preferring aggressive invalidation of large chunks of
the program state whenever we encounter anything we cannot model, be it
a function without source code or a language construct we don't support.
As i've mentioned recently, it does both eliminate certain kinds of
false positives (when we're uncertain we assume less incorrect stuff)
and introduce other false positives (when we're uncertain, we may think
that something might be possible, while in fact it isn't). Hence,
invalidation is a trade-off. Hence the question: does everybody like the
current trade-off? I'm particularly thinking about two parts of it:

* Invalidating base region when a field is invalidated (as the whole
base region is reachable through safe pointer arithmetic).
* Invalidating heap regions passed to us by const pointers when heap is
invalidated (as a non-const pointer may exist elsewhere).

Please let us know if you have other cases you'd like to think about :)

---

A bit of an explanation/motivation here.

Invalidation alone doesn't cause false positives all by itself; it only
amplifies false positives that result from our eager approach to state
splits, which causes us to sometimes produce more state splits than
necessary. For instance, in

   // example 1
   if (x) { ... }
   invalidate(&x);
   if (x) { ... }

we may assume that the first if() takes the true branch, and the second
if() takes the false branch. If the invalidation is overly aggressive,
and in fact "x" is untouched, then this path is infeasible. However, the
same problem arises in the following code that doesn't contain any
invalidation:

   // example 2
   if (x) { ... }
   if (y) { ... }

By looking at this code, we can guarantee that both "x is true" and "x
is false" is feasible, otherwise if(x) is deadcode. We can also
guarantee that both "y is true" and "y is false" is feasible, otherwise
if(y) is deadcode. But we cannot guarantee that "x is true and y is
false" is a feasible branch just by looking at the code, hence this path
might be a false positive, and as such can be suppressed by asserting
that "x" is false in the "y is false" branch.

Example 1 is a variant of example 2 in which symbol "y" is replaced with
symbol "invalidated x". Similarly, we may run into an infeasible branch
"original x is true, invalidated x is false". However, depending on how
obvious it is to the reader that this branch is infeasible (eg.
invalidate() certainly doesn't touch "x", or it may even be invisible),
it may be very frustrating to suppress the false positive with an assertion.

Hence the call for heuristics to see if some of those frustrating
invalidations can be toned down.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer] Toning down invalidations?

Boris Kolpackov via cfe-dev


On 27 November 2017 at 16:24, Artem Dergachev via cfe-dev <[hidden email]> wrote:
So far we've been preferring aggressive invalidation of large chunks of the program state whenever we encounter anything we cannot model, be it a function without source code or a language construct we don't support. As i've mentioned recently, it does both eliminate certain kinds of false positives (when we're uncertain we assume less incorrect stuff) and introduce other false positives (when we're uncertain, we may think that something might be possible, while in fact it isn't). Hence, invalidation is a trade-off. Hence the question: does everybody like the current trade-off? I'm particularly thinking about two parts of it:

* Invalidating base region when a field is invalidated (as the whole base region is reachable through safe pointer arithmetic).

I think even if we want to be overly conservative this can be mitigated once the code is modular, since this should only possible if the definition of the class is available in the translation unit where the called function is defined. But since most of the functions are well behaved in a sense they won't touch anything other than the field I think it would be a good default to not to invalidate the whole region and introduce an annotation to mark functions that do such pointer arithmetic to suppress false positives resulting from the lack of invalidation. But I would expect those cases to be very rare (although this only based on intuition I did not do any esearch yet).
 
* Invalidating heap regions passed to us by const pointers when heap is invalidated (as a non-const pointer may exist elsewhere).

What events trigger heap invalidation? Can this effect be mitigated by a conservative points to analysis?
 

Please let us know if you have other cases you'd like to think about :)

---

A bit of an explanation/motivation here.

Invalidation alone doesn't cause false positives all by itself; it only amplifies false positives that result from our eager approach to state splits, which causes us to sometimes produce more state splits than necessary. For instance, in

  // example 1
  if (x) { ... }
  invalidate(&x);
  if (x) { ... }

we may assume that the first if() takes the true branch, and the second if() takes the false branch. If the invalidation is overly aggressive, and in fact "x" is untouched, then this path is infeasible. However, the same problem arises in the following code that doesn't contain any invalidation:

  // example 2
  if (x) { ... }
  if (y) { ... }

By looking at this code, we can guarantee that both "x is true" and "x is false" is feasible, otherwise if(x) is deadcode. We can also guarantee that both "y is true" and "y is false" is feasible, otherwise if(y) is deadcode. But we cannot guarantee that "x is true and y is false" is a feasible branch just by looking at the code, hence this path might be a false positive, and as such can be suppressed by asserting that "x" is false in the "y is false" branch.

Example 1 is a variant of example 2 in which symbol "y" is replaced with symbol "invalidated x". Similarly, we may run into an infeasible branch "original x is true, invalidated x is false". However, depending on how obvious it is to the reader that this branch is infeasible (eg. invalidate() certainly doesn't touch "x", or it may even be invisible), it may be very frustrating to suppress the false positive with an assertion.

Hence the call for heuristics to see if some of those frustrating invalidations can be toned down.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev