[analyzer] Equivalent paths.

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[analyzer] Equivalent paths.

David Blaikie via cfe-dev
Just wanted to share/document one idea that was kinda in the air during
the dev meeting.

____________

First, well-known stuff to warm up. Suppose we have code:

     void foo(int x) {
         if (x > 100)
             crash();
     }

Static Analyzer's usual bug report will say something like "assuming x
is from 101 to 2147483647, foo(x) crashes". Would it be better to report
it as "assuming x is equal to 2018, this function crashes"?

   * On one hand, it's tempting because we've come up with a concrete
unit test that the user can add to his program: "foo(2018);". This
doesn't sound like much in this case, but it might be quite helpful as
the system of equations becomes more complicated (eg., x > 0, y > 0, x^3
+ y^3 = z^3 - this might have a non-trivial solution in finite-width
ints!) (unsigned ints, to prevent overflow UB).

   * On the other hand, what if the developer says "it is impossible to
call foo(2018) because it is only intended to be called with ascii
characters"? In this case the developer would not know that x='x' is
another valid unit test, because we only told him about x=2018, so he'll
think that it's a false positive and miss a real bug in his code.

This last reason kinda outweights the first reason, which is why we're
trying not to avoid reducing Static Analyzer reports to concrete
counterexamples: it's better to provide the developer with all the
information that we have. This might work for concolic testing, but it's
not a good idea for a purely static tool.

____________

Now, apply the same reasoning to bug report de-duplication. If we aren't
comfortable with reducing "[101, 2147483647]" to "2018", then why are we
comfortable with reducing the whole set of equivalent reports (that
represent the same bug found on different execution paths) to a single
"example report" (even the shortest one)? It might be that the shortest
warning is false due to a hidden contract within the program, but the
equivalence class contains other warnings that have feasible paths.

For example:

     void bar(bool x, bool y) {
         if (x) { ... }
         if (y) { ... }
         crash();
     }

A typical report produced by Static Analyzer would look like this:
"assuming x is false, assuming y is false, bar(x, y) crashes". This
allows the developer to dismiss the report as a false positive because
bar() is only intended to be called with at least one of x and y being true.

However, if bar() is small enough, Static Analyzer is likely to find the
crash on all four paths through bar(). And if this is the case, then
when we're picking a specific "example report" out of these four
equivalent reports, we're throwing away an important piece of knowledge
that we already have: knowledge that it is in fact *irrelevant* whether
x and y are true or false.

For that reason, adding "assuming"/"taking" pieces here and even showing
path notes within any of the two if()s is bad. We should be pruning them
much like we're pruning path notes within irrelevant inlined functions,
or maybe replace with a specific note message that it is irrelevant
which branch is taken (though i don't immediately see how it may be useful).

____________

So i believe that the future of the BugReporter and custom
BugReporterVisitors is to display path notes by exploring the whole
equivalence class of bug reports rather than picking a single example
report.

The equivalence class might be incomplete because StaticAnalyzer doesn't
give any guarantees of coverage. But this is fine: in the worst case,
when only one path of all the equivalence paths is found, we'll just get
the old behavior.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev