[StaticAnalysis] Determine dereference values

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[StaticAnalysis] Determine dereference values

Sumner, Brian via cfe-dev
Hello

We are looking into using the clang front-end for static analysis.

The goal is to find memory accesses on the source code level whose
addresses can be statically determined or constrained. This should work
across functions and even translation units.

Example:
main.c:
     int main() {
       for (int i = 0; i < 4; i++)
         access(((int*)0x1234) + i);  // pass 0x1234, 0x1238, 0x123c, 0x1240
       access(*(int**)0x4444);  // pass statically unknown value
     }
other.c:
     void access(int* p) {
       // Want output: read at addr
(0x1634|0x1638|0x163c|0x1640|unknown) from clang::Expr*.
       ((volatile int*)p)[0x100];
     }

The clang StaticAnalysis library does a lot of the work we are
interested in. That is, determining what values an expression is
constrained to, while understanding stores, loads and running a symbolic
execution engine.

How scalable is this approach? Even though we would require inter-TU
analysis, the problem could be reduced by only looking at accesses that
have the volatile qualifier since we are looking at hardware accesses of
a bare-metal program. Some retries without inlining are fine, because we
assume the accesses are not separated by the constant with significant
complexity in between.

Will this be decently reliable? We are interested in cases where a
constant is dragged across a couple of low bounded loops with a bit of
arithmetic. What are typical cases where the engine gives up because of
exploding complexity? I have found that loops are explored in a very
limited scope. Is there an easy way to relax these limits a bit at the
cost of much higher execution time?

I noticed the engine does not take the value of a file scoped constant
pointer "T* const" into account. Is there a technical limitation that
prevents doing this?

I also tried to hack a bit on the DereferenceChecker and DivZeroChecker
to try and get the symbolic or even concrete value of a Loc, but only
got the initialized value and not the value it should be at the
dereference. When plotting a graph from a source that does basic
arithmetic on a pointer, the expression value never changes. It seems to
me that symbolic values of Locs are not fully tracked. Is this true and
is there a way to fully track them?

A backwards data-flow analysis on IR level is probably a more reasonable
approach in general, but getting the exact clang::Expr that does the
access is valuable to us.

Overall, is this problem reasonably solvable with clang static analysis?
Any feedback is greatly appreciated!

Best Regards
Rafael

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[StaticAnalysis] Determine dereference values

Sumner, Brian via cfe-dev
Thank you for the reply. I will check those out.

The FixedAddress checker is definitely a good start too, but we are more
interested in the actual dereferences and compare the deduced SVals to
known address ranges.

What I meant is that for example in the following it is expected to
point out a null dereference bug:

int main()
{
     int* p = (int*)sizeof(int);
     p -= 1;
     return *p;
}

I just finished debugging the issue and found the implementation of
pointer arithmetic in SimpleSValBuilder::evalBinOpLN was missing some
logic. The "Multiplicand" was not initialized and therefore always zero.
The following fix was doing it for me. Should I copy this to the commits
list or can someone take a look at it here?

diff --git a/SimpleSValBuilder.cpp b/SimpleSValBuilder_fix.cpp
index f09f969..da31fc0 100644
--- a/SimpleSValBuilder.cpp
+++ b/SimpleSValBuilder_fix.cpp
@@ -927,6 +927,8 @@ SVal SimpleSValBuilder::evalBinOpLN(ProgramStateRef
state,

        // Offset the increment by the pointer size.
        llvm::APSInt Multiplicand(rightI.getBitWidth(), /* isUnsigned */
true);
+      QualType PteeTy =
resultTy.getTypePtr()->castAs<PointerType>()->getPointeeType();
+      Multiplicand = getContext().getTypeSizeInChars(PteeTy).getQuantity();
        rightI *= Multiplicand;

        // Compute the adjusted pointer.


On 28/07/17 16:08, Artem Dergachev wrote:

> First of all, there's already an experimental alpha.core.FixedAddress
> that seems to be doing what you want, in a relatively easy manner. I
> cannot guarantee that it actually works well, but it did seem to work
> somehow last time i looked at it.
>
> Regarding execution path coverage:
>
> - In general, yeah, the engine does not guarantee it wouldn't give up
> when it sees a lot of loops or other execution path splits. Analysis
> across translation units is being worked on in
> https://reviews.llvm.org/D30691 but at the cost of even more giving
> up, so it'd show up cross-module issues at the cost of randomly hiding
> intra-module issues that would otherwise be seen (so if you're all
> about coverage, you'd probably want to use both analyses in two runs).
>
> - If you expect many loops with relatively small fixed numbers of
> iterations, have a look at the ongoing GSoC project which, in
> particular, introduces an option to unroll such loops completely even
> if otherwise the analyzer would have given up
> (https://reviews.llvm.org/D34260 is already available in master; see
> other patches by Peter as well); other tweaks to give up in a less
> fatal manner when the loop is infinite ("loop widening") are also
> planned.
>
> - There are many existing options to control the "giving up" behavior
> under -analyzer-config, you may have to read AnalyzerOptions.cpp to
> learn them, they aren't very well-documented, unfortunately. They
> control loops and inter-procedural analysis.
>
> > I noticed the engine does not take the value of a file scoped
> constant pointer "T* const" into account.
>
> Reproduced. Unimplemented, i guess, but shouldn't be hard.
>
> > It seems to me that symbolic values of Locs are not fully tracked.
> Is this true and is there a way to fully track them?
>
> I don't think i fully understand, could you give an example?
>
>
> On 7/27/17 6:51 PM, Rafael┬ĚStahl via cfe-dev wrote:
>> Hello
>>
>> We are looking into using the clang front-end for static analysis.
>>
>> The goal is to find memory accesses on the source code level whose
>> addresses can be statically determined or constrained. This should
>> work across functions and even translation units.
>>
>> Example:
>> main.c:
>>     int main() {
>>       for (int i = 0; i < 4; i++)
>>         access(((int*)0x1234) + i);  // pass 0x1234, 0x1238, 0x123c,
>> 0x1240
>>       access(*(int**)0x4444);  // pass statically unknown value
>>     }
>> other.c:
>>     void access(int* p) {
>>       // Want output: read at addr
>> (0x1634|0x1638|0x163c|0x1640|unknown) from clang::Expr*.
>>       ((volatile int*)p)[0x100];
>>     }
>>
>> The clang StaticAnalysis library does a lot of the work we are
>> interested in. That is, determining what values an expression is
>> constrained to, while understanding stores, loads and running a
>> symbolic execution engine.
>>
>> How scalable is this approach? Even though we would require inter-TU
>> analysis, the problem could be reduced by only looking at accesses
>> that have the volatile qualifier since we are looking at hardware
>> accesses of a bare-metal program. Some retries without inlining are
>> fine, because we assume the accesses are not separated by the
>> constant with significant complexity in between.
>>
>> Will this be decently reliable? We are interested in cases where a
>> constant is dragged across a couple of low bounded loops with a bit
>> of arithmetic. What are typical cases where the engine gives up
>> because of exploding complexity? I have found that loops are explored
>> in a very limited scope. Is there an easy way to relax these limits a
>> bit at the cost of much higher execution time?
>>
>> I noticed the engine does not take the value of a file scoped
>> constant pointer "T* const" into account. Is there a technical
>> limitation that prevents doing this?
>>
>> I also tried to hack a bit on the DereferenceChecker and
>> DivZeroChecker to try and get the symbolic or even concrete value of
>> a Loc, but only got the initialized value and not the value it should
>> be at the dereference. When plotting a graph from a source that does
>> basic arithmetic on a pointer, the expression value never changes. It
>> seems to me that symbolic values of Locs are not fully tracked. Is
>> this true and is there a way to fully track them?
>>
>> A backwards data-flow analysis on IR level is probably a more
>> reasonable approach in general, but getting the exact clang::Expr
>> that does the access is valuable to us.
>>
>> Overall, is this problem reasonably solvable with clang static
>> analysis? Any feedback is greatly appreciated!
>>
>> Best Regards
>> Rafael
>>
>> _______________________________________________
>> cfe-dev mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Loading...