Handling of loops in the Clang Static Analyzer

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Handling of loops in the Clang Static Analyzer

Eric Fiselier via cfe-dev
Hi,

I am re-sending the question I asked under a different thread so that the subject is more relevant to the topic.

I did not quite realize it earlier but it seems that the static analyzer unrolls a loop up to a certain number of times and then stops exploring paths beyond that. In the checker I have written, I get the message "Block count exceeded" and then state exploration stops. As a result, my checker give false positives and does not achieve what it sets out to do.

I understand that handling loops is a difficult problem, but is there a work-around available, perhaps, even some source annotation from the user?

Also, would it not be possible to re-start exploration after the loop if there is some code independent of the loop computation. I understand that this independent flow would eventually merge with the computation in the loop in some manner (otherwise the loop computation would probably have been "dead"), but, even then, it may be possible to so some useful analysis with the "independent" code.

Finally, I understand that the checkers have been used with "realistic" test cases. I am curious, how did they work if the test cases had loops in them?

Thanks.

Regards,
Venugopal Raghavan.

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Handling of loops in the Clang Static Analyzer

Eric Fiselier via cfe-dev

Hello!

> I did not quite realize it earlier but it seems that the static analyzer unrolls a loop up to a certain number of times and then stops exploring paths beyond that.

Yes that is how it is working. It is not perfect.

> I understand that handling loops is a difficult problem, but is there a work-around available, perhaps, even some source annotation from the user?

Perhaps there are work-arounds, I know that people have looked at this.

Maybe -analyzer-max-loop is interesting. It tells Clang analyzer how many times you want to go through loops. The bigger value the better analysis, but slower analysis.

    clang -cc1 -analyze -analyzer-checker=alpha,core -analyzer-max-loop 10000 test1.c

Best regards,
Daniel Marjamäki

..................................................................................................................

Daniel Marjamäki Senior Engineer

Evidente ES East AB  Warfvinges väg 34  SE-112 51 Stockholm  Sweden

 

Mobile:                 +46 (0)709 12 42 62

E-mail:                 [hidden email][hidden email]                      

 

www.evidente.se


From: cfe-dev [[hidden email]] on behalf of Venugopal Raghavan via cfe-dev [[hidden email]]
Sent: 24 February 2017 04:07
To: [hidden email]
Subject: [cfe-dev] Handling of loops in the Clang Static Analyzer

Hi,

I am re-sending the question I asked under a different thread so that the subject is more relevant to the topic.

I did not quite realize it earlier but it seems that the static analyzer unrolls a loop up to a certain number of times and then stops exploring paths beyond that. In the checker I have written, I get the message "Block count exceeded" and then state exploration stops. As a result, my checker give false positives and does not achieve what it sets out to do.

I understand that handling loops is a difficult problem, but is there a work-around available, perhaps, even some source annotation from the user?

Also, would it not be possible to re-start exploration after the loop if there is some code independent of the loop computation. I understand that this independent flow would eventually merge with the computation in the loop in some manner (otherwise the loop computation would probably have been "dead"), but, even then, it may be possible to so some useful analysis with the "independent" code.

Finally, I understand that the checkers have been used with "realistic" test cases. I am curious, how did they work if the test cases had loops in them?

Thanks.

Regards,
Venugopal Raghavan.

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Handling of loops in the Clang Static Analyzer

Eric Fiselier via cfe-dev
In reply to this post by Eric Fiselier via cfe-dev
Hi Venugopal,

> On Feb 23, 2017, at 7:07 PM, Venugopal Raghavan via cfe-dev <[hidden email]> wrote:
>
> Hi,
>
> I am re-sending the question I asked under a different thread so that the subject is more relevant to the topic.
>
> I did not quite realize it earlier but it seems that the static analyzer unrolls a loop up to a certain number of times and then stops exploring paths beyond that. In the checker I have written, I get the message "Block count exceeded" and then state exploration stops. As a result, my checker give false positives and does not achieve what it sets out to do.

The analyzer makes no promises about exhaustivity and in many cases will drop flows and stop path exploration. If your analysis depends on full path exploration to prevent false positives (as compared to false negatives) then it is going to be an uphill battle to eliminate false positives.

As you noted, for concrete-bound loops stopping path exploration can be particularly pernicious because after unrolling the loop N times the analyze simply stops. Any code dominated by the loop exit will simply not be explored, leading to false negatives.

Sean Eveson (cc’d) did some initial work on loop widening to mitigate this problem. The basic idea there was rather than simply stopping, the analyzer would “forget” any specific information about a particular iteration through the loop and proceed analyzing after the loop without any assumptions about how many times the loop was unrolled. This would lose some precision but gain coverage for code after the loop. This feature not complete and is off by default, but you can see the beginnings of it at <https://reviews.llvm.org/D12358>

Devin

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Handling of loops in the Clang Static Analyzer

Eric Fiselier via cfe-dev
Hi Venugopal,

Sean Eveson (cc’d) did some initial work on loop widening to mitigate this problem.

I started to work on this, but have unfortunately not had time to take the next steps. There is a mode which does 'loop widening' which is off by default as the state after the loop is completely cleared which can lead to false positives. The idea is to improve on the widening, by changing it to only clear state that might be affected by the loop.

To enable the loop widening pass: "-analyzer-config widen-loops=true".

Maybe -analyzer-max-loop is interesting. It tells Clang analyzer how many times you want to go through loops. The bigger value the better analysis, but slower analysis.

An issue you will run in to when changing analyzer-max-loop is that it also affects variable-bound loops. If you set the max loop higher to work around your problem you will see some slow down for the concrete-bound loops, but you will see a lot more for the variable-bound ones.

Example:
// Where the value of j is unknown.
for (i = 0; i < j; ++i) {
  // ...
}
foo(i);

As far as I remember, with a max loop of 1000 and no other limits on the analyzer, `foo` will be called for all the values between 0..~1000. This is because the analyzer will branch each time it reaches the loop condition.

Regards,
Sean Eveson

Sean Eveson
SN Systems - Sony Computer Entertainment Group

On Fri, Feb 24, 2017 at 7:54 PM, Devin Coughlin <[hidden email]> wrote:
Hi Venugopal,

> On Feb 23, 2017, at 7:07 PM, Venugopal Raghavan via cfe-dev <[hidden email]> wrote:
>
> Hi,
>
> I am re-sending the question I asked under a different thread so that the subject is more relevant to the topic.
>
> I did not quite realize it earlier but it seems that the static analyzer unrolls a loop up to a certain number of times and then stops exploring paths beyond that. In the checker I have written, I get the message "Block count exceeded" and then state exploration stops. As a result, my checker give false positives and does not achieve what it sets out to do.

The analyzer makes no promises about exhaustivity and in many cases will drop flows and stop path exploration. If your analysis depends on full path exploration to prevent false positives (as compared to false negatives) then it is going to be an uphill battle to eliminate false positives.

As you noted, for concrete-bound loops stopping path exploration can be particularly pernicious because after unrolling the loop N times the analyze simply stops. Any code dominated by the loop exit will simply not be explored, leading to false negatives.

Sean Eveson (cc’d) did some initial work on loop widening to mitigate this problem. The basic idea there was rather than simply stopping, the analyzer would “forget” any specific information about a particular iteration through the loop and proceed analyzing after the loop without any assumptions about how many times the loop was unrolled. This would lose some precision but gain coverage for code after the loop. This feature not complete and is off by default, but you can see the beginnings of it at <https://reviews.llvm.org/D12358>

Devin



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Handling of loops in the Clang Static Analyzer

Eric Fiselier via cfe-dev
Hi,

Many thanks for all your replies. I am trying to see how best I can change my checker/or the way I run my checker taking into account your suggestions. However, I am afraid, I have one more question. Consider the following test case:

test(int q, int z) {
   int b;
   int x;
   int p;

   if (z > 0) {        // S1
     p = q + 1;
     x = p + 1;
   }

   if (x <= (q + 1)) {   // S2
     b = 0;                 // S3
   }
   else {
     b = 1;
   }
   printf("%d\n", b);
}

If (z > 0) is true in statement S1 above, x gets the value q + 2 (propagating through p). In the exploded graph, I do see this being the case. Hence, shouldn't the constaint solver recognize the fact that the condition (x <= (q + 1)) in statement S2 is always false? However, in the exploded graph, I do not see this condition and the assignment of 0 to b in S3 getting pruned away when I follow the path in which (z > 0) is true. Can't the constraint solver handle this case? Or have I goofed up in my reasoning with the example (hopefully, my reasoning about the mutual exclusivity of the statements holds true irrespective of the signedness etc.)

Thanks.

Regards,
Venu.


On Mon, Feb 27, 2017 at 4:48 PM, Sean Eveson <[hidden email]> wrote:
Hi Venugopal,

Sean Eveson (cc’d) did some initial work on loop widening to mitigate this problem.

I started to work on this, but have unfortunately not had time to take the next steps. There is a mode which does 'loop widening' which is off by default as the state after the loop is completely cleared which can lead to false positives. The idea is to improve on the widening, by changing it to only clear state that might be affected by the loop.

To enable the loop widening pass: "-analyzer-config widen-loops=true".

Maybe -analyzer-max-loop is interesting. It tells Clang analyzer how many times you want to go through loops. The bigger value the better analysis, but slower analysis.

An issue you will run in to when changing analyzer-max-loop is that it also affects variable-bound loops. If you set the max loop higher to work around your problem you will see some slow down for the concrete-bound loops, but you will see a lot more for the variable-bound ones.

Example:
// Where the value of j is unknown.
for (i = 0; i < j; ++i) {
  // ...
}
foo(i);

As far as I remember, with a max loop of 1000 and no other limits on the analyzer, `foo` will be called for all the values between 0..~1000. This is because the analyzer will branch each time it reaches the loop condition.

Regards,
Sean Eveson

Sean Eveson
SN Systems - Sony Computer Entertainment Group

On Fri, Feb 24, 2017 at 7:54 PM, Devin Coughlin <[hidden email]> wrote:
Hi Venugopal,

> On Feb 23, 2017, at 7:07 PM, Venugopal Raghavan via cfe-dev <[hidden email]> wrote:
>
> Hi,
>
> I am re-sending the question I asked under a different thread so that the subject is more relevant to the topic.
>
> I did not quite realize it earlier but it seems that the static analyzer unrolls a loop up to a certain number of times and then stops exploring paths beyond that. In the checker I have written, I get the message "Block count exceeded" and then state exploration stops. As a result, my checker give false positives and does not achieve what it sets out to do.

The analyzer makes no promises about exhaustivity and in many cases will drop flows and stop path exploration. If your analysis depends on full path exploration to prevent false positives (as compared to false negatives) then it is going to be an uphill battle to eliminate false positives.

As you noted, for concrete-bound loops stopping path exploration can be particularly pernicious because after unrolling the loop N times the analyze simply stops. Any code dominated by the loop exit will simply not be explored, leading to false negatives.

Sean Eveson (cc’d) did some initial work on loop widening to mitigate this problem. The basic idea there was rather than simply stopping, the analyzer would “forget” any specific information about a particular iteration through the loop and proceed analyzing after the loop without any assumptions about how many times the loop was unrolled. This would lose some precision but gain coverage for code after the loop. This feature not complete and is off by default, but you can see the beginnings of it at <https://reviews.llvm.org/D12358>

Devin




_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Handling of loops in the Clang Static Analyzer

Eric Fiselier via cfe-dev


On 2/28/17 9:44 AM, Venugopal Raghavan via cfe-dev wrote:
> Hi,
>
> Many thanks for all your replies. I am trying to see how best I can
> change my checker/or the way I run my checker taking into account your
> suggestions.
I'd add to this that ideally, the current approach to loops shouldn't
lead to false positives, because all paths explored this way are still
valid, assuming everything else works fine. When something else goes
wrong as well, such as loop condition being improperly modeled (eg. we
loop from 0 to collection.size() and don't realize that the size is the
same in two subsequent loops), then it gets worse. But it'd get worse
without loops for the same reason.

If you're trying to solve an all-path problem and rely on full coverage,
then you'd have more luck with regular data flow analysis (see
DeadStoresChecker), or if you still want to use symbolic execution for
that, you'd need to detect lost coverage in the checkEndAnalysis
callback (eg. if ExprEngine has work remaining) and suppress all
warnings obtained during this analysis if the coverage was lost (i don't
think i have examples, this trick wasn't explored much, but i suspect it
could work).

> However, I am afraid, I have one more question. Consider the following
> test case:
>
> test(int q, int z) {
>    int b;
>    int x;
>    int p;
>
>    if (z > 0) {        // S1
>      p = q + 1;
>      x = p + 1;
>    }
>
>    if (x <= (q + 1)) {   // S2
>      b = 0;                 // S3
>    }
>    else {
>      b = 1;
>    }
>    printf("%d\n", b);
> }
>
> If (z > 0) is true in statement S1 above, x gets the value q + 2
> (propagating through p). In the exploded graph, I do see this being
> the case. Hence, shouldn't the constaint solver recognize the fact
> that the condition (x <= (q + 1)) in statement S2 is always false?
Well, technically not, because ((q + 2) <= (q + 1)) is true when q ==
INT_MAX - 1. However, i also suspect that the constraint solver may fail
to recognize both your and my point: our constraint solver is too simple
(for the sake of being fast) to simplify even that much (see also the
recent work on adding Z3 to us - the reviews around
https://reviews.llvm.org/D28952 ).

> However, in the exploded graph, I do not see this condition and the
> assignment of 0 to b in S3 getting pruned away when I follow the path
> in which (z > 0) is true. Can't the constraint solver handle this
> case? Or have I goofed up in my reasoning with the example (hopefully,
> my reasoning about the mutual exclusivity of the statements holds true
> irrespective of the signedness etc.)
>
> Thanks.
>
> Regards,
> Venu.
>
>
> On Mon, Feb 27, 2017 at 4:48 PM, Sean Eveson <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hi Venugopal,
>
>     > Sean Eveson (cc’d) did some initial work on loop widening to
>     mitigate this problem.
>
>     I started to work on this, but have unfortunately not had time to
>     take the next steps. There is a mode which does 'loop widening'
>     which is off by default as the state after the loop is
>     completely cleared which can lead to false positives. The idea is
>     to improve on the widening, by changing it to only clear state
>     that might be affected by the loop.
>
>     To enable the loop widening pass: "-analyzer-config widen-loops=true".
>
>     >Maybe -analyzer-max-loop is interesting. It tells Clang analyzer
>     how many times you want to go through loops. The bigger value the
>     better analysis, but slower analysis.
>
>     An issue you will run in to when changing analyzer-max-loop is
>     that it also affects variable-bound loops. If you set the max loop
>     higher to work around your problem you will see some slow down for
>     the concrete-bound loops, but you will see a lot more for the
>     variable-bound ones.
>
>     Example:
>     // Where the value of j is unknown.
>     for (i = 0; i < j; ++i) {
>       // ...
>     }
>     foo(i);
>
>     As far as I remember, with a max loop of 1000 and no other limits
>     on the analyzer, `foo` will be called for all the values between
>     0..~1000. This is because the analyzer will branch each time it
>     reaches the loop condition.
>
>     Regards,
>     Sean Eveson
>
>     Sean Eveson
>     SN Systems - Sony Computer Entertainment Group
>
>     On Fri, Feb 24, 2017 at 7:54 PM, Devin Coughlin
>     <[hidden email] <mailto:[hidden email]>> wrote:
>
>         Hi Venugopal,
>
>         > On Feb 23, 2017, at 7:07 PM, Venugopal Raghavan via cfe-dev
>         <[hidden email] <mailto:[hidden email]>> wrote:
>         >
>         > Hi,
>         >
>         > I am re-sending the question I asked under a different
>         thread so that the subject is more relevant to the topic.
>         >
>         > I did not quite realize it earlier but it seems that the
>         static analyzer unrolls a loop up to a certain number of times
>         and then stops exploring paths beyond that. In the checker I
>         have written, I get the message "Block count exceeded" and
>         then state exploration stops. As a result, my checker give
>         false positives and does not achieve what it sets out to do.
>
>         The analyzer makes no promises about exhaustivity and in many
>         cases will drop flows and stop path exploration. If your
>         analysis depends on full path exploration to prevent false
>         positives (as compared to false negatives) then it is going to
>         be an uphill battle to eliminate false positives.
>
>         As you noted, for concrete-bound loops stopping path
>         exploration can be particularly pernicious because after
>         unrolling the loop N times the analyze simply stops. Any code
>         dominated by the loop exit will simply not be explored,
>         leading to false negatives.
>
>         Sean Eveson (cc’d) did some initial work on loop widening to
>         mitigate this problem. The basic idea there was rather than
>         simply stopping, the analyzer would “forget” any specific
>         information about a particular iteration through the loop and
>         proceed analyzing after the loop without any assumptions about
>         how many times the loop was unrolled. This would lose some
>         precision but gain coverage for code after the loop. This
>         feature not complete and is off by default, but you can see
>         the beginnings of it at <https://reviews.llvm.org/D12358
>         <https://reviews.llvm.org/D12358>>
>
>         Devin
>
>
>
>
>
> _______________________________________________ cfe-dev mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev