Iterator Checkers: Understanding Bindings

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Iterator Checkers: Understanding Bindings

Neil Nelson via cfe-dev

Hi,

 

I think I am on the good path to create iterator checkers for the Clang Static Analyzer. I already executed some experiments successfully, however to do that I needed to “hack” the infrastructure. Now the next step is to create a proper solution instead of the hacking which is unacceptable and would probably brake the system.

 

My first iterator checker is a very simple checker that tries to recognize cases where an iterator is dereferenced or incremented past its end. To do that I first catch the symbolic value of calls to end() functions that return an iterator type. At this point (checkPostCall) the return value is symbolic which is good. The next step is to post-check comparisons but here I get a temporary instead. The good news is that this temporary is default bound to the same symbol I got from the return value of end(). The bad news is that this default binding is buried and irretrievable in the current Static Analyzer core.

 

The key function to retrieve binding is RegionStoreManager::getBinding() in RegionStore.cpp. However, this function only retrieves direct bindings. I could not find too much documentation about the two types of bindings, I could only find out the default binding is used for structure members where the members are not direct bound but the structure itself has a default binding. I found nothing about what should happen when retrieving the binding for the structure itself where it only has a default binding.

 

The situation is even worse here. The name of function getBinding() is a bit misleading because it does not even retrieve a direct binding even if it exists. In case of structures or class types it creates a lazy compound value even if the binding exists. Why is this behavior? The lazy compound value then can be used to get back the original structure/class, thus the temporary. This is just a round trip and does not help to retrieve the binding, even if this would be a direct binding. After an assignment of the return value to a variable, checking the variable (e.g. as an operand of a pre- or postCall checker) a symbol is of course bound to the variable as well. However in this case the binding could be retrieved if it would be a direct binding, because in this case it is not considered a struct/class anymore but as a variable region.

 

Could somebody help me to understand why bindings behave this way and what to change to be able to retrieve these default bindings? The are essential for all iterator checkers which are needed by many projects.

 

Thank you for your cooperation in advance!

 

Regards,

 

Ádám

 


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Iterator Checkers: Understanding Bindings

Neil Nelson via cfe-dev
Hello Adam.

The main problem with iterators is that they may be a result of complex
inline-analyzed functions. Relying on the internal structure of returned
SVal doesn't seem to be a good idea. According to my experience, the
best approach here is just to consider it as something that has the
required type.

Note that working with iterators in CSA is complicated enough and I'm
not even pretend for the good knowledge. Just sharing some experience.
> I think I am on the good path to create iterator checkers for the Clang Static Analyzer. I already executed some experiments successfully, however to do that I needed to "hack" the infrastructure. Now the next step is to create a proper solution instead of the hacking which is unacceptable and would probably brake the system.
>
> My first iterator checker is a very simple checker that tries to recognize cases where an iterator is dereferenced or incremented past its end. To do that I first catch the symbolic value of calls to end() functions that return an iterator type. At this point (checkPostCall) the return value is symbolic which is good. The next step is to post-check comparisons but here I get a temporary instead. The good news is that this temporary is default bound to the same symbol I got from the return value of end(). The bad news is that this default binding is buried and irretrievable in the current Static Analyzer core.
You may need something like checkTemporaryValue() callback which is
implemented but not currently contributed to upstream (sorry for that).
It may be used to transmit information related to temporary values while
doing analysis in your checker before the related information is removed
from the state. You can check our github project and
BasicStringBoundChecker to see the implementation of this callback and
its usage. If it is what you need, we'll hurry with upstreaming.
https://github.com/haoNoQ/clang/blob/summary-ipa-draft/lib/StaticAnalyzer/Checkers/BasicStringBoundChecker.cpp
> The key function to retrieve binding is RegionStoreManager::getBinding() in RegionStore.cpp. However, this function only retrieves direct bindings. I could not find too much documentation about the two types of bindings, I could only find out the default binding is used for structure members where the members are not direct bound but the structure itself has a default binding. I found nothing about what should happen when retrieving the binding for the structure itself where it only has a default binding.
I'm not sure that getting a binding is something that you really need. I
have tried to operate with regions directly instead. Could you share
your work and give some test case? I think I don't fully understand what
is your question about.
>
> The situation is even worse here. The name of function getBinding() is a bit misleading because it does not even retrieve a direct binding even if it exists.
May be the binding _is_ LazyCompoundVal?
>   In case of structures or class types it creates a lazy compound value even if the binding exists. Why is this behavior? The lazy compound value then can be used to get back the original structure/class, thus the temporary. This is just a round trip and does not help to retrieve the binding, even if this would be a direct binding. After an assignment of the return value to a variable, checking the variable (e.g. as an operand of a pre- or postCall checker) a symbol is of course bound to the variable as well. However in this case the binding could be retrieved if it would be a direct binding, because in this case it is not considered a struct/class anymore but as a variable region.
>
> Could somebody help me to understand why bindings behave this way and what to change to be able to retrieve these default bindings? The are essential for all iterator checkers which are needed by many projects.
> Thank you for your cooperation in advance!
>
> Regards,
>
> Ádám

--
Best regards,
Aleksei Sidorin
Software Engineer,
IMSWL-IMCG, SRR, Samsung Electronics

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Iterator Checkers: Understanding Bindings

Neil Nelson via cfe-dev
RegionStore, as one of the possible implementations of the Store, does
not explicitly store every binding it can provide. For example, at the
beginning of the analysis, every variable's value is SymbolRegionValue
of this variable's region, however this value is not stored as a direct
binding - it is simply assumed to be there since there's no other
binding to override this assumption.

Same thing with LazyCompoundVal's - once you ask for an SVal to
represent a value of a structure-type region R, you get a
LazyCompoundVal, even if it's not bound to R directly. The original
symbol (returned by eg. in your case begin() or end()) represents the
original contents of the structure. However, later the structure may (or
may not) have its fields changed, that is, other values may (or may not)
be bound to sub-regions of R. Which means that the original symbol does
not necessarily describe the value bound to the structure-type region R.
LazyCompoundVal, on the other hand, always (regardless of other
bindings) describes the structural value correctly. Moreover, it can be
constructed without figuring out if there were sub-region bindings - it
just copies the whole store (which is immutable, and therefore extremely
cheap to copy). So the analyzer just goes ahead and always creates a
LazyCompoundVal, without looking if there are any new bindings.

I've seen cases when retrieving the original default symbol made sense
to me, so i'd agree that an API for that would be useful. However, from
above it should be obvious that it's not how the store should normally
operate; you start digging into implementation details here. This symbol
can be used for identifying the instance of the object (and all copies
of it), but the object could have completely changed since this symbol
was born. Because iterators don't seem to change through bindings, this
might be a way to go.

Hope this helps.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Iterator Checkers: Understanding Bindings

Neil Nelson via cfe-dev
Hello,

Thank you for your quick reply (both Artem and Aleksei). It was very useful. Now I understand that there are two very different needs. The first need is what most checkers require and the Static Analyzer currently satisfies is that structures are analyzed deeply. However, what we need here is to handle some structures in our checker as opaque objects. I think iterator checkers are not the only one with such need, some checkers, e.g. STL, Boost or some project-specific checkers to be developed in the future may have similar needs.

If I understand it correctly, we need to implement a new function in RegionStoreManager that retrieves the raw binding for an SVal. This function must also be declared in StoreManager class as a pure virtual function. This function must be able to retrieve default bindings as well. Should it try to retrieve a direct binding first? And how to call that function? getRawBinding()? Or maybe getBindingForOpaqVal()?

Regards,

Ádám

-----Original Message-----
From: cfe-dev [mailto:[hidden email]] On Behalf Of Artem Dergachev via cfe-dev
Sent: 2016. május 12. csütörtök 13:09
To: [hidden email]
Subject: Re: [cfe-dev] Iterator Checkers: Understanding Bindings

RegionStore, as one of the possible implementations of the Store, does not explicitly store every binding it can provide. For example, at the beginning of the analysis, every variable's value is SymbolRegionValue of this variable's region, however this value is not stored as a direct binding - it is simply assumed to be there since there's no other binding to override this assumption.

Same thing with LazyCompoundVal's - once you ask for an SVal to represent a value of a structure-type region R, you get a LazyCompoundVal, even if it's not bound to R directly. The original symbol (returned by eg. in your case begin() or end()) represents the original contents of the structure. However, later the structure may (or may not) have its fields changed, that is, other values may (or may not) be bound to sub-regions of R. Which means that the original symbol does not necessarily describe the value bound to the structure-type region R.
LazyCompoundVal, on the other hand, always (regardless of other
bindings) describes the structural value correctly. Moreover, it can be constructed without figuring out if there were sub-region bindings - it just copies the whole store (which is immutable, and therefore extremely cheap to copy). So the analyzer just goes ahead and always creates a LazyCompoundVal, without looking if there are any new bindings.

I've seen cases when retrieving the original default symbol made sense to me, so i'd agree that an API for that would be useful. However, from above it should be obvious that it's not how the store should normally operate; you start digging into implementation details here. This symbol can be used for identifying the instance of the object (and all copies of it), but the object could have completely changed since this symbol was born. Because iterators don't seem to change through bindings, this might be a way to go.

Hope this helps.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Iterator Checkers: Understanding Bindings

Neil Nelson via cfe-dev
 > If I understand it correctly, we need to implement a new function in
 > RegionStoreManager that retrieves the raw binding for an SVal.
 > This function must also be declared in StoreManager class as a pure
 > virtual function. This function must be able to retrieve default bindings
 > as well. Should it try to retrieve a direct binding first? And how to
call
 > that function? getRawBinding()? Or maybe getBindingForOpaqVal()?

Hmm. As long as the contents of the structure remain unchanged,
LazyCompoundVal retains its original region (it is simply copied around
as an value). Have a look at this quick debug.ExprInspection -based test:


   struct S {
     int z;
   };

   S conjure_S();

   void test_6() {
     S s1 = conjure_S();
     S s2 = s1;
     // lazily frozen compound value of temporary object constructed at
statement 'conjure_S()'
     clang_analyzer_explain(s2);
     s2.z = 3;
     // lazily frozen compound value of local variable 's2'
     clang_analyzer_explain(s2);
   }


Maybe it'd be a good idea to try to identify iterators by that region.
You'd probably need to able to evalCall() their methods (and probably a
few other functions) in order to avoid corrupting their contents through
invalidation. On methods like operator++, which are bound to invalidate
contents, you can still transfer your id to the new region.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Iterator Checkers: Understanding Bindings

Neil Nelson via cfe-dev
Hello,

Contents of an iterator (a class or structure) do change. However, we are not interested in the internal changes, for us an iterator is a black box. We track some of their operations and record their state (which is checker dependent) in the execution state. This is almost exactly the same as SimpleStreamChecker does with streams, the only difference is that streams are pointers and iterators are structures. I am sure that iterator checkers are not the only ones that require an API for tracking structures/classes as black boxes. There could be some other STL, Boost or project specific library checkers as well.

evalCall() is not an option since it forces all iterator checking into one single checker which is inconvenient. Furthermore, instead of unreliable hacks with regions we need a proper solution. I wonder what the core guys would say about such a new function what I proposed.

Regards,

Ádám


-----Original Message-----
From: cfe-dev [mailto:[hidden email]] On Behalf Of Artem Dergachev via cfe-dev
Sent: 2016. május 12. csütörtök 17:35
To: [hidden email]
Subject: Re: [cfe-dev] Iterator Checkers: Understanding Bindings

 > If I understand it correctly, we need to implement a new function in  > RegionStoreManager that retrieves the raw binding for an SVal.
 > This function must also be declared in StoreManager class as a pure  > virtual function. This function must be able to retrieve default bindings  > as well. Should it try to retrieve a direct binding first? And how to call  > that function? getRawBinding()? Or maybe getBindingForOpaqVal()?

Hmm. As long as the contents of the structure remain unchanged, LazyCompoundVal retains its original region (it is simply copied around as an value). Have a look at this quick debug.ExprInspection -based test:


   struct S {
     int z;
   };

   S conjure_S();

   void test_6() {
     S s1 = conjure_S();
     S s2 = s1;
     // lazily frozen compound value of temporary object constructed at statement 'conjure_S()'
     clang_analyzer_explain(s2);
     s2.z = 3;
     // lazily frozen compound value of local variable 's2'
     clang_analyzer_explain(s2);
   }


Maybe it'd be a good idea to try to identify iterators by that region.
You'd probably need to able to evalCall() their methods (and probably a
few other functions) in order to avoid corrupting their contents through
invalidation. On methods like operator++, which are bound to invalidate
contents, you can still transfer your id to the new region.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Iterator Checkers: Understanding Bindings

Neil Nelson via cfe-dev
I agree with pretty much all of your points.

I think a proper solution would also include cases when the object was
not conjured by a function. Eg., how do we identify `c' in the following
code?:

   class C {...};
   void foo() {
     C c;
   }

In this case `c' doesn't have a default-bound symbol. The region
approach works though; but it has other problems.

I wish we had a way to assign identifier-like symbols to object
instances (perhaps in an on-demand manner), and probably construct
symbolic-ID expressions for identifiers of objects produced from other
objects (eg. SymbolID, SymbolCopiedID<SymbolID>, etc.).

I'm still not instantly sure how to implement that, despite thinking
about it quite a bit. I'd also ask more clever guys to share ideas (:

P.S. Imho, evalCall problem is actually about sharing checker state.
There's no problem with having evalCall() staying in only one checker if
we share its results with other checkers.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev