Symbolic Extents

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Symbolic Extents

Jordy Rose
The start of work on symbolic extents. I wanted to ask what the best way
to handle this was. It seems we should have a general EvaluateSymExpr
method somewhere that simplifies a SymExpr as much as possible, which is
what the getKnownExtent() method (see patch) would eventually be doing. Or
perhaps EvaluateSVal, to handle ConcreteInts as well.

It's also going to soon become helpful to ask for the minimum or maximum
value of a SymExpr -- knowing the upper bound on an array size would
certainly be better than no information at all. What should the API for
that be?

(One goal of mine is to be able to model that the length of argv is argc.)

Jordy
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

symbolic-extents.patch (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Symbolic Extents

Zhongxing Xu
On Fri, Jun 25, 2010 at 5:18 AM, Jordy Rose <[hidden email]> wrote:
> The start of work on symbolic extents. I wanted to ask what the best way
> to handle this was. It seems we should have a general EvaluateSymExpr
> method somewhere that simplifies a SymExpr as much as possible, which is
> what the getKnownExtent() method (see patch) would eventually be doing. Or
> perhaps EvaluateSVal, to handle ConcreteInts as well.

I think this patch is a good small improvement over current implementation.

>
> It's also going to soon become helpful to ask for the minimum or maximum
> value of a SymExpr -- knowing the upper bound on an array size would
> certainly be better than no information at all. What should the API for
> that be?

Symbolic extents could go really wild. It's hard to design the
interface without concrete test cases that we want to solve.

>
> (One goal of mine is to be able to model that the length of argv is argc.)
>
> Jordy
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>
>
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Symbolic Extents

Ted Kremenek
In reply to this post by Jordy Rose
On Jun 24, 2010, at 2:18 PM, Jordy Rose wrote:

> The start of work on symbolic extents. I wanted to ask what the best way
> to handle this was. It seems we should have a general EvaluateSymExpr
> method somewhere that simplifies a SymExpr as much as possible, which is
> what the getKnownExtent() method (see patch) would eventually be doing. Or
> perhaps EvaluateSVal, to handle ConcreteInts as well.
>
> It's also going to soon become helpful to ask for the minimum or maximum
> value of a SymExpr -- knowing the upper bound on an array size would
> certainly be better than no information at all. What should the API for
> that be?
>
> (One goal of mine is to be able to model that the length of argv is argc.)

Hi Jordy,

I've made comments in some other emails, but here are some specific comments on your patch.  Comments inline:

Index: include/clang/Checker/PathSensitive/Store.h
===================================================================
--- include/clang/Checker/PathSensitive/Store.h (revision 106771)
+++ include/clang/Checker/PathSensitive/Store.h (working copy)
@@ -181,6 +181,11 @@
     return llvm::Optional<SVal>();
   }
 
+  virtual const llvm::APSInt *getKnownExtent(const GRState *state,
+                                             const MemRegion *R) {
+    return NULL;
+  }
+


Following my comment in another email, I think this should be a method in SValuator.   E.g:

  virtual SVal getKnownExtent(const GRState *state, const MemRegion *R);

Note that by using an SVal, extents just become like any other (possibly symbolic) expression.  That means we can establish constraints for them, as use all our existing SVal logic.  I actually don't think the StoreManager needs to be involved at all here.



Index: lib/Checker/RegionStore.cpp
===================================================================
--- lib/Checker/RegionStore.cpp (revision 106771)
+++ lib/Checker/RegionStore.cpp (working copy)
@@ -384,6 +384,21 @@
       return Optional<SVal>();
   }
 
+  const llvm::APSInt *getKnownExtent(const GRState *state, const MemRegion *R) {
+    Optional<SVal> V = getExtent(state, R);
+    if (!V)
+      return NULL;
+
+    if (nonloc::ConcreteInt *Int = dyn_cast<nonloc::ConcreteInt>(&V))
+      return &Int->getValue();
+
+    if (SymbolRef Sym = V->getAsSymbol())
+      return state->getSymVal(Sym);
+
+    // FIXME: This should eventually try to evaluate symbolic expressions.
+    return NULL;
+  }
+

I think all of this could easily go into SValuator, allowing us to return something far more generic.  I also think the 'getExtent' method only needs to operate on SymbolicRegions, and could possibly be a method in SymbolManager.  For all other cases where the extent is no symbolic, we should be able to evaluate it up front.

 
Index: lib/Checker/CastSizeChecker.cpp
===================================================================
--- lib/Checker/CastSizeChecker.cpp (revision 106771)
+++ lib/Checker/CastSizeChecker.cpp (working copy)
@@ -44,7 +44,8 @@
 
   QualType ToPointeeTy = ToPTy->getPointeeType();
 
-  const MemRegion *R = C.getState()->getSVal(E).getAsRegion();
+  const GRState *state = C.getState();
+  const MemRegion *R = state->getSVal(E).getAsRegion();
   if (R == 0)
     return;
 
@@ -52,17 +53,13 @@
   if (SR == 0)
     return;
 
-  llvm::Optional<SVal> V =
-                    C.getEngine().getStoreManager().getExtent(C.getState(), SR);
-  if (!V)
+  StoreManager &StoreMgr = C.getEngine().getStoreManager();
+  const llvm::APSInt *Extent = StoreMgr.getKnownExtent(state, SR);
+  if (!Extent)
     return;
 
-  const nonloc::ConcreteInt *CI = dyn_cast<nonloc::ConcreteInt>(V);
-  if (!CI)
-    return;
-
-  CharUnits RegionSize = CharUnits::fromQuantity(CI->getValue().getSExtValue());
-  CharUnits TypeSize = C.getASTContext().getTypeSizeInChars(ToPointeeTy);
+  CharUnits RegionSize = CharUnits::fromQuantity(Extent->getSExtValue());
+  CharUnits TypeSize = Ctx.getTypeSizeInChars(ToPointeeTy);


I think that extents should remain as SVals, leaving most of this code unchanged.

Fundamentally, I don't think this is quite the right design for extents.  This actually seems to be making the assumption that extents can be reduced to specific APSInt values.  In the interesting cases, that won't be possible at all.  Instead of we treat extents just like any other SVal (expression), then we get arbitrary flexibility in what they represent and how they are represented.  It allows us then to establish symbolic constraints between extents and indices, which is critical for buffer overflow detection.

I don't have a complete design for extents, but here is my rough idea.  Maybe it is the right design and maybe it's not; I'm curious to see what you and Zhongxing think.  The goals I have in my mind are:

(1) Treat extents just like any other symbolic expression, allowing extents to capture formulations such as "(n+1) * sizeof(int)" and the like.

(2) Make reasoning about extents completely lazy, so we only incur a cost when we reason about them.

For constant-size arrays and scalars, the SVal representing the extent should be something we can compute up-front.  Thus SValuator::getExtent() should return a simple NonLoc SVal (such as a constant) for such extents.  We can add supporting methods to StoreManager or MemRegions if necessary to make computing such values easier.

Now let's consider symbolic extents.  For calls to function like foo():

  void *p = foo(...)

we have the analyzer return a SymbolicRegion.  That SymbolicRegion has an associated SymbolID.  RegionStore uses that SymbolID to create "derived symbols" that represent the initial values of that SymbolicRegion when the code has not created an explicit binding.  This approach allows RegionStore to lazily abstract memory as the code interrogates memory values.

My thought about extents is why not treat them the same way?  We can create a new type of derived symbol, e.g. SymbolicExtent, that subclasses SymbolDerived.  For all intensive purposes it is just another symbol, and we just use the same existing infrastructure to query that symbol and perform operations on it.

For example, consider:

  p = malloc(n * sizeof(int))

Let's say we represent the return value of malloc() with a SymbolicRegion whose SymbolID is $1.  We can provide an API to SymbolManager that queries the SymbolExtent* for a given SymbolID.  This would be lazily generated using a FoldingSet, just like we do with other derived symbols.  Now let's assume we have a SymbolExtent* for that extent, and we want to add the constraint that the symbol is equal to 'n * sizeof(int)'.  All we do construct a symbolic expression that equates that derived symbol with 'n * sizeof(int)'.  We could do this using SValuator to build the symbolic expression, and then use the Assume method in the ConstraintManager to assume that the constraint was true.  All of this could be done in a PostVistCallExpr method of some checker that explicitly understood the malloc() function.  Other checkers could then add constraints modeling other domain-specific knowledge about extents.

The nice thing about this approach is that as long as we make the ConstraintManager and SValuator smart enough, we should be able to handle extents (and their relation to other symbols) just like any other symbolic value.  This also allows us to remove the current GDM-based approach of 'RegionExtents' from RegionStore.  This not only simplifies RegionStore, but also gets the StoreManager out of the business of managing extents itself.  It also gives us a scalable approach to handling extents, and gradually making the analyzer smarter about reasoning about them.

That said, I don't claim that this is perfect, or even the absolute right answer.  I just don't think we should be moving in the direction where we add APIs that have using returning APSInt values explicitly.  If we are going to try and model extents well, I think we should be aiming for a more general design.

What do you guys think?
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Symbolic Extents

Zhongxing Xu
Hi Ted,

I really like the design. It's clean and general enough to leverage
the smart classes SValuator and ConstraintManger.

One premise I want to point out is that this requires the constraint
manager to reason about constraints involving two symbols. It's not
enough to keep relations between symbols and integer ranges, but also
the relations between symbols, like $x == $y. So the current
constraint manager is not smart enough for this.

On Tue, Jun 29, 2010 at 7:34 AM, Ted Kremenek <[hidden email]> wrote:

> On Jun 24, 2010, at 2:18 PM, Jordy Rose wrote:
>
>> The start of work on symbolic extents. I wanted to ask what the best way
>> to handle this was. It seems we should have a general EvaluateSymExpr
>> method somewhere that simplifies a SymExpr as much as possible, which is
>> what the getKnownExtent() method (see patch) would eventually be doing. Or
>> perhaps EvaluateSVal, to handle ConcreteInts as well.
>>
>> It's also going to soon become helpful to ask for the minimum or maximum
>> value of a SymExpr -- knowing the upper bound on an array size would
>> certainly be better than no information at all. What should the API for
>> that be?
>>
>> (One goal of mine is to be able to model that the length of argv is argc.)
>
> Hi Jordy,
>
> I've made comments in some other emails, but here are some specific comments on your patch.  Comments inline:
>
> Index: include/clang/Checker/PathSensitive/Store.h
> ===================================================================
> --- include/clang/Checker/PathSensitive/Store.h (revision 106771)
> +++ include/clang/Checker/PathSensitive/Store.h (working copy)
> @@ -181,6 +181,11 @@
>     return llvm::Optional<SVal>();
>   }
>
> +  virtual const llvm::APSInt *getKnownExtent(const GRState *state,
> +                                             const MemRegion *R) {
> +    return NULL;
> +  }
> +
>
>
> Following my comment in another email, I think this should be a method in SValuator.   E.g:
>
>  virtual SVal getKnownExtent(const GRState *state, const MemRegion *R);
>
> Note that by using an SVal, extents just become like any other (possibly symbolic) expression.  That means we can establish constraints for them, as use all our existing SVal logic.  I actually don't think the StoreManager needs to be involved at all here.
>
>
>
> Index: lib/Checker/RegionStore.cpp
> ===================================================================
> --- lib/Checker/RegionStore.cpp (revision 106771)
> +++ lib/Checker/RegionStore.cpp (working copy)
> @@ -384,6 +384,21 @@
>       return Optional<SVal>();
>   }
>
> +  const llvm::APSInt *getKnownExtent(const GRState *state, const MemRegion *R) {
> +    Optional<SVal> V = getExtent(state, R);
> +    if (!V)
> +      return NULL;
> +
> +    if (nonloc::ConcreteInt *Int = dyn_cast<nonloc::ConcreteInt>(&V))
> +      return &Int->getValue();
> +
> +    if (SymbolRef Sym = V->getAsSymbol())
> +      return state->getSymVal(Sym);
> +
> +    // FIXME: This should eventually try to evaluate symbolic expressions.
> +    return NULL;
> +  }
> +
>
> I think all of this could easily go into SValuator, allowing us to return something far more generic.  I also think the 'getExtent' method only needs to operate on SymbolicRegions, and could possibly be a method in SymbolManager.  For all other cases where the extent is no symbolic, we should be able to evaluate it up front.
>
>
> Index: lib/Checker/CastSizeChecker.cpp
> ===================================================================
> --- lib/Checker/CastSizeChecker.cpp     (revision 106771)
> +++ lib/Checker/CastSizeChecker.cpp     (working copy)
> @@ -44,7 +44,8 @@
>
>   QualType ToPointeeTy = ToPTy->getPointeeType();
>
> -  const MemRegion *R = C.getState()->getSVal(E).getAsRegion();
> +  const GRState *state = C.getState();
> +  const MemRegion *R = state->getSVal(E).getAsRegion();
>   if (R == 0)
>     return;
>
> @@ -52,17 +53,13 @@
>   if (SR == 0)
>     return;
>
> -  llvm::Optional<SVal> V =
> -                    C.getEngine().getStoreManager().getExtent(C.getState(), SR);
> -  if (!V)
> +  StoreManager &StoreMgr = C.getEngine().getStoreManager();
> +  const llvm::APSInt *Extent = StoreMgr.getKnownExtent(state, SR);
> +  if (!Extent)
>     return;
>
> -  const nonloc::ConcreteInt *CI = dyn_cast<nonloc::ConcreteInt>(V);
> -  if (!CI)
> -    return;
> -
> -  CharUnits RegionSize = CharUnits::fromQuantity(CI->getValue().getSExtValue());
> -  CharUnits TypeSize = C.getASTContext().getTypeSizeInChars(ToPointeeTy);
> +  CharUnits RegionSize = CharUnits::fromQuantity(Extent->getSExtValue());
> +  CharUnits TypeSize = Ctx.getTypeSizeInChars(ToPointeeTy);
>
>
> I think that extents should remain as SVals, leaving most of this code unchanged.
>
> Fundamentally, I don't think this is quite the right design for extents.  This actually seems to be making the assumption that extents can be reduced to specific APSInt values.  In the interesting cases, that won't be possible at all.  Instead of we treat extents just like any other SVal (expression), then we get arbitrary flexibility in what they represent and how they are represented.  It allows us then to establish symbolic constraints between extents and indices, which is critical for buffer overflow detection.
>
> I don't have a complete design for extents, but here is my rough idea.  Maybe it is the right design and maybe it's not; I'm curious to see what you and Zhongxing think.  The goals I have in my mind are:
>
> (1) Treat extents just like any other symbolic expression, allowing extents to capture formulations such as "(n+1) * sizeof(int)" and the like.
>
> (2) Make reasoning about extents completely lazy, so we only incur a cost when we reason about them.
>
> For constant-size arrays and scalars, the SVal representing the extent should be something we can compute up-front.  Thus SValuator::getExtent() should return a simple NonLoc SVal (such as a constant) for such extents.  We can add supporting methods to StoreManager or MemRegions if necessary to make computing such values easier.
>
> Now let's consider symbolic extents.  For calls to function like foo():
>
>  void *p = foo(...)
>
> we have the analyzer return a SymbolicRegion.  That SymbolicRegion has an associated SymbolID.  RegionStore uses that SymbolID to create "derived symbols" that represent the initial values of that SymbolicRegion when the code has not created an explicit binding.  This approach allows RegionStore to lazily abstract memory as the code interrogates memory values.
>
> My thought about extents is why not treat them the same way?  We can create a new type of derived symbol, e.g. SymbolicExtent, that subclasses SymbolDerived.  For all intensive purposes it is just another symbol, and we just use the same existing infrastructure to query that symbol and perform operations on it.
>
> For example, consider:
>
>  p = malloc(n * sizeof(int))
>
> Let's say we represent the return value of malloc() with a SymbolicRegion whose SymbolID is $1.  We can provide an API to SymbolManager that queries the SymbolExtent* for a given SymbolID.  This would be lazily generated using a FoldingSet, just like we do with other derived symbols.  Now let's assume we have a SymbolExtent* for that extent, and we want to add the constraint that the symbol is equal to 'n * sizeof(int)'.  All we do construct a symbolic expression that equates that derived symbol with 'n * sizeof(int)'.  We could do this using SValuator to build the symbolic expression, and then use the Assume method in the ConstraintManager to assume that the constraint was true.  All of this could be done in a PostVistCallExpr method of some checker that explicitly understood the malloc() function.  Other checkers could then add constraints modeling other domain-specific knowledge about extents.
>
> The nice thing about this approach is that as long as we make the ConstraintManager and SValuator smart enough, we should be able to handle extents (and their relation to other symbols) just like any other symbolic value.  This also allows us to remove the current GDM-based approach of 'RegionExtents' from RegionStore.  This not only simplifies RegionStore, but also gets the StoreManager out of the business of managing extents itself.  It also gives us a scalable approach to handling extents, and gradually making the analyzer smarter about reasoning about them.
>
> That said, I don't claim that this is perfect, or even the absolute right answer.  I just don't think we should be moving in the direction where we add APIs that have using returning APSInt values explicitly.  If we are going to try and model extents well, I think we should be aiming for a more general design.
>
> What do you guys think?

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Symbolic Extents

Ted Kremenek

On Jun 28, 2010, at 7:00 PM, Zhongxing Xu wrote:

> Hi Ted,
>
> I really like the design. It's clean and general enough to leverage
> the smart classes SValuator and ConstraintManger.
>
> One premise I want to point out is that this requires the constraint
> manager to reason about constraints involving two symbols. It's not
> enough to keep relations between symbols and integer ranges,

It should be able to handle tracking whether an extent is within some constant range, which is no less powerful than what we have now (actually it's more powerful).

> but also
> the relations between symbols, like $x == $y. So the current
> constraint manager is not smart enough for this.

Exactly, but it's no less powerful than what we already had for extents, and opens the door to do it correctly once the ConstraintManager can handle constraints between symbols.  
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Symbolic Extents

Jordy Rose

All right, time to go back to the drawing board with this one. So the
direction we're going is this (pseudocode):

(MallocChecker::MallocMemAux)
SVal Result = ValMgr.getConjuredSymbolVal(...)
SVal Extent = ValMgr.getExtent(Result) // or SVator.getExtent(Result)?
state->AssumeEQ(Extent, SizeArg)


The reason I wasn't doing this before is because the constraint manager
wouldn't (currently) handle this case:

void symbolic1(size_t a) {
  char *buf = (char*) malloc(a);
  if (a == 1) {
    buf[1] = 'b'; // expected-warning {{out-of-bound}}
  } else {
    buf[1] = 'b'; // no-warning
  }
  free(buf);
}

Here, the value of the symbolic extent is not known until /after/ it is
set. Our current constraint manager doesn't allow us to "alias" symbols in
this way, though of course it should soon. But all right, let's not worry
about that for now.

As for the extent symbols themselves, I'm not quite sure what to do.
Variable-length arrays aren't symbolic regions, so there's no "parent
symbol" for a SymbolDerived to attach to, but we do need to bound them
symbolically. So we'd at least need a new kind of symbol that attached
itself to regions, rather than other symbols.

Perhaps a MetadataRegion might be more appropriate? The advantage of a new
region over a new symbol is that it could be used for metadata that changes
as well. (I'm looking ahead to taking a stab at modeling C string length,
at least in simple cases.) Of course, this could wreak havoc with the
mythical FlatRegionStore, which would have no place to put this new
subregion.

As for tying SymExprs closer with SVals, it doesn't really make sense,
since SymExprs have more than one pointers' worth of data and thus don't
fit in SVals. Unless we wanted to make a new folding set manager, it's
probably not worth it to extract them from SymbolManager. (Unless we do end
up with a new base in klee.)

Thoughts on MetadataRegion?
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Symbolic Extents

Ted Kremenek

On Jun 28, 2010, at 11:35 PM, Jordy Rose wrote:

> So the
> direction we're going is this (pseudocode):
>
> (MallocChecker::MallocMemAux)
> SVal Result = ValMgr.getConjuredSymbolVal(...)
> SVal Extent = ValMgr.getExtent(Result) // or SVator.getExtent(Result)?
> state->AssumeEQ(Extent, SizeArg)

Essentially, except that I would merge the two first lines into an API wrapped by SValuator.  e.g.:

  const MemRegion *R = ...
  SVal Extent = SVator.getExtent(R);
  state = state->AssumeEQ(Extent, SizeArg);

The client should never care whether or not the extent is backed by a symbol.  If the Extent is a constant (e.g., nonloc::ConcreteInt), then the AssumeEQ should be trivially decidable if SizeArg is a constant.  If it isn't, it will likely be a linear combination of a symbol and a constant, and the current constraint solver logic (which you added) should be able to handle this.





_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Symbolic Extents

Ted Kremenek
In reply to this post by Jordy Rose

On Jun 28, 2010, at 11:35 PM, Jordy Rose wrote:

> As for the extent symbols themselves, I'm not quite sure what to do.
> Variable-length arrays aren't symbolic regions, so there's no "parent
> symbol" for a SymbolDerived to attach to, but we do need to bound them
> symbolically. So we'd at least need a new kind of symbol that attached
> itself to regions, rather than other symbols.

The extent of a variable length array is just the value of the size expression when it was declared.  e.g.:

  int buf[10 * n];

Here the extent is '10 * n' when at the point of the DeclStmt for 'buf'.  We already evaluate '10 * n', and should just capture it as part of the region definition for the VLA.  This would require creating a specialized VarRegion that had an SVal entry for the extent.  SValuator::getExtent() would just need to know for that type of region that it can obtain the extent by querying the region.



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Symbolic Extents

Ted Kremenek
In reply to this post by Jordy Rose

On Jun 28, 2010, at 11:35 PM, Jordy Rose wrote:

> Perhaps a MetadataRegion might be more appropriate? The advantage of a new
> region over a new symbol is that it could be used for metadata that changes
> as well. (I'm looking ahead to taking a stab at modeling C string length,
> at least in simple cases.) Of course, this could wreak havoc with the
> mythical FlatRegionStore, which would have no place to put this new
> subregion.

Would precisely would a MetadataRegion represent?  Regions are suppose to represent chunks of memory, subsets of chunks of memory, with annotations on that chunk.  Aside from VLAs (where we need to capture more information anyway), I don't know what a MetadataRegion would represent.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Symbolic Extents

Ted Kremenek
In reply to this post by Jordy Rose

On Jun 28, 2010, at 11:35 PM, Jordy Rose wrote:

> As for tying SymExprs closer with SVals, it doesn't really make sense,
> since SymExprs have more than one pointers' worth of data and thus don't
> fit in SVals. Unless we wanted to make a new folding set manager, it's
> probably not worth it to extract them from SymbolManager. (Unless we do end
> up with a new base in klee.)


Tying SymExprs with SVals is actually an intrinsic part of their design.  Hopefully I can make that little clearer.

SVals and SymExprs are really part of a single feature.  The are used to represent the "expression semantics" of evaluating expressions.  When we see '(exp1) + (exp2)', we compute an SVal that represents the semantics of evaluating that expression.  In the beginning we only represented a very simple set of semantics which you can think of as a lattice/set of values: { Unknown, Undefined, Constant, Locations }.  SVals are just *handles* to values in this lattice.  This is really important.  We pass SVals around everywhere, but anytime they have real data associated with them we have a FoldingSet to unique that data.  For example, in the case of nonloc::ConcreteInt the SVal indicates that we have a Constant and the associated data is represented by an APSInt.

SymExprs are meant to just be values in this set of values.  They allow us to represent more complicated values, and they allow us to reason about computation more lazily by making the semantic expressions more symbolic.  SymExprs are intrinsically part of how SVals are intended to be used.

Another way to look at it is that we are defining a language to represent "semantic expressions".  That language consists of a grammar, with SVal just being a handle into an expression in that grammar.  More complicated expressions need to reference other expressions, but since SVals are just a single pointer they can't represent that directly.  Instead SVals are used as a generic handle to symbolic expressions, and SymExprs can be built (when needed) to represent the recursive nature of more complicated expressions.  Actually tying SVals and SymExprs closer together makes a lot of sense, because they are just meant to represent that grammar of expressions.

My suggestion for using parts from Klee was that Klee has its own expression language, similar to the notion of SymExprs.  Klee's expression language, however, is far more developed, although it has a different memory model.  We could bring in parts of Klee's expression language by potentially wrapping them in SVals.  The end goal, however, is that we define an expression language for defining the semantics of expressions.  That design is really clear to me, but it is represented very clearly in the code.


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Symbolic Extents

Ted Kremenek
In reply to this post by Jordy Rose

On Jun 28, 2010, at 11:35 PM, Jordy Rose wrote:

> The reason I wasn't doing this before is because the constraint manager
> wouldn't (currently) handle this case:
>
> void symbolic1(size_t a) {
>  char *buf = (char*) malloc(a);
>  if (a == 1) {
>    buf[1] = 'b'; // expected-warning {{out-of-bound}}
>  } else {
>    buf[1] = 'b'; // no-warning
>  }
>  free(buf);
> }
>
> Here, the value of the symbolic extent is not known until /after/ it is
> set. Our current constraint manager doesn't allow us to "alias" symbols in
> this way, though of course it should soon. But all right, let's not worry
> about that for now.

Yes, I completely understand how this limitation guided you to the other approach, but this is a fundamental problem that we need to solve anyway.  This is a limitation that I've really to address for a long time, but I also think it wouldn't be that hard to support symbol aliases (at least for the simple cases).  I know it's tempting to not try to think about this problem now (as we've deferred solving it for a long time), but unless we try to take the general approach with handling extents using SVals (including symbol aliasing), we are just going to hit the fundamental limitation of what extents can represent (e.g., simple constants, basic symbols) by the end of the summer.  At that point we'll need to solve this problem anyway.

Symbol aliasing is basically alpha-renaming.  We need to store on the side (within GRState) the set of alpha-renamed symbols, and when we assume that two symbols are aliased we need a callback into the Checkers so that they can decide if their current set of meta-data associated with a symbol is compatible with assuming that two symbols are equal.  The SValuator can be clever to always use an alpha-renamed symbol when building new expressions, thus folding the renaming into newly constructed SVals.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Symbolic Extents

Ted Kremenek
In reply to this post by Jordy Rose

On Jun 28, 2010, at 11:35 PM, Jordy Rose wrote:

> Thoughts on MetadataRegion?

I don't know what function MetadataRegion would serve and how it would fit into the design.  What specific problem would it solve, and would it be used?
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Symbolic Extents

Jordy Rose
In reply to this post by Ted Kremenek

On Tue, 29 Jun 2010 10:24:57 -0700, Ted Kremenek <[hidden email]>
wrote:
> On Jun 28, 2010, at 11:35 PM, Jordy Rose wrote:
>
>> Perhaps a MetadataRegion might be more appropriate? The advantage of a
>> new
>> region over a new symbol is that it could be used for metadata that
>> changes
>> as well. (I'm looking ahead to taking a stab at modeling C string
length,
>> at least in simple cases.) Of course, this could wreak havoc with the
>> mythical FlatRegionStore, which would have no place to put this new
>> subregion.
>
> Would precisely would a MetadataRegion represent?  Regions are suppose
to
> represent chunks of memory, subsets of chunks of memory, with
annotations
> on that chunk.  Aside from VLAs (where we need to capture more
information
> anyway), I don't know what a MetadataRegion would represent.

I was trying to think of a way to leverage the existing symbol/region
liveness analysis -- subregions remain alive with the superregion, and
region values remain alive with their regions. This shouldn't be a
constraint on us, though -- a symbolic extent is just a (possibly conjured)
value that remains live with the region.

I'll stop trying to plan ahead for C string length -- maybe that's
solvable with the GDM anyway.

So, at this point I would picture extents for SymbolicRegions (and
AllocaRegions, and maybe VLAs) as a cross between derived symbols and
conjured symbols with tags, except for one possible issue -- they might be
different on different paths through the CFG. With conjured symbols now we
have a Count parameter to avoid this. Does this mean we can only
getExtent() (or getExtentVal()) from a function with access to
GRStmtNodeBuilder?

Also, it might be nice if getExtent() lived on MemRegion, instead of
ValueManager or SValuator. Maybe?
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Symbolic Extents

Jordy Rose
In reply to this post by Ted Kremenek

On Tue, 29 Jun 2010 10:44:47 -0700, Ted Kremenek <[hidden email]>
wrote:
> Symbol aliasing is basically alpha-renaming.  We need to store on the
side
> (within GRState) the set of alpha-renamed symbols, and when we assume
that
> two symbols are aliased we need a callback into the Checkers so that
they
> can decide if their current set of meta-data associated with a symbol is
> compatible with assuming that two symbols are equal.  The SValuator can
be
> clever to always use an alpha-renamed symbol when building new
expressions,
> thus folding the renaming into newly constructed SVals.

Aliasing isn't so bad; it's the other constraints that would be harder to
model with sym-sym relations. But maybe we just support aliasing for a
while, or aliasing+linear adjustment, and ignore constraints like "a < b"
or "a != b".
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Symbolic Extents

Jordy Rose
In reply to this post by Ted Kremenek

Okay. I think my misunderstanding came from wondering when to pass around
a SymExpr vs. an SVal, since not all SVals map into SymExprs. (In
particular, you can't pass a SymExpr and an APSInt in the same variable.)
The answer seems to be "never" -- you always want an SVal for API that
faces the checkers -- because SymExpr is almost an implementation detail.
Does that sound right?

(usual qualification about exceptions to every rules)

I was also confused by the fact that SymExprs don't deal in SVals, but I
guess they don't need to. The nonloc types are either symbolic, constant,
or compound, and you can't do a symbolic operation to a compound value.


On Tue, 29 Jun 2010 10:36:43 -0700, Ted Kremenek <[hidden email]>
wrote:
> On Jun 28, 2010, at 11:35 PM, Jordy Rose wrote:
>
>> As for tying SymExprs closer with SVals, it doesn't really make sense,
>> since SymExprs have more than one pointers' worth of data and thus
don't
>> fit in SVals. Unless we wanted to make a new folding set manager, it's
>> probably not worth it to extract them from SymbolManager. (Unless we do
>> end
>> up with a new base in klee.)
>
>
> Tying SymExprs with SVals is actually an intrinsic part of their design.

> Hopefully I can make that little clearer.
>
> SVals and SymExprs are really part of a single feature.  The are used to
> represent the "expression semantics" of evaluating expressions.  When we
> see '(exp1) + (exp2)', we compute an SVal that represents the semantics
of
> evaluating that expression.  In the beginning we only represented a very
> simple set of semantics which you can think of as a lattice/set of
values:
> { Unknown, Undefined, Constant, Locations }.  SVals are just *handles*
to
> values in this lattice.  This is really important.  We pass SVals around
> everywhere, but anytime they have real data associated with them we have
a
> FoldingSet to unique that data.  For example, in the case of
> nonloc::ConcreteInt the SVal indicates that we have a Constant and the
> associated data is represented by an APSInt.
>
> SymExprs are meant to just be values in this set of values.  They allow
us
> to represent more complicated values, and they allow us to reason about
> computation more lazily by making the semantic expressions more
symbolic.
> SymExprs are intrinsically part of how SVals are intended to be used.
>
> Another way to look at it is that we are defining a language to
represent
> "semantic expressions".  That language consists of a grammar, with SVal
> just being a handle into an expression in that grammar.  More
complicated
> expressions need to reference other expressions, but since SVals are
just a
> single pointer they can't represent that directly.  Instead SVals are
used
> as a generic handle to symbolic expressions, and SymExprs can be built
> (when needed) to represent the recursive nature of more complicated
> expressions.  Actually tying SVals and SymExprs closer together makes a
lot
> of sense, because they are just meant to represent that grammar of
> expressions.
>
> My suggestion for using parts from Klee was that Klee has its own
> expression language, similar to the notion of SymExprs.  Klee's
expression
> language, however, is far more developed, although it has a different
> memory model.  We could bring in parts of Klee's expression language by
> potentially wrapping them in SVals.  The end goal, however, is that we
> define an expression language for defining the semantics of expressions.

> That design is really clear to me, but it is represented very clearly in
> the code.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Symbolic Extents

Ted Kremenek
In reply to this post by Jordy Rose

On Jun 29, 2010, at 1:53 PM, Jordy Rose wrote:

> I was trying to think of a way to leverage the existing symbol/region
> liveness analysis -- subregions remain alive with the superregion, and
> region values remain alive with their regions. This shouldn't be a
> constraint on us, though -- a symbolic extent is just a (possibly conjured)
> value that remains live with the region.

Yes, symbols are also pruned from the state when they are no longer live.  This includes symbols derived from regions and symbols derived from other symbols.

>
> I'll stop trying to plan ahead for C string length -- maybe that's
> solvable with the GDM anyway.
>
> So, at this point I would picture extents for SymbolicRegions (and
> AllocaRegions, and maybe VLAs) as a cross between derived symbols and
> conjured symbols with tags, except for one possible issue -- they might be
> different on different paths through the CFG. With conjured symbols now we
> have a Count parameter to avoid this. Does this mean we can only
> getExtent() (or getExtentVal()) from a function with access to
> GRStmtNodeBuilder?

No, I think having access to SymbolManager would be enough, which is accessible from ValueManager.

> Also, it might be nice if getExtent() lived on MemRegion, instead of
> ValueManager or SValuator. Maybe?

I think having it in MemRegion would be fine, except you would need to pass a SymbolManager& argument for the case of generating derived symbols on demand.  More complicated extents would involve expressions that were already constructed by the SValuator and then equated (via ConstraintManager) to a symbol.


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Symbolic Extents

Ted Kremenek
In reply to this post by Jordy Rose

On Jun 29, 2010, at 1:56 PM, Jordy Rose wrote:


On Tue, 29 Jun 2010 10:44:47 -0700, Ted Kremenek <[hidden email]>
wrote:
Symbol aliasing is basically alpha-renaming.  We need to store on the
side
(within GRState) the set of alpha-renamed symbols, and when we assume
that
two symbols are aliased we need a callback into the Checkers so that
they
can decide if their current set of meta-data associated with a symbol is
compatible with assuming that two symbols are equal.  The SValuator can
be
clever to always use an alpha-renamed symbol when building new
expressions,
thus folding the renaming into newly constructed SVals.

Aliasing isn't so bad; it's the other constraints that would be harder to
model with sym-sym relations. But maybe we just support aliasing for a
while, or aliasing+linear adjustment, and ignore constraints like "a < b"
or "a != b".

For "a < b" and such, isn't it a matter of unifying the constraints that already exist on 'a' and 'b'?



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Symbolic Extents

Ted Kremenek
In reply to this post by Jordy Rose

On Jun 29, 2010, at 2:12 PM, Jordy Rose wrote:

>
> Okay. I think my misunderstanding came from wondering when to pass around
> a SymExpr vs. an SVal, since not all SVals map into SymExprs. (In
> particular, you can't pass a SymExpr and an APSInt in the same variable.)
> The answer seems to be "never" -- you always want an SVal for API that
> faces the checkers -- because SymExpr is almost an implementation detail.
> Does that sound right?
>
> (usual qualification about exceptions to every rules)
>
> I was also confused by the fact that SymExprs don't deal in SVals, but I
> guess they don't need to. The nonloc types are either symbolic, constant,
> or compound, and you can't do a symbolic operation to a compound value.
>

Yes, that's basically it!  That said, we could probably benefit from making the recursive nature of the expression language grammar clearer.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Symbolic Extents

Jordy Rose
In reply to this post by Ted Kremenek

On Tue, 29 Jun 2010 17:23:20 -0700, Ted Kremenek <[hidden email]>
wrote:

> On Jun 29, 2010, at 1:56 PM, Jordy Rose wrote:
>
>>
>> On Tue, 29 Jun 2010 10:44:47 -0700, Ted Kremenek <[hidden email]>
>> wrote:
>>> Symbol aliasing is basically alpha-renaming.  We need to store on the
>> side
>>> (within GRState) the set of alpha-renamed symbols, and when we assume
>> that
>>> two symbols are aliased we need a callback into the Checkers so that
>> they
>>> can decide if their current set of meta-data associated with a symbol
is
>>> compatible with assuming that two symbols are equal.  The SValuator
can
>> be
>>> clever to always use an alpha-renamed symbol when building new
>> expressions,
>>> thus folding the renaming into newly constructed SVals.
>>
>> Aliasing isn't so bad; it's the other constraints that would be harder
to
>> model with sym-sym relations. But maybe we just support aliasing for a
>> while, or aliasing+linear adjustment, and ignore constraints like "a <
b"
>> or "a != b".
>
>
> For "a < b" and such, isn't it a matter of unifying the constraints that
> already exist on 'a' and 'b'?


Something like that, but fully modeling it can get complicated very
quickly:

a < b
a < c
// a != MAX, but other than that we know nothing
b == 11
// at this point a < 11
c == 10
// now a < 10

In the long-term I think we're going to want to arbitrarily impose an
order on symbols, so that higher symbols depend on lower ones and not the
other way around. Alternately (or additionally?), we could have a
notification system of some kind: when a min or max value changes, ping any
dependent symbols.

Constraints can form arbitrarily complex graphs, in theory, and we do have
to limit that. Unifying the current constraints is certainly an improvement
over our current state.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Symbolic Extents

Ted Kremenek

On Jun 29, 2010, at 5:52 PM, Jordy Rose wrote:

> Something like that, but fully modeling it can get complicated very
> quickly:
>
> a < b
> a < c
> // a != MAX, but other than that we know nothing
> b == 11
> // at this point a < 11
> c == 10
> // now a < 10
>
> In the long-term I think we're going to want to arbitrarily impose an
> order on symbols, so that higher symbols depend on lower ones and not the
> other way around.

An ordering like this is essentially what is done in the Archer paper:

http://portal.acm.org/citation.cfm?id=940115

(PDF: http://theory.stanford.edu/~yxie/archer.pdf)

> Alternately (or additionally?), we could have a
> notification system of some kind: when a min or max value changes, ping any
> dependent symbols.

Yes this is basically a gradually relaxation of the constraints.

>
> Constraints can form arbitrarily complex graphs, in theory, and we do have
> to limit that.

Absolutely.  What constraints can be handled should be dictated by the ConstraintManager.

> Unifying the current constraints is certainly an improvement
> over our current state.


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
12