[Analyzer] Obtain MemRegion corresponding to an pointer expression that has been cast to a different type

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[Analyzer] Obtain MemRegion corresponding to an pointer expression that has been cast to a different type

James Y Knight via cfe-dev
Hi All,

I'm analyzing something like the following code:

struct S {
  int a;
  char b;
  int c;
}

void foo() {
  struct S x;
  bar((uint8_t *)&x);
}

When I reach the CallEvent corresponding to the call to bar(), I would like to extract the MemRegion corresponding to x, i.e. by ignoring the (uint8_t *) cast. My code looks something like this:

const Expr *arg = Call.getArgExpr(0);
SVal addrVal = State->getSVal(arg, LCtx);
Optional<Loc> l = addrVal.getAs<Loc>();
if (!l) // must be a null pointer
return nullptr;

QualType T = getPointedToType(E);
return State->getSVal(*l, T).getAsRegion();

where getPointedToType() is defined as

getPointedToType(const Expr *E) {
assert(E);
if (!isPointer(E))
return QualType();
if (const CastExpr *cast = dyn_cast<CastExpr>(E))
return getPointedToType(cast->getSubExpr());

const PointerType *Ty =
dyn_cast<PointerType>(E->getType().getCanonicalType().getTypePtr());
if (Ty)
return Ty->getPointeeType();
return QualType();
}

Everything seems to work just fine, until the call to State->getSVal(*l, T), which returns a NonLoc. If I instead call State->getSVal(*l) without the pointed-to type, then I do get a MemRegion, but it's an element region of type uint_8, NOT what I want.

Am I doing something wrong? Is there a much easier way to do this?

~Scott Constable

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [Analyzer] Obtain MemRegion corresponding to an pointer expression that has been cast to a different type

James Y Knight via cfe-dev
Hi Scott,

I don’t actually see a reason here why you need to even look at the structure of the AST here.  The analyzer does a full symbolic execution, so there is a powerful separation between syntax and semantics right at your fingertips.

I would approach this from a different angle.  Once you have the location, in this case, ‘l’, it should be an ElementRegion.  That will represent the cast from original MemRegion (a VarRegion) to uint8_t*.  Then just strip off the ElementRegion.  The MemRegion design captures how the casts were used to change the interpretation of a piece of memory.  It’s all right there in the MemRegion hierarchy.

AST-based approaches like this are fundamentally very brittle.  For example, you would need to do something different if the code was instead written like this:

  void foo() {
    struct S x;
   uint8_t *y = (uint8_t *)&x;
   bar(y);
  }

If you just use the MemRegions directly, these syntactic differences are irrelevant.  The MemRegions capture the actual semantics of the value you are working with.  In this case, the analyzer knows that the original memory address is for the VarRegion for ‘x’.

Typically if you find yourself going to the AST itself to do these kind of operations, the approach is inherently wrong.  Syntactic approaches work reasonably well for the compiler, where cheap local analysis is all you have.  For the static analyzer, there is so much semantics captured in the ProgramState that you can go far beyond the reasoning power of syntactic checks like this.

Cheers,
Ted

> On Aug 19, 2015, at 8:44 AM, scott constable via cfe-dev <[hidden email]> wrote:
>
> Hi All,
>
> I'm analyzing something like the following code:
>
> struct S {
>   int a;
>   char b;
>   int c;
> }
>
> void foo() {
>   struct S x;
>   bar((uint8_t *)&x);
> }
>
> When I reach the CallEvent corresponding to the call to bar(), I would like to extract the MemRegion corresponding to x, i.e. by ignoring the (uint8_t *) cast. My code looks something like this:
>
> const Expr *arg = Call.getArgExpr(0);
> SVal addrVal = State->getSVal(arg, LCtx);
> Optional<Loc> l = addrVal.getAs<Loc>();
> if (!l) // must be a null pointer
> return nullptr;
>
> QualType T = getPointedToType(E);
> return State->getSVal(*l, T).getAsRegion();
>
> where getPointedToType() is defined as
>
> getPointedToType(const Expr *E) {
> assert(E);
> if (!isPointer(E))
> return QualType();
> if (const CastExpr *cast = dyn_cast<CastExpr>(E))
> return getPointedToType(cast->getSubExpr());
>
> const PointerType *Ty =
> dyn_cast<PointerType>(E->getType().getCanonicalType().getTypePtr());
> if (Ty)
> return Ty->getPointeeType();
> return QualType();
> }
>
> Everything seems to work just fine, until the call to State->getSVal(*l, T), which returns a NonLoc. If I instead call State->getSVal(*l) without the pointed-to type, then I do get a MemRegion, but it's an element region of type uint_8, NOT what I want.
>
> Am I doing something wrong? Is there a much easier way to do this?
>
> ~Scott Constable
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_cfe-2Ddev&d=BQIGaQ&c=eEvniauFctOgLOKGJOplqw&r=UVc407_CCx3FapxjS2xZ9jo4Q91upSGpJHRF8fPPYVY&m=kO3mADPT6iSj6j0bsR1t_h-zUwpU5pIswmJrYE52JpY&s=lDOFrm1CLnG-VY9ygoKFkayV7KRSC5BEgo-k_jJdf9k&e= 

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [Analyzer] Obtain MemRegion corresponding to an pointer expression that has been cast to a different type

James Y Knight via cfe-dev
Thanks Ted,

The solution was to write the "dereference" function like this:

const MemRegion * 
Util::getPointedToRegion(SVal addrVal, bool ignoreElemCast) {
Optional<Loc> l = addrVal.getAs<Loc>();
if (!l) // must be a null pointer
return nullptr;
const MemRegion *MR = l->getAsRegion();
if (!MR)
return nullptr;
const ElementRegion *ER = dyn_cast<ElementRegion>(MR);
if (ER && ignoreElemCast)
MR = ER->getSuperRegion();

return MR;
}

It's essentially just stripping off the ElementRegion, just like you suggested.

~Scott Constable

On Wed, Aug 19, 2015 at 11:57 AM, Ted Kremenek via cfe-dev <[hidden email]> wrote:
Hi Scott,

I don’t actually see a reason here why you need to even look at the structure of the AST here.  The analyzer does a full symbolic execution, so there is a powerful separation between syntax and semantics right at your fingertips.

I would approach this from a different angle.  Once you have the location, in this case, ‘l’, it should be an ElementRegion.  That will represent the cast from original MemRegion (a VarRegion) to uint8_t*.  Then just strip off the ElementRegion.  The MemRegion design captures how the casts were used to change the interpretation of a piece of memory.  It’s all right there in the MemRegion hierarchy.

AST-based approaches like this are fundamentally very brittle.  For example, you would need to do something different if the code was instead written like this:

  void foo() {
    struct S x;
   uint8_t *y = (uint8_t *)&x;
   bar(y);
  }

If you just use the MemRegions directly, these syntactic differences are irrelevant.  The MemRegions capture the actual semantics of the value you are working with.  In this case, the analyzer knows that the original memory address is for the VarRegion for ‘x’.

Typically if you find yourself going to the AST itself to do these kind of operations, the approach is inherently wrong.  Syntactic approaches work reasonably well for the compiler, where cheap local analysis is all you have.  For the static analyzer, there is so much semantics captured in the ProgramState that you can go far beyond the reasoning power of syntactic checks like this.

Cheers,
Ted

> On Aug 19, 2015, at 8:44 AM, scott constable via cfe-dev <[hidden email]> wrote:
>
> Hi All,
>
> I'm analyzing something like the following code:
>
> struct S {
>   int a;
>   char b;
>   int c;
> }
>
> void foo() {
>   struct S x;
>   bar((uint8_t *)&x);
> }
>
> When I reach the CallEvent corresponding to the call to bar(), I would like to extract the MemRegion corresponding to x, i.e. by ignoring the (uint8_t *) cast. My code looks something like this:
>
> const Expr *arg = Call.getArgExpr(0);
> SVal addrVal = State->getSVal(arg, LCtx);
> Optional<Loc> l = addrVal.getAs<Loc>();
> if (!l) // must be a null pointer
>       return nullptr;
>
> QualType T = getPointedToType(E);
> return State->getSVal(*l, T).getAsRegion();
>
> where getPointedToType() is defined as
>
> getPointedToType(const Expr *E) {
>       assert(E);
>       if (!isPointer(E))
>               return QualType();
>       if (const CastExpr *cast = dyn_cast<CastExpr>(E))
>               return getPointedToType(cast->getSubExpr());
>
>       const PointerType *Ty =
>               dyn_cast<PointerType>(E->getType().getCanonicalType().getTypePtr());
>       if (Ty)
>               return Ty->getPointeeType();
>       return QualType();
> }
>
> Everything seems to work just fine, until the call to State->getSVal(*l, T), which returns a NonLoc. If I instead call State->getSVal(*l) without the pointed-to type, then I do get a MemRegion, but it's an element region of type uint_8, NOT what I want.
>
> Am I doing something wrong? Is there a much easier way to do this?
>
> ~Scott Constable
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_cfe-2Ddev&d=BQIGaQ&c=eEvniauFctOgLOKGJOplqw&r=UVc407_CCx3FapxjS2xZ9jo4Q91upSGpJHRF8fPPYVY&m=kO3mADPT6iSj6j0bsR1t_h-zUwpU5pIswmJrYE52JpY&s=lDOFrm1CLnG-VY9ygoKFkayV7KRSC5BEgo-k_jJdf9k&e=

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev