Quantcast

pointer values, taint propogation

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

pointer values, taint propogation

Philip Reames via cfe-dev

I would like to write a checker to make sure that pointers from untrusted sources are not dereferenced.  So I am playing around with the alpha.security.taint checker to try to understand how taint propagation works.  I put together some simple test cases:

 

 

The indicated calls to printf should be flagged because an untrusted value is passed as the format string (the first argument).  When I run the alpha.security.taint checker on this, it only flags the last two.  Looking into this, I found that the static analyzer often assigns an unknown value to the result of casting from an integer to a pointer.  Taint can’t be attached to an unknown value, so I need a known value for the pointer.  Although I’m not sure if this is really what we want to do, for now I added a post-check call-back on cast expressions to my checker that casts the integer to unsigned and then assigns the result as the pointer value.

 

 

When I run alpha.security.taint together with this on the test file, it flags the calls to printf where (char *)l and (char *)u are passed, but not the calls where sv1, sp1, (char *)NULL+l, sv2, or sp2 are passed, even though these are just different expressions with the same value.  I tried looking into this using the debugger, and found that when the analyzer calls ExprEngine::evalStore for the assignment statement (say “sv1 = (char *)ll”), it gives the pointer the appropriate value, but when the analyzer calls GenericTaintChecker::checkUncontrolledFormatString from the pre-check call-back for the call to printf on the following line of the test file, the pointer has an UnknownVal.  (See excepts from the debugging session below.)

 

Can anyone help me understand what is going on here?

 

Thanks,

Ray

 

 

 

 


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pointer values, taint propogation

Philip Reames via cfe-dev

I didn’t realize the screen captures would make things difficult.  Here’s a version of the e-mail with text instead.

 

I would like to write a checker to make sure that pointers from untrusted sources are not dereferenced.  So I am playing around with the alpha.security.taint checker to try to understand how taint propagation works.  I put together some simple test cases:

 

#include <stdio.h>

#include <string.h>

 

void test(float f, char *sp1, char *sp2, char *sp3) {

  char *sv1, *sv2, *sv3;

 

  long long ll;

  scanf("%lld", &ll);    // ll is read from stdin, so is untrusted

 

  if (ll) {

    printf((char *)ll, f);    // flag: ll is untrusted

    sv1 = (char *)ll;

    printf(sv1, f);    // flag: sv1 = ll is untrusted

    sp1 = (char *)ll;

    printf(sp1, f);    // flag: sp1 = ll is untrusted

 

    printf((char *)NULL+ll, f);    // flag: ll is untrusted

    unsigned long u = 0;

    u += ll;

    printf((char *)u, f);    // flag: u = 0+ll is untrusted

    sv2 = NULL;

    sv2 += ll;

    printf(sv2, f);    // flag: sv2 = NULL+ll is untrusted

    sp2 = NULL;

    sp2 += ll;

    printf(sp2, f);    // flag: sp2 = NULL+ll is untrusted

 

    printf(sp3, f);

    sv3 = sp3;

    sv3 += ll;

    printf(sv3, f);    // flag: sv3 = sp3+ll is untrusted

    sp3 += ll;

    printf(sp3, f);    // flag: sp3 = sp3+ll is untrusted

  }

}

 

The indicated calls to printf should be flagged because an untrusted value is passed as the format string (the first argument).  When I run the alpha.security.taint checker on this, it only flags the last two.  Looking into this, I found that the static analyzer often assigns an unknown value to the result of casting from an integer to a pointer.  Taint can’t be attached to an unknown value, so I need a known value for the pointer.  Although I’m not sure if this is really what we want to do, for now I added a post-check call-back on cast expressions to my checker that casts the integer to unsigned and then assigns the result as the pointer value.

 

void UntrustedDerefChecker::checkPostStmt(const CastExpr *CE, CheckerContext &ChCtx) const {

  ProgramStateRef St = ChCtx.getState();

  const LocationContext *LCtx = ChCtx.getLocationContext();

  if (CE->getCastKind() == CK_IntegralToPointer && St->getSVal(CE, LCtx).isUnknown()) {

    // if cast to pointer resulted in an unknown value, try casting to unsigned instead

    const Expr *SubExpr = CE->getSubExpr();

    QualType CastTy = ChCtx.getASTContext().UnsignedLongTy;

    QualType SubExprTy = SubExpr->getType();

    SVal Val = St->getSVal(SubExpr, LCtx);

    Val = St->getStateManager().getSValBuilder().evalIntegralCast(St, Val, CastTy, SubExprTy);

    ChCtx.addTransition(St->BindExpr(CE, LCtx, Val));

  }

}

 

When I run alpha.security.taint together with this on the test file, it flags the calls to printf where (char *)l and (char *)u are passed, but not the calls where sv1, sp1, (char *)NULL+l, sv2, or sp2 are passed, even though these are just different expressions with the same value.  I tried looking into this using the debugger, and found that when the analyzer calls ExprEngine::evalStore for the assignment statement (say “sv1 = (char *)ll”), it gives the pointer the appropriate value, but when the analyzer calls GenericTaintChecker::checkUncontrolledFormatString from the pre-check call-back for the call to printf on the following line of the test file, the pointer has an UnknownVal.  (See excepts from the debugging session below.)

 

Can anyone help me understand what is going on here?

 

Thanks,

Ray

 

(These debugger session excepts are inherently screen captures, but the gist of them is described above.)

 

 

 

 


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pointer values, taint propogation

Philip Reames via cfe-dev
Hello,

While the analyzer tries to cover all sides of the C's "everything is
everything" attitude (pointers are integers, integers are pointers,
pointer types do not matter), it seems that casting integers to pointers
is the least covered part of it.


--= IntegerAsLoc values =--

First, there is currently no accepted way to represent a cast from a
*symbolic* integer to a pointer in the SVal hierarchy. For the opposite
cast (from a pointer to an integer) there's nonloc::LocAsInteger SVal
sub-class to represent the result. However, nobody made a
"loc::IntegerAsLoc" for us yet.

I'd probably suggest to represent those as SymbolicRegion of an
integer-type symbol. In fact, the following patch fixes all of your
cases except sv2/sp2:

```
diff --git a/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
b/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
index 28b43dd..6e894f3 100644
--- a/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
+++ b/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
@@ -97,7 +97,9 @@ SVal SimpleSValBuilder::evalCastFromNonLoc(NonLoc val,
QualType castTy) {

      if (!isLocType)
        return makeNonLoc(se, T, castTy);
-    return UnknownVal();
+
+    const SymbolicRegion *SR = MemMgr.getSymbolicRegion(se);
+    return loc::MemRegionVal(SR);
    }
```

This produces "&SymRegion{conj_$0{long long}}", which was never done
before and needs much more testing. Surprisingly, it doesn't even fail
on tests (though i'd still suspect that it'd introduce a bunch of
crashes on real codebases). I believe this is eventually the way to go,
but i'd not be ready to accept this change right away without proper
evaluation. I'm not sure when i'd be able to do this work.

I also think that this trick is better than representing pointers with
unsigned integers the way you did it, because consistently representing
pointer values with Loc-type SVal's is, in my experience, a very good
contract to follow; there are a lot of useful assertions around that.


--= Why sv3/sp3 works? =--

This case is easier than the others because it doesn't produce any
casts, but works through pointer arithmetic, which is well-supported.
Essentially, "sv3 += ll" is treated similarly to "sv3[ll]", which
produces an ElementRegion inside SymbolicRegion. Because ElementRegion's
index ll = conj_$0<long long> is tainted, the whole SVal is considered
tainted.


--= What's so much wronger with sv2/sp2? =--

The difference is that it involves adding an integral symbol to a 0
(Loc); this operation itself yields UnknownVal. I've no idea why is this
simple operation unsupported (UnknownVal essentially always means
unsupported), but i suspect that it's because we'd need a pointer type
to conduct it correctly (multiply the shift by pointee type size), which
is not stored in 0 (Loc) itself. Adding pointer type here would take
some work.


--= Why so much inconsistency? =--

Because static symbolic execution is so much harder than compilation or
"concrete" interpretation, but on the other hand it doesn't require
absolute correctness, it's natural that some operations will be
simplified out until somebody really needs them to be supported. There
are just so many cases to cover, and a lot of them require extending the
SVal algebra with new operations and new value kinds. So we're bound to
see UnknownVal's from time to time.

Another layer of inconsistency - not sure if it affected you - is that
often UnknownVal's are replaced with SymbolConjured values, which
happens whenever somebody finds out that he badly needs to recover
path-sensitivity in his specific example. There's no consensus if
symbolicating unknowns is generally a good thing.

Once you notice symbolication, you can hack upon it by propagating taint
towards the newly conjured symbol in a checker callback. This would
work, but it'd be harder to propagate taint removal ("the symbol is
already verified, no longer to keep it tainted") back to the original
symbol.


On 2/15/17 3:55 PM, McDowell, Raymond C. via cfe-dev wrote:

>
> I didn’t realize the screen captures would make things difficult.
> Here’s a version of the e-mail with text instead.
>
> I would like to write a checker to make sure that pointers from
> untrusted sources are not dereferenced.  So I am playing around with
> the alpha.security.taint checker to try to understand how taint
> propagation works.  I put together some simple test cases:
>
> #include <stdio.h>
>
> #include <string.h>
>
> void test(float f, char *sp1, char *sp2, char *sp3) {
>
> char *sv1, *sv2, *sv3;
>
> long long ll;
>
> scanf("%lld", &ll);    // ll is read from stdin, so is untrusted
>
> if (ll) {
>
> printf((char *)ll, f);    // flag: ll is untrusted
>
> sv1 = (char *)ll;
>
> printf(sv1, f);    // flag: sv1 = ll is untrusted
>
> sp1 = (char *)ll;
>
> printf(sp1, f);    // flag: sp1 = ll is untrusted
>
> printf((char *)NULL+ll, f);    // flag: ll is untrusted
>
> unsigned long u = 0;
>
> u += ll;
>
> printf((char *)u, f);    // flag: u = 0+ll is untrusted
>
> sv2 = NULL;
>
> sv2 += ll;
>
> printf(sv2, f);    // flag: sv2 = NULL+ll is untrusted
>
> sp2 = NULL;
>
> sp2 += ll;
>
> printf(sp2, f);    // flag: sp2 = NULL+ll is untrusted
>
> printf(sp3, f);
>
> sv3 = sp3;
>
> sv3 += ll;
>
> printf(sv3, f);    // flag: sv3 = sp3+ll is untrusted
>
> sp3 += ll;
>
> printf(sp3, f);    // flag: sp3 = sp3+ll is untrusted
>
> }
>
> }
>
> The indicated calls to printf should be flagged because an untrusted
> value is passed as the format string (the first argument).  When I run
> the alpha.security.taint checker on this, it only flags the last two.  
> Looking into this, I found that the static analyzer often assigns an
> unknown value to the result of casting from an integer to a pointer.  
> Taint can’t be attached to an unknown value, so I need a known value
> for the pointer. Although I’m not sure if this is really what we want
> to do, for now I added a post-check call-back on cast expressions to
> my checker that casts the integer to unsigned and then assigns the
> result as the pointer value.
>
> void UntrustedDerefChecker::checkPostStmt(const CastExpr *CE,
> CheckerContext &ChCtx) const {
>
> ProgramStateRef St = ChCtx.getState();
>
> const LocationContext *LCtx = ChCtx.getLocationContext();
>
> if (CE->getCastKind() == CK_IntegralToPointer && St->getSVal(CE,
> LCtx).isUnknown()) {
>
> // if cast to pointer resulted in an unknown value, try casting to
> unsigned instead
>
> const Expr *SubExpr = CE->getSubExpr();
>
> QualType CastTy = ChCtx.getASTContext().UnsignedLongTy;
>
> QualType SubExprTy = SubExpr->getType();
>
> SVal Val = St->getSVal(SubExpr, LCtx);
>
> Val = St->getStateManager().getSValBuilder().evalIntegralCast(St, Val,
> CastTy, SubExprTy);
>
> ChCtx.addTransition(St->BindExpr(CE, LCtx, Val));
>
> }
>
> }
>
> When I run alpha.security.taint together with this on the test file,
> it flags the calls to printf where (char *)l and (char *)u are passed,
> but not the calls where sv1, sp1, (char *)NULL+l, sv2, or sp2 are
> passed, even though these are just different expressions with the same
> value.  I tried looking into this using the debugger, and found that
> when the analyzer calls ExprEngine::evalStore for the assignment
> statement (say “sv1 = (char *)ll”), it gives the pointer the
> appropriate value, but when the analyzer calls
> GenericTaintChecker::checkUncontrolledFormatString from the pre-check
> call-back for the call to printf on the following line of the test
> file, the pointer has an UnknownVal.  (See excepts from the debugging
> session below.)
>
> Can anyone help me understand what is going on here?
>
> Thanks,
>
> Ray
>
> (These debugger session excepts are inherently screen captures, but
> the gist of them is described above.)
>
>
>
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pointer values, taint propogation

Philip Reames via cfe-dev
Artem,

Thank you for your detailed answer -- it was illuminating and very helpful.  Representing a pointer as an unsigned integer was causing problems, but creating a new symbolic region seems to be working.  I am now wondering if we can do something similar for the sv2/sp2 cases: when we add a symbolic integer value to a null pointer, make the value a new symbolic region based on that integer value.  I tried to write a pre-check/post-check pair to do this, but am running into trouble even with the pre-check to catch this situation.  Here's what I wrote:

void UntrustedDerefChecker::checkPreStmt(const BinaryOperator *Bop, CheckerContext &ChCtx) const {
   ProgramStateRef St  = ChCtx.getState();
   const LocationContext *LCtx = ChCtx.getLocationContext();
   BinaryOperator::Opcode Op = Bop->getOpcode();
   if (Op != BO_Add && Op != BO_AddAssign
       && Op != BO_Sub && p != BO_SubAssign)
      return;
   Expr *LHS = Bop->getLHS();
   if (LHS->getType()->isPointerType()) {
      SVal ValL = St->getSVal(LHS, LCtx);
      ProgramStateRef LIsNullSt, LIsNotNullSt;
      std::tie(LIsNotNullSt, LIsNullSt) = St->assume(ValL.castAs<DefinedOrUnknownSVal>());
      if (LIsNullSt & !LIsNotNullSt) {
         ChCtx.addTransition(LIsNullSt->set<NullPtrArith>((unsigned)LeftArg));
         return;
      }
   }
  if (Po != BO_Add && Op != BO_Sub)
      return;
   ... // handle pointer on right symmetrically to pointer on the left case above
}

But when this call-back is called for the sv2 += ll statement, LIsNuLLSt is null and LIsNotNullSt is a valid state, so it thinks sv2 must be non-null even though the state has it constrained to be null.  I traced it in the debugger and it looks like the issue is in SimpleSValBuilder::evalCastFromLoc, where the non-symbolic case falls through to the GotoLabel case, making any ConcreteInt-as-Loc value (even NULL) evaluate to true.

Is this a bug, or am I misunderstanding what is going on?  Any suggestions for a work-around?

Thanks,
Ray

-----Original Message-----
From: Artem Dergachev [mailto:[hidden email]]
Sent: Monday, February 20, 2017 5:45 AM
To: McDowell, Raymond C.; [hidden email]
Subject: Re: [cfe-dev] pointer values, taint propogation

Hello,

While the analyzer tries to cover all sides of the C's "everything is everything" attitude (pointers are integers, integers are pointers, pointer types do not matter), it seems that casting integers to pointers is the least covered part of it.


--= IntegerAsLoc values =--

First, there is currently no accepted way to represent a cast from a
*symbolic* integer to a pointer in the SVal hierarchy. For the opposite cast (from a pointer to an integer) there's nonloc::LocAsInteger SVal sub-class to represent the result. However, nobody made a "loc::IntegerAsLoc" for us yet.

I'd probably suggest to represent those as SymbolicRegion of an integer-type symbol. In fact, the following patch fixes all of your cases except sv2/sp2:

```
diff --git a/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
b/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
index 28b43dd..6e894f3 100644
--- a/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
+++ b/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
@@ -97,7 +97,9 @@ SVal SimpleSValBuilder::evalCastFromNonLoc(NonLoc val, QualType castTy) {

      if (!isLocType)
        return makeNonLoc(se, T, castTy);
-    return UnknownVal();
+
+    const SymbolicRegion *SR = MemMgr.getSymbolicRegion(se);
+    return loc::MemRegionVal(SR);
    }
```

This produces "&SymRegion{conj_$0{long long}}", which was never done before and needs much more testing. Surprisingly, it doesn't even fail on tests (though i'd still suspect that it'd introduce a bunch of crashes on real codebases). I believe this is eventually the way to go, but i'd not be ready to accept this change right away without proper evaluation. I'm not sure when i'd be able to do this work.

I also think that this trick is better than representing pointers with unsigned integers the way you did it, because consistently representing pointer values with Loc-type SVal's is, in my experience, a very good contract to follow; there are a lot of useful assertions around that.


--= Why sv3/sp3 works? =--

This case is easier than the others because it doesn't produce any casts, but works through pointer arithmetic, which is well-supported.
Essentially, "sv3 += ll" is treated similarly to "sv3[ll]", which produces an ElementRegion inside SymbolicRegion. Because ElementRegion's index ll = conj_$0<long long> is tainted, the whole SVal is considered tainted.


--= What's so much wronger with sv2/sp2? =--

The difference is that it involves adding an integral symbol to a 0 (Loc); this operation itself yields UnknownVal. I've no idea why is this simple operation unsupported (UnknownVal essentially always means unsupported), but i suspect that it's because we'd need a pointer type to conduct it correctly (multiply the shift by pointee type size), which is not stored in 0 (Loc) itself. Adding pointer type here would take some work.


--= Why so much inconsistency? =--

Because static symbolic execution is so much harder than compilation or "concrete" interpretation, but on the other hand it doesn't require absolute correctness, it's natural that some operations will be simplified out until somebody really needs them to be supported. There are just so many cases to cover, and a lot of them require extending the SVal algebra with new operations and new value kinds. So we're bound to see UnknownVal's from time to time.

Another layer of inconsistency - not sure if it affected you - is that often UnknownVal's are replaced with SymbolConjured values, which happens whenever somebody finds out that he badly needs to recover path-sensitivity in his specific example. There's no consensus if symbolicating unknowns is generally a good thing.

Once you notice symbolication, you can hack upon it by propagating taint towards the newly conjured symbol in a checker callback. This would work, but it'd be harder to propagate taint removal ("the symbol is already verified, no longer to keep it tainted") back to the original symbol.


On 2/15/17 3:55 PM, McDowell, Raymond C. via cfe-dev wrote:

>
> I didn’t realize the screen captures would make things difficult.
> Here’s a version of the e-mail with text instead.
>
> I would like to write a checker to make sure that pointers from
> untrusted sources are not dereferenced.  So I am playing around with
> the alpha.security.taint checker to try to understand how taint
> propagation works.  I put together some simple test cases:
>
> #include <stdio.h>
>
> #include <string.h>
>
> void test(float f, char *sp1, char *sp2, char *sp3) {
>
> char *sv1, *sv2, *sv3;
>
> long long ll;
>
> scanf("%lld", &ll);    // ll is read from stdin, so is untrusted
>
> if (ll) {
>
> printf((char *)ll, f);    // flag: ll is untrusted
>
> sv1 = (char *)ll;
>
> printf(sv1, f);    // flag: sv1 = ll is untrusted
>
> sp1 = (char *)ll;
>
> printf(sp1, f);    // flag: sp1 = ll is untrusted
>
> printf((char *)NULL+ll, f);    // flag: ll is untrusted
>
> unsigned long u = 0;
>
> u += ll;
>
> printf((char *)u, f);    // flag: u = 0+ll is untrusted
>
> sv2 = NULL;
>
> sv2 += ll;
>
> printf(sv2, f);    // flag: sv2 = NULL+ll is untrusted
>
> sp2 = NULL;
>
> sp2 += ll;
>
> printf(sp2, f);    // flag: sp2 = NULL+ll is untrusted
>
> printf(sp3, f);
>
> sv3 = sp3;
>
> sv3 += ll;
>
> printf(sv3, f);    // flag: sv3 = sp3+ll is untrusted
>
> sp3 += ll;
>
> printf(sp3, f);    // flag: sp3 = sp3+ll is untrusted
>
> }
>
> }
>
> The indicated calls to printf should be flagged because an untrusted
> value is passed as the format string (the first argument).  When I run
> the alpha.security.taint checker on this, it only flags the last two.
> Looking into this, I found that the static analyzer often assigns an
> unknown value to the result of casting from an integer to a pointer.
> Taint can’t be attached to an unknown value, so I need a known value
> for the pointer. Although I’m not sure if this is really what we want
> to do, for now I added a post-check call-back on cast expressions to
> my checker that casts the integer to unsigned and then assigns the
> result as the pointer value.
>
> void UntrustedDerefChecker::checkPostStmt(const CastExpr *CE,
> CheckerContext &ChCtx) const {
>
> ProgramStateRef St = ChCtx.getState();
>
> const LocationContext *LCtx = ChCtx.getLocationContext();
>
> if (CE->getCastKind() == CK_IntegralToPointer && St->getSVal(CE,
> LCtx).isUnknown()) {
>
> // if cast to pointer resulted in an unknown value, try casting to
> unsigned instead
>
> const Expr *SubExpr = CE->getSubExpr();
>
> QualType CastTy = ChCtx.getASTContext().UnsignedLongTy;
>
> QualType SubExprTy = SubExpr->getType();
>
> SVal Val = St->getSVal(SubExpr, LCtx);
>
> Val = St->getStateManager().getSValBuilder().evalIntegralCast(St, Val,
> CastTy, SubExprTy);
>
> ChCtx.addTransition(St->BindExpr(CE, LCtx, Val));
>
> }
>
> }
>
> When I run alpha.security.taint together with this on the test file,
> it flags the calls to printf where (char *)l and (char *)u are passed,
> but not the calls where sv1, sp1, (char *)NULL+l, sv2, or sp2 are
> passed, even though these are just different expressions with the same
> value.  I tried looking into this using the debugger, and found that
> when the analyzer calls ExprEngine::evalStore for the assignment
> statement (say “sv1 = (char *)ll”), it gives the pointer the
> appropriate value, but when the analyzer calls
> GenericTaintChecker::checkUncontrolledFormatString from the pre-check
> call-back for the call to printf on the following line of the test
> file, the pointer has an UnknownVal.  (See excepts from the debugging
> session below.)
>
> Can anyone help me understand what is going on here?
>
> Thanks,
>
> Ray
>
> (These debugger session excepts are inherently screen captures, but
> the gist of them is described above.)
>
>
>
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pointer values, taint propogation

Philip Reames via cfe-dev
Hello,

Sorry, it took me some time to get to reproducing this issue.

I think everything works correctly, because left-hand side of operator
+= (unlike operator +) is actually an lvalue - the variable sv2 (sp2)
itself, not its contents. The relevant SVal is .dump()'ed as &sv2, and
"means" a pointer to the variable sv2. Null pointer usually doesn't
point to stack variables, so the null-state is modeled to be impossible.

To get the rvalue of the left-hand side expression, you'd need to
perform an extra getSVal(), this time from the Store - which essentially
dereferences the pointer. Something like this:

```
   if (LHS->isLValue()) // or just check the opcode
     if (auto ValLAsLoc = ValL.getAs<Loc>()) // auto resolves to
Optional<Loc>
       ValL = St->getSVal(*ValLAsLoc);
     else
       // Handle what went wrong. Perhaps an UndefinedVal or an UnknownVal.
```

Generally, it's a very good idea to .dump() the SVal's you encounter,
there's a lot of useful info. There's also this SValExplainer class
which may help with understanding the SVal hierarchy.

On 2/23/17 11:10 PM, McDowell, Raymond C. wrote:

> Artem,
>
> Thank you for your detailed answer -- it was illuminating and very helpful.  Representing a pointer as an unsigned integer was causing problems, but creating a new symbolic region seems to be working.  I am now wondering if we can do something similar for the sv2/sp2 cases: when we add a symbolic integer value to a null pointer, make the value a new symbolic region based on that integer value.  I tried to write a pre-check/post-check pair to do this, but am running into trouble even with the pre-check to catch this situation.  Here's what I wrote:
>
> void UntrustedDerefChecker::checkPreStmt(const BinaryOperator *Bop, CheckerContext &ChCtx) const {
>     ProgramStateRef St  = ChCtx.getState();
>     const LocationContext *LCtx = ChCtx.getLocationContext();
>     BinaryOperator::Opcode Op = Bop->getOpcode();
>     if (Op != BO_Add && Op != BO_AddAssign
>         && Op != BO_Sub && p != BO_SubAssign)
>        return;
>     Expr *LHS = Bop->getLHS();
>     if (LHS->getType()->isPointerType()) {
>        SVal ValL = St->getSVal(LHS, LCtx);
>        ProgramStateRef LIsNullSt, LIsNotNullSt;
>        std::tie(LIsNotNullSt, LIsNullSt) = St->assume(ValL.castAs<DefinedOrUnknownSVal>());
>        if (LIsNullSt & !LIsNotNullSt) {
>           ChCtx.addTransition(LIsNullSt->set<NullPtrArith>((unsigned)LeftArg));
>           return;
>        }
>     }
>    if (Po != BO_Add && Op != BO_Sub)
>        return;
>     ... // handle pointer on right symmetrically to pointer on the left case above
> }
>
> But when this call-back is called for the sv2 += ll statement, LIsNuLLSt is null and LIsNotNullSt is a valid state, so it thinks sv2 must be non-null even though the state has it constrained to be null.  I traced it in the debugger and it looks like the issue is in SimpleSValBuilder::evalCastFromLoc, where the non-symbolic case falls through to the GotoLabel case, making any ConcreteInt-as-Loc value (even NULL) evaluate to true.
>
> Is this a bug, or am I misunderstanding what is going on?  Any suggestions for a work-around?
>
> Thanks,
> Ray
>
> -----Original Message-----
> From: Artem Dergachev [mailto:[hidden email]]
> Sent: Monday, February 20, 2017 5:45 AM
> To: McDowell, Raymond C.; [hidden email]
> Subject: Re: [cfe-dev] pointer values, taint propogation
>
> Hello,
>
> While the analyzer tries to cover all sides of the C's "everything is everything" attitude (pointers are integers, integers are pointers, pointer types do not matter), it seems that casting integers to pointers is the least covered part of it.
>
>
> --= IntegerAsLoc values =--
>
> First, there is currently no accepted way to represent a cast from a
> *symbolic* integer to a pointer in the SVal hierarchy. For the opposite cast (from a pointer to an integer) there's nonloc::LocAsInteger SVal sub-class to represent the result. However, nobody made a "loc::IntegerAsLoc" for us yet.
>
> I'd probably suggest to represent those as SymbolicRegion of an integer-type symbol. In fact, the following patch fixes all of your cases except sv2/sp2:
>
> ```
> diff --git a/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
> b/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
> index 28b43dd..6e894f3 100644
> --- a/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
> +++ b/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
> @@ -97,7 +97,9 @@ SVal SimpleSValBuilder::evalCastFromNonLoc(NonLoc val, QualType castTy) {
>
>        if (!isLocType)
>          return makeNonLoc(se, T, castTy);
> -    return UnknownVal();
> +
> +    const SymbolicRegion *SR = MemMgr.getSymbolicRegion(se);
> +    return loc::MemRegionVal(SR);
>      }
> ```
>
> This produces "&SymRegion{conj_$0{long long}}", which was never done before and needs much more testing. Surprisingly, it doesn't even fail on tests (though i'd still suspect that it'd introduce a bunch of crashes on real codebases). I believe this is eventually the way to go, but i'd not be ready to accept this change right away without proper evaluation. I'm not sure when i'd be able to do this work.
>
> I also think that this trick is better than representing pointers with unsigned integers the way you did it, because consistently representing pointer values with Loc-type SVal's is, in my experience, a very good contract to follow; there are a lot of useful assertions around that.
>
>
> --= Why sv3/sp3 works? =--
>
> This case is easier than the others because it doesn't produce any casts, but works through pointer arithmetic, which is well-supported.
> Essentially, "sv3 += ll" is treated similarly to "sv3[ll]", which produces an ElementRegion inside SymbolicRegion. Because ElementRegion's index ll = conj_$0<long long> is tainted, the whole SVal is considered tainted.
>
>
> --= What's so much wronger with sv2/sp2? =--
>
> The difference is that it involves adding an integral symbol to a 0 (Loc); this operation itself yields UnknownVal. I've no idea why is this simple operation unsupported (UnknownVal essentially always means unsupported), but i suspect that it's because we'd need a pointer type to conduct it correctly (multiply the shift by pointee type size), which is not stored in 0 (Loc) itself. Adding pointer type here would take some work.
>
>
> --= Why so much inconsistency? =--
>
> Because static symbolic execution is so much harder than compilation or "concrete" interpretation, but on the other hand it doesn't require absolute correctness, it's natural that some operations will be simplified out until somebody really needs them to be supported. There are just so many cases to cover, and a lot of them require extending the SVal algebra with new operations and new value kinds. So we're bound to see UnknownVal's from time to time.
>
> Another layer of inconsistency - not sure if it affected you - is that often UnknownVal's are replaced with SymbolConjured values, which happens whenever somebody finds out that he badly needs to recover path-sensitivity in his specific example. There's no consensus if symbolicating unknowns is generally a good thing.
>
> Once you notice symbolication, you can hack upon it by propagating taint towards the newly conjured symbol in a checker callback. This would work, but it'd be harder to propagate taint removal ("the symbol is already verified, no longer to keep it tainted") back to the original symbol.
>
>
> On 2/15/17 3:55 PM, McDowell, Raymond C. via cfe-dev wrote:
>> I didn’t realize the screen captures would make things difficult.
>> Here’s a version of the e-mail with text instead.
>>
>> I would like to write a checker to make sure that pointers from
>> untrusted sources are not dereferenced.  So I am playing around with
>> the alpha.security.taint checker to try to understand how taint
>> propagation works.  I put together some simple test cases:
>>
>> #include <stdio.h>
>>
>> #include <string.h>
>>
>> void test(float f, char *sp1, char *sp2, char *sp3) {
>>
>> char *sv1, *sv2, *sv3;
>>
>> long long ll;
>>
>> scanf("%lld", &ll);    // ll is read from stdin, so is untrusted
>>
>> if (ll) {
>>
>> printf((char *)ll, f);    // flag: ll is untrusted
>>
>> sv1 = (char *)ll;
>>
>> printf(sv1, f);    // flag: sv1 = ll is untrusted
>>
>> sp1 = (char *)ll;
>>
>> printf(sp1, f);    // flag: sp1 = ll is untrusted
>>
>> printf((char *)NULL+ll, f);    // flag: ll is untrusted
>>
>> unsigned long u = 0;
>>
>> u += ll;
>>
>> printf((char *)u, f);    // flag: u = 0+ll is untrusted
>>
>> sv2 = NULL;
>>
>> sv2 += ll;
>>
>> printf(sv2, f);    // flag: sv2 = NULL+ll is untrusted
>>
>> sp2 = NULL;
>>
>> sp2 += ll;
>>
>> printf(sp2, f);    // flag: sp2 = NULL+ll is untrusted
>>
>> printf(sp3, f);
>>
>> sv3 = sp3;
>>
>> sv3 += ll;
>>
>> printf(sv3, f);    // flag: sv3 = sp3+ll is untrusted
>>
>> sp3 += ll;
>>
>> printf(sp3, f);    // flag: sp3 = sp3+ll is untrusted
>>
>> }
>>
>> }
>>
>> The indicated calls to printf should be flagged because an untrusted
>> value is passed as the format string (the first argument).  When I run
>> the alpha.security.taint checker on this, it only flags the last two.
>> Looking into this, I found that the static analyzer often assigns an
>> unknown value to the result of casting from an integer to a pointer.
>> Taint can’t be attached to an unknown value, so I need a known value
>> for the pointer. Although I’m not sure if this is really what we want
>> to do, for now I added a post-check call-back on cast expressions to
>> my checker that casts the integer to unsigned and then assigns the
>> result as the pointer value.
>>
>> void UntrustedDerefChecker::checkPostStmt(const CastExpr *CE,
>> CheckerContext &ChCtx) const {
>>
>> ProgramStateRef St = ChCtx.getState();
>>
>> const LocationContext *LCtx = ChCtx.getLocationContext();
>>
>> if (CE->getCastKind() == CK_IntegralToPointer && St->getSVal(CE,
>> LCtx).isUnknown()) {
>>
>> // if cast to pointer resulted in an unknown value, try casting to
>> unsigned instead
>>
>> const Expr *SubExpr = CE->getSubExpr();
>>
>> QualType CastTy = ChCtx.getASTContext().UnsignedLongTy;
>>
>> QualType SubExprTy = SubExpr->getType();
>>
>> SVal Val = St->getSVal(SubExpr, LCtx);
>>
>> Val = St->getStateManager().getSValBuilder().evalIntegralCast(St, Val,
>> CastTy, SubExprTy);
>>
>> ChCtx.addTransition(St->BindExpr(CE, LCtx, Val));
>>
>> }
>>
>> }
>>
>> When I run alpha.security.taint together with this on the test file,
>> it flags the calls to printf where (char *)l and (char *)u are passed,
>> but not the calls where sv1, sp1, (char *)NULL+l, sv2, or sp2 are
>> passed, even though these are just different expressions with the same
>> value.  I tried looking into this using the debugger, and found that
>> when the analyzer calls ExprEngine::evalStore for the assignment
>> statement (say “sv1 = (char *)ll”), it gives the pointer the
>> appropriate value, but when the analyzer calls
>> GenericTaintChecker::checkUncontrolledFormatString from the pre-check
>> call-back for the call to printf on the following line of the test
>> file, the pointer has an UnknownVal.  (See excepts from the debugging
>> session below.)
>>
>> Can anyone help me understand what is going on here?
>>
>> Thanks,
>>
>> Ray
>>
>> (These debugger session excepts are inherently screen captures, but
>> the gist of them is described above.)
>>
>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Loading...