Static Analyzer and signed 64bit constraints

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Static Analyzer and signed 64bit constraints

Louis Dionne via cfe-dev
Hello,

I'm attempting to write a checker to look for certain signedness issues. I have code similar to the following:

int32_t z = -1;
read (STDIN_FILENO, &z, sizeof(z));

if (z > 20) exit (EXIT_FAILURE);

memcpy (x, y, z);

Inside the checker I am printing the constraints on the SVal 'z' with a PreCall, filtering on memcpy:

ProgramStateRef State = C.getState();
State->getConstraintManager().print(State, llvm::outs(), "\n", " ");

I have two questions related to this:

1. The constraint on the above value is printed as 'conj_$0{int32_t} : { [-2147483648, -1], [1, 20] }'. What happened to zero? My best guess is the static analyzer realizing that 'memcpy(x, y, 0)' is a nop and pruning that program state?

2. The above code, aside from the zero exception, makes sense when z is an int8_t, int16_t, or an int32_t:

 conj_$0{int8_t} : { [-128, -1], [1, 20] }
 conj_$0{int16_t} : { [-32768, -1], [1, 20] }
 conj_$0{int32_t} : { [-2147483648, -1], [1, 20] }

However, if I switch this to an int64_t I get:

 conj_$0{int64_t} : { [1, 20] }

What happened to all the negative numbers?

Apologies if I've missed something obvious.

Cheers!
-yrp

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Static Analyzer and signed 64bit constraints

Louis Dionne via cfe-dev
 > The constraint on the above value is printed as 'conj_$0{int32_t} : {
[-2147483648, -1], [1, 20] }'. What happened to zero? My best guess is
the static analyzer realizing that 'memcpy(x, y, 0)' is a nop and
pruning that program state?

Yes, indeed, it is an effect of how we model 'memcpy'. I don't think
it's the desired behavior though. In particular, we don't display a
warning on 'memcpy(NULL, y, 0)', even though it's UB. So, yeah, sounds
like either a minor bug. It's not obvious for me how often paths with
zero-size memcpys are actually feasible in real-world programs.



 > However, if I switch this to an int64_t I get:
 > conj_$0{int64_t} : { [1, 20] }

Hmm, weird, i'm not able to reproduce it. Could you provide more info?
I'm trying:


$ cat test.c


   #include <stdlib.h>
   #include <stdio.h>

   void foo(void *x, void *y) {
     int64_t z = -1;
     read(1, &z, sizeof(z));
     if (z > 20)
       exit(-1);
     memcpy(x, y, z);
     // triggers state dump via ExprInspection
     clang_analyzer_printState();
     // don't garbage-collect 'z' until now
     z;
   }


$ clang --analyze test.c -Xclang -analyzer-checker -Xclang
debug.ExprInspection


Store (direct and default bindings), 0x7fcf3114a5e8 :
  (z,0,direct) : conj_$0{int64_t}
  (GlobalInternalSpaceRegion,0,default) : conj_$1{int}
  (GlobalSystemSpaceRegion,0,default) : conj_$2{int}

Expressions:
  (0x7fcf30579920,0x7fcf3113c0c0) clang_analyzer_printState :
&code{clang_analyzer_printState}

Ranges of symbol values:
  conj_$0{int64_t} : { [-9223372036854775808, -1], [1, 20] }



On 29/01/2018 12:25 AM, - yrp via cfe-dev wrote:

> Hello,
>
> I'm attempting to write a checker to look for certain signedness
> issues. I have code similar to the following:
>
> int32_t z = -1;
> read (STDIN_FILENO, &z, sizeof(z));
>
> if (z > 20) exit (EXIT_FAILURE);
>
> memcpy (x, y, z);
>
> Inside the checker I am printing the constraints on the SVal 'z' with
> a PreCall, filtering on memcpy:
>
> ProgramStateRef State = C.getState();
> State->getConstraintManager().print(State, llvm::outs(), "\n", " ");
>
> I have two questions related to this:
>
> 1. The constraint on the above value is printed as 'conj_$0{int32_t} :
> { [-2147483648, -1], [1, 20] }'. What happened to zero? My best guess
> is the static analyzer realizing that 'memcpy(x, y, 0)' is a nop and
> pruning that program state?
>
> 2. The above code, aside from the zero exception, makes sense when z
> is an int8_t, int16_t, or an int32_t:
>
>  conj_$0{int8_t} : { [-128, -1], [1, 20] }
>  conj_$0{int16_t} : { [-32768, -1], [1, 20] }
>  conj_$0{int32_t} : { [-2147483648, -1], [1, 20] }
>
> However, if I switch this to an int64_t I get:
>
>  conj_$0{int64_t} : { [1, 20] }
>
> What happened to all the negative numbers?
>
> Apologies if I've missed something obvious.
>
> Cheers!
> -yrp
>
>
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Static Analyzer and signed 64bit constraints

Louis Dionne via cfe-dev


Hi Artem,

Thanks for the reply and info. After minimizing it appears this is related to sizeof + 64bit perhaps? I had to modify your testcase slightly to demonstrate the behavior Im seeing:


#include <stdlib.h>
#include <stdio.h>

void foo(void) {
  int64_t z = -1;
  read(1, &z, sizeof(z));

  char x[20] = { 0 };
  char y[20] = { 0 };

  // if (z > 20)
  if (z > sizeof x)
    exit(-1);
  memcpy(x, y, z);
  // triggers state dump via ExprInspection
  clang_analyzer_printState();
  // don't garbage-collect 'z' until now
  z;
}


$ clang --analyze test.c -Xclang -analyzer-checker -Xclang debug.ExprInspection
...
 conj_$0{int64_t} : { [1, 20] }

If I change the type of 'z' to int32_t I get:

 conj_$0{int32_t} : { [-2147483648, -1], [1, 20] }

If I change the if guard on 'z' to the literal '20' and keep the type as int64_t I get the expected range:

 conj_$0{int64_t} : { [-9223372036854775808, -1], [1, 20] }


Have I misunderstood some C behavior regarding sizeof and this is expected? I'll feel pretty silly if this is the case...

In case it's relevant: clang version 6.0.0 (trunk 318002) running on linux

Cheers!
yrp




On Monday, January 29, 2018, 6:18:51 PM PST, Artem Dergachev <[hidden email]> wrote:





> The constraint on the above value is printed as 'conj_$0{int32_t} : {
[-2147483648, -1], [1, 20] }'. What happened to zero? My best guess is
the static analyzer realizing that 'memcpy(x, y, 0)' is a nop and
pruning that program state?

Yes, indeed, it is an effect of how we model 'memcpy'. I don't think
it's the desired behavior though. In particular, we don't display a
warning on 'memcpy(NULL, y, 0)', even though it's UB. So, yeah, sounds
like either a minor bug. It's not obvious for me how often paths with
zero-size memcpys are actually feasible in real-world programs.



> However, if I switch this to an int64_t I get:
> conj_$0{int64_t} : { [1, 20] }

Hmm, weird, i'm not able to reproduce it. Could you provide more info?
I'm trying:


$ cat test.c


  #include <stdlib.h>
  #include <stdio.h>

  void foo(void *x, void *y) {
    int64_t z = -1;
    read(1, &z, sizeof(z));
    if (z > 20)
      exit(-1);
    memcpy(x, y, z);
    // triggers state dump via ExprInspection
    clang_analyzer_printState();
    // don't garbage-collect 'z' until now
    z;
  }


$ clang --analyze test.c -Xclang -analyzer-checker -Xclang
debug.ExprInspection


Store (direct and default bindings), 0x7fcf3114a5e8 :
 (z,0,direct) : conj_$0{int64_t}
 (GlobalInternalSpaceRegion,0,default) : conj_$1{int}
 (GlobalSystemSpaceRegion,0,default) : conj_$2{int}

Expressions:
 (0x7fcf30579920,0x7fcf3113c0c0) clang_analyzer_printState :
&code{clang_analyzer_printState}

Ranges of symbol values:
 conj_$0{int64_t} : { [-9223372036854775808, -1], [1, 20] }



On 29/01/2018 12:25 AM, - yrp via cfe-dev wrote:

> Hello,
>
> I'm attempting to write a checker to look for certain signedness
> issues. I have code similar to the following:
>
> int32_t z = -1;
> read (STDIN_FILENO, &z, sizeof(z));
>
> if (z > 20) exit (EXIT_FAILURE);
>
> memcpy (x, y, z);
>
> Inside the checker I am printing the constraints on the SVal 'z' with
> a PreCall, filtering on memcpy:
>
> ProgramStateRef State = C.getState();
> State->getConstraintManager().print(State, llvm::outs(), "\n", " ");
>
> I have two questions related to this:
>
> 1. The constraint on the above value is printed as 'conj_$0{int32_t} :
> { [-2147483648, -1], [1, 20] }'. What happened to zero? My best guess
> is the static analyzer realizing that 'memcpy(x, y, 0)' is a nop and
> pruning that program state?
>
> 2. The above code, aside from the zero exception, makes sense when z
> is an int8_t, int16_t, or an int32_t:
>
>  conj_$0{int8_t} : { [-128, -1], [1, 20] }
>  conj_$0{int16_t} : { [-32768, -1], [1, 20] }
>  conj_$0{int32_t} : { [-2147483648, -1], [1, 20] }
>
> However, if I switch this to an int64_t I get:
>
>  conj_$0{int64_t} : { [1, 20] }
>
> What happened to all the negative numbers?
>
> Apologies if I've missed something obvious.
>
> Cheers!
> -yrp

>
>
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Static Analyzer and signed 64bit constraints

Louis Dionne via cfe-dev
Hmm, this looks correct. Since typeof(sizeof(...)) is size_t, which is
uint64_t on that architecture, and C promotion rules for comparison of
int64_t and uint64_t would require conversion of both operands into
uint64_t, all negative values of 'z' would turn out to be larger than
20. This is different from comparison of, say, int16_t and uint16_t,
which would be converted to int for the purposes of comparison.

Like, the analyzer doesn't always do a good job around integral casts,
but this case looks correct.

On 29/01/2018 9:28 PM, - yrp wrote:

>
> Hi Artem,
>
> Thanks for the reply and info. After minimizing it appears this is related to sizeof + 64bit perhaps? I had to modify your testcase slightly to demonstrate the behavior Im seeing:
>
>
> #include <stdlib.h>
> #include <stdio.h>
>
> void foo(void) {
>    int64_t z = -1;
>    read(1, &z, sizeof(z));
>
>    char x[20] = { 0 };
>    char y[20] = { 0 };
>
>    // if (z > 20)
>    if (z > sizeof x)
>      exit(-1);
>    memcpy(x, y, z);
>    // triggers state dump via ExprInspection
>    clang_analyzer_printState();
>    // don't garbage-collect 'z' until now
>    z;
> }
>
>
> $ clang --analyze test.c -Xclang -analyzer-checker -Xclang debug.ExprInspection
> ...
>   conj_$0{int64_t} : { [1, 20] }
>
> If I change the type of 'z' to int32_t I get:
>
>   conj_$0{int32_t} : { [-2147483648, -1], [1, 20] }
>
> If I change the if guard on 'z' to the literal '20' and keep the type as int64_t I get the expected range:
>
>   conj_$0{int64_t} : { [-9223372036854775808, -1], [1, 20] }
>
>
> Have I misunderstood some C behavior regarding sizeof and this is expected? I'll feel pretty silly if this is the case...
>
> In case it's relevant: clang version 6.0.0 (trunk 318002) running on linux
>
> Cheers!
> yrp
>
>
>
>
> On Monday, January 29, 2018, 6:18:51 PM PST, Artem Dergachev <[hidden email]> wrote:
>
>
>
>
>
>> The constraint on the above value is printed as 'conj_$0{int32_t} : {
> [-2147483648, -1], [1, 20] }'. What happened to zero? My best guess is
> the static analyzer realizing that 'memcpy(x, y, 0)' is a nop and
> pruning that program state?
>
> Yes, indeed, it is an effect of how we model 'memcpy'. I don't think
> it's the desired behavior though. In particular, we don't display a
> warning on 'memcpy(NULL, y, 0)', even though it's UB. So, yeah, sounds
> like either a minor bug. It's not obvious for me how often paths with
> zero-size memcpys are actually feasible in real-world programs.
>
>
>
>> However, if I switch this to an int64_t I get:
>> conj_$0{int64_t} : { [1, 20] }
> Hmm, weird, i'm not able to reproduce it. Could you provide more info?
> I'm trying:
>
>
> $ cat test.c
>
>
>    #include <stdlib.h>
>    #include <stdio.h>
>
>    void foo(void *x, void *y) {
>      int64_t z = -1;
>      read(1, &z, sizeof(z));
>      if (z > 20)
>        exit(-1);
>      memcpy(x, y, z);
>      // triggers state dump via ExprInspection
>      clang_analyzer_printState();
>      // don't garbage-collect 'z' until now
>      z;
>    }
>
>
> $ clang --analyze test.c -Xclang -analyzer-checker -Xclang
> debug.ExprInspection
>
>
> Store (direct and default bindings), 0x7fcf3114a5e8 :
>   (z,0,direct) : conj_$0{int64_t}
>   (GlobalInternalSpaceRegion,0,default) : conj_$1{int}
>   (GlobalSystemSpaceRegion,0,default) : conj_$2{int}
>
> Expressions:
>   (0x7fcf30579920,0x7fcf3113c0c0) clang_analyzer_printState :
> &code{clang_analyzer_printState}
>
> Ranges of symbol values:
>   conj_$0{int64_t} : { [-9223372036854775808, -1], [1, 20] }
>
>
>
> On 29/01/2018 12:25 AM, - yrp via cfe-dev wrote:
>> Hello,
>>
>> I'm attempting to write a checker to look for certain signedness
>> issues. I have code similar to the following:
>>
>> int32_t z = -1;
>> read (STDIN_FILENO, &z, sizeof(z));
>>
>> if (z > 20) exit (EXIT_FAILURE);
>>
>> memcpy (x, y, z);
>>
>> Inside the checker I am printing the constraints on the SVal 'z' with
>> a PreCall, filtering on memcpy:
>>
>> ProgramStateRef State = C.getState();
>> State->getConstraintManager().print(State, llvm::outs(), "\n", " ");
>>
>> I have two questions related to this:
>>
>> 1. The constraint on the above value is printed as 'conj_$0{int32_t} :
>> { [-2147483648, -1], [1, 20] }'. What happened to zero? My best guess
>> is the static analyzer realizing that 'memcpy(x, y, 0)' is a nop and
>> pruning that program state?
>>
>> 2. The above code, aside from the zero exception, makes sense when z
>> is an int8_t, int16_t, or an int32_t:
>>
>>   conj_$0{int8_t} : { [-128, -1], [1, 20] }
>>   conj_$0{int16_t} : { [-32768, -1], [1, 20] }
>>   conj_$0{int32_t} : { [-2147483648, -1], [1, 20] }
>>
>> However, if I switch this to an int64_t I get:
>>
>>   conj_$0{int64_t} : { [1, 20] }
>>
>> What happened to all the negative numbers?
>>
>> Apologies if I've missed something obvious.
>>
>> Cheers!
>> -yrp
>>
>> _______________________________________________
>> cfe-dev mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Static Analyzer and signed 64bit constraints

Louis Dionne via cfe-dev
Arg, man that's frustrating but makes sense when you point it out. Sorry for wasting your time with it, and thanks again for the clarification on the other point.

Cheers,
-yrp




On Tuesday, January 30, 2018, 10:02:34 AM PST, Artem Dergachev <[hidden email]> wrote:





Hmm, this looks correct. Since typeof(sizeof(...)) is size_t, which is
uint64_t on that architecture, and C promotion rules for comparison of
int64_t and uint64_t would require conversion of both operands into
uint64_t, all negative values of 'z' would turn out to be larger than
20. This is different from comparison of, say, int16_t and uint16_t,
which would be converted to int for the purposes of comparison.

Like, the analyzer doesn't always do a good job around integral casts,
but this case looks correct.

On 29/01/2018 9:28 PM, - yrp wrote:

>
> Hi Artem,
>
> Thanks for the reply and info. After minimizing it appears this is related to sizeof + 64bit perhaps? I had to modify your testcase slightly to demonstrate the behavior Im seeing:
>
>
> #include <stdlib.h>
> #include <stdio.h>
>
> void foo(void) {
>    int64_t z = -1;
>    read(1, &z, sizeof(z));
>
>    char x[20] = { 0 };
>    char y[20] = { 0 };
>
>    // if (z > 20)
>    if (z > sizeof x)
>      exit(-1);
>    memcpy(x, y, z);
>    // triggers state dump via ExprInspection
>    clang_analyzer_printState();
>    // don't garbage-collect 'z' until now
>    z;
> }
>
>
> $ clang --analyze test.c -Xclang -analyzer-checker -Xclang debug.ExprInspection
> ...
>   conj_$0{int64_t} : { [1, 20] }
>
> If I change the type of 'z' to int32_t I get:
>
>   conj_$0{int32_t} : { [-2147483648, -1], [1, 20] }
>
> If I change the if guard on 'z' to the literal '20' and keep the type as int64_t I get the expected range:
>
>   conj_$0{int64_t} : { [-9223372036854775808, -1], [1, 20] }
>
>
> Have I misunderstood some C behavior regarding sizeof and this is expected? I'll feel pretty silly if this is the case...
>
> In case it's relevant: clang version 6.0.0 (trunk 318002) running on linux
>
> Cheers!
> yrp
>
>
>
>
> On Monday, January 29, 2018, 6:18:51 PM PST, Artem Dergachev <[hidden email]> wrote:
>
>
>
>
>
>> The constraint on the above value is printed as 'conj_$0{int32_t} : {
> [-2147483648, -1], [1, 20] }'. What happened to zero? My best guess is
> the static analyzer realizing that 'memcpy(x, y, 0)' is a nop and
> pruning that program state?
>
> Yes, indeed, it is an effect of how we model 'memcpy'. I don't think
> it's the desired behavior though. In particular, we don't display a
> warning on 'memcpy(NULL, y, 0)', even though it's UB. So, yeah, sounds
> like either a minor bug. It's not obvious for me how often paths with
> zero-size memcpys are actually feasible in real-world programs.
>
>
>
>> However, if I switch this to an int64_t I get:
>> conj_$0{int64_t} : { [1, 20] }
> Hmm, weird, i'm not able to reproduce it. Could you provide more info?
> I'm trying:
>
>
> $ cat test.c
>
>
>    #include <stdlib.h>
>    #include <stdio.h>
>
>    void foo(void *x, void *y) {
>      int64_t z = -1;
>      read(1, &z, sizeof(z));
>      if (z > 20)
>        exit(-1);
>      memcpy(x, y, z);
>      // triggers state dump via ExprInspection
>      clang_analyzer_printState();
>      // don't garbage-collect 'z' until now
>      z;
>    }
>
>
> $ clang --analyze test.c -Xclang -analyzer-checker -Xclang
> debug.ExprInspection
>
>
> Store (direct and default bindings), 0x7fcf3114a5e8 :
>   (z,0,direct) : conj_$0{int64_t}
>   (GlobalInternalSpaceRegion,0,default) : conj_$1{int}
>   (GlobalSystemSpaceRegion,0,default) : conj_$2{int}
>
> Expressions:
>   (0x7fcf30579920,0x7fcf3113c0c0) clang_analyzer_printState :
> &code{clang_analyzer_printState}
>
> Ranges of symbol values:
>   conj_$0{int64_t} : { [-9223372036854775808, -1], [1, 20] }
>
>
>
> On 29/01/2018 12:25 AM, - yrp via cfe-dev wrote:
>> Hello,
>>
>> I'm attempting to write a checker to look for certain signedness
>> issues. I have code similar to the following:
>>
>> int32_t z = -1;
>> read (STDIN_FILENO, &z, sizeof(z));
>>
>> if (z > 20) exit (EXIT_FAILURE);
>>
>> memcpy (x, y, z);
>>
>> Inside the checker I am printing the constraints on the SVal 'z' with
>> a PreCall, filtering on memcpy:
>>
>> ProgramStateRef State = C.getState();
>> State->getConstraintManager().print(State, llvm::outs(), "\n", " ");
>>
>> I have two questions related to this:
>>
>> 1. The constraint on the above value is printed as 'conj_$0{int32_t} :
>> { [-2147483648, -1], [1, 20] }'. What happened to zero? My best guess
>> is the static analyzer realizing that 'memcpy(x, y, 0)' is a nop and
>> pruning that program state?
>>
>> 2. The above code, aside from the zero exception, makes sense when z
>> is an int8_t, int16_t, or an int32_t:
>>
>>   conj_$0{int8_t} : { [-128, -1], [1, 20] }
>>   conj_$0{int16_t} : { [-32768, -1], [1, 20] }
>>   conj_$0{int32_t} : { [-2147483648, -1], [1, 20] }
>>
>> However, if I switch this to an int64_t I get:
>>
>>   conj_$0{int64_t} : { [1, 20] }
>>
>> What happened to all the negative numbers?
>>
>> Apologies if I've missed something obvious.
>>
>> Cheers!
>> -yrp
>>
>> _______________________________________________
>> cfe-dev mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev