[analyzer][taint] More precise taint modelling on arrays

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[analyzer][taint] More precise taint modelling on arrays

Hans Wennborg via cfe-dev
I made the following test case for checking the modeling of taint propagation on the `strcpy` function.
As I observed, only the first byte of the array became tainted, even though all bytes should be treated tainted.
In the test, you can see my expectations and the actual result.

```
void strcpy_unbounded_tainted_buffer(char *buf) {
  scanf("%s", buf);

  char dst[32];
  strcpy(dst, buf);                       //        expected---vvv   vvv--- actual
  clang_analyzer_isTainted_char(dst[0]);  // expected-warning{{YES}} YES
  clang_analyzer_isTainted_char(dst[1]);  // expected-warning{{YES}} NO
  clang_analyzer_isTainted_char(dst[31]); // expected-warning{{YES}} NO
}

void strcpy_bounded_tainted_buffer(char *buf) {
  scanf("%s", buf);
  buf[10] = '\0';
  clang_analyzer_isTainted_char(buf[0]);  // expected-warning{{YES}} YES
  clang_analyzer_isTainted_char(buf[1]);  // expected-warning{{YES}} NO
  clang_analyzer_isTainted_char(buf[10]); // expected-warning{{NO}}  NO
  clang_analyzer_isTainted_char(buf[20]); // expected-warning{{YES}} NO

  char dst[32];
  strcpy(dst, buf);
  clang_analyzer_isTainted_char(dst[0]);  // expected-warning{{YES}} YES
  clang_analyzer_isTainted_char(dst[1]);  // expected-warning{{YES}} NO
  clang_analyzer_isTainted_char(dst[10]); // expected-warning{{NO}}  NO
  clang_analyzer_isTainted_char(dst[20]); // expected-warning{{NO}}  NO
}

```

Some clarification about `TaintedSubRegions` and tainting `nonloc::LazyCompoundVal`s would be also helpful since it might be related to this topic.

What are the reasons for this limitation on modeling taintedness regarding arrays?


Background and expectation:
This change would be the first step in migrating the diagnostic emitting parts of the `GenericTaintChecker`.
Eg.: `checkUncontrolledFormatString`, `checkSystemCall`, `checkTaintedBufferSize`.
As a result, multiple checkers will consume taintedness information for reporting warnings in the future and letting the `GenericTaintChecker` do only modeling and propagation.

Regards, Balazs.

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer][taint] More precise taint modelling on arrays

Hans Wennborg via cfe-dev
It's not much of a limitation, just a bug. When `scanf` produces a
default conjured symbol binding in the buffer, this is *the* symbol that
should carry the taint; all derived symbols will automatically be
treated as tainted. If we're being even more precise and consider the
length of the scanned string, then the derived symbol will need to carry
partial taint (i.e., specify TaintedSubRegions such that derived<conj,
R> is tainted iff R is a TaintedSubRegion of buf for conj). I.e., what
i'm saying is that RegionStore is more-or-less flexible enough to
implement this correctly, it's only a matter of getting this done. Those
string functions are just too many and each of them requires individual
attention.

On 2/10/20 2:43 PM, Balázs Benics via cfe-dev wrote:

> I made the following test case for checking the modeling of taint
> propagation on the `strcpy` function.
> As I observed, only the first byte of the array became tainted, even
> though all bytes should be treated tainted.
> In the test, you can see my expectations and the actual result.
>
> ```
> void strcpy_unbounded_tainted_buffer(char *buf) {
>   scanf("%s", buf);
>
>   char dst[32];
>   strcpy(dst, buf);                       //  expected---vvv   vvv---
> actual
>   clang_analyzer_isTainted_char(dst[0]);  // expected-warning{{YES}} YES
>   clang_analyzer_isTainted_char(dst[1]);  // expected-warning{{YES}} NO
>   clang_analyzer_isTainted_char(dst[31]); // expected-warning{{YES}} NO
> }
>
> void strcpy_bounded_tainted_buffer(char *buf) {
>   scanf("%s", buf);
>   buf[10] = '\0';
>   clang_analyzer_isTainted_char(buf[0]);  // expected-warning{{YES}} YES
>   clang_analyzer_isTainted_char(buf[1]);  // expected-warning{{YES}} NO
>   clang_analyzer_isTainted_char(buf[10]); // expected-warning{{NO}}  NO
>   clang_analyzer_isTainted_char(buf[20]); // expected-warning{{YES}} NO
>
>   char dst[32];
>   strcpy(dst, buf);
>   clang_analyzer_isTainted_char(dst[0]);  // expected-warning{{YES}} YES
>   clang_analyzer_isTainted_char(dst[1]);  // expected-warning{{YES}} NO
>   clang_analyzer_isTainted_char(dst[10]); // expected-warning{{NO}}  NO
>   clang_analyzer_isTainted_char(dst[20]); // expected-warning{{NO}}  NO
> }
> ```
>
> Some clarification about `TaintedSubRegions` and tainting
> `nonloc::LazyCompoundVal`s would be also helpful since it might be
> related to this topic.
>
> What are the reasons for this limitation on modeling taintedness
> regarding arrays?
>
>
> Background and expectation:
> This change would be the first step in migrating the diagnostic
> emitting parts of the `GenericTaintChecker`.
> Eg.: `checkUncontrolledFormatString`, `checkSystemCall`,
> `checkTaintedBufferSize`.
> As a result, multiple checkers will consume taintedness information
> for reporting warnings in the future and letting the
> `GenericTaintChecker` do only modeling and propagation.
>
> Regards, Balazs.
>
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer][taint] More precise taint modelling on arrays

Hans Wennborg via cfe-dev
Thank you for the clarification, Artem.

I'm going to have a look into that.

Artem Dergachev <[hidden email]> ezt írta (időpont: 2020. febr. 11., K, 17:27):
It's not much of a limitation, just a bug. When `scanf` produces a
default conjured symbol binding in the buffer, this is *the* symbol that
should carry the taint; all derived symbols will automatically be
treated as tainted. If we're being even more precise and consider the
length of the scanned string, then the derived symbol will need to carry
partial taint (i.e., specify TaintedSubRegions such that derived<conj,
R> is tainted iff R is a TaintedSubRegion of buf for conj). I.e., what
i'm saying is that RegionStore is more-or-less flexible enough to
implement this correctly, it's only a matter of getting this done. Those
string functions are just too many and each of them requires individual
attention.

On 2/10/20 2:43 PM, Balázs Benics via cfe-dev wrote:
> I made the following test case for checking the modeling of taint
> propagation on the `strcpy` function.
> As I observed, only the first byte of the array became tainted, even
> though all bytes should be treated tainted.
> In the test, you can see my expectations and the actual result.
>
> ```
> void strcpy_unbounded_tainted_buffer(char *buf) {
>   scanf("%s", buf);
>
>   char dst[32];
>   strcpy(dst, buf);                       //  expected---vvv   vvv---
> actual
>   clang_analyzer_isTainted_char(dst[0]);  // expected-warning{{YES}} YES
>   clang_analyzer_isTainted_char(dst[1]);  // expected-warning{{YES}} NO
>   clang_analyzer_isTainted_char(dst[31]); // expected-warning{{YES}} NO
> }
>
> void strcpy_bounded_tainted_buffer(char *buf) {
>   scanf("%s", buf);
>   buf[10] = '\0';
>   clang_analyzer_isTainted_char(buf[0]);  // expected-warning{{YES}} YES
>   clang_analyzer_isTainted_char(buf[1]);  // expected-warning{{YES}} NO
>   clang_analyzer_isTainted_char(buf[10]); // expected-warning{{NO}}  NO
>   clang_analyzer_isTainted_char(buf[20]); // expected-warning{{YES}} NO
>
>   char dst[32];
>   strcpy(dst, buf);
>   clang_analyzer_isTainted_char(dst[0]);  // expected-warning{{YES}} YES
>   clang_analyzer_isTainted_char(dst[1]);  // expected-warning{{YES}} NO
>   clang_analyzer_isTainted_char(dst[10]); // expected-warning{{NO}}  NO
>   clang_analyzer_isTainted_char(dst[20]); // expected-warning{{NO}}  NO
> }
> ```
>
> Some clarification about `TaintedSubRegions` and tainting
> `nonloc::LazyCompoundVal`s would be also helpful since it might be
> related to this topic.
>
> What are the reasons for this limitation on modeling taintedness
> regarding arrays?
>
>
> Background and expectation:
> This change would be the first step in migrating the diagnostic
> emitting parts of the `GenericTaintChecker`.
> Eg.: `checkUncontrolledFormatString`, `checkSystemCall`,
> `checkTaintedBufferSize`.
> As a result, multiple checkers will consume taintedness information
> for reporting warnings in the future and letting the
> `GenericTaintChecker` do only modeling and propagation.
>
> Regards, Balazs.
>
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev