[analyzer] Discovering information not contained in the bugpath

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[analyzer] Discovering information not contained in the bugpath

Nathan Ridge via cfe-dev
Hi!

I'm currently working on a GSoC project that aims to improve the description we provide for bug reports the Static Analyzer emits.

We divided the problem (that the bugreports didn't explain how the bug came about very well) into two parts:

1. The information is available in the bug path (basically parts of the code the analyzer actually explored on that particular path of execution). We refer to these as the "must-have-happened" case.
2. The information isn't available, possibly not even in the ExplodedGraph (parts of the program the analyzer explored on all paths of execution). We call this the "should-not-have-happened" case.

01 int flag;

02 bool coin();

03

04 void foo() {

05   flag = coin(); // no note, but there should be one

06 }

07

08 int main() {

09   int *x = 0; // x initialized to 0

10   flag = 1;

11   foo(); // no note, but there should be one

12   if (flag) // assumed false

13     x = new int;

14   foo(); // no note, but there should be one

15

16   if (flag) // assumed true

17     *x = 5; // warn

18 }


The first branch is a good example for the "should-not-have-happened" case: the bug path doesn't contain line 13, so we don't really know whether the condition on line 12 is a big deal, which in this case it is (unlike if it only guarded a print of some sort)

The second branch is a good example for the "must-have-happened" case: we know that the condition on line 16 is very important.

I like to think that I made a very decent progress on the first part, as it's objectively a lot simpler. A great addition to the analyzer is that it now understands (at least, when my patches land) control dependencies in the CFG. With that, we can figure out that flag's value on line 16 is crucial, so we track it, resulting in notes being added on line 14, and on line 5, explaining that flag's value changed.

However, we are very unsure about how to tackle the second part on the project. I gave this a lot of thought, we had a couple meetings in private, and I'd like to share where I am with this.

My project proposed to implement a static backward program slicing algorithm which would definitely be amazing, but seems to be an unrealistic approach. Basically, any slicing algorithm that is interprocedural would require a system dependence graph, something we just simply don't have, not even for LLVM.

Whenever pointer semantics are in the picture, we have a really hard time: it's something that is super difficult to reason about without symbolic execution, and it would probably be a good idea not to tackle such cases until we come up with something clever (maybe the pset calculation Gábor Horváth is working on?). We should, however, being able to handle reference parameters.

So then, what's the goal exactly?

We'd like to teach the analyzer that some code blocks (even in other functions) could have written the variable that is responsible for the bug (x, in the included example), and is very important to the bug report. We could use this information to track variables similarly to the "must-have-happened" case: track conditions that prevented the write from happening.

To the specifics. I think it's reasonable to implement an intraprocedural slicing algorithm. For the included example, I think we should be able to discover the potential write to x on line 9. We don't need anything we already don't have to achieve this.

Reasoning across function boundaries could be achieved by inlining functions (as I understand it, simply insert the function's CFGBlocks), but there are big unknowns in this.

So my question is, do you think such inlining is possible? If not, should I just create a primitive variable-to-parameter mapping? Is there something else on the table?

Cheers!
Kristóf Umann

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer] Discovering information not contained in the bugpath

Nathan Ridge via cfe-dev
I gave this some thought, and realized that we could divide the "should-not-have-happened" cases into 2 categories:

1. The unexplored block is within a function that the analyzer visited.
2. The unexplored block lies within a function that the analyzer didn't even visit.

For example:

01 int flag;

02 bool coin();

03

04 void foo() {

05   flag = coin(); // no note, but there should be one

06 }

07

08 void bar(int &i) {

09  if (flag) // assumed false

10    i = new int;

11 }

12

13 int main() {

14   int *x = 0; // x initialized to 0

15   flag = 1;

16   foo(); // no note, but there should be one

17   bar(x);

18   foo(); // no note, but there should be one

19

20   if (flag) // assumed true

21     *x = 5; // warn

22 }


Observe the subtle difference that on line 17, function bar is called, and the first branch lies in that function. This is definitely a "category 1" case, since the analyzer inlined and visited bar(), but again, failed the realize the importance of flag due to not visiting line 10, and makes no mention of line 16 in the bugreport.


01 int flag;

02 bool coin();

03

04 void foo() {

05   flag = coin(); // no note, but there should be one

06 }

07

08 void bar(int &i) {

09  i = new int;

10 }

11

12 int main() {

13   int *x = 0; // x initialized to 0

14   flag = 1;

15   foo(); // no note, but there should be one

16 if (flag) // assumed false

17    bar(x);

18   foo(); // no note, but there should be one

19

20   if (flag) // assumed true

21     *x = 5; // warn

22 }


In this example, the if statement stays, but the assignment to x is now found in bar(). In this case, the analyzer didn't explore bar(), so it's a "category 2" case.


Now, the reason why these categories are important is that if the analyzer actually explored a function, it solves the problem of parameters: for the example shown for "category 1", we can simply ask the analyzer whether x in function main() is the same as i in function bar(). This is, as stated in my previous letter, far more difficult for "category 2" cases.


I think when I'm finished with the "must-have-happened" cases, I could start enhancing NoStoreFuncVisitor to gather conditions possibly worth tracking for "category 1" cases, see whether an intraprocedural slicing algorithm makes sense, could worry about resolving parameters for the other case one when I get there.


Any thoughts on this approach? :)


On Sat, 8 Jun 2019 at 01:59, Kristóf Umann <[hidden email]> wrote:
Hi!

I'm currently working on a GSoC project that aims to improve the description we provide for bug reports the Static Analyzer emits.

We divided the problem (that the bugreports didn't explain how the bug came about very well) into two parts:

1. The information is available in the bug path (basically parts of the code the analyzer actually explored on that particular path of execution). We refer to these as the "must-have-happened" case.
2. The information isn't available, possibly not even in the ExplodedGraph (parts of the program the analyzer explored on all paths of execution). We call this the "should-not-have-happened" case.

01 int flag;

02 bool coin();

03

04 void foo() {

05   flag = coin(); // no note, but there should be one

06 }

07

08 int main() {

09   int *x = 0; // x initialized to 0

10   flag = 1;

11   foo(); // no note, but there should be one

12   if (flag) // assumed false

13     x = new int;

14   foo(); // no note, but there should be one

15

16   if (flag) // assumed true

17     *x = 5; // warn

18 }


The first branch is a good example for the "should-not-have-happened" case: the bug path doesn't contain line 13, so we don't really know whether the condition on line 12 is a big deal, which in this case it is (unlike if it only guarded a print of some sort)

The second branch is a good example for the "must-have-happened" case: we know that the condition on line 16 is very important.

I like to think that I made a very decent progress on the first part, as it's objectively a lot simpler. A great addition to the analyzer is that it now understands (at least, when my patches land) control dependencies in the CFG. With that, we can figure out that flag's value on line 16 is crucial, so we track it, resulting in notes being added on line 14, and on line 5, explaining that flag's value changed.

However, we are very unsure about how to tackle the second part on the project. I gave this a lot of thought, we had a couple meetings in private, and I'd like to share where I am with this.

My project proposed to implement a static backward program slicing algorithm which would definitely be amazing, but seems to be an unrealistic approach. Basically, any slicing algorithm that is interprocedural would require a system dependence graph, something we just simply don't have, not even for LLVM.

Whenever pointer semantics are in the picture, we have a really hard time: it's something that is super difficult to reason about without symbolic execution, and it would probably be a good idea not to tackle such cases until we come up with something clever (maybe the pset calculation Gábor Horváth is working on?). We should, however, being able to handle reference parameters.

So then, what's the goal exactly?

We'd like to teach the analyzer that some code blocks (even in other functions) could have written the variable that is responsible for the bug (x, in the included example), and is very important to the bug report. We could use this information to track variables similarly to the "must-have-happened" case: track conditions that prevented the write from happening.

To the specifics. I think it's reasonable to implement an intraprocedural slicing algorithm. For the included example, I think we should be able to discover the potential write to x on line 9. We don't need anything we already don't have to achieve this.

Reasoning across function boundaries could be achieved by inlining functions (as I understand it, simply insert the function's CFGBlocks), but there are big unknowns in this.

So my question is, do you think such inlining is possible? If not, should I just create a primitive variable-to-parameter mapping? Is there something else on the table?

Cheers!
Kristóf Umann

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer] Discovering information not contained in the bugpath

Nathan Ridge via cfe-dev
The problem we're solving is *entirely* about inter-procedural analysis. Like, as long as there are no inlined calls on the path (and therefore no pruning is being done), then there's no problem at all: the user can easily see that the value is untouched by direct observation.

In this sense i find the lack of note in example 2 completely non-problematic (as long as control dependency tracking lands). It's not a problem that the user isn't told that bar() could have initialized 'x', because the user already knows that bar() is not even called in the first place.

However, there may be a combination of category 1 and category 2 that is much more interesting: what if main() unconditionally calls foo() that conditionally calls bar() that unconditionally initializes the value? Eg.:

void bar(int *x) {
  *x = 1;  // the interesting event that doesn't happen
}

void foo(int *x) {
  if (rand())  // control dependency of the interesting event
    bar(x);
}

int main() {
  int x;
  foo(&x);
  use(x); // use of uninitialized value
}

In this case it is essential to highlight the path within foo() so that to explain how it fails to call bar().

On 09.06.2019 17:50, Kristóf Umann wrote:
I gave this some thought, and realized that we could divide the "should-not-have-happened" cases into 2 categories:

1. The unexplored block is within a function that the analyzer visited.
2. The unexplored block lies within a function that the analyzer didn't even visit.

For example:

01 int flag;

02 bool coin();

03

04 void foo() {

05   flag = coin(); // no note, but there should be one

06 }

07

08 void bar(int &i) {

09  if (flag) // assumed false

10    i = new int;

11 }

12

13 int main() {

14   int *x = 0; // x initialized to 0

15   flag = 1;

16   foo(); // no note, but there should be one

17   bar(x);

18   foo(); // no note, but there should be one

19

20   if (flag) // assumed true

21     *x = 5; // warn

22 }


Observe the subtle difference that on line 17, function bar is called, and the first branch lies in that function. This is definitely a "category 1" case, since the analyzer inlined and visited bar(), but again, failed the realize the importance of flag due to not visiting line 10, and makes no mention of line 16 in the bugreport.

01 int flag;

02 bool coin();

03

04 void foo() {

05   flag = coin(); // no note, but there should be one

06 }

07

08 void bar(int &i) {

09  i = new int;

10 }

11

12 int main() {

13   int *x = 0; // x initialized to 0

14   flag = 1;

15   foo(); // no note, but there should be one

16 if (flag) // assumed false

17    bar(x);

18   foo(); // no note, but there should be one

19

20   if (flag) // assumed true

21     *x = 5; // warn

22 }


In this example, the if statement stays, but the assignment to x is now found in bar(). In this case, the analyzer didn't explore bar(), so it's a "category 2" case.


Now, the reason why these categories are important is that if the analyzer actually explored a function, it solves the problem of parameters: for the example shown for "category 1", we can simply ask the analyzer whether x in function main() is the same as i in function bar(). This is, as stated in my previous letter, far more difficult for "category 2" cases.


I think when I'm finished with the "must-have-happened" cases, I could start enhancing NoStoreFuncVisitor to gather conditions possibly worth tracking for "category 1" cases, see whether an intraprocedural slicing algorithm makes sense, could worry about resolving parameters for the other case one when I get there.


Any thoughts on this approach? :)


On Sat, 8 Jun 2019 at 01:59, Kristóf Umann <[hidden email]> wrote:
Hi!

I'm currently working on a GSoC project that aims to improve the description we provide for bug reports the Static Analyzer emits.

We divided the problem (that the bugreports didn't explain how the bug came about very well) into two parts:

1. The information is available in the bug path (basically parts of the code the analyzer actually explored on that particular path of execution). We refer to these as the "must-have-happened" case.
2. The information isn't available, possibly not even in the ExplodedGraph (parts of the program the analyzer explored on all paths of execution). We call this the "should-not-have-happened" case.

01 int flag;

02 bool coin();

03

04 void foo() {

05   flag = coin(); // no note, but there should be one

06 }

07

08 int main() {

09   int *x = 0; // x initialized to 0

10   flag = 1;

11   foo(); // no note, but there should be one

12   if (flag) // assumed false

13     x = new int;

14   foo(); // no note, but there should be one

15

16   if (flag) // assumed true

17     *x = 5; // warn

18 }


The first branch is a good example for the "should-not-have-happened" case: the bug path doesn't contain line 13, so we don't really know whether the condition on line 12 is a big deal, which in this case it is (unlike if it only guarded a print of some sort)

The second branch is a good example for the "must-have-happened" case: we know that the condition on line 16 is very important.

I like to think that I made a very decent progress on the first part, as it's objectively a lot simpler. A great addition to the analyzer is that it now understands (at least, when my patches land) control dependencies in the CFG. With that, we can figure out that flag's value on line 16 is crucial, so we track it, resulting in notes being added on line 14, and on line 5, explaining that flag's value changed.

However, we are very unsure about how to tackle the second part on the project. I gave this a lot of thought, we had a couple meetings in private, and I'd like to share where I am with this.

My project proposed to implement a static backward program slicing algorithm which would definitely be amazing, but seems to be an unrealistic approach. Basically, any slicing algorithm that is interprocedural would require a system dependence graph, something we just simply don't have, not even for LLVM.

Whenever pointer semantics are in the picture, we have a really hard time: it's something that is super difficult to reason about without symbolic execution, and it would probably be a good idea not to tackle such cases until we come up with something clever (maybe the pset calculation Gábor Horváth is working on?). We should, however, being able to handle reference parameters.

So then, what's the goal exactly?

We'd like to teach the analyzer that some code blocks (even in other functions) could have written the variable that is responsible for the bug (x, in the included example), and is very important to the bug report. We could use this information to track variables similarly to the "must-have-happened" case: track conditions that prevented the write from happening.

To the specifics. I think it's reasonable to implement an intraprocedural slicing algorithm. For the included example, I think we should be able to discover the potential write to x on line 9. We don't need anything we already don't have to achieve this.

Reasoning across function boundaries could be achieved by inlining functions (as I understand it, simply insert the function's CFGBlocks), but there are big unknowns in this.

So my question is, do you think such inlining is possible? If not, should I just create a primitive variable-to-parameter mapping? Is there something else on the table?

Cheers!
Kristóf Umann


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer] Discovering information not contained in the bugpath

Nathan Ridge via cfe-dev


On Mon, 10 Jun 2019 at 05:03, Artem Dergachev <[hidden email]> wrote:
The problem we're solving is *entirely* about inter-procedural analysis. Like, as long as there are no inlined calls on the path (and therefore no pruning is being done), then there's no problem at all: the user can easily see that the value is untouched by direct observation.

I have explained myself poorly. I intend to implement an intraprocedural slicing, but that would work interprocedurally by resolving function calls with the help of the analyzer. This is important, because for the by-the-book interprocedural slicing you'd need a system dependence graph. Now, if a function isn't inlined, we need to do this on our own. I see a couple unknowns here (how deep do we explore such functions? how to we resolve non-trivial parameter passing? are we sure that tracking expressions here is actually meaningful?) but I think tackling simple cases is achievable.
 
In this sense i find the lack of note in example 2 completely non-problematic (as long as control dependency tracking lands). It's not a problem that the user isn't told that bar() could have initialized 'x', because the user already knows that bar() is not even called in the first place.

Would you agree that tracking flag however would be good?
 
However, there may be a combination of category 1 and category 2 that is much more interesting: what if main() unconditionally calls foo() that conditionally calls bar() that unconditionally initializes the value? Eg.:

void bar(int *x) {
  *x = 1;  // the interesting event that doesn't happen
}

void foo(int *x) {
  if (rand())  // control dependency of the interesting event
    bar(x);
}

int main() {
  int x;
  foo(&x);
  use(x); // use of uninitialized value
}

In this case it is essential to highlight the path within foo() so that to explain how it fails to call bar().
On 09.06.2019 17:50, Kristóf Umann wrote:
I gave this some thought, and realized that we could divide the "should-not-have-happened" cases into 2 categories:

1. The unexplored block is within a function that the analyzer visited.
2. The unexplored block lies within a function that the analyzer didn't even visit.

For example:

01 int flag;

02 bool coin();

03

04 void foo() {

05   flag = coin(); // no note, but there should be one

06 }

07

08 void bar(int &i) {

09  if (flag) // assumed false

10    i = new int;

11 }

12

13 int main() {

14   int *x = 0; // x initialized to 0

15   flag = 1;

16   foo(); // no note, but there should be one

17   bar(x);

18   foo(); // no note, but there should be one

19

20   if (flag) // assumed true

21     *x = 5; // warn

22 }


Observe the subtle difference that on line 17, function bar is called, and the first branch lies in that function. This is definitely a "category 1" case, since the analyzer inlined and visited bar(), but again, failed the realize the importance of flag due to not visiting line 10, and makes no mention of line 16 in the bugreport.

01 int flag;

02 bool coin();

03

04 void foo() {

05   flag = coin(); // no note, but there should be one

06 }

07

08 void bar(int &i) {

09  i = new int;

10 }

11

12 int main() {

13   int *x = 0; // x initialized to 0

14   flag = 1;

15   foo(); // no note, but there should be one

16 if (flag) // assumed false

17    bar(x);

18   foo(); // no note, but there should be one

19

20   if (flag) // assumed true

21     *x = 5; // warn

22 }


In this example, the if statement stays, but the assignment to x is now found in bar(). In this case, the analyzer didn't explore bar(), so it's a "category 2" case.


Now, the reason why these categories are important is that if the analyzer actually explored a function, it solves the problem of parameters: for the example shown for "category 1", we can simply ask the analyzer whether x in function main() is the same as i in function bar(). This is, as stated in my previous letter, far more difficult for "category 2" cases.


I think when I'm finished with the "must-have-happened" cases, I could start enhancing NoStoreFuncVisitor to gather conditions possibly worth tracking for "category 1" cases, see whether an intraprocedural slicing algorithm makes sense, could worry about resolving parameters for the other case one when I get there.


Any thoughts on this approach? :)


On Sat, 8 Jun 2019 at 01:59, Kristóf Umann <[hidden email]> wrote:
Hi!

I'm currently working on a GSoC project that aims to improve the description we provide for bug reports the Static Analyzer emits.

We divided the problem (that the bugreports didn't explain how the bug came about very well) into two parts:

1. The information is available in the bug path (basically parts of the code the analyzer actually explored on that particular path of execution). We refer to these as the "must-have-happened" case.
2. The information isn't available, possibly not even in the ExplodedGraph (parts of the program the analyzer explored on all paths of execution). We call this the "should-not-have-happened" case.

01 int flag;

02 bool coin();

03

04 void foo() {

05   flag = coin(); // no note, but there should be one

06 }

07

08 int main() {

09   int *x = 0; // x initialized to 0

10   flag = 1;

11   foo(); // no note, but there should be one

12   if (flag) // assumed false

13     x = new int;

14   foo(); // no note, but there should be one

15

16   if (flag) // assumed true

17     *x = 5; // warn

18 }


The first branch is a good example for the "should-not-have-happened" case: the bug path doesn't contain line 13, so we don't really know whether the condition on line 12 is a big deal, which in this case it is (unlike if it only guarded a print of some sort)

The second branch is a good example for the "must-have-happened" case: we know that the condition on line 16 is very important.

I like to think that I made a very decent progress on the first part, as it's objectively a lot simpler. A great addition to the analyzer is that it now understands (at least, when my patches land) control dependencies in the CFG. With that, we can figure out that flag's value on line 16 is crucial, so we track it, resulting in notes being added on line 14, and on line 5, explaining that flag's value changed.

However, we are very unsure about how to tackle the second part on the project. I gave this a lot of thought, we had a couple meetings in private, and I'd like to share where I am with this.

My project proposed to implement a static backward program slicing algorithm which would definitely be amazing, but seems to be an unrealistic approach. Basically, any slicing algorithm that is interprocedural would require a system dependence graph, something we just simply don't have, not even for LLVM.

Whenever pointer semantics are in the picture, we have a really hard time: it's something that is super difficult to reason about without symbolic execution, and it would probably be a good idea not to tackle such cases until we come up with something clever (maybe the pset calculation Gábor Horváth is working on?). We should, however, being able to handle reference parameters.

So then, what's the goal exactly?

We'd like to teach the analyzer that some code blocks (even in other functions) could have written the variable that is responsible for the bug (x, in the included example), and is very important to the bug report. We could use this information to track variables similarly to the "must-have-happened" case: track conditions that prevented the write from happening.

To the specifics. I think it's reasonable to implement an intraprocedural slicing algorithm. For the included example, I think we should be able to discover the potential write to x on line 9. We don't need anything we already don't have to achieve this.

Reasoning across function boundaries could be achieved by inlining functions (as I understand it, simply insert the function's CFGBlocks), but there are big unknowns in this.

So my question is, do you think such inlining is possible? If not, should I just create a primitive variable-to-parameter mapping? Is there something else on the table?

Cheers!
Kristóf Umann


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer] Discovering information not contained in the bugpath

Nathan Ridge via cfe-dev
I rewrote my sketch algorithm that showcases how I imagine this to work.


On Mon, 10 Jun 2019 at 20:03, Kristóf Umann <[hidden email]> wrote:


On Mon, 10 Jun 2019 at 05:03, Artem Dergachev <[hidden email]> wrote:
The problem we're solving is *entirely* about inter-procedural analysis. Like, as long as there are no inlined calls on the path (and therefore no pruning is being done), then there's no problem at all: the user can easily see that the value is untouched by direct observation.

I have explained myself poorly. I intend to implement an intraprocedural slicing, but that would work interprocedurally by resolving function calls with the help of the analyzer. This is important, because for the by-the-book interprocedural slicing you'd need a system dependence graph. Now, if a function isn't inlined, we need to do this on our own. I see a couple unknowns here (how deep do we explore such functions? how to we resolve non-trivial parameter passing? are we sure that tracking expressions here is actually meaningful?) but I think tackling simple cases is achievable.
 
In this sense i find the lack of note in example 2 completely non-problematic (as long as control dependency tracking lands). It's not a problem that the user isn't told that bar() could have initialized 'x', because the user already knows that bar() is not even called in the first place.

Would you agree that tracking flag however would be good?
 
However, there may be a combination of category 1 and category 2 that is much more interesting: what if main() unconditionally calls foo() that conditionally calls bar() that unconditionally initializes the value? Eg.:

void bar(int *x) {
  *x = 1;  // the interesting event that doesn't happen
}

void foo(int *x) {
  if (rand())  // control dependency of the interesting event
    bar(x);
}

int main() {
  int x;
  foo(&x);
  use(x); // use of uninitialized value
}

In this case it is essential to highlight the path within foo() so that to explain how it fails to call bar().
On 09.06.2019 17:50, Kristóf Umann wrote:
I gave this some thought, and realized that we could divide the "should-not-have-happened" cases into 2 categories:

1. The unexplored block is within a function that the analyzer visited.
2. The unexplored block lies within a function that the analyzer didn't even visit.

For example:

01 int flag;

02 bool coin();

03

04 void foo() {

05   flag = coin(); // no note, but there should be one

06 }

07

08 void bar(int &i) {

09  if (flag) // assumed false

10    i = new int;

11 }

12

13 int main() {

14   int *x = 0; // x initialized to 0

15   flag = 1;

16   foo(); // no note, but there should be one

17   bar(x);

18   foo(); // no note, but there should be one

19

20   if (flag) // assumed true

21     *x = 5; // warn

22 }


Observe the subtle difference that on line 17, function bar is called, and the first branch lies in that function. This is definitely a "category 1" case, since the analyzer inlined and visited bar(), but again, failed the realize the importance of flag due to not visiting line 10, and makes no mention of line 16 in the bugreport.

01 int flag;

02 bool coin();

03

04 void foo() {

05   flag = coin(); // no note, but there should be one

06 }

07

08 void bar(int &i) {

09  i = new int;

10 }

11

12 int main() {

13   int *x = 0; // x initialized to 0

14   flag = 1;

15   foo(); // no note, but there should be one

16 if (flag) // assumed false

17    bar(x);

18   foo(); // no note, but there should be one

19

20   if (flag) // assumed true

21     *x = 5; // warn

22 }


In this example, the if statement stays, but the assignment to x is now found in bar(). In this case, the analyzer didn't explore bar(), so it's a "category 2" case.


Now, the reason why these categories are important is that if the analyzer actually explored a function, it solves the problem of parameters: for the example shown for "category 1", we can simply ask the analyzer whether x in function main() is the same as i in function bar(). This is, as stated in my previous letter, far more difficult for "category 2" cases.


I think when I'm finished with the "must-have-happened" cases, I could start enhancing NoStoreFuncVisitor to gather conditions possibly worth tracking for "category 1" cases, see whether an intraprocedural slicing algorithm makes sense, could worry about resolving parameters for the other case one when I get there.


Any thoughts on this approach? :)


On Sat, 8 Jun 2019 at 01:59, Kristóf Umann <[hidden email]> wrote:
Hi!

I'm currently working on a GSoC project that aims to improve the description we provide for bug reports the Static Analyzer emits.

We divided the problem (that the bugreports didn't explain how the bug came about very well) into two parts:

1. The information is available in the bug path (basically parts of the code the analyzer actually explored on that particular path of execution). We refer to these as the "must-have-happened" case.
2. The information isn't available, possibly not even in the ExplodedGraph (parts of the program the analyzer explored on all paths of execution). We call this the "should-not-have-happened" case.

01 int flag;

02 bool coin();

03

04 void foo() {

05   flag = coin(); // no note, but there should be one

06 }

07

08 int main() {

09   int *x = 0; // x initialized to 0

10   flag = 1;

11   foo(); // no note, but there should be one

12   if (flag) // assumed false

13     x = new int;

14   foo(); // no note, but there should be one

15

16   if (flag) // assumed true

17     *x = 5; // warn

18 }


The first branch is a good example for the "should-not-have-happened" case: the bug path doesn't contain line 13, so we don't really know whether the condition on line 12 is a big deal, which in this case it is (unlike if it only guarded a print of some sort)

The second branch is a good example for the "must-have-happened" case: we know that the condition on line 16 is very important.

I like to think that I made a very decent progress on the first part, as it's objectively a lot simpler. A great addition to the analyzer is that it now understands (at least, when my patches land) control dependencies in the CFG. With that, we can figure out that flag's value on line 16 is crucial, so we track it, resulting in notes being added on line 14, and on line 5, explaining that flag's value changed.

However, we are very unsure about how to tackle the second part on the project. I gave this a lot of thought, we had a couple meetings in private, and I'd like to share where I am with this.

My project proposed to implement a static backward program slicing algorithm which would definitely be amazing, but seems to be an unrealistic approach. Basically, any slicing algorithm that is interprocedural would require a system dependence graph, something we just simply don't have, not even for LLVM.

Whenever pointer semantics are in the picture, we have a really hard time: it's something that is super difficult to reason about without symbolic execution, and it would probably be a good idea not to tackle such cases until we come up with something clever (maybe the pset calculation Gábor Horváth is working on?). We should, however, being able to handle reference parameters.

So then, what's the goal exactly?

We'd like to teach the analyzer that some code blocks (even in other functions) could have written the variable that is responsible for the bug (x, in the included example), and is very important to the bug report. We could use this information to track variables similarly to the "must-have-happened" case: track conditions that prevented the write from happening.

To the specifics. I think it's reasonable to implement an intraprocedural slicing algorithm. For the included example, I think we should be able to discover the potential write to x on line 9. We don't need anything we already don't have to achieve this.

Reasoning across function boundaries could be achieved by inlining functions (as I understand it, simply insert the function's CFGBlocks), but there are big unknowns in this.

So my question is, do you think such inlining is possible? If not, should I just create a primitive variable-to-parameter mapping? Is there something else on the table?

Cheers!
Kristóf Umann


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer] Discovering information not contained in the bugpath

Nathan Ridge via cfe-dev
In reply to this post by Nathan Ridge via cfe-dev


On 6/10/19 11:03 AM, Kristóf Umann wrote:


On Mon, 10 Jun 2019 at 05:03, Artem Dergachev <[hidden email]> wrote:
The problem we're solving is *entirely* about inter-procedural analysis. Like, as long as there are no inlined calls on the path (and therefore no pruning is being done), then there's no problem at all: the user can easily see that the value is untouched by direct observation.

I have explained myself poorly. I intend to implement an intraprocedural slicing, but that would work interprocedurally by resolving function calls with the help of the analyzer. This is important, because for the by-the-book interprocedural slicing you'd need a system dependence graph. Now, if a function isn't inlined, we need to do this on our own. I see a couple unknowns here (how deep do we explore such functions? how to we resolve non-trivial parameter passing? are we sure that tracking expressions here is actually meaningful?) but I think tackling simple cases is achievable.
 
In this sense i find the lack of note in example 2 completely non-problematic (as long as control dependency tracking lands). It's not a problem that the user isn't told that bar() could have initialized 'x', because the user already knows that bar() is not even called in the first place.

Would you agree that tracking flag however would be good?

Yes, totally.

 
However, there may be a combination of category 1 and category 2 that is much more interesting: what if main() unconditionally calls foo() that conditionally calls bar() that unconditionally initializes the value? Eg.:

void bar(int *x) {
  *x = 1;  // the interesting event that doesn't happen
}

void foo(int *x) {
  if (rand())  // control dependency of the interesting event
    bar(x);
}

int main() {
  int x;
  foo(&x);
  use(x); // use of uninitialized value
}

In this case it is essential to highlight the path within foo() so that to explain how it fails to call bar().
On 09.06.2019 17:50, Kristóf Umann wrote:
I gave this some thought, and realized that we could divide the "should-not-have-happened" cases into 2 categories:

1. The unexplored block is within a function that the analyzer visited.
2. The unexplored block lies within a function that the analyzer didn't even visit.

For example:

01 int flag;

02 bool coin();

03

04 void foo() {

05   flag = coin(); // no note, but there should be one

06 }

07

08 void bar(int &i) {

09  if (flag) // assumed false

10    i = new int;

11 }

12

13 int main() {

14   int *x = 0; // x initialized to 0

15   flag = 1;

16   foo(); // no note, but there should be one

17   bar(x);

18   foo(); // no note, but there should be one

19

20   if (flag) // assumed true

21     *x = 5; // warn

22 }


Observe the subtle difference that on line 17, function bar is called, and the first branch lies in that function. This is definitely a "category 1" case, since the analyzer inlined and visited bar(), but again, failed the realize the importance of flag due to not visiting line 10, and makes no mention of line 16 in the bugreport.

01 int flag;

02 bool coin();

03

04 void foo() {

05   flag = coin(); // no note, but there should be one

06 }

07

08 void bar(int &i) {

09  i = new int;

10 }

11

12 int main() {

13   int *x = 0; // x initialized to 0

14   flag = 1;

15   foo(); // no note, but there should be one

16 if (flag) // assumed false

17    bar(x);

18   foo(); // no note, but there should be one

19

20   if (flag) // assumed true

21     *x = 5; // warn

22 }


In this example, the if statement stays, but the assignment to x is now found in bar(). In this case, the analyzer didn't explore bar(), so it's a "category 2" case.


Now, the reason why these categories are important is that if the analyzer actually explored a function, it solves the problem of parameters: for the example shown for "category 1", we can simply ask the analyzer whether x in function main() is the same as i in function bar(). This is, as stated in my previous letter, far more difficult for "category 2" cases.


I think when I'm finished with the "must-have-happened" cases, I could start enhancing NoStoreFuncVisitor to gather conditions possibly worth tracking for "category 1" cases, see whether an intraprocedural slicing algorithm makes sense, could worry about resolving parameters for the other case one when I get there.


Any thoughts on this approach? :)


On Sat, 8 Jun 2019 at 01:59, Kristóf Umann <[hidden email]> wrote:
Hi!

I'm currently working on a GSoC project that aims to improve the description we provide for bug reports the Static Analyzer emits.

We divided the problem (that the bugreports didn't explain how the bug came about very well) into two parts:

1. The information is available in the bug path (basically parts of the code the analyzer actually explored on that particular path of execution). We refer to these as the "must-have-happened" case.
2. The information isn't available, possibly not even in the ExplodedGraph (parts of the program the analyzer explored on all paths of execution). We call this the "should-not-have-happened" case.

01 int flag;

02 bool coin();

03

04 void foo() {

05   flag = coin(); // no note, but there should be one

06 }

07

08 int main() {

09   int *x = 0; // x initialized to 0

10   flag = 1;

11   foo(); // no note, but there should be one

12   if (flag) // assumed false

13     x = new int;

14   foo(); // no note, but there should be one

15

16   if (flag) // assumed true

17     *x = 5; // warn

18 }


The first branch is a good example for the "should-not-have-happened" case: the bug path doesn't contain line 13, so we don't really know whether the condition on line 12 is a big deal, which in this case it is (unlike if it only guarded a print of some sort)

The second branch is a good example for the "must-have-happened" case: we know that the condition on line 16 is very important.

I like to think that I made a very decent progress on the first part, as it's objectively a lot simpler. A great addition to the analyzer is that it now understands (at least, when my patches land) control dependencies in the CFG. With that, we can figure out that flag's value on line 16 is crucial, so we track it, resulting in notes being added on line 14, and on line 5, explaining that flag's value changed.

However, we are very unsure about how to tackle the second part on the project. I gave this a lot of thought, we had a couple meetings in private, and I'd like to share where I am with this.

My project proposed to implement a static backward program slicing algorithm which would definitely be amazing, but seems to be an unrealistic approach. Basically, any slicing algorithm that is interprocedural would require a system dependence graph, something we just simply don't have, not even for LLVM.

Whenever pointer semantics are in the picture, we have a really hard time: it's something that is super difficult to reason about without symbolic execution, and it would probably be a good idea not to tackle such cases until we come up with something clever (maybe the pset calculation Gábor Horváth is working on?). We should, however, being able to handle reference parameters.

So then, what's the goal exactly?

We'd like to teach the analyzer that some code blocks (even in other functions) could have written the variable that is responsible for the bug (x, in the included example), and is very important to the bug report. We could use this information to track variables similarly to the "must-have-happened" case: track conditions that prevented the write from happening.

To the specifics. I think it's reasonable to implement an intraprocedural slicing algorithm. For the included example, I think we should be able to discover the potential write to x on line 9. We don't need anything we already don't have to achieve this.

Reasoning across function boundaries could be achieved by inlining functions (as I understand it, simply insert the function's CFGBlocks), but there are big unknowns in this.

So my question is, do you think such inlining is possible? If not, should I just create a primitive variable-to-parameter mapping? Is there something else on the table?

Cheers!
Kristóf Umann



_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer] Discovering information not contained in the bugpath

Nathan Ridge via cfe-dev
In reply to this post by Nathan Ridge via cfe-dev
Hmm, what do you mean by this line?:

        trackExpressionValue(N, RHS of assignment, Report)

Like, how does the existing visitor infrastructure behave when the statement is, by construction, never appears in the bug path? It doesn't even have a value to track, as it never appeared in the Environment.

P.S. I just realized that this discussion would probably be messy to read later through archives, as old revisions of the google doc are no longer available.

On 6/10/19 6:46 PM, Kristóf Umann wrote:
I rewrote my sketch algorithm that showcases how I imagine this to work.


On Mon, 10 Jun 2019 at 20:03, Kristóf Umann <[hidden email]> wrote:


On Mon, 10 Jun 2019 at 05:03, Artem Dergachev <[hidden email]> wrote:
The problem we're solving is *entirely* about inter-procedural analysis. Like, as long as there are no inlined calls on the path (and therefore no pruning is being done), then there's no problem at all: the user can easily see that the value is untouched by direct observation.

I have explained myself poorly. I intend to implement an intraprocedural slicing, but that would work interprocedurally by resolving function calls with the help of the analyzer. This is important, because for the by-the-book interprocedural slicing you'd need a system dependence graph. Now, if a function isn't inlined, we need to do this on our own. I see a couple unknowns here (how deep do we explore such functions? how to we resolve non-trivial parameter passing? are we sure that tracking expressions here is actually meaningful?) but I think tackling simple cases is achievable.
 
In this sense i find the lack of note in example 2 completely non-problematic (as long as control dependency tracking lands). It's not a problem that the user isn't told that bar() could have initialized 'x', because the user already knows that bar() is not even called in the first place.

Would you agree that tracking flag however would be good?
 
However, there may be a combination of category 1 and category 2 that is much more interesting: what if main() unconditionally calls foo() that conditionally calls bar() that unconditionally initializes the value? Eg.:

void bar(int *x) {
  *x = 1;  // the interesting event that doesn't happen
}

void foo(int *x) {
  if (rand())  // control dependency of the interesting event
    bar(x);
}

int main() {
  int x;
  foo(&x);
  use(x); // use of uninitialized value
}

In this case it is essential to highlight the path within foo() so that to explain how it fails to call bar().
On 09.06.2019 17:50, Kristóf Umann wrote:
I gave this some thought, and realized that we could divide the "should-not-have-happened" cases into 2 categories:

1. The unexplored block is within a function that the analyzer visited.
2. The unexplored block lies within a function that the analyzer didn't even visit.

For example:

01 int flag;

02 bool coin();

03

04 void foo() {

05   flag = coin(); // no note, but there should be one

06 }

07

08 void bar(int &i) {

09  if (flag) // assumed false

10    i = new int;

11 }

12

13 int main() {

14   int *x = 0; // x initialized to 0

15   flag = 1;

16   foo(); // no note, but there should be one

17   bar(x);

18   foo(); // no note, but there should be one

19

20   if (flag) // assumed true

21     *x = 5; // warn

22 }


Observe the subtle difference that on line 17, function bar is called, and the first branch lies in that function. This is definitely a "category 1" case, since the analyzer inlined and visited bar(), but again, failed the realize the importance of flag due to not visiting line 10, and makes no mention of line 16 in the bugreport.

01 int flag;

02 bool coin();

03

04 void foo() {

05   flag = coin(); // no note, but there should be one

06 }

07

08 void bar(int &i) {

09  i = new int;

10 }

11

12 int main() {

13   int *x = 0; // x initialized to 0

14   flag = 1;

15   foo(); // no note, but there should be one

16 if (flag) // assumed false

17    bar(x);

18   foo(); // no note, but there should be one

19

20   if (flag) // assumed true

21     *x = 5; // warn

22 }


In this example, the if statement stays, but the assignment to x is now found in bar(). In this case, the analyzer didn't explore bar(), so it's a "category 2" case.


Now, the reason why these categories are important is that if the analyzer actually explored a function, it solves the problem of parameters: for the example shown for "category 1", we can simply ask the analyzer whether x in function main() is the same as i in function bar(). This is, as stated in my previous letter, far more difficult for "category 2" cases.


I think when I'm finished with the "must-have-happened" cases, I could start enhancing NoStoreFuncVisitor to gather conditions possibly worth tracking for "category 1" cases, see whether an intraprocedural slicing algorithm makes sense, could worry about resolving parameters for the other case one when I get there.


Any thoughts on this approach? :)


On Sat, 8 Jun 2019 at 01:59, Kristóf Umann <[hidden email]> wrote:
Hi!

I'm currently working on a GSoC project that aims to improve the description we provide for bug reports the Static Analyzer emits.

We divided the problem (that the bugreports didn't explain how the bug came about very well) into two parts:

1. The information is available in the bug path (basically parts of the code the analyzer actually explored on that particular path of execution). We refer to these as the "must-have-happened" case.
2. The information isn't available, possibly not even in the ExplodedGraph (parts of the program the analyzer explored on all paths of execution). We call this the "should-not-have-happened" case.

01 int flag;

02 bool coin();

03

04 void foo() {

05   flag = coin(); // no note, but there should be one

06 }

07

08 int main() {

09   int *x = 0; // x initialized to 0

10   flag = 1;

11   foo(); // no note, but there should be one

12   if (flag) // assumed false

13     x = new int;

14   foo(); // no note, but there should be one

15

16   if (flag) // assumed true

17     *x = 5; // warn

18 }


The first branch is a good example for the "should-not-have-happened" case: the bug path doesn't contain line 13, so we don't really know whether the condition on line 12 is a big deal, which in this case it is (unlike if it only guarded a print of some sort)

The second branch is a good example for the "must-have-happened" case: we know that the condition on line 16 is very important.

I like to think that I made a very decent progress on the first part, as it's objectively a lot simpler. A great addition to the analyzer is that it now understands (at least, when my patches land) control dependencies in the CFG. With that, we can figure out that flag's value on line 16 is crucial, so we track it, resulting in notes being added on line 14, and on line 5, explaining that flag's value changed.

However, we are very unsure about how to tackle the second part on the project. I gave this a lot of thought, we had a couple meetings in private, and I'd like to share where I am with this.

My project proposed to implement a static backward program slicing algorithm which would definitely be amazing, but seems to be an unrealistic approach. Basically, any slicing algorithm that is interprocedural would require a system dependence graph, something we just simply don't have, not even for LLVM.

Whenever pointer semantics are in the picture, we have a really hard time: it's something that is super difficult to reason about without symbolic execution, and it would probably be a good idea not to tackle such cases until we come up with something clever (maybe the pset calculation Gábor Horváth is working on?). We should, however, being able to handle reference parameters.

So then, what's the goal exactly?

We'd like to teach the analyzer that some code blocks (even in other functions) could have written the variable that is responsible for the bug (x, in the included example), and is very important to the bug report. We could use this information to track variables similarly to the "must-have-happened" case: track conditions that prevented the write from happening.

To the specifics. I think it's reasonable to implement an intraprocedural slicing algorithm. For the included example, I think we should be able to discover the potential write to x on line 9. We don't need anything we already don't have to achieve this.

Reasoning across function boundaries could be achieved by inlining functions (as I understand it, simply insert the function's CFGBlocks), but there are big unknowns in this.

So my question is, do you think such inlining is possible? If not, should I just create a primitive variable-to-parameter mapping? Is there something else on the table?

Cheers!
Kristóf Umann



_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer] Discovering information not contained in the bugpath

Nathan Ridge via cfe-dev
I enhanced the sketch greatly.

On Tue, 11 Jun 2019 at 04:19, Artem Dergachev <[hidden email]> wrote:
Hmm, what do you mean by this line?:

        trackExpressionValue(N, RHS of assignment, Report)

Like, how does the existing visitor infrastructure behave when the statement is, by construction, never appears in the bug path? It doesn't even have a value to track, as it never appeared in the Environment.

Well the thinking was that there's no need for that as TrackControlDependecy only needs a CFG to track conditions. Now, obviously, the actual tracking of the right hand side of the assignment wouldn't be possible, but that could be a plus: I think tracking too much hypothetical "this could have prevented the bug, but ultimately did not" events could litter the bug report.

I changed this in the sketch.
 
P.S. I just realized that this discussion would probably be messy to read later through archives, as old revisions of the google doc are no longer available.

Good point. Here's a copy-paste from the document:

The following slicing algorithm, which is heavily tailored for the Clang Static Analyzer, somewhat redefines the goal of slicing: traditionally, we’d gather a set of relevant variables, which would initially be the variables in the slicing criterion, and expand it if we find a variable relevant to any variable within the set. Along this, we’d also calculate a set of relevant statements, which is a subset of the program, either executable on it’s own or not.


My approach is vastly different: the reason behind this is the function called trackExpressionValue. In a nutshell, trackExpressionValue tracks a variable, and adds notes at where the tracked variable was written, alongside the path that leads there, essentially gathering the set of relevant statements. Teaching it to discover variables immediate relevant to the tracked variable, we could (in theory) calculate the set of relevant variables transitively: if X is tracked, and A would be added to the set because of X, and B would be added because of A, tracking X would trigger tracking A, and that would trigger tracking B.


Our starting point (slicing criterion) is the ExplodedNode and CFGBlock where report originates from, and the variable that was initially tracked.

function insertIfLastWrite(CFG, SetOfLastWrites : set of CFGBlocks,

                          NewBlock : CFGBlock)

 for each W block in SetOfLastWrites do

   if NewBlock post dominates W then

     SetOfLastWrites -= W

     SetOfLastWrites += NewBlock        

   else if W post dominates B then

     break

   fi

 od

 if no elements post dominate B or vise versa then

   SetOfLastWrites += NewBlock

 fi


// Traverses N’s CFG to discover branches that are not a part of the

// bug path. Returns true if it discovers a write to X that is

// inevitable.

function conditionTracker(N : ExplodedNode, X : variable,

                         OriginB : CFGBlock, Report : BugReport)

 SetOfLastWrites = {} : set of CFGBlocks


 for all B blocks in N->getCFG() in order do

   if there is no path from B to OriginB then

     continue

   fi


   for all E elements in B in order do

     if E writes X then

       insertIfLastWrite(N->getCFG(), SetOfLastWrites, B)

       continue

     fi


     if E is a function call to F and the analyzer inlined F then

       ParamSet := set of F's parameters to which

                   X is passed to as a non-const reference

       NC := N's predecessor that is F's CallEnter

        

       for all NewX variables in ParamSet do

         if conditionTracker(NC, NewX, F’s exit block, Report) then

           insertIfLastWrite(N->getCFG(), SetOfLastWrites, B)

         fi

       od

       // X is global/static, or a field and F is a method, etc...

       if F would have access to X regardless then

         if conditionTracker(NC, NewX, F’s exit block, Report) then

           insertIfLastWrite(N->getCFG(), SetOfLastWrites, B)

         fi

       fi

       continue

     fi


   od

 od


 for all W blocks in SetOfLastWrites do

   if W is not a part of the bug path then

     track control dependencies of W

   fi

   // Otherwise, let regular visitors handle it.

 od


 if any of the blocks in SetOfLastWrites is in the bug path

   return true

 fi

 return false


 
On 6/10/19 6:46 PM, Kristóf Umann wrote:
I rewrote my sketch algorithm that showcases how I imagine this to work.


On Mon, 10 Jun 2019 at 20:03, Kristóf Umann <[hidden email]> wrote:


On Mon, 10 Jun 2019 at 05:03, Artem Dergachev <[hidden email]> wrote:
The problem we're solving is *entirely* about inter-procedural analysis. Like, as long as there are no inlined calls on the path (and therefore no pruning is being done), then there's no problem at all: the user can easily see that the value is untouched by direct observation.

I have explained myself poorly. I intend to implement an intraprocedural slicing, but that would work interprocedurally by resolving function calls with the help of the analyzer. This is important, because for the by-the-book interprocedural slicing you'd need a system dependence graph. Now, if a function isn't inlined, we need to do this on our own. I see a couple unknowns here (how deep do we explore such functions? how to we resolve non-trivial parameter passing? are we sure that tracking expressions here is actually meaningful?) but I think tackling simple cases is achievable.
 
In this sense i find the lack of note in example 2 completely non-problematic (as long as control dependency tracking lands). It's not a problem that the user isn't told that bar() could have initialized 'x', because the user already knows that bar() is not even called in the first place.

Would you agree that tracking flag however would be good?
 
However, there may be a combination of category 1 and category 2 that is much more interesting: what if main() unconditionally calls foo() that conditionally calls bar() that unconditionally initializes the value? Eg.:

void bar(int *x) {
  *x = 1;  // the interesting event that doesn't happen
}

void foo(int *x) {
  if (rand())  // control dependency of the interesting event
    bar(x);
}

int main() {
  int x;
  foo(&x);
  use(x); // use of uninitialized value
}

In this case it is essential to highlight the path within foo() so that to explain how it fails to call bar().
On 09.06.2019 17:50, Kristóf Umann wrote:
I gave this some thought, and realized that we could divide the "should-not-have-happened" cases into 2 categories:

1. The unexplored block is within a function that the analyzer visited.
2. The unexplored block lies within a function that the analyzer didn't even visit.

For example:

01 int flag;

02 bool coin();

03

04 void foo() {

05   flag = coin(); // no note, but there should be one

06 }

07

08 void bar(int &i) {

09  if (flag) // assumed false

10    i = new int;

11 }

12

13 int main() {

14   int *x = 0; // x initialized to 0

15   flag = 1;

16   foo(); // no note, but there should be one

17   bar(x);

18   foo(); // no note, but there should be one

19

20   if (flag) // assumed true

21     *x = 5; // warn

22 }


Observe the subtle difference that on line 17, function bar is called, and the first branch lies in that function. This is definitely a "category 1" case, since the analyzer inlined and visited bar(), but again, failed the realize the importance of flag due to not visiting line 10, and makes no mention of line 16 in the bugreport.

01 int flag;

02 bool coin();

03

04 void foo() {

05   flag = coin(); // no note, but there should be one

06 }

07

08 void bar(int &i) {

09  i = new int;

10 }

11

12 int main() {

13   int *x = 0; // x initialized to 0

14   flag = 1;

15   foo(); // no note, but there should be one

16 if (flag) // assumed false

17    bar(x);

18   foo(); // no note, but there should be one

19

20   if (flag) // assumed true

21     *x = 5; // warn

22 }


In this example, the if statement stays, but the assignment to x is now found in bar(). In this case, the analyzer didn't explore bar(), so it's a "category 2" case.


Now, the reason why these categories are important is that if the analyzer actually explored a function, it solves the problem of parameters: for the example shown for "category 1", we can simply ask the analyzer whether x in function main() is the same as i in function bar(). This is, as stated in my previous letter, far more difficult for "category 2" cases.


I think when I'm finished with the "must-have-happened" cases, I could start enhancing NoStoreFuncVisitor to gather conditions possibly worth tracking for "category 1" cases, see whether an intraprocedural slicing algorithm makes sense, could worry about resolving parameters for the other case one when I get there.


Any thoughts on this approach? :)


On Sat, 8 Jun 2019 at 01:59, Kristóf Umann <[hidden email]> wrote:
Hi!

I'm currently working on a GSoC project that aims to improve the description we provide for bug reports the Static Analyzer emits.

We divided the problem (that the bugreports didn't explain how the bug came about very well) into two parts:

1. The information is available in the bug path (basically parts of the code the analyzer actually explored on that particular path of execution). We refer to these as the "must-have-happened" case.
2. The information isn't available, possibly not even in the ExplodedGraph (parts of the program the analyzer explored on all paths of execution). We call this the "should-not-have-happened" case.

01 int flag;

02 bool coin();

03

04 void foo() {

05   flag = coin(); // no note, but there should be one

06 }

07

08 int main() {

09   int *x = 0; // x initialized to 0

10   flag = 1;

11   foo(); // no note, but there should be one

12   if (flag) // assumed false

13     x = new int;

14   foo(); // no note, but there should be one

15

16   if (flag) // assumed true

17     *x = 5; // warn

18 }


The first branch is a good example for the "should-not-have-happened" case: the bug path doesn't contain line 13, so we don't really know whether the condition on line 12 is a big deal, which in this case it is (unlike if it only guarded a print of some sort)

The second branch is a good example for the "must-have-happened" case: we know that the condition on line 16 is very important.

I like to think that I made a very decent progress on the first part, as it's objectively a lot simpler. A great addition to the analyzer is that it now understands (at least, when my patches land) control dependencies in the CFG. With that, we can figure out that flag's value on line 16 is crucial, so we track it, resulting in notes being added on line 14, and on line 5, explaining that flag's value changed.

However, we are very unsure about how to tackle the second part on the project. I gave this a lot of thought, we had a couple meetings in private, and I'd like to share where I am with this.

My project proposed to implement a static backward program slicing algorithm which would definitely be amazing, but seems to be an unrealistic approach. Basically, any slicing algorithm that is interprocedural would require a system dependence graph, something we just simply don't have, not even for LLVM.

Whenever pointer semantics are in the picture, we have a really hard time: it's something that is super difficult to reason about without symbolic execution, and it would probably be a good idea not to tackle such cases until we come up with something clever (maybe the pset calculation Gábor Horváth is working on?). We should, however, being able to handle reference parameters.

So then, what's the goal exactly?

We'd like to teach the analyzer that some code blocks (even in other functions) could have written the variable that is responsible for the bug (x, in the included example), and is very important to the bug report. We could use this information to track variables similarly to the "must-have-happened" case: track conditions that prevented the write from happening.

To the specifics. I think it's reasonable to implement an intraprocedural slicing algorithm. For the included example, I think we should be able to discover the potential write to x on line 9. We don't need anything we already don't have to achieve this.

Reasoning across function boundaries could be achieved by inlining functions (as I understand it, simply insert the function's CFGBlocks), but there are big unknowns in this.

So my question is, do you think such inlining is possible? If not, should I just create a primitive variable-to-parameter mapping? Is there something else on the table?

Cheers!
Kristóf Umann



_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev