Statements appear twice in CFG

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Statements appear twice in CFG

Adam Cieszkiel via cfe-dev

Hello all,

I’m writing a tool that uses the CFG with different functions.

I have a problem that when iterating over the elements of a CFGBlock, some of the statements appear twice.

It is most prevalent in function calls but I saw it in some other cases as well.

For example, the following code:

 

double i = pow(2.0, 3);

 

will create two elements in the CFGBlock:

[B1]

   1: pow(2., 3)

   2: double i = pow(2., 3);

 

And when I try to reproduce the code from this block (by iterating over the elements and pretty print them), I get two function calls:

pow(2., 3);

double i = pow(2., 3);

 

I saw the same behavior also in binary operator when the opcode is a comma and in conditional operator.

Here are some examples:

1)

int i, k = 10;

for (i = 0; i < 10; i++, k--){

// some code

}

 

The block that corresponds to the step of the for loop looks like that:

[B2]

   1: i++

   2: k--

   3: [B2.1], [B2.2]

 

Which creates the code:

i++;

k--;

i++, k--;

 

2)

return a > 0? a : -a;

 

Creates this block:

1: [B4.2] ? [B2.1] : [B3.1]

   2: return [B1.1];

 

Which pretty prints the following code:

a > 0 ? a : -a;

return a > 0 ? a : -a;

 

 

I suspect that the reason for this behavior is that the CFG is built recursively and some of the statements are visited and appended twice to the block.

In the case of the binary operator with comma, I solved this problem by removing the call to appendStmt in the code below (Which is in CFGBuilder::VisitBinaryOperator() in CFG.cpp file)

 

if (B->getOpcode() == BO_Comma) { // ,

    autoCreateBlock();

    //appendStmt(Block, B);

    addStmt(B->getRHS());

    return addStmt(B->getLHS());

  }

 

I tried to solve the problem also by changing VisitCallExpr() and VisitConditioalOperator() but couldn’t solve it there.

I would appreciate any help and insights about this issue.

Thanks,

Rachel.


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Statements appear twice in CFG

Adam Cieszkiel via cfe-dev
Hi Rachel,

The CFG encodes control flow between both statements and (important) sub-expressions.  It’s been a while since I’ve touched the CFG, but IIRC we don’t do a complete linearization of all expressions — but we do for many of them.

Let’s take your example:

[B1]
   1: pow(2., 3)
   2: double i = pow(2., 3);

This is actually an expression, B1.1, and its containing statement, B1.2.  The CFG encodes that “pow(2., 3)” happens before the initialization of ‘i’.  This is NOT the case of the same thing being added twice.  The idea is that an analysis will know the call happens first.  By the time a statement appears in the CFG, it is a precondition that any sub-expressions that appear in the CFG will dominate the statement appearing in the CFG.  Again, we don’t do a complete linearization (because that would be unnecessary) but we do for things like calls, binary expressions, and anything that has control flow like ‘&&’, etc.

The way this SHOULD be printed up is something like:

   1: i++
   2: k--
   3: [B2.1], [B2.2]

In this case we see that the sub-expression ‘i++’ gets evaluated first, followed by ‘k—‘, and then the comma expression gets evaluated as a whole after both of these.

Does that help?  It is not the case that the same exact thing is added twice.

Ted


On Jun 7, 2017, at 7:17 AM, Rachel HaCohen (rahacohe) via cfe-dev <[hidden email]> wrote:

Hello all,
I’m writing a tool that uses the CFG with different functions.
I have a problem that when iterating over the elements of a CFGBlock, some of the statements appear twice.
It is most prevalent in function calls but I saw it in some other cases as well.
For example, the following code:
 
double i = pow(2.0, 3);
 
will create two elements in the CFGBlock:
[B1]
   1: pow(2., 3)
   2: double i = pow(2., 3);
 
And when I try to reproduce the code from this block (by iterating over the elements and pretty print them), I get two function calls:
pow(2., 3);
double i = pow(2., 3);
 
I saw the same behavior also in binary operator when the opcode is a comma and in conditional operator.
Here are some examples:
1)
int i, k = 10;
for (i = 0; i < 10; i++, k--){
// some code
}
 
The block that corresponds to the step of the for loop looks like that:
[B2]
   1: i++
   2: k--
   3: [B2.1], [B2.2]
 
Which creates the code:
i++;
k--;
i++, k--;
 
2)
return a > 0? a : -a;
 
Creates this block:
1: [B4.2] ? [B2.1] : [B3.1]
   2: return [B1.1];
 
Which pretty prints the following code:
a > 0 ? a : -a;
return a > 0 ? a : -a;
 
 
I suspect that the reason for this behavior is that the CFG is built recursively and some of the statements are visited and appended twice to the block.
In the case of the binary operator with comma, I solved this problem by removing the call to appendStmt in the code below (Which is in CFGBuilder::VisitBinaryOperator() in CFG.cpp file)
 
if (B->getOpcode() == BO_Comma) { // ,
    autoCreateBlock();
    //appendStmt(Block, B);
    addStmt(B->getRHS());
    return addStmt(B->getLHS());
  }
 
I tried to solve the problem also by changing VisitCallExpr() and VisitConditioalOperator() but couldn’t solve it there.
I would appreciate any help and insights about this issue.
Thanks,
Rachel. 
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Statements appear twice in CFG

Rachel HaCohen
This post has NOT been accepted by the mailing list yet.
Thanks for the response!
Of course, you are right. It’s not accurate to say that the same *exact* statement is added twice.
I meant that sub expressions appear twice, in different contexts.

So let’s start from the end:
My goal is to reproduce the code block exactly as it appears in the original function. I did it by printing the CFGBlock’s elements but, as you can see in the examples, this doesn’t always work as I expected.
I understand now the logic that it is meant to show the order of execution but unfortunately, it doesn’t help me...
So do you have any idea how can I achieve my goal?
I guess what I need is to change the CFG source code so that it doesn’t process recursively the sub-expressions.
Do you think it’s the right approach? I had some difficulties doing it in practice and would appreciate a little help with that, if it’s possible.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Statements appear twice in CFG

Rachel HaCohen
This post has NOT been accepted by the mailing list yet.
In reply to this post by Adam Cieszkiel via cfe-dev
Thanks for the response!
Of course, you are right. It’s not accurate to say that the same *exact* statement is added twice.
I meant that sub expressions appear twice, in different contexts.

So let’s start from the end:
My goal is to reproduce the code block exactly as it appears in the original function. I did it by printing the CFGBlock’s elements but, as you can see in the examples, this doesn’t always work as I expected.
I understand now the logic that it is meant to show the order of execution but unfortunately, it doesn’t help me...
So do you have any idea how can I achieve my goal?
I guess what I need is to change the CFG source code so that it doesn’t process recursively the sub-expressions.
Do you think it’s the right approach? I had some difficulties doing it in practice and would appreciate a little help with that, if it’s possible.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Statements appear twice in CFG

Ted Kremenek
This post has NOT been accepted by the mailing list yet.


On Jul 5, 2017, at 3:28 AM, Rachel HaCohen [via Clang Developers] <[hidden email]> wrote:

Thanks for the response!
Of course, you are right. It’s not accurate to say that the same *exact* statement is added twice.
I meant that sub expressions appear twice, in different contexts.

So let’s start from the end:
My goal is to reproduce the code block exactly as it appears in the original function.

Hi Rachel,

To be honest, I do not really understand the goal.  The CFG just captures the control flow from an AST that it is created from. It is not meant to be a syntactic structure.  If you need that, can you just go back to the underlying AST?  I know that is essentially what happens when you print out a statement, but the CFG consists of actual pointers into the AST which you can do more with.

Can you elaborate a bit more on what you are trying to achieve?

I did it by printing the CFGBlock’s elements but, as you can see in the examples, this doesn’t always work as I expected.
I understand now the logic that it is meant to show the order of execution but unfortunately, it doesn’t help me...
So do you have any idea how can I achieve my goal?
I guess what I need is to change the CFG source code so that it doesn’t process recursively the sub-expressions.
Do you think it’s the right approach? I had some difficulties doing it in practice and would appreciate a little help with that, if it’s possible.

Changing the CFG doesn't strike me as the right approach, because it is no longer a CFG anymore.  A bunch of stuff in the compiler depends on the contents of the CFG.  The reason why it recursively contains pieces of the AST is that control flow is accurately represented.  Going back to my question above, I am still a bit confused what specifically you are trying to accomplish so it is difficult for me to provide the ideal recommendation.

Cheers,
Ted



If you reply to this email, your message will be added to the discussion below:
http://clang-developers.42468.n3.nabble.com/Statements-appear-twice-in-CFG-tp4056829p4057228.html
This email was sent by Rachel HaCohen (via Nabble)
To receive all replies by email, subscribe to this discussion
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Statements appear twice in CFG

Rachel HaCohen
This post has NOT been accepted by the mailing list yet.
So more generally, the goal is to build a source-to-source tool that flattens a function and transforms it into a big switch block (like described here)
I found that the CFG helps a lot with that because the CFG Blocks represent exactly the cases I want to put in the switch. All the control flow analysis is already implemented and I just need to put it in the cases of the switch.
So that's what I did. I built an empty switchStmt, built a caseStmt corresponding to each CFGBlock (except the entry block) and then inserted to each case the Stmts that appear in its block (By iterating over the CFGElements and accessing their Stmt).
It worked nice but then I encountered the problem I described, that functions are being called twice etc...

Hope this is clear now.
Thanks.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Statements appear twice in CFG

Ted Kremenek
This post has NOT been accepted by the mailing list yet.
Thanks Rachel.   That really helps.

I only glanced at the paper, so I can’t weigh in whether or not using the CFG in this way will be feasible given all the odd cases it needs to handle, especially for C++, but I have a much better idea what you are doing now.

My intuition is that modifying the CFG while it is being constructed is problematic.  There’s a bunch of little invariants in CFG construction that are tricky to reason about, and you’ll be put in a place to maintain an out-of-tree patch for CFG building.  It sounds to me that you mostly want to do some post-facto filtering of CFGElements as you are doing the traversal of the CFG.  There are various ways to do this.  It might not be directly applicable, but I suggest taking a look at the printing logic for the CFG:


You’ll notice that the printing logic builds up a hash table that records which basic block and slot # each statement appears so it can nicely print back references from one statement to another.  You could use something similar to record stuff that you are currently double printing, and then use that data structure to prune out printing things twice.  You’re probably going to find a bunch of edge cases you need to handle.

I hope this helps.  This is really just the first idea I thought of, and given I haven’t really read the paper or worked with the problem as directly as you are the suggestion may be flawed, but this seemed to map well to the problem of CFG printing.

Cheers,
Ted

On Jul 5, 2017, at 9:13 PM, Rachel HaCohen [via Clang Developers] <[hidden email]> wrote:

So more generally, the goal is to build a source-to-source tool that flattens a function and transforms it into a big switch block (like described here)
I found that the CFG helps a lot with that because the CFG Blocks represent exactly the cases I want to put in the switch. All the control flow analysis is already implemented and I just need to put it in the cases of the switch.
So that's what I did. I built an empty switchStmt, built a caseStmt corresponding to each CFGBlock (except the entry block) and then inserted to each case the Stmts that appear in its block (By iterating over the CFGElements and accessing their Stmt).
It worked nice but then I encountered the problem I described, that functions are being called twice etc...

Hope this is clear now.
Thanks.


If you reply to this email, your message will be added to the discussion below:
http://clang-developers.42468.n3.nabble.com/Statements-appear-twice-in-CFG-tp4056829p4057244.html
This email was sent by Rachel HaCohen (via Nabble)
To receive all replies by email, subscribe to this discussion
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Statements appear twice in CFG

Rachel HaCohen
This post has NOT been accepted by the mailing list yet.
Thanks!! That helped a lot!
I finally solved the problem by building a variation of the StmtMap, as you suggested.
Before adding each stmt to the map, I check recursively if its sub-stmts are already in the map and if they are, I tag them. Finally I print only un-tagged stmts.
Loading...