Visiting statements of a CFG one time

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Visiting statements of a CFG one time

Hollman, Daisy Sophia via cfe-dev
Hi all,

I'm trying to iterate over all top-level functions of a C file and perform
control-flow analysis of each function body.  My current approach looks as
follows:

class MyASTVisitor : public DeclVisitor<MyASTVisitor> {
  ASTContext &Context;
public:
  explicit MyASTVisitor(ASTContext &Context) : Context(Context) { }

  void VisitFunctionDecl(FunctionDecl *FD) {
    if (!FD->hasBody())
      return;

    FD->getNameInfo().getName().dump();

    const CFG::BuildOptions opts;
    auto CFG = CFG::buildCFG(FD, FD->getBody(), &Context, opts);
    CFG->dump(Context.getLangOpts(), true);
    // perform the actual analysis over the CFG
  }
};

class MyConsumer : public ASTConsumer {
  MyASTVisitor Visitor;
public:
  explicit MyConsumer(ASTContext &Context) : Visitor(Context) { }

  bool HandleTopLevelDecl(DeclGroupRef DR) override {
    for (auto *I : DR)
      Visitor.Visit(I);
    return true;
  }
};

What confuses me right now is how the CFG is structured.  Consider the
following C code:

extern int *pi;
extern unsigned char *pc;
void foo(void) {
    *(pi=(int *)pc, pi) = 42;
}

The dump of the corresponding CFG looks as follows:

 [B2 (ENTRY)]
   Succs (1): B1

 [B1]
   1: pi = (int *)pc
   2: pi (ImplicitCastExpr, LValueToRValue, int *)
   3: ... , [B1.2]
   4: *([B1.3]) = 42
   Preds (1): B2
   Succs (1): B0

 [B0 (EXIT)]
   Preds (1): B1

The single statement
  *(pi=(int *)pc, pi) = 42;
is broken up into 4 statements in the CFG.  That means, if I iterate over each
statement of a block like

for (const CFGBlock *block : *CFG) {
  for (const auto &I : *block) {
    if (Optional<CFGStmt> cs = I.getAs<CFGStmt>())
      TF.Visit(cs->getStmt());
  }
}

and the transfer functions TF recursively visit each subexpression, I will
visit some of them twice.  My plan was actually to construct transfer functions
of type ConstStmtVisitor in order to examine each statement recursively.  In
the past I ran into a similar problem which was related to wrongly set
CFG::BuildOptions.  However, this does not seem to be the case this time.

Is it possible to construct a CFG where statements are not split?  Or is there
another approach to walk over all statements of a basic block exactly one time?

Cheers,
Stefan
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Visiting statements of a CFG one time

Hollman, Daisy Sophia via cfe-dev
I ran into exactly this problem last week. I wanted to visit top-level statements (and expressions since expression-statements are just expressions) and recursively work through their ASTs. I just wrote a small pre-pass over the CFG elements that would construct a set of top-level statements by noting which expressions occurred as subexpressions and then extract the statements from there.

There's probably a better way to do this, but here's what my implementation of that function looks like.

    void findStatements(CFGBlock *B, llvm::SmallVectorImpl<const Stmt *> &SS) {
      llvm::SmallDenseMap<const Stmt*, bool> Map;

      // Mark subexpressions of each element in the block.
      for (auto I = B->begin(); I != B->end(); ++I) {
        CFGElement E = *I;
        if (auto SE = E.getAs<CFGStmt>()) {
          const Stmt *S = SE->getStmt();
          for (const Stmt *K : S->children())
            Map[K] = true;
        }
      }

      // Any expressions not in Map are statements.
      for (auto I = B->begin(); I != B->end(); ++I) {
        CFGElement E = *I;
        if (auto SE = E.getAs<CFGStmt>()) {
          const Stmt *S = SE->getStmt();
          if (Map.find(S) == Map.end())
            SS.push_back(S);
        }
      }
    }

Andrew Sutton


On Tue, Sep 8, 2020 at 9:44 AM Stefan Schulze Frielinghaus via cfe-dev <[hidden email]> wrote:
Hi all,

I'm trying to iterate over all top-level functions of a C file and perform
control-flow analysis of each function body.  My current approach looks as
follows:

class MyASTVisitor : public DeclVisitor<MyASTVisitor> {
  ASTContext &Context;
public:
  explicit MyASTVisitor(ASTContext &Context) : Context(Context) { }

  void VisitFunctionDecl(FunctionDecl *FD) {
    if (!FD->hasBody())
      return;

    FD->getNameInfo().getName().dump();

    const CFG::BuildOptions opts;
    auto CFG = CFG::buildCFG(FD, FD->getBody(), &Context, opts);
    CFG->dump(Context.getLangOpts(), true);
    // perform the actual analysis over the CFG
  }
};

class MyConsumer : public ASTConsumer {
  MyASTVisitor Visitor;
public:
  explicit MyConsumer(ASTContext &Context) : Visitor(Context) { }

  bool HandleTopLevelDecl(DeclGroupRef DR) override {
    for (auto *I : DR)
      Visitor.Visit(I);
    return true;
  }
};

What confuses me right now is how the CFG is structured.  Consider the
following C code:

extern int *pi;
extern unsigned char *pc;
void foo(void) {
    *(pi=(int *)pc, pi) = 42;
}

The dump of the corresponding CFG looks as follows:

 [B2 (ENTRY)]
   Succs (1): B1

 [B1]
   1: pi = (int *)pc
   2: pi (ImplicitCastExpr, LValueToRValue, int *)
   3: ... , [B1.2]
   4: *([B1.3]) = 42
   Preds (1): B2
   Succs (1): B0

 [B0 (EXIT)]
   Preds (1): B1

The single statement
  *(pi=(int *)pc, pi) = 42;
is broken up into 4 statements in the CFG.  That means, if I iterate over each
statement of a block like

for (const CFGBlock *block : *CFG) {
  for (const auto &I : *block) {
    if (Optional<CFGStmt> cs = I.getAs<CFGStmt>())
      TF.Visit(cs->getStmt());
  }
}

and the transfer functions TF recursively visit each subexpression, I will
visit some of them twice.  My plan was actually to construct transfer functions
of type ConstStmtVisitor in order to examine each statement recursively.  In
the past I ran into a similar problem which was related to wrongly set
CFG::BuildOptions.  However, this does not seem to be the case this time.

Is it possible to construct a CFG where statements are not split?  Or is there
another approach to walk over all statements of a basic block exactly one time?

Cheers,
Stefan
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Visiting statements of a CFG one time

Hollman, Daisy Sophia via cfe-dev
Note that the surrounding statement is not necessarily in the same block. Say, your algorithm would fail to find full statement 'x ? y : z;' by looking at its sub-expression 'y'.

As far as i'm aware, no existing data flow analyses operate the way you describe.

On 9/8/20 7:15 AM, Andrew Sutton via cfe-dev wrote:
I ran into exactly this problem last week. I wanted to visit top-level statements (and expressions since expression-statements are just expressions) and recursively work through their ASTs. I just wrote a small pre-pass over the CFG elements that would construct a set of top-level statements by noting which expressions occurred as subexpressions and then extract the statements from there.

There's probably a better way to do this, but here's what my implementation of that function looks like.

    void findStatements(CFGBlock *B, llvm::SmallVectorImpl<const Stmt *> &SS) {
      llvm::SmallDenseMap<const Stmt*, bool> Map;

      // Mark subexpressions of each element in the block.
      for (auto I = B->begin(); I != B->end(); ++I) {
        CFGElement E = *I;
        if (auto SE = E.getAs<CFGStmt>()) {
          const Stmt *S = SE->getStmt();
          for (const Stmt *K : S->children())
            Map[K] = true;
        }
      }

      // Any expressions not in Map are statements.
      for (auto I = B->begin(); I != B->end(); ++I) {
        CFGElement E = *I;
        if (auto SE = E.getAs<CFGStmt>()) {
          const Stmt *S = SE->getStmt();
          if (Map.find(S) == Map.end())
            SS.push_back(S);
        }
      }
    }

Andrew Sutton


On Tue, Sep 8, 2020 at 9:44 AM Stefan Schulze Frielinghaus via cfe-dev <[hidden email]> wrote:
Hi all,

I'm trying to iterate over all top-level functions of a C file and perform
control-flow analysis of each function body.  My current approach looks as
follows:

class MyASTVisitor : public DeclVisitor<MyASTVisitor> {
  ASTContext &Context;
public:
  explicit MyASTVisitor(ASTContext &Context) : Context(Context) { }

  void VisitFunctionDecl(FunctionDecl *FD) {
    if (!FD->hasBody())
      return;

    FD->getNameInfo().getName().dump();

    const CFG::BuildOptions opts;
    auto CFG = CFG::buildCFG(FD, FD->getBody(), &Context, opts);
    CFG->dump(Context.getLangOpts(), true);
    // perform the actual analysis over the CFG
  }
};

class MyConsumer : public ASTConsumer {
  MyASTVisitor Visitor;
public:
  explicit MyConsumer(ASTContext &Context) : Visitor(Context) { }

  bool HandleTopLevelDecl(DeclGroupRef DR) override {
    for (auto *I : DR)
      Visitor.Visit(I);
    return true;
  }
};

What confuses me right now is how the CFG is structured.  Consider the
following C code:

extern int *pi;
extern unsigned char *pc;
void foo(void) {
    *(pi=(int *)pc, pi) = 42;
}

The dump of the corresponding CFG looks as follows:

 [B2 (ENTRY)]
   Succs (1): B1

 [B1]
   1: pi = (int *)pc
   2: pi (ImplicitCastExpr, LValueToRValue, int *)
   3: ... , [B1.2]
   4: *([B1.3]) = 42
   Preds (1): B2
   Succs (1): B0

 [B0 (EXIT)]
   Preds (1): B1

The single statement
  *(pi=(int *)pc, pi) = 42;
is broken up into 4 statements in the CFG.  That means, if I iterate over each
statement of a block like

for (const CFGBlock *block : *CFG) {
  for (const auto &I : *block) {
    if (Optional<CFGStmt> cs = I.getAs<CFGStmt>())
      TF.Visit(cs->getStmt());
  }
}

and the transfer functions TF recursively visit each subexpression, I will
visit some of them twice.  My plan was actually to construct transfer functions
of type ConstStmtVisitor in order to examine each statement recursively.  In
the past I ran into a similar problem which was related to wrongly set
CFG::BuildOptions.  However, this does not seem to be the case this time.

Is it possible to construct a CFG where statements are not split?  Or is there
another approach to walk over all statements of a basic block exactly one time?

Cheers,
Stefan
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Visiting statements of a CFG one time

Hollman, Daisy Sophia via cfe-dev
On Tue, Sep 08, 2020 at 07:44:52PM -0700, Artem Dergachev wrote:
> Note that the surrounding statement is not necessarily in the same block.
> Say, your algorithm would fail to find full statement 'x ? y : z;' by
> looking at its sub-expression 'y'.
>
> As far as i'm aware, no existing data flow analyses operate the way you
> describe.

In which cases are subexpressions pulled out?  Is the idea behind the
splitting to come up with something similar to expressions in three
address format?

Cheers,
Stefan
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Visiting statements of a CFG one time

Hollman, Daisy Sophia via cfe-dev
In reply to this post by Hollman, Daisy Sophia via cfe-dev
On Tue, Sep 08, 2020 at 10:15:31AM -0400, Andrew Sutton wrote:

> I ran into exactly this problem last week. I wanted to visit top-level
> statements (and expressions since expression-statements are just
> expressions) and recursively work through their ASTs. I just wrote a small
> pre-pass over the CFG elements that would construct a set of top-level
> statements by noting which expressions occurred as subexpressions and then
> extract the statements from there.
>
> There's probably a better way to do this, but here's what my implementation
> of that function looks like.
>
>     void findStatements(CFGBlock *B, llvm::SmallVectorImpl<const Stmt *>
> &SS) {
>       llvm::SmallDenseMap<const Stmt*, bool> Map;
>
>       // Mark subexpressions of each element in the block.
>       for (auto I = B->begin(); I != B->end(); ++I) {
>         CFGElement E = *I;
>         if (auto SE = E.getAs<CFGStmt>()) {
>           const Stmt *S = SE->getStmt();
>           for (const Stmt *K : S->children())
>             Map[K] = true;
>         }
>       }
>
>       // Any expressions not in Map are statements.
>       for (auto I = B->begin(); I != B->end(); ++I) {
>         CFGElement E = *I;
>         if (auto SE = E.getAs<CFGStmt>()) {
>           const Stmt *S = SE->getStmt();
>           if (Map.find(S) == Map.end())
>             SS.push_back(S);
>         }
>       }
>     }

Interesting, though I think you need to do it recursively since
children() is not recursive.  In my case one of the expressions which
got pulled out had an extra ParenExpr around in the embedded expression.

I will further play around with this idea.

Thanks,
Stefan
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev