Match callback invocation order

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Match callback invocation order

Vassil Vassilev via cfe-dev
Hi All,

I'm struggling to figure out the order in which MatchFinder invokes callbacks.

Here are my matches. Each match callback just dumps the file and line no.

    ClangTool Tool(OptionsParser.getCompilations(),
        OptionsParser.getSourcePathList());

    MyMatchCallback Mcb;
    MatchFinder Finder;

    Finder.addMatcher(
        functionDecl().bind("fn"),
        &Mcb);
       
    ...
   
    Finder.addMatcher(
        returnStmt().bind("rtn"),
        &Mcb);
   
    Finder.addMatcher(
        callExpr().bind("call"),
        &Mcb);
       
    return Tool.run(newFrontendActionFactory(&Finder).get());

I run the tool on this src file:

  1 int fn1(void) {return 1;}
  2 int fn2(void) {return 1;}
  3 int fn3(void) {return 1;}
  4 int fn4(void) {return 1;}
  5
  6 int (main)(void) {
  7     fn1();
  8     return 1;
  9
 10     if (fn2())
 11         return 1;
 12
 13     { /* arbitrary block */
 14         if(fn3())
 15             return 1;
 16     }
 17
 18     fn4();
 19
 20     return 1;
 21 }
 
This is the output I get:  <file>:<line_no>

    fnDecl main /home/.../match_order_test.c:6
    fn call fn1/home/.../match_order_test.c:7
    Rtn /home/.../match_order_test.c:8
    fn call fn4/home/.../match_order_test.c:18    << hey! line 10 should be next?
    Rtn /home/.../match_order_test.c:20
    fn call fn2/home/.../match_order_test.c:10
    Rtn /home/.../match_order_test.c:11
    fn call fn3/home/.../match_order_test.c:14
    Rtn /home/.../match_order_test.c:15
   
I was expecting the matches to be in line-number order. In fact I'm really depending on that for the custom tool I want to build as it needs to analyze the order in which certain fns are called.

Looking at the ast-dump of the src file and adding in the order in which the CallExprs and ReturnStmt's were matched MatchFinder looks to be doing a kind of breadth-first traversal.
 
    TranslationUnitDecl 0x7fffc2a2b238 <<invalid sloc>> <invalid sloc>
    |-TypedefDecl 0x7fffc2a2bad0 <<invalid sloc>> <invalid sloc> implicit __int128_t '__int128'
    ...
    `-FunctionDecl 0x7fffc2a8a588 <line:6:1, line:21:1> line:6:6 main 'int (void)':'int (void)'
      `-CompoundStmt 0x7fffc2a8a8f0 <col:18, line:21:1>
    #1  |-CallExpr 0x7fffc2a8a6c0 <line:7:5, col:9> 'int'
        | `-ImplicitCastExpr 0x7fffc2a8a6a8 <col:5> 'int (*)(void)' <FunctionToPointerDecay>
        |   `-DeclRefExpr 0x7fffc2a8a658 <col:5> 'int (void)' Function 0x7fffc2a89ef0 'fn1' 'int (void)'
    #2  |-ReturnStmt 0x7fffc2a8a700 <line:8:5, col:12>
        | `-IntegerLiteral 0x7fffc2a8a6e0 <col:12> 'int' 1
        |-IfStmt 0x7fffc2a8a798 <line:10:5, line:11:16>
    #5  | |-CallExpr 0x7fffc2a8a748 <line:10:9, col:13> 'int'
        | | `-ImplicitCastExpr 0x7fffc2a8a730 <col:9> 'int (*)(void)' <FunctionToPointerDecay>
        | |   `-DeclRefExpr 0x7fffc2a8a710 <col:9> 'int (void)' Function 0x7fffc2a8a0c0 'fn2' 'int (void)'
    #6  | `-ReturnStmt 0x7fffc2a8a788 <line:11:9, col:16>
        |   `-IntegerLiteral 0x7fffc2a8a768 <col:16> 'int' 1
        |-CompoundStmt 0x7fffc2a8a850 <line:13:5, line:16:5>
        | `-IfStmt 0x7fffc2a8a838 <line:14:9, line:15:20>
    #7  |   |-CallExpr 0x7fffc2a8a7e8 <line:14:12, col:16> 'int'
        |   | `-ImplicitCastExpr 0x7fffc2a8a7d0 <col:12> 'int (*)(void)' <FunctionToPointerDecay>
        |   |   `-DeclRefExpr 0x7fffc2a8a7b0 <col:12> 'int (void)' Function 0x7fffc2a8a248 'fn3' 'int (void)'
    #8  |   `-ReturnStmt 0x7fffc2a8a828 <line:15:13, col:20>
        |     `-IntegerLiteral 0x7fffc2a8a808 <col:20> 'int' 1
    #3  |-CallExpr 0x7fffc2a8a8a0 <line:18:5, col:9> 'int'
        | `-ImplicitCastExpr 0x7fffc2a8a888 <col:5> 'int (*)(void)' <FunctionToPointerDecay>
        |   `-DeclRefExpr 0x7fffc2a8a868 <col:5> 'int (void)' Function 0x7fffc2a8a3d0 'fn4' 'int (void)'
    #4   `-ReturnStmt 0x7fffc2a8a8e0 <line:20:5, col:12>
          `-IntegerLiteral 0x7fffc2a8a8c0 <col:12> 'int' 1
         
         
In the documentation for MatchFinder it says "The order of matches is guaranteed to be equivalent to doing a pre-order traversal on the AST, and applying the matchers in the order in which they were added to the MatchFinder." but I can't reconcile what I see with what I found on the wikipedia for 'pre-order traversal'.

Is this a bug?

How can I influence the traversal order so that my matchers fire in source-code order?

Thanks,
Billy.

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Match callback invocation order

Vassil Vassilev via cfe-dev
I am not familiar with ASTMatchers, but am with RecursiveASTVisitor and the issue seems to what some call the "data recursion" optimization that is specific to TraverseStmt.  While e.g. TraverseDecl(Decl *) and TraverseType(QualType), for statements the default signature is TraverseStmt(Stmt *, DataRecursionQueue *Queue); this queue argument is employed to avoid excessive stack depths that apparently occur with some very big expressions.  The way it works, I believe, is by enqueuing nested statements instead of processing them right away, in exactly the manner you describe.

Fortunately, it’s simple to turn off this optimization: just define a TraverseStmt(Stmt *S) in your Derived (I guess in this case, add this definition to whatever component of ASTMatchers inherits from RecursiveASTVisitor<…>). 

To avoid this confusion in the future (the same problem vexed me awhile back in a different context), It might be wise to alter the default TraverseStmt implementation so that, by default, it keeps track of the stack depth and only employs the data recursion queue when the stack depths really do get too great, and otherwise performs nested traversals in the same manner as TraverseDecl and TraverseType, if that makes sense.

Dave


In the documentation for MatchFinder it says "The order of matches is
guaranteed to be equivalent to doing a pre-order traversal on the AST, and
applying the matchers in the order in which they were added to the
MatchFinder." but I can't reconcile what I see with what I found on the
wikipedia for 'pre-order traversal'.

Is this a bug?

How can I influence the traversal order so that my matchers fire in
source-code order?

Thanks,
Billy.


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Match callback invocation order

Vassil Vassilev via cfe-dev
In reply to this post by Vassil Vassilev via cfe-dev

On 15/06/2020 14:28, Billy O'Mahony via cfe-dev wrote:
> I was expecting the matches to be in line-number order. In fact I'm
> really depending on that for the custom tool I want to build as it
> needs to analyze the order in which certain fns are called.


I'm not sure line numbers are the thing to rely on if that is your goal.
Consider for example a lambda inside a function (though maybe you are
only working on C, not C++?). Your matcher for callExpr() would need to
distinguish between calls inside and outside the lambda.

Regardless of whether C++ is relevant to you, as David pointed out, this
appears to be an optimization, so an alternative implementation might
make sense anyway.

I would recommend you match functions as you are with functionDecl() and
then in your match callback, use something like (not tested)

ast_matchers::match(functionDecl(

forEachDescendant(callExpr(forFunction(functionDecl(equalsNode(FN)))))

), *FN, *Result.Context);


The forFunction() matcher is the important part which will ensure that
for example callExpr() in nested lambdas are not part of your result set.

Then order the resulting nodes by their getBeginLoc() perhaps. (Macros
may need to be accounted for).

Thanks,

Stephen.

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Match callback invocation order

Vassil Vassilev via cfe-dev
Hi Stephen,

thanks for the input.

Fortunately it's just C I'm interested in.

It's not just the order of certain function calls that are relevant; the ordering of some conditional expressions and return stmts are also relevant. And some of them are done via macros in which case, if I recall, I see the source location being the location of the macro definition - not the location of the macro usage.

It still sounds to me like MatchFinder is not doing exactly what it says on the tin "The order of matches is guaranteed to be equivalent to doing a pre-order traversal on the AST, and applying the matchers in the order in which they were added to the MatchFinder." 

I've experimented with RecursiveASTVisitor and that does invoke the Visit<Node> functions in a natural/source-code order. So I will stick with that strategy for now.

Cheers,
Billy.

On Wed, 17 Jun 2020 at 23:21, Stephen Kelly <[hidden email]> wrote:

On 15/06/2020 14:28, Billy O'Mahony via cfe-dev wrote:
> I was expecting the matches to be in line-number order. In fact I'm
> really depending on that for the custom tool I want to build as it
> needs to analyze the order in which certain fns are called.


I'm not sure line numbers are the thing to rely on if that is your goal.
Consider for example a lambda inside a function (though maybe you are
only working on C, not C++?). Your matcher for callExpr() would need to
distinguish between calls inside and outside the lambda.

Regardless of whether C++ is relevant to you, as David pointed out, this
appears to be an optimization, so an alternative implementation might
make sense anyway.

I would recommend you match functions as you are with functionDecl() and
then in your match callback, use something like (not tested)

ast_matchers::match(functionDecl(

forEachDescendant(callExpr(forFunction(functionDecl(equalsNode(FN)))))

), *FN, *Result.Context);


The forFunction() matcher is the important part which will ensure that
for example callExpr() in nested lambdas are not part of your result set.

Then order the resulting nodes by their getBeginLoc() perhaps. (Macros
may need to be accounted for).

Thanks,

Stephen.


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Match callback invocation order

Vassil Vassilev via cfe-dev
In reply to this post by Vassil Vassilev via cfe-dev
Hi David,

thanks for that detail. I did have a quick look at the MatchFinder code in the source but couldn't see where it might be changed (typically the example of MatchFinder do not involve inheriting and specialising it) or where I might override it I did inherit from it. But last time I looked at C++ <algorithm> was the new hotness so that might be a bridge too far for me right now.

In any case, I've started a new impl using RecusiveASTVisitor and I don't have the out-of-order callback issue.

/Billy.

On Mon, 15 Jun 2020 at 22:36, David Rector via cfe-dev <[hidden email]> wrote:
I am not familiar with ASTMatchers, but am with RecursiveASTVisitor and the issue seems to what some call the "data recursion" optimization that is specific to TraverseStmt.  While e.g. TraverseDecl(Decl *) and TraverseType(QualType), for statements the default signature is TraverseStmt(Stmt *, DataRecursionQueue *Queue); this queue argument is employed to avoid excessive stack depths that apparently occur with some very big expressions.  The way it works, I believe, is by enqueuing nested statements instead of processing them right away, in exactly the manner you describe.

Fortunately, it’s simple to turn off this optimization: just define a TraverseStmt(Stmt *S) in your Derived (I guess in this case, add this definition to whatever component of ASTMatchers inherits from RecursiveASTVisitor<…>). 

To avoid this confusion in the future (the same problem vexed me awhile back in a different context), It might be wise to alter the default TraverseStmt implementation so that, by default, it keeps track of the stack depth and only employs the data recursion queue when the stack depths really do get too great, and otherwise performs nested traversals in the same manner as TraverseDecl and TraverseType, if that makes sense.

Dave


In the documentation for MatchFinder it says "The order of matches is
guaranteed to be equivalent to doing a pre-order traversal on the AST, and
applying the matchers in the order in which they were added to the
MatchFinder." but I can't reconcile what I see with what I found on the
wikipedia for 'pre-order traversal'.

Is this a bug?

How can I influence the traversal order so that my matchers fire in
source-code order?

Thanks,
Billy.

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Match callback invocation order

Vassil Vassilev via cfe-dev
I took a look, the RecursiveASTVisitor-derived component is in clang/lib/ASTMatchers/ASTMatchFinder.cpp.

I am not familiar with ASTMatchers, and don’t know whether others experience the same issue Billy is experiencing, but for those who might want to fix this issue, since we’re already discussing other ways of making ASTMatchers more user-friendly out of the box, here’s what I notice in my clang 10 code:

There seem to be two visitors; a MatchASTVisitor and a MatchChildASTVisitor.  I don’t know the role of each, but both have a TraverseStmt(Stmt *S, DataRecursionQueue *Queue) implementation, which means that in either visitor, so long as a non null Queue is provided to the base RecursiveASTVisitor<…>::TraverseStmt call, it will be used, so that nested Stmt traversals will be out of order, confusing to Billy and others.

But, if either a) a null Queue were provided to the base RecursiveASTVisitor::TraverseStmt call, or b) the Queue parameter were eliminated altogether from the signatures of both, I would expect that there would be no queueing, everything would be done in the normal stack order that Billy and many other users expect.

Interestingly, the MatchChildASTVisitor *does* seem to have some logic to disable the Queue when the stack depth is sufficiently small, which would seem to be a great compromise that perhaps should even be in the RecursiveASTVisitor base TraverseStmt logic.

However, the MatchASTVisitor does not have this logic; it always passes the queue to the base TraverseStmt.

Billy and/or others might want to experiment with disabling the Queue parameter, or nullifying it, in MatchASTVisitor::TraverseStmt.  If that fixes things, perhaps the conditional Queue-nullifying code in MatchChildASTVisitor needs to be replicated up there.

Another possibility to consider, though unlikely it will work: if the ASTMatchers code could be contained in VisitStmt or WalkUpFromStmt in TraverseStmt, I believe the WalkUpFrom/VisitStmts are always performed in the right order, if you look closely at how the Queue is processed in RecursiveASTVisitor::TraverseStmt.

That is as far as I can dive into this, as my own "stack depth" of other issues is fairly high right now.  (I definitely would benefit from employing a "Queue" parameter in my own life :)

Good luck,

Dave

On Jun 19, 2020, at 10:46 AM, Billy O'Mahony <[hidden email]> wrote:

Hi David,

thanks for that detail. I did have a quick look at the MatchFinder code in the source but couldn't see where it might be changed (typically the example of MatchFinder do not involve inheriting and specialising it) or where I might override it I did inherit from it. But last time I looked at C++ <algorithm> was the new hotness so that might be a bridge too far for me right now.

In any case, I've started a new impl using RecusiveASTVisitor and I don't have the out-of-order callback issue.

/Billy.

On Mon, 15 Jun 2020 at 22:36, David Rector via cfe-dev <[hidden email]> wrote:
I am not familiar with ASTMatchers, but am with RecursiveASTVisitor and the issue seems to what some call the "data recursion" optimization that is specific to TraverseStmt.  While e.g. TraverseDecl(Decl *) and TraverseType(QualType), for statements the default signature is TraverseStmt(Stmt *, DataRecursionQueue *Queue); this queue argument is employed to avoid excessive stack depths that apparently occur with some very big expressions.  The way it works, I believe, is by enqueuing nested statements instead of processing them right away, in exactly the manner you describe.

Fortunately, it’s simple to turn off this optimization: just define a TraverseStmt(Stmt *S) in your Derived (I guess in this case, add this definition to whatever component of ASTMatchers inherits from RecursiveASTVisitor<…>). 

To avoid this confusion in the future (the same problem vexed me awhile back in a different context), It might be wise to alter the default TraverseStmt implementation so that, by default, it keeps track of the stack depth and only employs the data recursion queue when the stack depths really do get too great, and otherwise performs nested traversals in the same manner as TraverseDecl and TraverseType, if that makes sense.

Dave


In the documentation for MatchFinder it says "The order of matches is
guaranteed to be equivalent to doing a pre-order traversal on the AST, and
applying the matchers in the order in which they were added to the
MatchFinder." but I can't reconcile what I see with what I found on the
wikipedia for 'pre-order traversal'.

Is this a bug?

How can I influence the traversal order so that my matchers fire in
source-code order?

Thanks,
Billy.

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Match callback invocation order

Vassil Vassilev via cfe-dev
In reply to this post by Vassil Vassilev via cfe-dev

On 19/06/2020 15:38, Billy O'Mahony wrote:

> Hi Stephen,
>
> thanks for the input.
>
> Fortunately it's just C I'm interested in.
>
> It's not just the order of certain function calls that are relevant;
> the ordering of some conditional expressions and return stmts are also
> relevant. And some of them are done via macros in which case, if I
> recall, I see the source location being the location of the macro
> definition - not the location of the macro usage.


Yes, macros add complexity.


>
> It still sounds to me like MatchFinder is not doing exactly what it
> says on the tin "The order of matches is guaranteed to be equivalent
> to doing a pre-order traversal on the AST, and applying the matchers
> in the order in which they were added to the MatchFinder."


Yes, this is at least a documentation bug, if the implementation isn't
changed.

Thanks,

Stephen.


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Match callback invocation order

Vassil Vassilev via cfe-dev
Thanks to Stephen and David for your help. I've filed a bug report  https://bugs.llvm.org/show_bug.cgi?id=46423

On Sat, 20 Jun 2020 at 17:26, Stephen Kelly <[hidden email]> wrote:

On 19/06/2020 15:38, Billy O'Mahony wrote:
> Hi Stephen,
>
> thanks for the input.
>
> Fortunately it's just C I'm interested in.
>
> It's not just the order of certain function calls that are relevant;
> the ordering of some conditional expressions and return stmts are also
> relevant. And some of them are done via macros in which case, if I
> recall, I see the source location being the location of the macro
> definition - not the location of the macro usage.


Yes, macros add complexity.


>
> It still sounds to me like MatchFinder is not doing exactly what it
> says on the tin "The order of matches is guaranteed to be equivalent
> to doing a pre-order traversal on the AST, and applying the matchers
> in the order in which they were added to the MatchFinder."


Yes, this is at least a documentation bug, if the implementation isn't
changed.

Thanks,

Stephen.



_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev