Order in which matchers are invoked

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Order in which matchers are invoked

Tom Stellard via cfe-dev
Hi,

I've been writing a project specific clang-tidy check (thanks to Artem and Steve for their help). I've matchers written and working but was seeing some strange things about the order in which my matcher callbacks were being called.

I noticed that in the documentation for MatchFinder it says " The order of matches is guaranteed to be equivalent to doing a pre-order traversal on the AST, and applying the matchers in the order in which they were added to the MatchFinder."

As my checker is checking that certain functions follow certain steps in order (check the "device" arg ptr for NULL, lock the device, do NOT return while the device is locked, unlock the device) I was expecting (depending on) in AST-order of the match callbacks. Not all the device-ptr-check matches first, then all the return matches, then all the etc etc.matches.


So two questions:
* Can this call-back order be changed somehow? - worth a shot ;)

* Failing this I can maintain information about the function and line number of all the steps in all the functions and figure out the AST-order from that. But I notice that using srcLocation often returns the same value for several statements (this is because in the code the Lock is actually done in a macro which locks the device but also returns if the lock fails - so I end up with a lock and a return both reporting the same srcLocation  - is there a way to determine which comes before which. (I think I came across a way of matching on macros too but I had already written working AST matchers at that point).

Thanks for reading,

Billy.


--------------


Here's an example of one of the type of function I want to check with some examples of errors in the code order:

int api_foo(device_t *dev) {
    int ret_val = 0;

    bar();  // fn calls & decls before api_enter is ok- just don't access dev.
    dev->bla = 1; // NO! device access before api_enter() called
    api_enter(dev);   // error if this call is not present exactly once

    if (dev->bla)
        return; // NO! didn't call api_exit before rtn. Also two return points

    if (dev->ma) {
        ret_val = 1;
        goto cleanup;
    }
    tweak(dev);

cleanup:
    api_exit(dev); // error if this is not present exactly once
    dev->bla = 1; //NO! device access after api_exit()
    return ret_val;
}

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Order in which matchers are invoked

Tom Stellard via cfe-dev
RecursiveASTVisitor has an overridable which enables post-order visitation instead of pre-order. Another approach you could take is to match the functions directly, and in the callback fire another matcher - you could manually call clang::ast_matchers::match(MATCHER, Node); (several other overloads exist) which is essentially "fetch me the matched nodes and give me back a data structure on the subtree started by 'Node'." Creating custom matchers local to your code, you can use the "static const auto M = declarative(matcher(syntax()))", or the AST_MATCHER macro, in case your matching requires manually investigating the node via directly calling C++ functions on the node, or calculating some result.

I don't know how much of this could reasonably be channeled from the checker's code to Clang-Tidy and the execution, or more importantly, how much it could mess up potential other checkers running on the same invocation...

Another option you could go for is using your check() only as a "data collection" callback and build some (reasonably small!) data structure, and using "void onEndOfTranslationUnit()" at the end to read your data structure, calculate some reasonable result from it, and emit your diagnostics.

Billy O'Mahony via cfe-dev <[hidden email]> ezt írta (időpont: 2019. nov. 25., H, 11:48):
Hi,

I've been writing a project specific clang-tidy check (thanks to Artem and Steve for their help). I've matchers written and working but was seeing some strange things about the order in which my matcher callbacks were being called.

I noticed that in the documentation for MatchFinder it says " The order of matches is guaranteed to be equivalent to doing a pre-order traversal on the AST, and applying the matchers in the order in which they were added to the MatchFinder."

As my checker is checking that certain functions follow certain steps in order (check the "device" arg ptr for NULL, lock the device, do NOT return while the device is locked, unlock the device) I was expecting (depending on) in AST-order of the match callbacks. Not all the device-ptr-check matches first, then all the return matches, then all the etc etc.matches.


So two questions:
* Can this call-back order be changed somehow? - worth a shot ;)

* Failing this I can maintain information about the function and line number of all the steps in all the functions and figure out the AST-order from that. But I notice that using srcLocation often returns the same value for several statements (this is because in the code the Lock is actually done in a macro which locks the device but also returns if the lock fails - so I end up with a lock and a return both reporting the same srcLocation  - is there a way to determine which comes before which. (I think I came across a way of matching on macros too but I had already written working AST matchers at that point).

Thanks for reading,

Billy.


--------------


Here's an example of one of the type of function I want to check with some examples of errors in the code order:

int api_foo(device_t *dev) {
    int ret_val = 0;

    bar();  // fn calls & decls before api_enter is ok- just don't access dev.
    dev->bla = 1; // NO! device access before api_enter() called
    api_enter(dev);   // error if this call is not present exactly once

    if (dev->bla)
        return; // NO! didn't call api_exit before rtn. Also two return points

    if (dev->ma) {
        ret_val = 1;
        goto cleanup;
    }
    tweak(dev);

cleanup:
    api_exit(dev); // error if this is not present exactly once
    dev->bla = 1; //NO! device access after api_exit()
    return ret_val;
}
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Order in which matchers are invoked

Tom Stellard via cfe-dev
Hi Whisperity,

thanks for the tips. Some follow-ups below..

On Mon, 25 Nov 2019 at 13:36, Whisperity <[hidden email]> wrote:
RecursiveASTVisitor has an overridable which enables post-order visitation instead of pre-order.

From the context of my clang-tidy check's registerMatchers(MatchFinder *Finder) is this possible?
 
Another approach you could take is to match the functions directly, and in the callback fire another matcher - you could manually call clang::ast_matchers::match(MATCHER, Node); (several other overloads exist) which is essentially "fetch me the matched nodes and give me back a data structure on the subtree started by 'Node'." Creating custom matchers local to your code, you can use the "static const auto M = declarative(matcher(syntax()))", or the AST_MATCHER macro, in case your matching requires manually investigating the node via directly calling C++ functions on the node, or calculating some result.

So in my existing clang-tidy ::check(const MatchFinder::MatchResult &Result) I can get the function's Node from Result.Context? Can I register several matchers here and ensure post-order visitation? Without those two features the matcher-in-callback strategy would have the same out-of-order issue.

On the other hand maybe I should forget about clang-tidy? (and do a stand-alone tool with RecursiveASTVisitor like you suggest).  As far as I know it has drawbacks in that  in order to distribute the project-specific test i have to distribute a full and entire clang-tidy - I can't just distribute an .so and plug that into an existing clang-tidy install at run-time.


I don't know how much of this could reasonably be channeled from the checker's code to Clang-Tidy and the execution, or more importantly, how much it could mess up potential other checkers running on the same invocation...

Another option you could go for is using your check() only as a "data collection" callback and build some (reasonably small!) data structure, and using "void onEndOfTranslationUnit()" at the end to read your data structure, calculate some reasonable result from it, and emit your diagnostics.

That's my current plan for a work around. Good to learn about EndOfTu though.


Billy O'Mahony via cfe-dev <[hidden email]> ezt írta (időpont: 2019. nov. 25., H, 11:48):
Hi,

I've been writing a project specific clang-tidy check (thanks to Artem and Steve for their help). I've matchers written and working but was seeing some strange things about the order in which my matcher callbacks were being called.

I noticed that in the documentation for MatchFinder it says " The order of matches is guaranteed to be equivalent to doing a pre-order traversal on the AST, and applying the matchers in the order in which they were added to the MatchFinder."

As my checker is checking that certain functions follow certain steps in order (check the "device" arg ptr for NULL, lock the device, do NOT return while the device is locked, unlock the device) I was expecting (depending on) in AST-order of the match callbacks. Not all the device-ptr-check matches first, then all the return matches, then all the etc etc.matches.


So two questions:
* Can this call-back order be changed somehow? - worth a shot ;)

* Failing this I can maintain information about the function and line number of all the steps in all the functions and figure out the AST-order from that. But I notice that using srcLocation often returns the same value for several statements (this is because in the code the Lock is actually done in a macro which locks the device but also returns if the lock fails - so I end up with a lock and a return both reporting the same srcLocation  - is there a way to determine which comes before which. (I think I came across a way of matching on macros too but I had already written working AST matchers at that point).

Thanks for reading,

Billy.


--------------


Here's an example of one of the type of function I want to check with some examples of errors in the code order:

int api_foo(device_t *dev) {
    int ret_val = 0;

    bar();  // fn calls & decls before api_enter is ok- just don't access dev.
    dev->bla = 1; // NO! device access before api_enter() called
    api_enter(dev);   // error if this call is not present exactly once

    if (dev->bla)
        return; // NO! didn't call api_exit before rtn. Also two return points

    if (dev->ma) {
        ret_val = 1;
        goto cleanup;
    }
    tweak(dev);

cleanup:
    api_exit(dev); // error if this is not present exactly once
    dev->bla = 1; //NO! device access after api_exit()
    return ret_val;
}
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev