I'd like to write a rule for either clang-tidy or static analyzer to help catch some potential errors in a project I'm working on.
My questions are:
a) is only one or the other will be able to do what I want to do?
b) if both are feasible which would have the simpler implementation?
The project involves writing an API that will run in a multi-threaded application and is responsible for serializing all access to a device structure. Therefore the first thing in every function in the API must be to call api_enter (which will among other things acquire a mutex on the device) and the last thing before returning must be to call api_exit. Also I want to enforce single exit point from every API function - or certainly if there are any return points that bypass the api_exit call.
So here is an example function with errors I want to catch highlighted.
int api_foo(device_t *dev) {
int ret_val = 0;
bar();
// fn calls & decls before api_enter is ok- just don't access dev.
dev->bla = 1; // NO! device access before api_enter() called
api_enter(dev); // error if this call is not present exactly once
if (dev->bla)
return; // NO! didn't call api_exit before rtn. Also two return points
if (dev->ma) {
ret_val = 1;
goto cleanup;
}
tweak(dev);
cleanup:
api_exit(dev); // error if this is not present exactly once
dev->bla = 1; //NO! device access after api_exit()
return ret_val;
}
I don't think it matters but the project is C compiled with gcc.
Also if both are feasible any other pointers, tips or good resources would be appreciated. E.g is there a totally different methodology I'm not considering - e.g. would using something like pycparser be a lot easier - though I'd prefer to keep it in clang as we plan to use tidy & static analyzer in any case for standard QA.
It depends on how strict do you want the checking be and on the
details of the rule. If you're designing a new API from scratch and
stuck with gcc forever, i wouldn't mind using the gcc
__attribute__((cleanup())) for your purpose.
The rule you described should be reasonably easy to implement with
the Static Analyzer. The good side of it is that you get a lot of
semantic modeling for free. For instance, if the developer copies
`dev` into a local variable and then uses that local variable
outside of api_enter..api_exit, the tool will be able to handle
transparently, as it deals with values rather than with variables.
Also it will probably be the easiest tool for your problem. The
downside would be that it's not guaranteed to find all bugs; it'll
inevitably give up on complicated code with high cyclomatic
complexity :) So if you want strict/paranoid enforcement of rules,
the Static Analyzer is not the right tool. But if you want to simply
find some bugs for free, it's the right tool.
It sounds as if your problem is not inter-procedural. Let me
double-check this: would you have another api_enter..api_exit pair
in the body of your tweak() function? Or is just one
api_enter..api_exit enough? Or is it a bug to call api_enter twice
without an api_exit in between?
If you have to write api_enter..api_exit in *every* function that
deals with devices, then the problem is not inter-procedural, which
makes it much easier. In particular, you should be able to come up
with a purely syntactic analysis ("every function that accesses a
device_t must start with api_enter() and must end in exactly one
spot with api_exit()"). Such analysis should be easily do-able in
clang-tidy as long as you're satisfied with this level of
aggressiveness. In particular, you'll have to be willing to
sacrifice code like this:
But it may be perfectly fine if you seriously want to enforce a
strict structure on all your functions that deal with devices.
I think the truly-truly right tool for your problem would be to come
up with a custom analysis over Clang CFG. It would be harder to
implement, but it would allow you to express things like "every
execution path within a function that accesses `dev` must have a
api_enter before it and an api_exit after it; you are not allowed to
copy `dev` around". This would strictly enforce the rule. At the
same time it'll allow you to lift the requirement of exactly one
return point - you would still be able to ensure that all accesses
are covered. If you need to allow to copy `dev` around, it should
still be doable, but it will be significantly more difficult to
implement.
On 7/2/19 12:13 PM, Billy O'Mahony via
cfe-dev wrote:
Hello,
I'd like to write a rule for either clang-tidy or static
analyzer to help catch some potential errors in a project I'm
working on.
My questions are:
a) is only one or the other will be able to do what I
want to do?
b) if both are feasible which would have the simpler
implementation?
The project involves writing an API that will run in a
multi-threaded application and is responsible for serializing
all access to a device structure. Therefore the first thing in
every function in the API must be to call api_enter (which
will among other things acquire a mutex on the device) and the
last thing before returning must be to call api_exit. Also I
want to enforce single exit point from every API function - or
certainly if there are any return points that bypass the
api_exit call.
So here is an example function with errors I want to catch
highlighted.
int api_foo(device_t *dev) {
int ret_val = 0;
bar(); // fn calls & decls before api_enter is ok-
just don't access dev.
dev->bla = 1; // NO! device access before
api_enter() called
api_enter(dev); // error if this call is not present
exactly once
if (dev->bla)
return; // NO! didn't call api_exit before rtn.
Also two return points
if (dev->ma) {
ret_val = 1;
goto cleanup;
}
tweak(dev);
cleanup:
api_exit(dev); // error if this is not present exactly
once
dev->bla = 1; //NO! device access after api_exit()
return ret_val;
}
I don't think it matters but the project is C compiled with
gcc.
Also if both are feasible any other pointers, tips or good
resources would be appreciated. E.g is there a totally
different methodology I'm not considering - e.g. would using
something like pycparser be a lot easier - though I'd prefer
to keep it in clang as we plan to use tidy & static
analyzer in any case for standard QA.
Instead of writing directly to the previously mutexed device,
get the next number from the atomic increment and write to a
memory file with that number as a name. Each process will then
write a separate file with a different incremented name. Since
filenames are handled as strings we may want to, say, add a
large number to the increment in order to have time sequenced
files by name with all names having the same number of digits.
This is not the only way to get a file ordering but is an
example. Alternatively, on Linux we might use ls -1t *.
At this point there is no waiting for the multiple threads.
Now we need a thread to read the files in order to feed the
device. If we want to cut out the slack time between files for
finding, opening, closing, deleting, we can ping-pong between
two coordinated processes where the second has the next file
ready to go when the first process yields the device.
There are obvious issues with the above, for example, one being
that the throughput of the device is slower than the combined
throughput of the threads, another being wraparound on file
numbering.
The first issue is the throughput ratio between the device and
the number of threads. The ratio should be evenly balanced
around 1.0. This suggests that the number of threads can be
moved up and down to obtain the best balance, with perhaps some
sufficient volume of data pending to avoid any device slack
time.
Also with a little effort we can read, for sending to the
device, a file that is being written by a process in order to
avoid an initial device utilization delay, if that becomes
useful. But that should only be an issue for the first one or
two files.
Neil Nelson
On 7/2/19 1:13 PM,
Billy O'Mahony via cfe-dev wrote:
Hello,
I'd like to write a rule for either
clang-tidy or static analyzer to help catch some potential
errors in a project I'm working on.
My questions are:
a) is only one or the other will be
able to do what I want to do?
b) if both are feasible which would
have the simpler implementation?
The project involves writing an API that
will run in a multi-threaded application and is responsible
for serializing all access to a device structure. Therefore
the first thing in every function in the API must be to call
api_enter (which will among other things acquire a mutex on
the device) and the last thing before returning must be to
call api_exit. Also I want to enforce single exit point from
every API function - or certainly if there are any return
points that bypass the api_exit call.
So here is an example function with errors
I want to catch highlighted.
int api_foo(device_t *dev) {
int ret_val = 0;
bar(); // fn calls & decls before
api_enter is ok- just don't access dev.
dev->bla = 1; // NO! device access
before api_enter() called
api_enter(dev); // error if this call
is not present exactly once
if (dev->bla)
return; // NO! didn't call api_exit
before rtn. Also two return points
if (dev->ma) {
ret_val = 1;
goto cleanup;
}
tweak(dev);
cleanup:
api_exit(dev); // error if this is not
present exactly once
dev->bla = 1; //NO! device access
after api_exit()
return ret_val;
}
I don't think it matters but the project is
C compiled with gcc.
Also if both are feasible any other
pointers, tips or good resources would be appreciated. E.g
is there a totally different methodology I'm not considering
- e.g. would using something like pycparser be a lot easier
- though I'd prefer to keep it in clang as we plan to use
tidy & static analyzer in any case for standard QA.
In reply to this post by shirley breuer via cfe-dev
Hi Artem,
thanks for your well thought-out and useful reply. It sounds like clang-tidy will be the sweet spot between usefulness and effort to implement.
I have a few other responses down below.
Regards,
Billy.
On Wed, 3 Jul 2019 at 23:23, Artem Dergachev <[hidden email]> wrote:
Hi,
It depends on how strict do you want the checking be and on the
details of the rule. If you're designing a new API from scratch and
stuck with gcc forever, i wouldn't mind using the gcc
__attribute__((cleanup())) for your purpose.
I didn't know about that gcc attrib. I need to read the gcc manual attrib section! I should've added that while we will be developing on gcc the code should be compilable on other toolchains/OSs also, so we are avoiding any gcc extensions (e.g. gcc has extensions for thread local storage but we are not using those for the same reason).
The rule you described should be reasonably easy to implement with
the Static Analyzer. The good side of it is that you get a lot of
semantic modeling for free. For instance, if the developer copies
`dev` into a local variable and then uses that local variable
outside of api_enter..api_exit, the tool will be able to handle
transparently, as it deals with values rather than with variables.
That is really cool.
Also it will probably be the easiest tool for your problem. The
downside would be that it's not guaranteed to find all bugs; it'll
inevitably give up on complicated code with high cyclomatic
complexity :) So if you want strict/paranoid enforcement of rules,
the Static Analyzer is not the right tool. But if you want to simply
find some bugs for free, it's the right tool.
It sounds as if your problem is not inter-procedural. Let me
double-check this: would you have another api_enter..api_exit pair
in the body of your tweak() function? Or is just one
api_enter..api_exit enough? Or is it a bug to call api_enter twice
without an api_exit in between?
Yes in this case tweak() could be another public fn of the api that would also have an enter/exit pair. But we are using recursive mutexes (a thread can acquire the same mutex N times (ie call api_enter) so long as it also releases N times) so that will be okay.
If you have to write api_enter..api_exit in *every* function that
deals with devices, then the problem is not inter-procedural, which
makes it much easier. In particular, you should be able to come up
with a purely syntactic analysis ("every function that accesses a
device_t must start with api_enter() and must end in exactly one
spot with api_exit()").
We will only insist on enter/exit in public API functions. Which are the only ones a client application can call. Internal private functions we won't have locking (as they can only be called ultimately from a public function so the device will be locked.) We are going to allow private fns to call back into the api via the public i/face. But we will have the public functions in specific files and have a specific naming prefix.
Such analysis should be easily do-able in
clang-tidy as long as you're satisfied with this level of
aggressiveness. In particular, you'll have to be willing to
sacrifice code like this:
But it may be perfectly fine if you seriously want to enforce a
strict structure on all your functions that deal with devices.
So is it the case that clang-tidy kind of passes info to the checker-extension in a syntactic code-parsing order. Whereas the static analyzer passes information to the checker in a simulated run-time order?
E.g in your foo() above my proposed checker gets fed 1) theres a function called foo, 2) theres an if with a call to flip_a_coin 3) in the true case there is a call to enter then exit 4) in the else there is nothing 5) there is a return (at which point my checker would need to be pretty smart and hold a lot of state to figure out something was wrong) . And to compare for the static analyzer it's more like 1) there is fn foo 2.1) there is a code path through foo with enter/exit 2.2) there is a code path with just return (at which point my reasonably simple checker would raise an error).
I think the truly-truly right tool for your problem would be to come
up with a custom analysis over Clang CFG.
.
By CFG you mean the Clang Static Analyzer?
It would be harder to
implement, but it would allow you to express things like "every
execution path within a function that accesses `dev` must have a
api_enter before it and an api_exit after it; you are not allowed to
copy `dev` around". This would strictly enforce the rule.
Yes that would be great. But I think just using clang-tidy from what you are saying would get us a long way. And there are heaps of simpler checks we would like to implement also.
At the
same time it'll allow you to lift the requirement of exactly one
return point - you would still be able to ensure that all accesses
are covered. If you need to allow to copy `dev` around, it should
still be doable, but it will be significantly more difficult to
implement
Does 'copy around' include passing to my private fns such as tweak()?. We don't need to copy dev anywhere within the public fns but we do need it to pass it to private fns.
On 7/2/19 12:13 PM, Billy O'Mahony via
cfe-dev wrote:
Hello,
I'd like to write a rule for either clang-tidy or static
analyzer to help catch some potential errors in a project I'm
working on.
My questions are:
a) is only one or the other will be able to do what I
want to do?
b) if both are feasible which would have the simpler
implementation?
The project involves writing an API that will run in a
multi-threaded application and is responsible for serializing
all access to a device structure. Therefore the first thing in
every function in the API must be to call api_enter (which
will among other things acquire a mutex on the device) and the
last thing before returning must be to call api_exit. Also I
want to enforce single exit point from every API function - or
certainly if there are any return points that bypass the
api_exit call.
So here is an example function with errors I want to catch
highlighted.
int api_foo(device_t *dev) {
int ret_val = 0;
bar(); // fn calls & decls before api_enter is ok-
just don't access dev.
dev->bla = 1; // NO! device access before
api_enter() called
api_enter(dev); // error if this call is not present
exactly once
if (dev->bla)
return; // NO! didn't call api_exit before rtn.
Also two return points
if (dev->ma) {
ret_val = 1;
goto cleanup;
}
tweak(dev);
cleanup:
api_exit(dev); // error if this is not present exactly
once
dev->bla = 1; //NO! device access after api_exit()
return ret_val;
}
I don't think it matters but the project is C compiled with
gcc.
Also if both are feasible any other pointers, tips or good
resources would be appreciated. E.g is there a totally
different methodology I'm not considering - e.g. would using
something like pycparser be a lot easier - though I'd prefer
to keep it in clang as we plan to use tidy & static
analyzer in any case for standard QA.
> It sounds like clang-tidy will be the sweet spot between
usefulness and effort to implement.
Yup, i approve. You can experiment with clang-query real quick and
it should give you an idea. If you find out that some of the useful
ASTMatchers are missing, feel free to add them - either locally or
upstreamed, as it's super easy.
It's a "source-level" control flow graph that consists of pointers
to AST statements, ordered in the control flow order (i.e., in order
of execution). A directed path in the CFG from statement A to
statement B would tell you that "B can potentially be executed after
A". This allows you to use your favorite graph algorithms to reason
about the program's behavior.
The Static Analyzer works over Clang CFG, but it's much more than
that: the Static Analyzer also understands the semantics of every
statement and can model their effects over the "symbolic" state of
the program. For example, in the following code:
void foo(bool x) {
if (x) {
}
if (x) {
}
}
the CFG would be a double-diamond: the control branches off at the
first statement, then joins back at the end, then branches off
again, then joins back. But the Static Analyzer would understand
that the if-conditions are the same, therefore these are simply two
unrelated execution paths, and the execution path in which the first
branch goes to true and the second branch goes to false is in fact
impossible. And if 'x' is overwritten in between, the Static
Analyzer would also know that.
Apart from the Static Analyzer, the CFG is used for some clang
warnings, such as -Wuninitialized (see AnalysisBasedWarnings.cpp).
There are some ready-made analyses for common data flow problems
implemented over Clang CFG (such as live variables analysis,
dominators analysis, etc.).
Clang CFG is not used during the actual compilation / code
generation. When Chris Lattner talks about MLIR and mentions that
Clang should have had a "CIL", he means that the Clang CFG (or
something similar) should have been used for compilation *as well
as* for analysis. This would, in particular, allow semantic
optimizations that require information that's already lost in LLVM
IR.
Because the CFG is not used for compilation, it's not necessarily
correct. There are a few known bugs in it, mostly around complicated
C++ and also around GNU extensions such as the *binary* operator ?:
or statement-expressions. But for plain C it should be nearly
perfect.
> Does 'copy around' include passing to my private fns such as
tweak()?
That's entirely up to you. What i was trying to say is that it if
you allow copying 'dev' into another variable, it will become much
harder to implement your analysis, so you'd much rather forbid the
user to do this. You might also allow the user to do this and suffer
from imprecision in your analysis. At the same time, because your
analysis is not inter-procedural, passing 'dev' into a function
should be fine. The function can still return it back "laundered" so
the user would be able to assign it into a variable behind your
back. But these are the usual trade-offs of static analysis, don't
be too scared by them - after all it all boils down to the halting
problem :)
On 7/4/19 3:40 AM, Billy O'Mahony
wrote:
Hi Artem,
thanks for your well thought-out and useful reply. It
sounds like clang-tidy will be the sweet spot between
usefulness and effort to implement.
I have a few other responses down below.
Regards,
Billy.
On Wed, 3 Jul 2019 at 23:23,
Artem Dergachev <[hidden email]> wrote:
Hi,
It depends on how strict do you want the checking be and
on the details of the rule. If you're designing a new API
from scratch and stuck with gcc forever, i wouldn't mind
using the gcc __attribute__((cleanup())) for your purpose.
I didn't know about that gcc attrib. I need to read the
gcc manual attrib section! I should've added that while we
will be developing on gcc the code should be compilable on
other toolchains/OSs also, so we are avoiding any gcc
extensions (e.g. gcc has extensions for thread local storage
but we are not using those for the same reason).
The rule you described should be
reasonably easy to implement with the Static Analyzer. The
good side of it is that you get a lot of semantic modeling
for free. For instance, if the developer copies `dev` into
a local variable and then uses that local variable outside
of api_enter..api_exit, the tool will be able to handle
transparently, as it deals with values rather than with
variables.
That is really cool.
Also it will probably be the easiest
tool for your problem. The downside would be that it's not
guaranteed to find all bugs; it'll inevitably give up on
complicated code with high cyclomatic complexity :) So if
you want strict/paranoid enforcement of rules, the Static
Analyzer is not the right tool. But if you want to simply
find some bugs for free, it's the right tool.
It sounds as if your problem is not inter-procedural. Let
me double-check this: would you have another
api_enter..api_exit pair in the body of your tweak()
function? Or is just one api_enter..api_exit enough? Or is
it a bug to call api_enter twice without an api_exit in
between?
Yes in this case tweak() could be another public fn of
the api that would also have an enter/exit pair. But we are
using recursive mutexes (a thread can acquire the same mutex
N times (ie call api_enter) so long as it also releases N
times) so that will be okay.
If you have to write api_enter..api_exit in *every*
function that deals with devices, then the problem is not
inter-procedural, which makes it much easier. In
particular, you should be able to come up with a purely
syntactic analysis ("every function that accesses a
device_t must start with api_enter() and must end in
exactly one spot with api_exit()").
We will only insist on enter/exit in public API
functions. Which are the only ones a client application can
call. Internal private functions we won't have locking (as
they can only be called ultimately from a public function so
the device will be locked.) We are going to allow private
fns to call back into the api via the public i/face. But we
will have the public functions in specific files and have a
specific naming prefix.
Such analysis should be easily do-able in clang-tidy
as long as you're satisfied with this level of
aggressiveness. In particular, you'll have to be willing
to sacrifice code like this:
But it may be perfectly fine if you seriously want to
enforce a strict structure on all your functions that deal
with devices.
So is it the case that clang-tidy kind of passes info to
the checker-extension in a syntactic code-parsing order.
Whereas the static analyzer passes information to the
checker in a simulated run-time order?
E.g in your foo() above my proposed checker gets fed 1)
theres a function called foo, 2) theres an if with a call to
flip_a_coin 3) in the true case there is a call to enter
then exit 4) in the else there is nothing 5) there is a
return (at which point my checker would need to be pretty
smart and hold a lot of state to figure out something was
wrong) . And to compare for the static analyzer it's more
like 1) there is fn foo 2.1) there is a code path through
foo with enter/exit 2.2) there is a code path with just
return (at which point my reasonably simple checker would
raise an error).
I think the truly-truly right tool for your problem would
be to come up with a custom analysis over Clang CFG.
.
By CFG you mean the Clang Static Analyzer?
It would be harder to implement, but it would allow
you to express things like "every execution path within a
function that accesses `dev` must have a api_enter before
it and an api_exit after it; you are not allowed to copy
`dev` around". This would strictly enforce the rule.
Yes that would be great. But I think just using
clang-tidy from what you are saying would get us a long way.
And there are heaps of simpler checks we would like to
implement also.
At the same time it'll allow you to lift the
requirement of exactly one return point - you would still
be able to ensure that all accesses are covered. If you
need to allow to copy `dev` around, it should still be
doable, but it will be significantly more difficult to
implement
Does 'copy around' include passing to my private fns
such as tweak()?. We don't need to copy dev anywhere within
the public fns but we do need it to pass it to private fns.
On
7/2/19 12:13 PM, Billy O'Mahony via cfe-dev wrote:
Hello,
I'd like to write a rule for either clang-tidy or
static analyzer to help catch some potential errors
in a project I'm working on.
My questions are:
a) is only one or the other will be able to
do what I want to do?
b) if both are feasible which would have the
simpler implementation?
The project involves writing an API that will run
in a multi-threaded application and is responsible
for serializing all access to a device structure.
Therefore the first thing in every function in the
API must be to call api_enter (which will among
other things acquire a mutex on the device) and the
last thing before returning must be to call
api_exit. Also I want to enforce single exit point
from every API function - or certainly if there are
any return points that bypass the api_exit call.
So here is an example function with errors I want
to catch highlighted.
int api_foo(device_t *dev) {
int ret_val = 0;
bar(); // fn calls & decls before
api_enter is ok- just don't access dev.
dev->bla = 1; // NO! device access before
api_enter() called
api_enter(dev); // error if this call is
not present exactly once
if (dev->bla)
return; // NO! didn't call api_exit
before rtn. Also two return points
if (dev->ma) {
ret_val = 1;
goto cleanup;
}
tweak(dev);
cleanup:
api_exit(dev); // error if this is not
present exactly once
dev->bla = 1; //NO! device access after
api_exit()
return ret_val;
}
I don't think it matters but the project is C
compiled with gcc.
Also if both are feasible any other pointers,
tips or good resources would be appreciated. E.g is
there a totally different methodology I'm not
considering - e.g. would using something like
pycparser be a lot easier - though I'd prefer to
keep it in clang as we plan to use tidy & static
analyzer in any case for standard QA.
super. Good to know I'm on the right path. And thanks for the extra info - plenty of clang goodness to be looking into there.
Cheers,
Billy.
On Sat, 6 Jul 2019 at 04:05, Artem Dergachev <[hidden email]> wrote:
> It sounds like clang-tidy will be the sweet spot between
usefulness and effort to implement.
Yup, i approve. You can experiment with clang-query real quick and
it should give you an idea. If you find out that some of the useful
ASTMatchers are missing, feel free to add them - either locally or
upstreamed, as it's super easy.
It's a "source-level" control flow graph that consists of pointers
to AST statements, ordered in the control flow order (i.e., in order
of execution). A directed path in the CFG from statement A to
statement B would tell you that "B can potentially be executed after
A". This allows you to use your favorite graph algorithms to reason
about the program's behavior.
The Static Analyzer works over Clang CFG, but it's much more than
that: the Static Analyzer also understands the semantics of every
statement and can model their effects over the "symbolic" state of
the program. For example, in the following code:
void foo(bool x) {
if (x) {
}
if (x) {
}
}
the CFG would be a double-diamond: the control branches off at the
first statement, then joins back at the end, then branches off
again, then joins back. But the Static Analyzer would understand
that the if-conditions are the same, therefore these are simply two
unrelated execution paths, and the execution path in which the first
branch goes to true and the second branch goes to false is in fact
impossible. And if 'x' is overwritten in between, the Static
Analyzer would also know that.
Apart from the Static Analyzer, the CFG is used for some clang
warnings, such as -Wuninitialized (see AnalysisBasedWarnings.cpp).
There are some ready-made analyses for common data flow problems
implemented over Clang CFG (such as live variables analysis,
dominators analysis, etc.).
Clang CFG is not used during the actual compilation / code
generation. When Chris Lattner talks about MLIR and mentions that
Clang should have had a "CIL", he means that the Clang CFG (or
something similar) should have been used for compilation *as well
as* for analysis. This would, in particular, allow semantic
optimizations that require information that's already lost in LLVM
IR.
Because the CFG is not used for compilation, it's not necessarily
correct. There are a few known bugs in it, mostly around complicated
C++ and also around GNU extensions such as the *binary* operator ?:
or statement-expressions. But for plain C it should be nearly
perfect.
> Does 'copy around' include passing to my private fns such as
tweak()?
That's entirely up to you. What i was trying to say is that it if
you allow copying 'dev' into another variable, it will become much
harder to implement your analysis, so you'd much rather forbid the
user to do this. You might also allow the user to do this and suffer
from imprecision in your analysis. At the same time, because your
analysis is not inter-procedural, passing 'dev' into a function
should be fine. The function can still return it back "laundered" so
the user would be able to assign it into a variable behind your
back. But these are the usual trade-offs of static analysis, don't
be too scared by them - after all it all boils down to the halting
problem :)
On 7/4/19 3:40 AM, Billy O'Mahony
wrote:
Hi Artem,
thanks for your well thought-out and useful reply. It
sounds like clang-tidy will be the sweet spot between
usefulness and effort to implement.
I have a few other responses down below.
Regards,
Billy.
On Wed, 3 Jul 2019 at 23:23,
Artem Dergachev <[hidden email]> wrote:
Hi,
It depends on how strict do you want the checking be and
on the details of the rule. If you're designing a new API
from scratch and stuck with gcc forever, i wouldn't mind
using the gcc __attribute__((cleanup())) for your purpose.
I didn't know about that gcc attrib. I need to read the
gcc manual attrib section! I should've added that while we
will be developing on gcc the code should be compilable on
other toolchains/OSs also, so we are avoiding any gcc
extensions (e.g. gcc has extensions for thread local storage
but we are not using those for the same reason).
The rule you described should be
reasonably easy to implement with the Static Analyzer. The
good side of it is that you get a lot of semantic modeling
for free. For instance, if the developer copies `dev` into
a local variable and then uses that local variable outside
of api_enter..api_exit, the tool will be able to handle
transparently, as it deals with values rather than with
variables.
That is really cool.
Also it will probably be the easiest
tool for your problem. The downside would be that it's not
guaranteed to find all bugs; it'll inevitably give up on
complicated code with high cyclomatic complexity :) So if
you want strict/paranoid enforcement of rules, the Static
Analyzer is not the right tool. But if you want to simply
find some bugs for free, it's the right tool.
It sounds as if your problem is not inter-procedural. Let
me double-check this: would you have another
api_enter..api_exit pair in the body of your tweak()
function? Or is just one api_enter..api_exit enough? Or is
it a bug to call api_enter twice without an api_exit in
between?
Yes in this case tweak() could be another public fn of
the api that would also have an enter/exit pair. But we are
using recursive mutexes (a thread can acquire the same mutex
N times (ie call api_enter) so long as it also releases N
times) so that will be okay.
If you have to write api_enter..api_exit in *every*
function that deals with devices, then the problem is not
inter-procedural, which makes it much easier. In
particular, you should be able to come up with a purely
syntactic analysis ("every function that accesses a
device_t must start with api_enter() and must end in
exactly one spot with api_exit()").
We will only insist on enter/exit in public API
functions. Which are the only ones a client application can
call. Internal private functions we won't have locking (as
they can only be called ultimately from a public function so
the device will be locked.) We are going to allow private
fns to call back into the api via the public i/face. But we
will have the public functions in specific files and have a
specific naming prefix.
Such analysis should be easily do-able in clang-tidy
as long as you're satisfied with this level of
aggressiveness. In particular, you'll have to be willing
to sacrifice code like this:
But it may be perfectly fine if you seriously want to
enforce a strict structure on all your functions that deal
with devices.
So is it the case that clang-tidy kind of passes info to
the checker-extension in a syntactic code-parsing order.
Whereas the static analyzer passes information to the
checker in a simulated run-time order?
E.g in your foo() above my proposed checker gets fed 1)
theres a function called foo, 2) theres an if with a call to
flip_a_coin 3) in the true case there is a call to enter
then exit 4) in the else there is nothing 5) there is a
return (at which point my checker would need to be pretty
smart and hold a lot of state to figure out something was
wrong) . And to compare for the static analyzer it's more
like 1) there is fn foo 2.1) there is a code path through
foo with enter/exit 2.2) there is a code path with just
return (at which point my reasonably simple checker would
raise an error).
I think the truly-truly right tool for your problem would
be to come up with a custom analysis over Clang CFG.
.
By CFG you mean the Clang Static Analyzer?
It would be harder to implement, but it would allow
you to express things like "every execution path within a
function that accesses `dev` must have a api_enter before
it and an api_exit after it; you are not allowed to copy
`dev` around". This would strictly enforce the rule.
Yes that would be great. But I think just using
clang-tidy from what you are saying would get us a long way.
And there are heaps of simpler checks we would like to
implement also.
At the same time it'll allow you to lift the
requirement of exactly one return point - you would still
be able to ensure that all accesses are covered. If you
need to allow to copy `dev` around, it should still be
doable, but it will be significantly more difficult to
implement
Does 'copy around' include passing to my private fns
such as tweak()?. We don't need to copy dev anywhere within
the public fns but we do need it to pass it to private fns.
On
7/2/19 12:13 PM, Billy O'Mahony via cfe-dev wrote:
Hello,
I'd like to write a rule for either clang-tidy or
static analyzer to help catch some potential errors
in a project I'm working on.
My questions are:
a) is only one or the other will be able to
do what I want to do?
b) if both are feasible which would have the
simpler implementation?
The project involves writing an API that will run
in a multi-threaded application and is responsible
for serializing all access to a device structure.
Therefore the first thing in every function in the
API must be to call api_enter (which will among
other things acquire a mutex on the device) and the
last thing before returning must be to call
api_exit. Also I want to enforce single exit point
from every API function - or certainly if there are
any return points that bypass the api_exit call.
So here is an example function with errors I want
to catch highlighted.
int api_foo(device_t *dev) {
int ret_val = 0;
bar(); // fn calls & decls before
api_enter is ok- just don't access dev.
dev->bla = 1; // NO! device access before
api_enter() called
api_enter(dev); // error if this call is
not present exactly once
if (dev->bla)
return; // NO! didn't call api_exit
before rtn. Also two return points
if (dev->ma) {
ret_val = 1;
goto cleanup;
}
tweak(dev);
cleanup:
api_exit(dev); // error if this is not
present exactly once
dev->bla = 1; //NO! device access after
api_exit()
return ret_val;
}
I don't think it matters but the project is C
compiled with gcc.
Also if both are feasible any other pointers,
tips or good resources would be appreciated. E.g is
there a totally different methodology I'm not
considering - e.g. would using something like
pycparser be a lot easier - though I'd prefer to
keep it in clang as we plan to use tidy & static
analyzer in any case for standard QA.
I managed to create a dummy clang-tidy check and run it.
For the tests I want to write I've decided the easiest first step is to just check for more than one return statement in my functions (this will only be applying to specific functions all with a common prefix e.g "api_foo".
So I managed to write a matcher which seems to work ok.
Finder->addMatcher(returnStmt().bind("x"), this);
And get it to print a warning for each rtn statement in MyCheck::check:
const auto *MatchedDecl = Result.Nodes.getNodeAs<ReturnStmt>("x"); diag(MatchedDecl->getReturnLoc(), "found rtn stmt");
So next I want to be able to find the name of the function that contains the returnStatement but I can't figure out how to do this.
Or maybe I should start with a FunctionDecl matcher that matches fns named (api_...) and then somehow traverse that function's AST and count the returnStmt's. ?
On Sat, 6 Jul 2019 at 13:41, Billy O'Mahony <[hidden email]> wrote:
Hi Artem,
super. Good to know I'm on the right path. And thanks for the extra info - plenty of clang goodness to be looking into there.
Cheers,
Billy.
On Sat, 6 Jul 2019 at 04:05, Artem Dergachev <[hidden email]> wrote:
> It sounds like clang-tidy will be the sweet spot between
usefulness and effort to implement.
Yup, i approve. You can experiment with clang-query real quick and
it should give you an idea. If you find out that some of the useful
ASTMatchers are missing, feel free to add them - either locally or
upstreamed, as it's super easy.
It's a "source-level" control flow graph that consists of pointers
to AST statements, ordered in the control flow order (i.e., in order
of execution). A directed path in the CFG from statement A to
statement B would tell you that "B can potentially be executed after
A". This allows you to use your favorite graph algorithms to reason
about the program's behavior.
The Static Analyzer works over Clang CFG, but it's much more than
that: the Static Analyzer also understands the semantics of every
statement and can model their effects over the "symbolic" state of
the program. For example, in the following code:
void foo(bool x) {
if (x) {
}
if (x) {
}
}
the CFG would be a double-diamond: the control branches off at the
first statement, then joins back at the end, then branches off
again, then joins back. But the Static Analyzer would understand
that the if-conditions are the same, therefore these are simply two
unrelated execution paths, and the execution path in which the first
branch goes to true and the second branch goes to false is in fact
impossible. And if 'x' is overwritten in between, the Static
Analyzer would also know that.
Apart from the Static Analyzer, the CFG is used for some clang
warnings, such as -Wuninitialized (see AnalysisBasedWarnings.cpp).
There are some ready-made analyses for common data flow problems
implemented over Clang CFG (such as live variables analysis,
dominators analysis, etc.).
Clang CFG is not used during the actual compilation / code
generation. When Chris Lattner talks about MLIR and mentions that
Clang should have had a "CIL", he means that the Clang CFG (or
something similar) should have been used for compilation *as well
as* for analysis. This would, in particular, allow semantic
optimizations that require information that's already lost in LLVM
IR.
Because the CFG is not used for compilation, it's not necessarily
correct. There are a few known bugs in it, mostly around complicated
C++ and also around GNU extensions such as the *binary* operator ?:
or statement-expressions. But for plain C it should be nearly
perfect.
> Does 'copy around' include passing to my private fns such as
tweak()?
That's entirely up to you. What i was trying to say is that it if
you allow copying 'dev' into another variable, it will become much
harder to implement your analysis, so you'd much rather forbid the
user to do this. You might also allow the user to do this and suffer
from imprecision in your analysis. At the same time, because your
analysis is not inter-procedural, passing 'dev' into a function
should be fine. The function can still return it back "laundered" so
the user would be able to assign it into a variable behind your
back. But these are the usual trade-offs of static analysis, don't
be too scared by them - after all it all boils down to the halting
problem :)
On 7/4/19 3:40 AM, Billy O'Mahony
wrote:
Hi Artem,
thanks for your well thought-out and useful reply. It
sounds like clang-tidy will be the sweet spot between
usefulness and effort to implement.
I have a few other responses down below.
Regards,
Billy.
On Wed, 3 Jul 2019 at 23:23,
Artem Dergachev <[hidden email]> wrote:
Hi,
It depends on how strict do you want the checking be and
on the details of the rule. If you're designing a new API
from scratch and stuck with gcc forever, i wouldn't mind
using the gcc __attribute__((cleanup())) for your purpose.
I didn't know about that gcc attrib. I need to read the
gcc manual attrib section! I should've added that while we
will be developing on gcc the code should be compilable on
other toolchains/OSs also, so we are avoiding any gcc
extensions (e.g. gcc has extensions for thread local storage
but we are not using those for the same reason).
The rule you described should be
reasonably easy to implement with the Static Analyzer. The
good side of it is that you get a lot of semantic modeling
for free. For instance, if the developer copies `dev` into
a local variable and then uses that local variable outside
of api_enter..api_exit, the tool will be able to handle
transparently, as it deals with values rather than with
variables.
That is really cool.
Also it will probably be the easiest
tool for your problem. The downside would be that it's not
guaranteed to find all bugs; it'll inevitably give up on
complicated code with high cyclomatic complexity :) So if
you want strict/paranoid enforcement of rules, the Static
Analyzer is not the right tool. But if you want to simply
find some bugs for free, it's the right tool.
It sounds as if your problem is not inter-procedural. Let
me double-check this: would you have another
api_enter..api_exit pair in the body of your tweak()
function? Or is just one api_enter..api_exit enough? Or is
it a bug to call api_enter twice without an api_exit in
between?
Yes in this case tweak() could be another public fn of
the api that would also have an enter/exit pair. But we are
using recursive mutexes (a thread can acquire the same mutex
N times (ie call api_enter) so long as it also releases N
times) so that will be okay.
If you have to write api_enter..api_exit in *every*
function that deals with devices, then the problem is not
inter-procedural, which makes it much easier. In
particular, you should be able to come up with a purely
syntactic analysis ("every function that accesses a
device_t must start with api_enter() and must end in
exactly one spot with api_exit()").
We will only insist on enter/exit in public API
functions. Which are the only ones a client application can
call. Internal private functions we won't have locking (as
they can only be called ultimately from a public function so
the device will be locked.) We are going to allow private
fns to call back into the api via the public i/face. But we
will have the public functions in specific files and have a
specific naming prefix.
Such analysis should be easily do-able in clang-tidy
as long as you're satisfied with this level of
aggressiveness. In particular, you'll have to be willing
to sacrifice code like this:
But it may be perfectly fine if you seriously want to
enforce a strict structure on all your functions that deal
with devices.
So is it the case that clang-tidy kind of passes info to
the checker-extension in a syntactic code-parsing order.
Whereas the static analyzer passes information to the
checker in a simulated run-time order?
E.g in your foo() above my proposed checker gets fed 1)
theres a function called foo, 2) theres an if with a call to
flip_a_coin 3) in the true case there is a call to enter
then exit 4) in the else there is nothing 5) there is a
return (at which point my checker would need to be pretty
smart and hold a lot of state to figure out something was
wrong) . And to compare for the static analyzer it's more
like 1) there is fn foo 2.1) there is a code path through
foo with enter/exit 2.2) there is a code path with just
return (at which point my reasonably simple checker would
raise an error).
I think the truly-truly right tool for your problem would
be to come up with a custom analysis over Clang CFG.
.
By CFG you mean the Clang Static Analyzer?
It would be harder to implement, but it would allow
you to express things like "every execution path within a
function that accesses `dev` must have a api_enter before
it and an api_exit after it; you are not allowed to copy
`dev` around". This would strictly enforce the rule.
Yes that would be great. But I think just using
clang-tidy from what you are saying would get us a long way.
And there are heaps of simpler checks we would like to
implement also.
At the same time it'll allow you to lift the
requirement of exactly one return point - you would still
be able to ensure that all accesses are covered. If you
need to allow to copy `dev` around, it should still be
doable, but it will be significantly more difficult to
implement
Does 'copy around' include passing to my private fns
such as tweak()?. We don't need to copy dev anywhere within
the public fns but we do need it to pass it to private fns.
On
7/2/19 12:13 PM, Billy O'Mahony via cfe-dev wrote:
Hello,
I'd like to write a rule for either clang-tidy or
static analyzer to help catch some potential errors
in a project I'm working on.
My questions are:
a) is only one or the other will be able to
do what I want to do?
b) if both are feasible which would have the
simpler implementation?
The project involves writing an API that will run
in a multi-threaded application and is responsible
for serializing all access to a device structure.
Therefore the first thing in every function in the
API must be to call api_enter (which will among
other things acquire a mutex on the device) and the
last thing before returning must be to call
api_exit. Also I want to enforce single exit point
from every API function - or certainly if there are
any return points that bypass the api_exit call.
So here is an example function with errors I want
to catch highlighted.
int api_foo(device_t *dev) {
int ret_val = 0;
bar(); // fn calls & decls before
api_enter is ok- just don't access dev.
dev->bla = 1; // NO! device access before
api_enter() called
api_enter(dev); // error if this call is
not present exactly once
if (dev->bla)
return; // NO! didn't call api_exit
before rtn. Also two return points
if (dev->ma) {
ret_val = 1;
goto cleanup;
}
tweak(dev);
cleanup:
api_exit(dev); // error if this is not
present exactly once
dev->bla = 1; //NO! device access after
api_exit()
return ret_val;
}
I don't think it matters but the project is C
compiled with gcc.
Also if both are feasible any other pointers,
tips or good resources would be appreciated. E.g is
there a totally different methodology I'm not
considering - e.g. would using something like
pycparser be a lot easier - though I'd prefer to
keep it in clang as we plan to use tidy & static
analyzer in any case for standard QA.
You can always do this in an inside out, but it most likely has
performance implications (i never really understood how ASTMatcher
performance works as i've never had any real performance problems
with them):
I managed to create a dummy clang-tidy check and run it.
For the tests I want to write I've decided the easiest
first step is to just check for more than one return
statement in my functions (this will only be applying to
specific functions all with a common prefix e.g "api_foo".
So I managed to write a matcher which seems to work ok.
Finder->addMatcher(returnStmt().bind("x"), this);
And get it to print a warning for each rtn statement in
MyCheck::check:
const auto *MatchedDecl =
Result.Nodes.getNodeAs<ReturnStmt>("x");
diag(MatchedDecl->getReturnLoc(), "found rtn stmt");
So next I want to be able to find the name of the
function that contains the returnStatement but I can't
figure out how to do this.
Or maybe I should start with a FunctionDecl matcher that
matches fns named (api_...) and then somehow traverse that
function's AST and count the returnStmt's. ?
On Sat, 6 Jul 2019 at 13:41,
Billy O'Mahony <[hidden email]>
wrote:
Hi Artem,
super. Good to know I'm on the right path. And thanks
for the extra info - plenty of clang goodness to be
looking into there.
Cheers,
Billy.
On Sat, 6 Jul 2019 at
04:05, Artem Dergachev <[hidden email]>
wrote:
> It sounds like clang-tidy
will be the sweet spot between usefulness and effort
to implement.
Yup, i approve. You can experiment with clang-query
real quick and it should give you an idea. If you find
out that some of the useful ASTMatchers are missing,
feel free to add them - either locally or upstreamed,
as it's super easy.
It's a "source-level" control flow graph that consists
of pointers to AST statements, ordered in the control
flow order (i.e., in order of execution). A directed
path in the CFG from statement A to statement B would
tell you that "B can potentially be executed after A".
This allows you to use your favorite graph algorithms
to reason about the program's behavior.
The Static Analyzer works over Clang CFG, but it's
much more than that: the Static Analyzer also
understands the semantics of every statement and can
model their effects over the "symbolic" state of the
program. For example, in the following code:
void foo(bool x) {
if (x) {
}
if (x) {
}
}
the CFG would be a double-diamond: the control
branches off at the first statement, then joins back
at the end, then branches off again, then joins back.
But the Static Analyzer would understand that the
if-conditions are the same, therefore these are simply
two unrelated execution paths, and the execution path
in which the first branch goes to true and the second
branch goes to false is in fact impossible. And if 'x'
is overwritten in between, the Static Analyzer would
also know that.
Apart from the Static Analyzer, the CFG is used for
some clang warnings, such as -Wuninitialized (see
AnalysisBasedWarnings.cpp). There are some ready-made
analyses for common data flow problems implemented
over Clang CFG (such as live variables analysis,
dominators analysis, etc.).
Clang CFG is not used during the actual compilation /
code generation. When Chris Lattner talks about MLIR
and mentions that Clang should have had a "CIL", he
means that the Clang CFG (or something similar) should
have been used for compilation *as well as* for
analysis. This would, in particular, allow semantic
optimizations that require information that's already
lost in LLVM IR.
Because the CFG is not used for compilation, it's not
necessarily correct. There are a few known bugs in it,
mostly around complicated C++ and also around GNU
extensions such as the *binary* operator ?: or
statement-expressions. But for plain C it should be
nearly perfect.
> Does 'copy around' include passing to my private
fns such as tweak()?
That's entirely up to you. What i was trying to say is
that it if you allow copying 'dev' into another
variable, it will become much harder to implement your
analysis, so you'd much rather forbid the user to do
this. You might also allow the user to do this and
suffer from imprecision in your analysis. At the same
time, because your analysis is not inter-procedural,
passing 'dev' into a function should be fine. The
function can still return it back "laundered" so the
user would be able to assign it into a variable behind
your back. But these are the usual trade-offs of
static analysis, don't be too scared by them - after
all it all boils down to the halting problem :)
On
7/4/19 3:40 AM, Billy O'Mahony wrote:
Hi Artem,
thanks for your well thought-out and useful
reply. It sounds like clang-tidy will be the
sweet spot between usefulness and effort to
implement.
I have a few other responses down below.
Regards,
Billy.
On Wed, 3 Jul
2019 at 23:23, Artem Dergachev <[hidden email]>
wrote:
Hi,
It depends on how strict do you want the
checking be and on the details of the rule.
If you're designing a new API from scratch
and stuck with gcc forever, i wouldn't mind
using the gcc __attribute__((cleanup())) for
your purpose.
I didn't know about that gcc attrib. I need
to read the gcc manual attrib section! I
should've added that while we will be
developing on gcc the code should be
compilable on other toolchains/OSs also, so we
are avoiding any gcc extensions (e.g. gcc has
extensions for thread local storage but we are
not using those for the same reason).
The rule you described
should be reasonably easy to implement with
the Static Analyzer. The good side of it is
that you get a lot of semantic modeling for
free. For instance, if the developer copies
`dev` into a local variable and then uses
that local variable outside of
api_enter..api_exit, the tool will be able
to handle transparently, as it deals with
values rather than with variables.
That is really cool.
Also it will probably
be the easiest tool for your problem. The
downside would be that it's not guaranteed
to find all bugs; it'll inevitably give up
on complicated code with high cyclomatic
complexity :) So if you want strict/paranoid
enforcement of rules, the Static Analyzer is
not the right tool. But if you want to
simply find some bugs for free, it's the
right tool.
It sounds as if your problem is not
inter-procedural. Let me double-check this:
would you have another api_enter..api_exit
pair in the body of your tweak() function?
Or is just one api_enter..api_exit enough?
Or is it a bug to call api_enter twice
without an api_exit in between?
Yes in this case tweak() could be another
public fn of the api that would also have an
enter/exit pair. But we are using recursive
mutexes (a thread can acquire the same mutex N
times (ie call api_enter) so long as it also
releases N times) so that will be okay.
If you have to write api_enter..api_exit in
*every* function that deals with devices,
then the problem is not inter-procedural,
which makes it much easier. In particular,
you should be able to come up with a purely
syntactic analysis ("every function that
accesses a device_t must start with
api_enter() and must end in exactly one spot
with api_exit()").
We will only insist on enter/exit in public
API functions. Which are the only ones a
client application can call. Internal private
functions we won't have locking (as they can
only be called ultimately from a public
function so the device will be locked.) We are
going to allow private fns to call back into
the api via the public i/face. But we will
have the public functions in specific files
and have a specific naming prefix.
Such analysis should be easily do-able
in clang-tidy as long as you're satisfied
with this level of aggressiveness. In
particular, you'll have to be willing to
sacrifice code like this:
But it may be perfectly fine if you
seriously want to enforce a strict structure
on all your functions that deal with
devices.
So is it the case that clang-tidy kind of
passes info to the checker-extension in a
syntactic code-parsing order. Whereas the
static analyzer passes information to the
checker in a simulated run-time order?
E.g in your foo() above my proposed checker
gets fed 1) theres a function called foo, 2)
theres an if with a call to flip_a_coin 3) in
the true case there is a call to enter then
exit 4) in the else there is nothing 5) there
is a return (at which point my checker would
need to be pretty smart and hold a lot of
state to figure out something was wrong) . And
to compare for the static analyzer it's more
like 1) there is fn foo 2.1) there is a code
path through foo with enter/exit 2.2) there is
a code path with just return (at which point
my reasonably simple checker would raise an
error).
I think the truly-truly right tool for your
problem would be to come up with a custom
analysis over Clang CFG.
.
By CFG you mean the Clang Static
Analyzer?
It would be harder to implement, but it
would allow you to express things like
"every execution path within a function that
accesses `dev` must have a api_enter before
it and an api_exit after it; you are not
allowed to copy `dev` around". This would
strictly enforce the rule.
Yes that would be great. But I think just
using clang-tidy from what you are saying
would get us a long way. And there are heaps
of simpler checks we would like to implement
also.
At the same time it'll allow you to lift
the requirement of exactly one return point
- you would still be able to ensure that all
accesses are covered. If you need to allow
to copy `dev` around, it should still be
doable, but it will be significantly more
difficult to implement
Does 'copy around' include passing to my
private fns such as tweak()?. We don't need to
copy dev anywhere within the public fns but we
do need it to pass it to private fns.
On
7/2/19 12:13 PM, Billy O'Mahony via
cfe-dev wrote:
Hello,
I'd like to write a rule for either
clang-tidy or static analyzer to help
catch some potential errors in a
project I'm working on.
My questions are:
a) is only one or the other
will be able to do what I want to do?
b) if both are feasible which
would have the simpler implementation?
The project involves writing an API
that will run in a multi-threaded
application and is responsible for
serializing all access to a device
structure. Therefore the first thing
in every function in the API must be
to call api_enter (which will among
other things acquire a mutex on the
device) and the last thing before
returning must be to call api_exit.
Also I want to enforce single exit
point from every API function - or
certainly if there are any return
points that bypass the api_exit call.
So here is an example function with
errors I want to catch highlighted.
int api_foo(device_t *dev) {
int ret_val = 0;
bar(); // fn calls & decls
before api_enter is ok- just don't
access dev.
dev->bla = 1; // NO! device
access before api_enter() called
api_enter(dev); // error if
this call is not present exactly once
if (dev->bla)
return; // NO! didn't call
api_exit before rtn. Also two return
points
if (dev->ma) {
ret_val = 1;
goto cleanup;
}
tweak(dev);
cleanup:
api_exit(dev); // error if this
is not present exactly once
dev->bla = 1; //NO! device
access after api_exit()
return ret_val;
}
I don't think it matters but the
project is C compiled with gcc.
Also if both are feasible any other
pointers, tips or good resources would
be appreciated. E.g is there a
totally different methodology I'm not
considering - e.g. would using
something like pycparser be a lot
easier - though I'd prefer to keep it
in clang as we plan to use tidy &
static analyzer in any case for
standard QA.
In reply to this post by shirley breuer via cfe-dev
On 10/09/2019 22:47, Billy O'Mahony via cfe-dev wrote:
> Hi Artem,
>
> (or anyone else :D ).
>
> I managed to create a dummy clang-tidy check and run it.
>
> For the tests I want to write I've decided the easiest first step is to
> just check for more than one return statement in my functions (this will
> only be applying to specific functions all with a common prefix e.g
> "api_foo".
>
> So I managed to write a matcher which seems to work ok.
> Finder->addMatcher(returnStmt().bind("x"), this);
>
> And get it to print a warning for each rtn statement in MyCheck::check:
> const auto *MatchedDecl = Result.Nodes.getNodeAs<ReturnStmt>("x");
> diag(MatchedDecl->getReturnLoc(), "found rtn stmt");
>
> So next I want to be able to find the name of the function that contains
> the returnStatement but I can't figure out how to do this.
Your `check` function will then be called once for each return statement
and you can extract both bound nodes.
>
> Or maybe I should start with a FunctionDecl matcher that matches fns
> named (api_...) and then somehow traverse that function's AST and count
> the returnStmt's. ?
If you wish to do counting, then traversing from the Function might
indeed make more sense.
In reply to this post by shirley breuer via cfe-dev
On 10/09/2019 23:46, Artem Dergachev via cfe-dev wrote:
> Yup, the most principled way of doing this with ASTMatchers is to start
> with the function decl and then recurse inside it:
>
> functionDecl(matchesName(...), forEachDescendant(returnStmt(...)))
>
> You can always do this in an inside out, but it most likely has
> performance implications (i never really understood how ASTMatcher
> performance works as i've never had any real performance problems with
> them):
>
> returnStmt(..., hasAncestor(functionDecl(matchesName(...))))
>
These will give unexpected results in the presence of lambdas for
example. The `forFunction` expression should be used instead.
On 9/10/19 3:54 PM, Stephen Kelly via cfe-dev wrote:
> On 10/09/2019 23:46, Artem Dergachev via cfe-dev wrote:
>> Yup, the most principled way of doing this with ASTMatchers is to
>> start with the function decl and then recurse inside it:
>>
>> functionDecl(matchesName(...), forEachDescendant(returnStmt(...)))
>>
>> You can always do this in an inside out, but it most likely has
>> performance implications (i never really understood how ASTMatcher
>> performance works as i've never had any real performance problems
>> with them):
>>
>> returnStmt(..., hasAncestor(functionDecl(matchesName(...))))
>>
> These will give unexpected results in the presence of lambdas for
> example. The `forFunction` expression should be used instead.
>
Yay nice, i never noticed this one, thanks!
Would it make sense to make a "direct" variant of this matcher as well,
i.e. a variant of forEachDescendant for functionDecls that only scans
statements within that function and doesn't descend into nested
declarations?
On 11/09/2019 01:13, Artem Dergachev via cfe-dev wrote:
>
>
> On 9/10/19 3:54 PM, Stephen Kelly via cfe-dev wrote:
>> On 10/09/2019 23:46, Artem Dergachev via cfe-dev wrote:
>>> Yup, the most principled way of doing this with ASTMatchers is to
>>> start with the function decl and then recurse inside it:
>>>
>>> functionDecl(matchesName(...), forEachDescendant(returnStmt(...)))
>>>
>>> You can always do this in an inside out, but it most likely has
>>> performance implications (i never really understood how ASTMatcher
>>> performance works as i've never had any real performance problems
>>> with them):
>>>
>>> returnStmt(..., hasAncestor(functionDecl(matchesName(...))))
>>>
>> These will give unexpected results in the presence of lambdas for
>> example. The `forFunction` expression should be used instead.
>>
>
> Yay nice, i never noticed this one, thanks!
>
> Would it make sense to make a "direct" variant of this matcher as well,
> i.e. a variant of forEachDescendant for functionDecls that only scans
> statements within that function and doesn't descend into nested
> declarations?
I've found that in practice, forEachDescendant is far less useful than
it seems. For example, you might think that it helps you match
descendants which are ints in a function called takeValue:
takeValue(someInt);
Imagine there is also a string overload of takeValue too.
However, if you use forEachDescendant to match the ints, you'll match
the wrong stuff:
takeValue(toString(someInt));
Always use a semantically specific matcher instead of
has/hasDescendant/forEachDescendant etc. And if a semantically specific
matcher doesn't exist, create it (either in your own code or upstream).
In reply to this post by shirley breuer via cfe-dev
Hi Artem, Stephen,
Success \o/
I can detect multiple returns now and should be able to figure out the other things I want to do also.
One or two comments in-line.
Thank you both for your help,
Billy.
Stephen wrote:
> On 10/09/2019 22:47, Billy O'Mahony via cfe-dev wrote:
> > Hi Artem, > > > > (or anyone else :D ). > > > > I managed to create a dummy clang-tidy check and run it. > > > > For the tests I want to write I've decided the easiest first step is to > > just check for more than one return statement in my functions (this will > > only be applying to specific functions all with a common prefix e.g > > "api_foo". > > > > So I managed to write a matcher which seems to work ok. > > Finder->addMatcher(returnStmt().bind("x"), this); > > > > And get it to print a warning for each rtn statement in MyCheck::check: > > const auto *MatchedDecl = Result.Nodes.getNodeAs<ReturnStmt>("x"); > > diag(MatchedDecl->getReturnLoc(), "found rtn stmt"); > > > > So next I want to be able to find the name of the function that contains > > the returnStatement but I can't figure out how to do this. > > . > Finder->addMatcher( > returnStmt( > forFunction(functionDecl().bind("func")) > ).bind("returnStmt"),
> this);
> > > Your `check` function will then be called once for each return statement
> and you can extract both bound nodes.
This is doing the trick very nicely.
>
> > > > Or maybe I should start with a FunctionDecl matcher that matches fns > > named (api_...) and then somehow traverse that function's AST and count > > the returnStmt's. ? > > If you wish to do counting, then traversing from the Function might
> indeed make more sense.
Ok. If I can't figure it out I'll post again. But using the double-bind and recording fn names in a
You can always do this in an inside out, but it most likely has
performance implications (i never really understood how ASTMatcher
performance works as i've never had any real performance problems
with them):
I managed to create a dummy clang-tidy check and run it.
For the tests I want to write I've decided the easiest
first step is to just check for more than one return
statement in my functions (this will only be applying to
specific functions all with a common prefix e.g "api_foo".
So I managed to write a matcher which seems to work ok.
Finder->addMatcher(returnStmt().bind("x"), this);
And get it to print a warning for each rtn statement in
MyCheck::check:
const auto *MatchedDecl =
Result.Nodes.getNodeAs<ReturnStmt>("x");
diag(MatchedDecl->getReturnLoc(), "found rtn stmt");
So next I want to be able to find the name of the
function that contains the returnStatement but I can't
figure out how to do this.
Or maybe I should start with a FunctionDecl matcher that
matches fns named (api_...) and then somehow traverse that
function's AST and count the returnStmt's. ?
On Sat, 6 Jul 2019 at 13:41,
Billy O'Mahony <[hidden email]>
wrote:
Hi Artem,
super. Good to know I'm on the right path. And thanks
for the extra info - plenty of clang goodness to be
looking into there.
Cheers,
Billy.
On Sat, 6 Jul 2019 at
04:05, Artem Dergachev <[hidden email]>
wrote:
> It sounds like clang-tidy
will be the sweet spot between usefulness and effort
to implement.
Yup, i approve. You can experiment with clang-query
real quick and it should give you an idea. If you find
out that some of the useful ASTMatchers are missing,
feel free to add them - either locally or upstreamed,
as it's super easy.
It's a "source-level" control flow graph that consists
of pointers to AST statements, ordered in the control
flow order (i.e., in order of execution). A directed
path in the CFG from statement A to statement B would
tell you that "B can potentially be executed after A".
This allows you to use your favorite graph algorithms
to reason about the program's behavior.
The Static Analyzer works over Clang CFG, but it's
much more than that: the Static Analyzer also
understands the semantics of every statement and can
model their effects over the "symbolic" state of the
program. For example, in the following code:
void foo(bool x) {
if (x) {
}
if (x) {
}
}
the CFG would be a double-diamond: the control
branches off at the first statement, then joins back
at the end, then branches off again, then joins back.
But the Static Analyzer would understand that the
if-conditions are the same, therefore these are simply
two unrelated execution paths, and the execution path
in which the first branch goes to true and the second
branch goes to false is in fact impossible. And if 'x'
is overwritten in between, the Static Analyzer would
also know that.
Apart from the Static Analyzer, the CFG is used for
some clang warnings, such as -Wuninitialized (see
AnalysisBasedWarnings.cpp). There are some ready-made
analyses for common data flow problems implemented
over Clang CFG (such as live variables analysis,
dominators analysis, etc.).
Clang CFG is not used during the actual compilation /
code generation. When Chris Lattner talks about MLIR
and mentions that Clang should have had a "CIL", he
means that the Clang CFG (or something similar) should
have been used for compilation *as well as* for
analysis. This would, in particular, allow semantic
optimizations that require information that's already
lost in LLVM IR.
Because the CFG is not used for compilation, it's not
necessarily correct. There are a few known bugs in it,
mostly around complicated C++ and also around GNU
extensions such as the *binary* operator ?: or
statement-expressions. But for plain C it should be
nearly perfect.
> Does 'copy around' include passing to my private
fns such as tweak()?
That's entirely up to you. What i was trying to say is
that it if you allow copying 'dev' into another
variable, it will become much harder to implement your
analysis, so you'd much rather forbid the user to do
this. You might also allow the user to do this and
suffer from imprecision in your analysis. At the same
time, because your analysis is not inter-procedural,
passing 'dev' into a function should be fine. The
function can still return it back "laundered" so the
user would be able to assign it into a variable behind
your back. But these are the usual trade-offs of
static analysis, don't be too scared by them - after
all it all boils down to the halting problem :)
On
7/4/19 3:40 AM, Billy O'Mahony wrote:
Hi Artem,
thanks for your well thought-out and useful
reply. It sounds like clang-tidy will be the
sweet spot between usefulness and effort to
implement.
I have a few other responses down below.
Regards,
Billy.
On Wed, 3 Jul
2019 at 23:23, Artem Dergachev <[hidden email]>
wrote:
Hi,
It depends on how strict do you want the
checking be and on the details of the rule.
If you're designing a new API from scratch
and stuck with gcc forever, i wouldn't mind
using the gcc __attribute__((cleanup())) for
your purpose.
I didn't know about that gcc attrib. I need
to read the gcc manual attrib section! I
should've added that while we will be
developing on gcc the code should be
compilable on other toolchains/OSs also, so we
are avoiding any gcc extensions (e.g. gcc has
extensions for thread local storage but we are
not using those for the same reason).
The rule you described
should be reasonably easy to implement with
the Static Analyzer. The good side of it is
that you get a lot of semantic modeling for
free. For instance, if the developer copies
`dev` into a local variable and then uses
that local variable outside of
api_enter..api_exit, the tool will be able
to handle transparently, as it deals with
values rather than with variables.
That is really cool.
Also it will probably
be the easiest tool for your problem. The
downside would be that it's not guaranteed
to find all bugs; it'll inevitably give up
on complicated code with high cyclomatic
complexity :) So if you want strict/paranoid
enforcement of rules, the Static Analyzer is
not the right tool. But if you want to
simply find some bugs for free, it's the
right tool.
It sounds as if your problem is not
inter-procedural. Let me double-check this:
would you have another api_enter..api_exit
pair in the body of your tweak() function?
Or is just one api_enter..api_exit enough?
Or is it a bug to call api_enter twice
without an api_exit in between?
Yes in this case tweak() could be another
public fn of the api that would also have an
enter/exit pair. But we are using recursive
mutexes (a thread can acquire the same mutex N
times (ie call api_enter) so long as it also
releases N times) so that will be okay.
If you have to write api_enter..api_exit in
*every* function that deals with devices,
then the problem is not inter-procedural,
which makes it much easier. In particular,
you should be able to come up with a purely
syntactic analysis ("every function that
accesses a device_t must start with
api_enter() and must end in exactly one spot
with api_exit()").
We will only insist on enter/exit in public
API functions. Which are the only ones a
client application can call. Internal private
functions we won't have locking (as they can
only be called ultimately from a public
function so the device will be locked.) We are
going to allow private fns to call back into
the api via the public i/face. But we will
have the public functions in specific files
and have a specific naming prefix.
Such analysis should be easily do-able
in clang-tidy as long as you're satisfied
with this level of aggressiveness. In
particular, you'll have to be willing to
sacrifice code like this:
But it may be perfectly fine if you
seriously want to enforce a strict structure
on all your functions that deal with
devices.
So is it the case that clang-tidy kind of
passes info to the checker-extension in a
syntactic code-parsing order. Whereas the
static analyzer passes information to the
checker in a simulated run-time order?
E.g in your foo() above my proposed checker
gets fed 1) theres a function called foo, 2)
theres an if with a call to flip_a_coin 3) in
the true case there is a call to enter then
exit 4) in the else there is nothing 5) there
is a return (at which point my checker would
need to be pretty smart and hold a lot of
state to figure out something was wrong) . And
to compare for the static analyzer it's more
like 1) there is fn foo 2.1) there is a code
path through foo with enter/exit 2.2) there is
a code path with just return (at which point
my reasonably simple checker would raise an
error).
I think the truly-truly right tool for your
problem would be to come up with a custom
analysis over Clang CFG.
.
By CFG you mean the Clang Static
Analyzer?
It would be harder to implement, but it
would allow you to express things like
"every execution path within a function that
accesses `dev` must have a api_enter before
it and an api_exit after it; you are not
allowed to copy `dev` around". This would
strictly enforce the rule.
Yes that would be great. But I think just
using clang-tidy from what you are saying
would get us a long way. And there are heaps
of simpler checks we would like to implement
also.
At the same time it'll allow you to lift
the requirement of exactly one return point
- you would still be able to ensure that all
accesses are covered. If you need to allow
to copy `dev` around, it should still be
doable, but it will be significantly more
difficult to implement
Does 'copy around' include passing to my
private fns such as tweak()?. We don't need to
copy dev anywhere within the public fns but we
do need it to pass it to private fns.
On
7/2/19 12:13 PM, Billy O'Mahony via
cfe-dev wrote:
Hello,
I'd like to write a rule for either
clang-tidy or static analyzer to help
catch some potential errors in a
project I'm working on.
My questions are:
a) is only one or the other
will be able to do what I want to do?
b) if both are feasible which
would have the simpler implementation?
The project involves writing an API
that will run in a multi-threaded
application and is responsible for
serializing all access to a device
structure. Therefore the first thing
in every function in the API must be
to call api_enter (which will among
other things acquire a mutex on the
device) and the last thing before
returning must be to call api_exit.
Also I want to enforce single exit
point from every API function - or
certainly if there are any return
points that bypass the api_exit call.
So here is an example function with
errors I want to catch highlighted.
int api_foo(device_t *dev) {
int ret_val = 0;
bar(); // fn calls & decls
before api_enter is ok- just don't
access dev.
dev->bla = 1; // NO! device
access before api_enter() called
api_enter(dev); // error if
this call is not present exactly once
if (dev->bla)
return; // NO! didn't call
api_exit before rtn. Also two return
points
if (dev->ma) {
ret_val = 1;
goto cleanup;
}
tweak(dev);
cleanup:
api_exit(dev); // error if this
is not present exactly once
dev->bla = 1; //NO! device
access after api_exit()
return ret_val;
}
I don't think it matters but the
project is C compiled with gcc.
Also if both are feasible any other
pointers, tips or good resources would
be appreciated. E.g is there a
totally different methodology I'm not
considering - e.g. would using
something like pycparser be a lot
easier - though I'd prefer to keep it
in clang as we plan to use tidy &
static analyzer in any case for
standard QA.