[analyzer][RFC] Attribute(s) to enhance/configure the analysis

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[analyzer][RFC] Attribute(s) to enhance/configure the analysis

Manas via cfe-dev
Hi,

There is an evolving need to configure the Clang Static Analyzer within the analyzed source code itself. We'd like to
1) suppress specific checkers (we already have an ongoing discussion at D89638)
2) express summaries (mainly argument constraints)
3) express taint propagation rules for functions (or for global variables like std::cin)

What if we had one attribute for CSA with a StringArgument?
(Actually, we already have that with the `annotate` attribute.)

So we'd have something like this:
1) [[clang::csa("supress.somecheck.somefunctionality")]]
2) [[clang::csa("summary.std::fread.BufferSize.Buffer(0).BufSize(1).BufSizeMultiplier(2)")]]
3) [[clang::csa("taint.sink.myNamespace::mySink")]]

Disadvantages: we must process strings whenever a node has the 'csa' attr attached, we have to come up with a "DSL".
Advantages: total flexibility.

I'd like to explore the possible approaches that we could have. For example, Aaron suggested alternatively for the suppression:
[[clang::suppress("analyzer.somecheck.somefunctionality")]]
[[clang::suppress("compiler.warning.12345")]]
[[clang::suppress("tidy.check-name.whatever")]]

Thanks,
Gabor

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer][RFC] Attribute(s) to enhance/configure the analysis

Manas via cfe-dev
My example is a bit flawed, so it could look like this:

1) 
[[clang::csa("supress.somecheck.somefunctionality")]] void foo();

2)
[[clang::csa("summary.BufferSize.Buffer(0).BufSize(1).BufSizeMultiplier(2)")]]
std::size_t fread( void* buffer, std::size_t size, std::size_t count, std::FILE* stream );

3) 
namespace myNamespace {
[[clang::csa("taint.sink")]]
void mySink(int x);
} // myNamespace

On Tue, Oct 20, 2020 at 5:34 PM Gábor Márton <[hidden email]> wrote:
Hi,

There is an evolving need to configure the Clang Static Analyzer within the analyzed source code itself. We'd like to
1) suppress specific checkers (we already have an ongoing discussion at D89638)
2) express summaries (mainly argument constraints)
3) express taint propagation rules for functions (or for global variables like std::cin)

What if we had one attribute for CSA with a StringArgument?
(Actually, we already have that with the `annotate` attribute.)

So we'd have something like this:
1) [[clang::csa("supress.somecheck.somefunctionality")]]
2) [[clang::csa("summary.std::fread.BufferSize.Buffer(0).BufSize(1).BufSizeMultiplier(2)")]]
3) [[clang::csa("taint.sink.myNamespace::mySink")]]

Disadvantages: we must process strings whenever a node has the 'csa' attr attached, we have to come up with a "DSL".
Advantages: total flexibility.

I'd like to explore the possible approaches that we could have. For example, Aaron suggested alternatively for the suppression:
[[clang::suppress("analyzer.somecheck.somefunctionality")]]
[[clang::suppress("compiler.warning.12345")]]
[[clang::suppress("tidy.check-name.whatever")]]

Thanks,
Gabor

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer][RFC] Attribute(s) to enhance/configure the analysis

Manas via cfe-dev
Hi folks,

@Gabor thank you for opening the conversation, I think it is extremely important!

TBH, at the moment, I am more interested in the first item on the list, so it will be mostly about that item from me.

I want to start off with one common thing for all the items on the list.  It is actually something that we should always keep in mind as the top priority, - how easy and pleasant it is to use.  If we ask users to add something in their code, it should not look like a gimmick.  If it looks elegant and adds value to the code, the users will be happy to add those annotations.  In the perfect scenario, it makes code more clear even for users who don’t use the analyzer (one example here is Python and type annotations for Mypy).

So, what’s my take on suppressions (aka item #1).  I believe that Aaron’s suggestion is better for two reasons:
  • It is more clear in its intentions.  From the very first glance you see that this is a suppression.
  • We already have “suppress” attribute!  We just need to adopt it (and maybe add another possible spelling).

However, the hardest choice here is the string argument for the attribute.  Again, it would be perfect if we can avoid weird identifiers like “java.com.warnings.alpha.whatever.2020”, it would be good if that string will make it clear for people who don’t use the analyzer what’s going on here.
Checker names that we use in command line are a bit too verbose, error messages are prone to changes, bug categories are too broad and abstract (the intention is pretty vague if you suppress “Logic error”).  

My suggestion would be using so-called “Bug types” (we already show them in the summary table in index.html).  Each diagnostic has one, it’s unique and more-or-less descriptive.  If we revise all the existing bug types and try to make them concise and clear, I think it can work!

As for other items, summaries… Oof that's a tough one.  It looks like we need to parse raw strings anyway, right?  So, keeping in mind UX, maybe it should be something entirely different?  Something like a DSL specifically for summaries.  I don’t have one direct suggestion, but I believe that summaries of all things can be something that all developers can benefit from.

Here is another thought here though.  Summaries is not something magical, one summary can not and should not describe the entirety of a particular function.  Summaries usually model specific aspects interesting for one particular checker.  So one function, can have a good number of checker-specific summaries.  It can state that it frees the first argument, dereferences the second argument when the third argument is not equal to 42, and returns tainted data.  Expressing everything in one annotation is way too much.  Having a bunch of summaries directly in the code can be fine, but can be terrifying (so many lines before the actual function).  One way to deal with it is APINotes, but that is a completely different topic.

Honestly, I think the third item is essentially the same thing as summaries, but maybe I don’t see some sort of peculiarity here.

Phew, I got a bit wordy!

Cheers!

Valeriy

On 20 Oct 2020, at 18:38, Gábor Márton <[hidden email]> wrote:

My example is a bit flawed, so it could look like this:

1) 
[[clang::csa("supress.somecheck.somefunctionality")]] void foo();

2)
[[clang::csa("summary.BufferSize.Buffer(0).BufSize(1).BufSizeMultiplier(2)")]]
std::size_t fread( void* buffer, std::size_t size, std::size_t count, std::FILE* stream );

3) 
namespace myNamespace {
[[clang::csa("taint.sink")]]
void mySink(int x);
} // myNamespace

On Tue, Oct 20, 2020 at 5:34 PM Gábor Márton <[hidden email]> wrote:
Hi,

There is an evolving need to configure the Clang Static Analyzer within the analyzed source code itself. We'd like to
1) suppress specific checkers (we already have an ongoing discussion at D89638)
2) express summaries (mainly argument constraints)
3) express taint propagation rules for functions (or for global variables like std::cin)

What if we had one attribute for CSA with a StringArgument?
(Actually, we already have that with the `annotate` attribute.)

So we'd have something like this:
1) [[clang::csa("supress.somecheck.somefunctionality")]]
2) [[clang::csa("summary.std::fread.BufferSize.Buffer(0).BufSize(1).BufSizeMultiplier(2)")]]
3) [[clang::csa("taint.sink.myNamespace::mySink")]]

Disadvantages: we must process strings whenever a node has the 'csa' attr attached, we have to come up with a "DSL".
Advantages: total flexibility.

I'd like to explore the possible approaches that we could have. For example, Aaron suggested alternatively for the suppression:
[[clang::suppress("analyzer.somecheck.somefunctionality")]]
[[clang::suppress("compiler.warning.12345")]]
[[clang::suppress("tidy.check-name.whatever")]]

Thanks,
Gabor


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer][RFC] Attribute(s) to enhance/configure the analysis

Manas via cfe-dev
Hi!

Speaking of suppressions, in an ideal world the user would never need that. I.e. when the analyzer misunderstands the code, the user would be able to add an assert or rewrite the code slightly and the result would be easier to understand both for humans and the analyzer. Unfortunately, this is not the case. We do have false positives when there is no clear way of rewriting the code or adding asserts to make the warning disappear. While it is useful to have an annotation for those cases, there are also some risks involved. Such an annotation can bitrot and redundant annotations can linger even after the code is changed or the analyzer is improved. Those annotations can suppress true positive results. Moreover, users might start to rely on those annotations instead of trying to make the code clearer first, amplifying the effects I mentioned earlier.

I think an ideal suppress annotation would:
* be applicable to a wide range of scopes, not just function scope
* have clear documentation indicating this is the last resort and refer to some guidelines how to suppress false positives by improving the code
* come with an easy mechanism to check whether the annotation makes any difference, so we can easily get rid of redundant ones
* come with an easy mechanism to check how many issues are suppressed by the same annotation (or even expect the user to specify this number as an argument and warn if that does not match the reality?)

We could also warn for checks when we do believe rewriting the code in a cleaner way should always be possible. I think overall those problems that would require the use of suppression annotations should be treated as a high priority.

Cheers,
Gabor

On Tue, 20 Oct 2020 at 20:24, Valeriy Savchenko via cfe-dev <[hidden email]> wrote:
Hi folks,

@Gabor thank you for opening the conversation, I think it is extremely important!

TBH, at the moment, I am more interested in the first item on the list, so it will be mostly about that item from me.

I want to start off with one common thing for all the items on the list.  It is actually something that we should always keep in mind as the top priority, - how easy and pleasant it is to use.  If we ask users to add something in their code, it should not look like a gimmick.  If it looks elegant and adds value to the code, the users will be happy to add those annotations.  In the perfect scenario, it makes code more clear even for users who don’t use the analyzer (one example here is Python and type annotations for Mypy).

So, what’s my take on suppressions (aka item #1).  I believe that Aaron’s suggestion is better for two reasons:
  • It is more clear in its intentions.  From the very first glance you see that this is a suppression.
  • We already have “suppress” attribute!  We just need to adopt it (and maybe add another possible spelling).

However, the hardest choice here is the string argument for the attribute.  Again, it would be perfect if we can avoid weird identifiers like “java.com.warnings.alpha.whatever.2020”, it would be good if that string will make it clear for people who don’t use the analyzer what’s going on here.
Checker names that we use in command line are a bit too verbose, error messages are prone to changes, bug categories are too broad and abstract (the intention is pretty vague if you suppress “Logic error”).  

My suggestion would be using so-called “Bug types” (we already show them in the summary table in index.html).  Each diagnostic has one, it’s unique and more-or-less descriptive.  If we revise all the existing bug types and try to make them concise and clear, I think it can work!

As for other items, summaries… Oof that's a tough one.  It looks like we need to parse raw strings anyway, right?  So, keeping in mind UX, maybe it should be something entirely different?  Something like a DSL specifically for summaries.  I don’t have one direct suggestion, but I believe that summaries of all things can be something that all developers can benefit from.

Here is another thought here though.  Summaries is not something magical, one summary can not and should not describe the entirety of a particular function.  Summaries usually model specific aspects interesting for one particular checker.  So one function, can have a good number of checker-specific summaries.  It can state that it frees the first argument, dereferences the second argument when the third argument is not equal to 42, and returns tainted data.  Expressing everything in one annotation is way too much.  Having a bunch of summaries directly in the code can be fine, but can be terrifying (so many lines before the actual function).  One way to deal with it is APINotes, but that is a completely different topic.

Honestly, I think the third item is essentially the same thing as summaries, but maybe I don’t see some sort of peculiarity here.

Phew, I got a bit wordy!

Cheers!

Valeriy

On 20 Oct 2020, at 18:38, Gábor Márton <[hidden email]> wrote:

My example is a bit flawed, so it could look like this:

1) 
[[clang::csa("supress.somecheck.somefunctionality")]] void foo();

2)
[[clang::csa("summary.BufferSize.Buffer(0).BufSize(1).BufSizeMultiplier(2)")]]
std::size_t fread( void* buffer, std::size_t size, std::size_t count, std::FILE* stream );

3) 
namespace myNamespace {
[[clang::csa("taint.sink")]]
void mySink(int x);
} // myNamespace

On Tue, Oct 20, 2020 at 5:34 PM Gábor Márton <[hidden email]> wrote:
Hi,

There is an evolving need to configure the Clang Static Analyzer within the analyzed source code itself. We'd like to
1) suppress specific checkers (we already have an ongoing discussion at D89638)
2) express summaries (mainly argument constraints)
3) express taint propagation rules for functions (or for global variables like std::cin)

What if we had one attribute for CSA with a StringArgument?
(Actually, we already have that with the `annotate` attribute.)

So we'd have something like this:
1) [[clang::csa("supress.somecheck.somefunctionality")]]
2) [[clang::csa("summary.std::fread.BufferSize.Buffer(0).BufSize(1).BufSizeMultiplier(2)")]]
3) [[clang::csa("taint.sink.myNamespace::mySink")]]

Disadvantages: we must process strings whenever a node has the 'csa' attr attached, we have to come up with a "DSL".
Advantages: total flexibility.

I'd like to explore the possible approaches that we could have. For example, Aaron suggested alternatively for the suppression:
[[clang::suppress("analyzer.somecheck.somefunctionality")]]
[[clang::suppress("compiler.warning.12345")]]
[[clang::suppress("tidy.check-name.whatever")]]

Thanks,
Gabor

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer][RFC] Attribute(s) to enhance/configure the analysis

Manas via cfe-dev
I totally agree with everything you said!

I just want to add something that I think was not very clear from my previous email.  The existing “suppress” attribute is a statement attribute, which gives it a very fine level of granularity.

> On 20 Oct 2020, at 23:06, Gábor Horváth <[hidden email]> wrote:
>
> Hi!
>
> Speaking of suppressions, in an ideal world the user would never need that. I.e. when the analyzer misunderstands the code, the user would be able to add an assert or rewrite the code slightly and the result would be easier to understand both for humans and the analyzer. Unfortunately, this is not the case. We do have false positives when there is no clear way of rewriting the code or adding asserts to make the warning disappear. While it is useful to have an annotation for those cases, there are also some risks involved. Such an annotation can bitrot and redundant annotations can linger even after the code is changed or the analyzer is improved. Those annotations can suppress true positive results. Moreover, users might start to rely on those annotations instead of trying to make the code clearer first, amplifying the effects I mentioned earlier.
>
> I think an ideal suppress annotation would:
> * be applicable to a wide range of scopes, not just function scope
> * have clear documentation indicating this is the last resort and refer to some guidelines how to suppress false positives by improving the code
> * come with an easy mechanism to check whether the annotation makes any difference, so we can easily get rid of redundant ones
> * come with an easy mechanism to check how many issues are suppressed by the same annotation (or even expect the user to specify this number as an argument and warn if that does not match the reality?)
>
> We could also warn for checks when we do believe rewriting the code in a cleaner way should always be possible. I think overall those problems that would require the use of suppression annotations should be treated as a high priority.
>
> Cheers,
> Gabor
>

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer][RFC] Attribute(s) to enhance/configure the analysis

Manas via cfe-dev
In reply to this post by Manas via cfe-dev
Hi!

Sorry for the duplicate emails, I just realized that there is one more dimension of suppression mechanisms. There are separate tools such as CodeChecker that already support suppressing individual reports (either using comments or bug hashes). So adding an annotation could make it unclear who is responsible for the suppression and which feature should the user use.

Cheers,
Gabor


On Tue, 20 Oct 2020 at 22:06, Gábor Horváth <[hidden email]> wrote:
Hi!

Speaking of suppressions, in an ideal world the user would never need that. I.e. when the analyzer misunderstands the code, the user would be able to add an assert or rewrite the code slightly and the result would be easier to understand both for humans and the analyzer. Unfortunately, this is not the case. We do have false positives when there is no clear way of rewriting the code or adding asserts to make the warning disappear. While it is useful to have an annotation for those cases, there are also some risks involved. Such an annotation can bitrot and redundant annotations can linger even after the code is changed or the analyzer is improved. Those annotations can suppress true positive results. Moreover, users might start to rely on those annotations instead of trying to make the code clearer first, amplifying the effects I mentioned earlier.

I think an ideal suppress annotation would:
* be applicable to a wide range of scopes, not just function scope
* have clear documentation indicating this is the last resort and refer to some guidelines how to suppress false positives by improving the code
* come with an easy mechanism to check whether the annotation makes any difference, so we can easily get rid of redundant ones
* come with an easy mechanism to check how many issues are suppressed by the same annotation (or even expect the user to specify this number as an argument and warn if that does not match the reality?)

We could also warn for checks when we do believe rewriting the code in a cleaner way should always be possible. I think overall those problems that would require the use of suppression annotations should be treated as a high priority.

Cheers,
Gabor

On Tue, 20 Oct 2020 at 20:24, Valeriy Savchenko via cfe-dev <[hidden email]> wrote:
Hi folks,

@Gabor thank you for opening the conversation, I think it is extremely important!

TBH, at the moment, I am more interested in the first item on the list, so it will be mostly about that item from me.

I want to start off with one common thing for all the items on the list.  It is actually something that we should always keep in mind as the top priority, - how easy and pleasant it is to use.  If we ask users to add something in their code, it should not look like a gimmick.  If it looks elegant and adds value to the code, the users will be happy to add those annotations.  In the perfect scenario, it makes code more clear even for users who don’t use the analyzer (one example here is Python and type annotations for Mypy).

So, what’s my take on suppressions (aka item #1).  I believe that Aaron’s suggestion is better for two reasons:
  • It is more clear in its intentions.  From the very first glance you see that this is a suppression.
  • We already have “suppress” attribute!  We just need to adopt it (and maybe add another possible spelling).

However, the hardest choice here is the string argument for the attribute.  Again, it would be perfect if we can avoid weird identifiers like “java.com.warnings.alpha.whatever.2020”, it would be good if that string will make it clear for people who don’t use the analyzer what’s going on here.
Checker names that we use in command line are a bit too verbose, error messages are prone to changes, bug categories are too broad and abstract (the intention is pretty vague if you suppress “Logic error”).  

My suggestion would be using so-called “Bug types” (we already show them in the summary table in index.html).  Each diagnostic has one, it’s unique and more-or-less descriptive.  If we revise all the existing bug types and try to make them concise and clear, I think it can work!

As for other items, summaries… Oof that's a tough one.  It looks like we need to parse raw strings anyway, right?  So, keeping in mind UX, maybe it should be something entirely different?  Something like a DSL specifically for summaries.  I don’t have one direct suggestion, but I believe that summaries of all things can be something that all developers can benefit from.

Here is another thought here though.  Summaries is not something magical, one summary can not and should not describe the entirety of a particular function.  Summaries usually model specific aspects interesting for one particular checker.  So one function, can have a good number of checker-specific summaries.  It can state that it frees the first argument, dereferences the second argument when the third argument is not equal to 42, and returns tainted data.  Expressing everything in one annotation is way too much.  Having a bunch of summaries directly in the code can be fine, but can be terrifying (so many lines before the actual function).  One way to deal with it is APINotes, but that is a completely different topic.

Honestly, I think the third item is essentially the same thing as summaries, but maybe I don’t see some sort of peculiarity here.

Phew, I got a bit wordy!

Cheers!

Valeriy

On 20 Oct 2020, at 18:38, Gábor Márton <[hidden email]> wrote:

My example is a bit flawed, so it could look like this:

1) 
[[clang::csa("supress.somecheck.somefunctionality")]] void foo();

2)
[[clang::csa("summary.BufferSize.Buffer(0).BufSize(1).BufSizeMultiplier(2)")]]
std::size_t fread( void* buffer, std::size_t size, std::size_t count, std::FILE* stream );

3) 
namespace myNamespace {
[[clang::csa("taint.sink")]]
void mySink(int x);
} // myNamespace

On Tue, Oct 20, 2020 at 5:34 PM Gábor Márton <[hidden email]> wrote:
Hi,

There is an evolving need to configure the Clang Static Analyzer within the analyzed source code itself. We'd like to
1) suppress specific checkers (we already have an ongoing discussion at D89638)
2) express summaries (mainly argument constraints)
3) express taint propagation rules for functions (or for global variables like std::cin)

What if we had one attribute for CSA with a StringArgument?
(Actually, we already have that with the `annotate` attribute.)

So we'd have something like this:
1) [[clang::csa("supress.somecheck.somefunctionality")]]
2) [[clang::csa("summary.std::fread.BufferSize.Buffer(0).BufSize(1).BufSizeMultiplier(2)")]]
3) [[clang::csa("taint.sink.myNamespace::mySink")]]

Disadvantages: we must process strings whenever a node has the 'csa' attr attached, we have to come up with a "DSL".
Advantages: total flexibility.

I'd like to explore the possible approaches that we could have. For example, Aaron suggested alternatively for the suppression:
[[clang::suppress("analyzer.somecheck.somefunctionality")]]
[[clang::suppress("compiler.warning.12345")]]
[[clang::suppress("tidy.check-name.whatever")]]

Thanks,
Gabor

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer][RFC] Attribute(s) to enhance/configure the analysis

Manas via cfe-dev
In reply to this post by Manas via cfe-dev
On Tue, Oct 20, 2020 at 11:34 AM Gábor Márton <[hidden email]> wrote:

>
> Hi,
>
> There is an evolving need to configure the Clang Static Analyzer within the analyzed source code itself. We'd like to
> 1) suppress specific checkers (we already have an ongoing discussion at D89638)
> 2) express summaries (mainly argument constraints)
> 3) express taint propagation rules for functions (or for global variables like std::cin)
>
> What if we had one attribute for CSA with a StringArgument?
> (Actually, we already have that with the `annotate` attribute.)
>
> So we'd have something like this:
> 1) [[clang::csa("supress.somecheck.somefunctionality")]]
> 2) [[clang::csa("summary.std::fread.BufferSize.Buffer(0).BufSize(1).BufSizeMultiplier(2)")]]
> 3) [[clang::csa("taint.sink.myNamespace::mySink")]]
>
> Disadvantages: we must process strings whenever a node has the 'csa' attr attached, we have to come up with a "DSL".
> Advantages: total flexibility.
>
> I'd like to explore the possible approaches that we could have. For example, Aaron suggested alternatively for the suppression:
> [[clang::suppress("analyzer.somecheck.somefunctionality")]]
> [[clang::suppress("compiler.warning.12345")]]
> [[clang::suppress("tidy.check-name.whatever")]]

Thank you for exploring the options in this space -- I think having a
uniform way for users to suppress diagnostics using attributes is a
really interesting idea. I have personally never really liked using
pragmas to do this work because that approach is usually quite verbose
when applied to a single line of code.

I think there are some questions we should keep in mind:
* How do we name diagnostics for the various components that issue
diagnostics? Does Clang need to introduce a per-warning diagnostic
identifier (e.g., C1234 like done in MSVC)
* Do we want to suppress at the declaration/statement level, or do we
want to support suppressing a range of lines? (I think we only want
decl/stmt level as pragma can be used for a range of lines.)
* Do we want the user to be able to suppress more than diagnostics by
identifier in the same attribute? e.g.,
[[clang::suppress("compiler.warning.12345", "tidy.frobble.wonky")]]
(My answer is yes.)
* Do we want the user to be able to suppress all diagnostics? e.g.,
[[clang::suppress]] (My answer is yes.)
* Do we want the user to be able to suppress diagnostics using some
grouping technique? e.g., [[clang::suppress("tidy.cert.*")]] or
[[clang::suppress("compiler.attributes")]] akin to -Wno-attributes
* If the user has a suppression attribute and the diagnostic listed is
*not* issued, do we want to give a user a way to notice that so they
can remove potentially stale annotations? e.g.,
-Wsuppressed-but-not-diagnosed
* Should the user be able to suppress warnings that have been upgraded
into an error via the command line?
* Does suppressing a diagnostic also suppress associated notes? (I
would assume the answer is yes.)

Regarding the name of the attribute, I would recommend avoiding "csa"
as that's not likely to be an acronym users will have a lot of
experience with. Given that all the tools that emit diagnostics which
would use this functionality are clang-based, I think using the
"clang" vendor namespace is sufficient (I'd rather not have
csa::suppress, clang::suppress, and tidy::suppress unless there's a
really good reason for it).

~Aaron
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer][RFC] Attribute(s) to enhance/configure the analysis

Manas via cfe-dev
Hi all,

I'm sorry I'm a little late to the party!

I have couple thoughts first and a proposal at the end.

On Oct 20, 2020, at 1:06 PM, Gábor Horváth via cfe-dev <[hidden email]> wrote:
I think an ideal suppress annotation would:
* be applicable to a wide range of scopes, not just function scope
* have clear documentation indicating this is the last resort and refer to some guidelines how to suppress false positives by improving the code
* come with an easy mechanism to check whether the annotation makes any difference, so we can easily get rid of redundant ones
* come with an easy mechanism to check how many issues are suppressed by the same annotation (or even expect the user to specify this number as an argument and warn if that does not match the reality?)

We must consider projects that need to support multiple versions of clang and by proxy multiple versions of Static Analyzer. Since there can be both false positives and false negatives in Analyzer we need to have a good story for situations when an attribute doesn't make sense for a particular Analyzer version (for example older one) but the maintainers still want to keep it in source. Maybe we could produce just lower severity diagnostics than warning?
(In case someone asks - I would strongly prefer to not introduce versioning of these attributes.)

On Oct 20, 2020, at 11:24 AM, Valeriy Savchenko via cfe-dev <[hidden email]> wrote:
However, the hardest choice here is the string argument for the attribute.  Again, it would be perfect if we can avoid weird identifiers like “java.com.warnings.alpha.whatever.2020”, it would be good if that string will make it clear for people who don’t use the analyzer what’s going on here.
Checker names that we use in command line are a bit too verbose, error messages are prone to changes, bug categories are too broad and abstract (the intention is pretty vague if you suppress “Logic error”).  

My suggestion would be using so-called “Bug types” (we already show them in the summary table in index.html).  Each diagnostic has one, it’s unique and more-or-less descriptive.  If we revise all the existing bug types and try to make them concise and clear, I think it can work!

Just to make sure we're on the same page - I'll use one of the WebKit checkers that I need to have a suppression for as an example:

checker name:
webkit.UncountedLambdaCapturesChecker
error messages (depend on the context):
warning: Captured raw-pointer 'ref_countable' to uncounted type is unsafe [webkit.UncountedLambdaCapturesChecker]
warning: Implicitly captured raw-pointer 'ref_countable' to uncounted type is unsafe [webkit.UncountedLambdaCapturesChecker]
bug category:
WebKit coding guidelines
bug type:
Uncounted call argument for a raw pointer/reference parameter

Assuming I got it right from all the above I have a clear preference which is the checker name. In a way it would be analogous to pragmas for cc1 diagnostics suppression - the same identifier is used in CLI and in the source for suppression.
My second choice would be to introduce another string identifier as I have a hard time imagining any other of the existing strings being introduce to codebases.

On Oct 20, 2020, at 1:11 PM, Valeriy Savchenko via cfe-dev <[hidden email]> wrote:

I just want to add something that I think was not very clear from my previous email.  The existing “suppress” attribute is a statement attribute, which gives it a very fine level of granularity.

I have couple use-cases for declaration attributes (like ParmVarDecl, FiledDecl, VarDecl) but I assume that no matter what mechanism we devise it should be applicable for declaration attributes too.
One thing I feel we need is "kind safety". While it's trivial for single-purpose attribute to define what kinds of AST nodes is it applicable to it becomes a bit more work with different arguments of a generic attribute.

On Oct 20, 2020, at 1:12 PM, Gábor Horváth via cfe-dev <[hidden email]> wrote:
Sorry for the duplicate emails, I just realized that there is one more dimension of suppression mechanisms. There are separate tools such as CodeChecker that already support suppressing individual reports (either using comments or bug hashes). So adding an annotation could make it unclear who is responsible for the suppression and which feature should the user use.

I assume that the status quo is that there's no common standard for in-source suppression for bug-finding tools. That means that arguably we won't make it significantly worse by adding n+1.
Hypothetically if we design the annotation system well we might convince other projects to use it. And if I had to pick one tool to take into the account I would maybe think of GCC analyzer?


One thing to consider here is that if we want to have some cross-tool story we probably shouldn't use any vendor-specific prefix like clang::suppress or csa::suppress.

On Oct 21, 2020, at 11:57 AM, Aaron Ballman via cfe-dev <[hidden email]> wrote:

Thank you for exploring the options in this space -- I think having a
uniform way for users to suppress diagnostics using attributes is a
really interesting idea. I have personally never really liked using
pragmas to do this work because that approach is usually quite verbose
when applied to a single line of code.

IIUC we have use-cases for suppression of "diagnostics produced by Analyzer" only. While attribute-based suppression of cc1 diagnostics sounds interesting I'm worried that the necessary broad discussion would effectively halt this.
Would you mind if we keep the focus on Analyzer for now or do you believe we'd not converge to the right design unless we consider cc1 too?

Re: "stable identifiers". I feel that no matter what identifiers we decide to use we still might want some escape hatch in case we end up having to rename a "stable identifier". 
I am not aware of any solution for such situation with pragmas.

But I am thinking that basically all we need is a notion of identifier alias - if we can have two names for an attribute we're fine.
Imagine we add a warning identifier "foo" to Analyzer in clang-12 and add an alias "bar" for it in clang-13. Projects using the "bar" identifier are fine and any project that is ok with minimal supported version being clang-13 can change the spelling in their sources to "bar".

Overall I suggest we:
- Adopt the suppress attribute Valeriy mentioned and extend it to declarations.
- Don't use prefix like clang:: or csa:: to make it an obvious interface for suppression to be used by other tools too.
- MVP can support just a single string identifier, later extend to comma-separated list of string identifiers.
- Use checker name as an identifier.


Cheers,

Jan


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer][RFC] Attribute(s) to enhance/configure the analysis

Manas via cfe-dev
On Thu, Oct 22, 2020 at 10:31 PM Jan Korous <[hidden email]> wrote:

>
> Hi all,
>
> I'm sorry I'm a little late to the party!
>
> I have couple thoughts first and a proposal at the end.
>
> On Oct 20, 2020, at 1:06 PM, Gábor Horváth via cfe-dev <[hidden email]> wrote:
>
> I think an ideal suppress annotation would:
> * be applicable to a wide range of scopes, not just function scope
> * have clear documentation indicating this is the last resort and refer to some guidelines how to suppress false positives by improving the code
> * come with an easy mechanism to check whether the annotation makes any difference, so we can easily get rid of redundant ones
> * come with an easy mechanism to check how many issues are suppressed by the same annotation (or even expect the user to specify this number as an argument and warn if that does not match the reality?)
>
>
> We must consider projects that need to support multiple versions of clang and by proxy multiple versions of Static Analyzer. Since there can be both false positives and false negatives in Analyzer we need to have a good story for situations when an attribute doesn't make sense for a particular Analyzer version (for example older one) but the maintainers still want to keep it in source. Maybe we could produce just lower severity diagnostics than warning?
> (In case someone asks - I would strongly prefer to not introduce versioning of these attributes.)
>
> On Oct 20, 2020, at 11:24 AM, Valeriy Savchenko via cfe-dev <[hidden email]> wrote:
> However, the hardest choice here is the string argument for the attribute.  Again, it would be perfect if we can avoid weird identifiers like “java.com.warnings.alpha.whatever.2020”, it would be good if that string will make it clear for people who don’t use the analyzer what’s going on here.
> Checker names that we use in command line are a bit too verbose, error messages are prone to changes, bug categories are too broad and abstract (the intention is pretty vague if you suppress “Logic error”).
>
> My suggestion would be using so-called “Bug types” (we already show them in the summary table in index.html).  Each diagnostic has one, it’s unique and more-or-less descriptive.  If we revise all the existing bug types and try to make them concise and clear, I think it can work!
>
>
> Just to make sure we're on the same page - I'll use one of the WebKit checkers that I need to have a suppression for as an example:
>
> checker name:
> webkit.UncountedLambdaCapturesChecker
> error messages (depend on the context):
> warning: Captured raw-pointer 'ref_countable' to uncounted type is unsafe [webkit.UncountedLambdaCapturesChecker]
> warning: Implicitly captured raw-pointer 'ref_countable' to uncounted type is unsafe [webkit.UncountedLambdaCapturesChecker]
> bug category:
> WebKit coding guidelines
> bug type:
> Uncounted call argument for a raw pointer/reference parameter
>
> Assuming I got it right from all the above I have a clear preference which is the checker name. In a way it would be analogous to pragmas for cc1 diagnostics suppression - the same identifier is used in CLI and in the source for suppression.
> My second choice would be to introduce another string identifier as I have a hard time imagining any other of the existing strings being introduce to codebases.
>
> On Oct 20, 2020, at 1:11 PM, Valeriy Savchenko via cfe-dev <[hidden email]> wrote:
>
> I just want to add something that I think was not very clear from my previous email.  The existing “suppress” attribute is a statement attribute, which gives it a very fine level of granularity.
>
>
> I have couple use-cases for declaration attributes (like ParmVarDecl, FiledDecl, VarDecl) but I assume that no matter what mechanism we devise it should be applicable for declaration attributes too.
> One thing I feel we need is "kind safety". While it's trivial for single-purpose attribute to define what kinds of AST nodes is it applicable to it becomes a bit more work with different arguments of a generic attribute.
>
> On Oct 20, 2020, at 1:12 PM, Gábor Horváth via cfe-dev <[hidden email]> wrote:
>
> Sorry for the duplicate emails, I just realized that there is one more dimension of suppression mechanisms. There are separate tools such as CodeChecker that already support suppressing individual reports (either using comments or bug hashes). So adding an annotation could make it unclear who is responsible for the suppression and which feature should the user use.
>
>
> I assume that the status quo is that there's no common standard for in-source suppression for bug-finding tools. That means that arguably we won't make it significantly worse by adding n+1.
> Hypothetically if we design the annotation system well we might convince other projects to use it. And if I had to pick one tool to take into the account I would maybe think of GCC analyzer?
>
> https://developers.redhat.com/blog/2020/03/26/static-analysis-in-gcc-10/
>
> One thing to consider here is that if we want to have some cross-tool story we probably shouldn't use any vendor-specific prefix like clang::suppress or csa::suppress.

There's not a standardized identifier for this attribute and so I
believe we should in fact use a vendor-specific scope. Putting
something in the standard's attribute namespace would not be the right
approach, at least. My preference is to use the clang vendor namespace
because it's already a vendor namespace our users are used to seeing
and because the name covers the broadest range of diagnostics that can
be disabled.

> On Oct 21, 2020, at 11:57 AM, Aaron Ballman via cfe-dev <[hidden email]> wrote:
>
> Thank you for exploring the options in this space -- I think having a
> uniform way for users to suppress diagnostics using attributes is a
> really interesting idea. I have personally never really liked using
> pragmas to do this work because that approach is usually quite verbose
> when applied to a single line of code.
>
> IIUC we have use-cases for suppression of "diagnostics produced by Analyzer" only.
> While attribute-based suppression of cc1 diagnostics sounds interesting I'm worried that the necessary broad discussion would effectively halt this.
> Would you mind if we keep the focus on Analyzer for now or do you believe we'd not converge to the right design unless we consider cc1 too?

I don't think that's the case, though. We have at least three
different tools that are all tightly integrated together and none of
them share a common mechanism for suppressing diagnostics. In Clang,
you either have to use command line arguments or a pragma. In
clang-tidy, we support // NOLINT comments. Now we're talking about
adding an attribute for the static analyzer. I think it's fine for
there to be more than one way to suppress a diagnostic, but we need a
common way for users that doesn't require them to understand what part
of the toolchain issued a diagnostic in order to suppress it properly.

I'm fine focusing on the static analyzer's needs, but I personally
think it's a requirement that whatever we come up with is a general
solution that covers at least the frontend, clang-tidy, and the static
analyzer with buy-in from the people heavily involved in those tools,
even if there's not a commitment to get that work done immediately for
all three tools.

> Re: "stable identifiers". I feel that no matter what identifiers we decide to use we still might want some escape hatch in case we end up having to rename a "stable identifier".
> I am not aware of any solution for such situation with pragmas.
> https://clang.llvm.org/docs/UsersManual.html#controlling-diagnostics-via-pragmas

I'm not overly worried about the stability of identifiers -- we change
diagnostics on occasion, but I think we tweak wording or remove/add
diagnostics far more than we change a diagnostic's meaning
sufficiently to warrant giving it a new stable identifier.

That said, it does bring up a different kind of diagnostic we should
keep in mind: [[clang::suppress("I no longer exist")]] where the
string literal doesn't refer to a diagnostic name. e.g., the user was
suppressing a diagnostic we removed, or the user has a typo in the
string literal, etc. We should perhaps diagnose this case with typo
correction to help alert users of the confusion rather than silently
accepting an attribute that has no impact.

> But I am thinking that basically all we need is a notion of identifier alias - if we can have two names for an attribute we're fine.
> Imagine we add a warning identifier "foo" to Analyzer in clang-12 and add an alias "bar" for it in clang-13. Projects using the "bar" identifier are fine and any project that is ok with minimal supported version being clang-13 can change the spelling in their sources to "bar".

I think this is an interesting idea and is worth exploring. For
instance, in clang-tidy we have checks that are aliased into multiple
groupings and I could imagine the user saying they want to suppress a
readability-based diagnostic but still allow the diagnostic if another
module is turned on even if it's the same information (e.g., maybe
they enable the CERT module and suddenly the issue is no longer about
readability but security as far as the user is concerned).

> Overall I suggest we:
> - Adopt the suppress attribute Valeriy mentioned and extend it to declarations.

Agreed, though we should also consider whether we have any type-based
warnings where we want the attribute on a type (I can't think of any
off the top of my head)

> - Don't use prefix like clang:: or csa:: to make it an obvious interface for suppression to be used by other tools too.

-1, this steps on design space that's not ours to step on.

> - MVP can support just a single string identifier, later extend to comma-separated list of string identifiers.

I can't imagine supporting a list of identifiers is actually more work
than supporting a single identifier. That said, I don't think it's
critical for the MVP, but if someone wants to implement it in the
initial offerings, I say go for it.

> - Use checker name as an identifier.

This approach seems to work fine if the checker only issues a few
diagnostics and they generally aren't on the same statement or
declaration, but it would be a problem if we had a checker which
issues multiple unrelated (or semi-related) warnings on the same
statement. e.g.,

// note, this is all one statement with newlines for clarity
foo() + // say Checker A issues a "don't call foo" warning here
bar() / baz(); // and Checker A also issues a "potential divide by
zero" warning here

If the user wanted to use an attribute to suppress the potential
divide by zero warning, using the checker name would also mean the
"don't call foo" warning is silenced because the attribute is at the
statement level. My impression is that we don't have many checkers
that would be in this situation, so maybe it's fine, but I do wonder a
bit about the frontend diagnostics. I'd imagine we could use the
warning group name as the "checker" name in that situation, but I
could imagine code like this:

[[gnu::not_supported_in_clang]] // want to suppress the ignored
unknown attribute diagnostic here
[[clang::supproted_in_clang]] // don't want to suppress the ignored
unknown attribute diagnostic here (from the typo)
void func();

I think it's worth intentionally deciding whether we want to support
cases like this or not up front. We also probably should talk about
some edge cases like whether aliases are treated the same as the
original checker or not, does it support families of checkers, etc.

~Aaron

>
>
> Cheers,
>
> Jan
>
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer][RFC] Attribute(s) to enhance/configure the analysis

Manas via cfe-dev


On Oct 23, 2020, at 5:34 AM, Aaron Ballman <[hidden email]> wrote:

On Thu, Oct 22, 2020 at 10:31 PM Jan Korous <[hidden email]> wrote:

Hi all,

I'm sorry I'm a little late to the party!

I have couple thoughts first and a proposal at the end.

On Oct 20, 2020, at 1:06 PM, Gábor Horváth via cfe-dev <[hidden email]> wrote:

I think an ideal suppress annotation would:
* be applicable to a wide range of scopes, not just function scope
* have clear documentation indicating this is the last resort and refer to some guidelines how to suppress false positives by improving the code
* come with an easy mechanism to check whether the annotation makes any difference, so we can easily get rid of redundant ones
* come with an easy mechanism to check how many issues are suppressed by the same annotation (or even expect the user to specify this number as an argument and warn if that does not match the reality?)


We must consider projects that need to support multiple versions of clang and by proxy multiple versions of Static Analyzer. Since there can be both false positives and false negatives in Analyzer we need to have a good story for situations when an attribute doesn't make sense for a particular Analyzer version (for example older one) but the maintainers still want to keep it in source. Maybe we could produce just lower severity diagnostics than warning?
(In case someone asks - I would strongly prefer to not introduce versioning of these attributes.)

On Oct 20, 2020, at 11:24 AM, Valeriy Savchenko via cfe-dev <[hidden email]> wrote:
However, the hardest choice here is the string argument for the attribute.  Again, it would be perfect if we can avoid weird identifiers like “java.com.warnings.alpha.whatever.2020”, it would be good if that string will make it clear for people who don’t use the analyzer what’s going on here.
Checker names that we use in command line are a bit too verbose, error messages are prone to changes, bug categories are too broad and abstract (the intention is pretty vague if you suppress “Logic error”).

My suggestion would be using so-called “Bug types” (we already show them in the summary table in index.html).  Each diagnostic has one, it’s unique and more-or-less descriptive.  If we revise all the existing bug types and try to make them concise and clear, I think it can work!


Just to make sure we're on the same page - I'll use one of the WebKit checkers that I need to have a suppression for as an example:

checker name:
webkit.UncountedLambdaCapturesChecker
error messages (depend on the context):
warning: Captured raw-pointer 'ref_countable' to uncounted type is unsafe [webkit.UncountedLambdaCapturesChecker]
warning: Implicitly captured raw-pointer 'ref_countable' to uncounted type is unsafe [webkit.UncountedLambdaCapturesChecker]
bug category:
WebKit coding guidelines
bug type:
Uncounted call argument for a raw pointer/reference parameter

Assuming I got it right from all the above I have a clear preference which is the checker name. In a way it would be analogous to pragmas for cc1 diagnostics suppression - the same identifier is used in CLI and in the source for suppression.
My second choice would be to introduce another string identifier as I have a hard time imagining any other of the existing strings being introduce to codebases.

On Oct 20, 2020, at 1:11 PM, Valeriy Savchenko via cfe-dev <[hidden email]> wrote:

I just want to add something that I think was not very clear from my previous email.  The existing “suppress” attribute is a statement attribute, which gives it a very fine level of granularity.


I have couple use-cases for declaration attributes (like ParmVarDecl, FiledDecl, VarDecl) but I assume that no matter what mechanism we devise it should be applicable for declaration attributes too.
One thing I feel we need is "kind safety". While it's trivial for single-purpose attribute to define what kinds of AST nodes is it applicable to it becomes a bit more work with different arguments of a generic attribute.

On Oct 20, 2020, at 1:12 PM, Gábor Horváth via cfe-dev <[hidden email]> wrote:

Sorry for the duplicate emails, I just realized that there is one more dimension of suppression mechanisms. There are separate tools such as CodeChecker that already support suppressing individual reports (either using comments or bug hashes). So adding an annotation could make it unclear who is responsible for the suppression and which feature should the user use.


I assume that the status quo is that there's no common standard for in-source suppression for bug-finding tools. That means that arguably we won't make it significantly worse by adding n+1.
Hypothetically if we design the annotation system well we might convince other projects to use it. And if I had to pick one tool to take into the account I would maybe think of GCC analyzer?

https://developers.redhat.com/blog/2020/03/26/static-analysis-in-gcc-10/

One thing to consider here is that if we want to have some cross-tool story we probably shouldn't use any vendor-specific prefix like clang::suppress or csa::suppress.

There's not a standardized identifier for this attribute and so I
believe we should in fact use a vendor-specific scope. Putting
something in the standard's attribute namespace would not be the right
approach, at least. My preference is to use the clang vendor namespace
because it's already a vendor namespace our users are used to seeing
and because the name covers the broadest range of diagnostics that can
be disabled.

I see, thanks for explaining this - I'm actually not familiar with constraints imposed by the C++ standard. In general the clang:: prefix should work for us.
The only concern I have is - we need to make sure we support also C. Is there a notion of namespace in __attribute__ (())?
(Grepping clang/test for __attribute__ didn't make me much wiser.)


On Oct 21, 2020, at 11:57 AM, Aaron Ballman via cfe-dev <[hidden email]> wrote:

Thank you for exploring the options in this space -- I think having a
uniform way for users to suppress diagnostics using attributes is a
really interesting idea. I have personally never really liked using
pragmas to do this work because that approach is usually quite verbose
when applied to a single line of code.

IIUC we have use-cases for suppression of "diagnostics produced by Analyzer" only.
While attribute-based suppression of cc1 diagnostics sounds interesting I'm worried that the necessary broad discussion would effectively halt this.
Would you mind if we keep the focus on Analyzer for now or do you believe we'd not converge to the right design unless we consider cc1 too?

I don't think that's the case, though. We have at least three
different tools that are all tightly integrated together and none of
them share a common mechanism for suppressing diagnostics. In Clang,
you either have to use command line arguments or a pragma. In
clang-tidy, we support // NOLINT comments. Now we're talking about
adding an attribute for the static analyzer. I think it's fine for
there to be more than one way to suppress a diagnostic, but we need a
common way for users that doesn't require them to understand what part
of the toolchain issued a diagnostic in order to suppress it properly.

I hear you although I don't agree that users shouldn't need to know what tool produces a warning. There are multiple different tools in the toolchain and there's no unified abstraction over their diagnostics.
Clang-tidy and Static Analyzer are opt-in - users need to explicitly use them (at least by proxy of an IDE or some other tool).

BTW are we reasonably sure that there won't be any weird corner-cases for attribute-based diagnostics suppression in Frontend? There are couple hundred attribute-related warnings just in DiagnosticSemaKinds.td.

I'm fine focusing on the static analyzer's needs, but I personally
think it's a requirement that whatever we come up with is a general
solution that covers at least the frontend, clang-tidy, and the static
analyzer with buy-in from the people heavily involved in those tools,
even if there's not a commitment to get that work done immediately for
all three tools.

The situation I specifically want to avoid is to be held back by Frontend. I am not aware of any motivation for attribute-based warning suppression coming from that side of things and don't think that Analyzer need is the right justification for that.
That being said - ultimately we are more or less talking about an attribute with a textual identifier. I consider such specification broad enough to support pretty much any possible use-case. The only thing we might want to do is to keep the specification as trivial as possible. I don't think we need to come up with a plan for Frontend as long as we do our best to avoid potential blockers in the design.

Re: "stable identifiers". I feel that no matter what identifiers we decide to use we still might want some escape hatch in case we end up having to rename a "stable identifier".
I am not aware of any solution for such situation with pragmas.
https://clang.llvm.org/docs/UsersManual.html#controlling-diagnostics-via-pragmas

I'm not overly worried about the stability of identifiers -- we change
diagnostics on occasion, but I think we tweak wording or remove/add
diagnostics far more than we change a diagnostic's meaning
sufficiently to warrant giving it a new stable identifier.

That said, it does bring up a different kind of diagnostic we should
keep in mind: [[clang::suppress("I no longer exist")]] where the
string literal doesn't refer to a diagnostic name. e.g., the user was
suppressing a diagnostic we removed, or the user has a typo in the
string literal, etc. We should perhaps diagnose this case with typo
correction to help alert users of the confusion rather than silently
accepting an attribute that has no impact.

I think that while this is important it is orthogonal to the discussion of how to name the attribute and what kind of identifier to use and we can deal with this separately later.

Note that this would still be a problem for projects that need to support multiple toolchains & versions - e. g. LLVM still supports clang-3.5:

If we ever remove an identifier then I don't see what a maintainer of such project should do:
- When they build and analyze the project with older toolchain they support the identifier would work as expected.
- When they build with a newer toolchain they support they get a warning which is not actionable for them - assuming we don't want to encourage a suppression suppression mechanism...

Maybe the bottom line is that we simply shouldn't ever remove an identifier?


But I am thinking that basically all we need is a notion of identifier alias - if we can have two names for an attribute we're fine.
Imagine we add a warning identifier "foo" to Analyzer in clang-12 and add an alias "bar" for it in clang-13. Projects using the "bar" identifier are fine and any project that is ok with minimal supported version being clang-13 can change the spelling in their sources to "bar".

I think this is an interesting idea and is worth exploring. For
instance, in clang-tidy we have checks that are aliased into multiple
groupings and I could imagine the user saying they want to suppress a
readability-based diagnostic but still allow the diagnostic if another
module is turned on even if it's the same information (e.g., maybe
they enable the CERT module and suddenly the issue is no longer about
readability but security as far as the user is concerned).

Overall I suggest we:
- Adopt the suppress attribute Valeriy mentioned and extend it to declarations.

Agreed, though we should also consider whether we have any type-based
warnings where we want the attribute on a type (I can't think of any
off the top of my head)

Agreed. I feel that we can just make sure this works for attributes on a type but don't have to necessarily implement this until there's a use-case.


- Don't use prefix like clang:: or csa:: to make it an obvious interface for suppression to be used by other tools too.

-1, this steps on design space that's not ours to step on.

I put my concern about C above. If that's ok I'm fine with clang:: prefix.

- MVP can support just a single string identifier, later extend to comma-separated list of string identifiers.

I can't imagine supporting a list of identifiers is actually more work
than supporting a single identifier. That said, I don't think it's
critical for the MVP, but if someone wants to implement it in the
initial offerings, I say go for it.

Agreed.

- Use checker name as an identifier.

This approach seems to work fine if the checker only issues a few
diagnostics and they generally aren't on the same statement or
declaration, but it would be a problem if we had a checker which
issues multiple unrelated (or semi-related) warnings on the same
statement. e.g.,

// note, this is all one statement with newlines for clarity
foo() + // say Checker A issues a "don't call foo" warning here
bar() / baz(); // and Checker A also issues a "potential divide by
zero" warning here

You're right that having a per-checker suppression would either constitute a constraint on implementation details of the checkers or force users to work around it by refactoring their code.

Also, there's a problem with using checker name as the identifier - since we have alpha checkers it's somewhat expected that such checker will at some point get renamed:
alpha.PackageFoo.CheckerBar -> PackageFoo.CheckerBar

The above makes me lean towards the direction that we should devise a new identifier then.

If the user wanted to use an attribute to suppress the potential
divide by zero warning, using the checker name would also mean the
"don't call foo" warning is silenced because the attribute is at the
statement level. My impression is that we don't have many checkers
that would be in this situation, so maybe it's fine, but I do wonder a
bit about the frontend diagnostics. I'd imagine we could use the
warning group name as the "checker" name in that situation, but I
could imagine code like this:

[[gnu::not_supported_in_clang]] // want to suppress the ignored
unknown attribute diagnostic here
[[clang::supproted_in_clang]] // don't want to suppress the ignored
unknown attribute diagnostic here (from the typo)
void func();

Just to make sure we're on the same page - Analyzer warnings aren't defined in a tablegen file like Frontend warnings are and they also don't have associated -Wfoo/-Wno-foo command line flags.
The warning text is produced ad-hoc in each checker's implementation. The identifier that's printed as part of the warning is the checker name.

I basically don't see a connection between checker name and Frontend warning names which makes me think that it doesn't really matter what identifier we decide to use for suppression in Analyzer as that's going to be an Analyzer-only concept anyway and Frontend is welcome to use any suitable identifier.

I think it's worth intentionally deciding whether we want to support
cases like this or not up front. We also probably should talk about
some edge cases like whether aliases are treated the same as the
original checker or not, does it support families of checkers, etc.

I would strongly prefer to keep the design as simple as possible (at least initially).

Cheers,

Jan

~Aaron



Cheers,

Jan


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer][RFC] Attribute(s) to enhance/configure the analysis

Manas via cfe-dev
On Fri, Oct 23, 2020 at 7:49 PM Jan Korous <[hidden email]> wrote:

> On Oct 23, 2020, at 5:34 AM, Aaron Ballman <[hidden email]> wrote:
> On Thu, Oct 22, 2020 at 10:31 PM Jan Korous <[hidden email]> wrote:
>
>
> Hi all,
>
> I'm sorry I'm a little late to the party!
>
> I have couple thoughts first and a proposal at the end.
>
> On Oct 20, 2020, at 1:06 PM, Gábor Horváth via cfe-dev <[hidden email]> wrote:
>
> I think an ideal suppress annotation would:
> * be applicable to a wide range of scopes, not just function scope
> * have clear documentation indicating this is the last resort and refer to some guidelines how to suppress false positives by improving the code
> * come with an easy mechanism to check whether the annotation makes any difference, so we can easily get rid of redundant ones
> * come with an easy mechanism to check how many issues are suppressed by the same annotation (or even expect the user to specify this number as an argument and warn if that does not match the reality?)
>
>
> We must consider projects that need to support multiple versions of clang and by proxy multiple versions of Static Analyzer. Since there can be both false positives and false negatives in Analyzer we need to have a good story for situations when an attribute doesn't make sense for a particular Analyzer version (for example older one) but the maintainers still want to keep it in source. Maybe we could produce just lower severity diagnostics than warning?
> (In case someone asks - I would strongly prefer to not introduce versioning of these attributes.)
>
> On Oct 20, 2020, at 11:24 AM, Valeriy Savchenko via cfe-dev <[hidden email]> wrote:
> However, the hardest choice here is the string argument for the attribute.  Again, it would be perfect if we can avoid weird identifiers like “java.com.warnings.alpha.whatever.2020”, it would be good if that string will make it clear for people who don’t use the analyzer what’s going on here.
> Checker names that we use in command line are a bit too verbose, error messages are prone to changes, bug categories are too broad and abstract (the intention is pretty vague if you suppress “Logic error”).
>
> My suggestion would be using so-called “Bug types” (we already show them in the summary table in index.html).  Each diagnostic has one, it’s unique and more-or-less descriptive.  If we revise all the existing bug types and try to make them concise and clear, I think it can work!
>
>
> Just to make sure we're on the same page - I'll use one of the WebKit checkers that I need to have a suppression for as an example:
>
> checker name:
> webkit.UncountedLambdaCapturesChecker
> error messages (depend on the context):
> warning: Captured raw-pointer 'ref_countable' to uncounted type is unsafe [webkit.UncountedLambdaCapturesChecker]
> warning: Implicitly captured raw-pointer 'ref_countable' to uncounted type is unsafe [webkit.UncountedLambdaCapturesChecker]
> bug category:
> WebKit coding guidelines
> bug type:
> Uncounted call argument for a raw pointer/reference parameter
>
> Assuming I got it right from all the above I have a clear preference which is the checker name. In a way it would be analogous to pragmas for cc1 diagnostics suppression - the same identifier is used in CLI and in the source for suppression.
> My second choice would be to introduce another string identifier as I have a hard time imagining any other of the existing strings being introduce to codebases.
>
> On Oct 20, 2020, at 1:11 PM, Valeriy Savchenko via cfe-dev <[hidden email]> wrote:
>
> I just want to add something that I think was not very clear from my previous email.  The existing “suppress” attribute is a statement attribute, which gives it a very fine level of granularity.
>
>
> I have couple use-cases for declaration attributes (like ParmVarDecl, FiledDecl, VarDecl) but I assume that no matter what mechanism we devise it should be applicable for declaration attributes too.
> One thing I feel we need is "kind safety". While it's trivial for single-purpose attribute to define what kinds of AST nodes is it applicable to it becomes a bit more work with different arguments of a generic attribute.
>
> On Oct 20, 2020, at 1:12 PM, Gábor Horváth via cfe-dev <[hidden email]> wrote:
>
> Sorry for the duplicate emails, I just realized that there is one more dimension of suppression mechanisms. There are separate tools such as CodeChecker that already support suppressing individual reports (either using comments or bug hashes). So adding an annotation could make it unclear who is responsible for the suppression and which feature should the user use.
>
>
> I assume that the status quo is that there's no common standard for in-source suppression for bug-finding tools. That means that arguably we won't make it significantly worse by adding n+1.
> Hypothetically if we design the annotation system well we might convince other projects to use it. And if I had to pick one tool to take into the account I would maybe think of GCC analyzer?
>
> https://developers.redhat.com/blog/2020/03/26/static-analysis-in-gcc-10/
>
> One thing to consider here is that if we want to have some cross-tool story we probably shouldn't use any vendor-specific prefix like clang::suppress or csa::suppress.
>
>
> There's not a standardized identifier for this attribute and so I
> believe we should in fact use a vendor-specific scope. Putting
> something in the standard's attribute namespace would not be the right
> approach, at least. My preference is to use the clang vendor namespace
> because it's already a vendor namespace our users are used to seeing
> and because the name covers the broadest range of diagnostics that can
> be disabled.
>
>
> I see, thanks for explaining this - I'm actually not familiar with constraints imposed by the C++ standard. In general the clang:: prefix should work for us.
> The only concern I have is - we need to make sure we support also C. Is there a notion of namespace in __attribute__ (())?
> (Grepping clang/test for __attribute__ didn't make me much wiser.)

GNU-style attributes do not have namespaces, but C supports [[]]-style
attributes starting in C2x, and -fdouble-square-bracket-attributes can
be used in any language standard mode. I think we should focus on
[[]]-style attributes because those have a more regular syntax for
determining what the attribute appertains to.

> On Oct 21, 2020, at 11:57 AM, Aaron Ballman via cfe-dev <[hidden email]> wrote:
>
> Thank you for exploring the options in this space -- I think having a
> uniform way for users to suppress diagnostics using attributes is a
> really interesting idea. I have personally never really liked using
> pragmas to do this work because that approach is usually quite verbose
> when applied to a single line of code.
>
> IIUC we have use-cases for suppression of "diagnostics produced by Analyzer" only.
> While attribute-based suppression of cc1 diagnostics sounds interesting I'm worried that the necessary broad discussion would effectively halt this.
> Would you mind if we keep the focus on Analyzer for now or do you believe we'd not converge to the right design unless we consider cc1 too?
>
>
> I don't think that's the case, though. We have at least three
> different tools that are all tightly integrated together and none of
> them share a common mechanism for suppressing diagnostics. In Clang,
> you either have to use command line arguments or a pragma. In
> clang-tidy, we support // NOLINT comments. Now we're talking about
> adding an attribute for the static analyzer. I think it's fine for
> there to be more than one way to suppress a diagnostic, but we need a
> common way for users that doesn't require them to understand what part
> of the toolchain issued a diagnostic in order to suppress it properly.
>
>
> I hear you although I don't agree that users shouldn't need to know what tool produces a warning. There are multiple different tools in the toolchain and there's no unified abstraction over their diagnostics.
> Clang-tidy and Static Analyzer are opt-in - users need to explicitly use them (at least by proxy of an IDE or some other tool).

That's a fair point, but trying to come up with a unified naming
convention for identifying diagnostics (or groups of them) across the
various tools seems like a very big project. Using per-tool
identifiers may not be idea, but it also seems like an easier problem
to solve.

> BTW are we reasonably sure that there won't be any weird corner-cases for attribute-based diagnostics suppression in Frontend? There are couple hundred attribute-related warnings just in DiagnosticSemaKinds.td.

I suspect there will be weird corner cases with just about anything we
devise (attributes, comments, something else entirely). Do you have a
kind of corner-case in mind that you're worried about?

> I'm fine focusing on the static analyzer's needs, but I personally
> think it's a requirement that whatever we come up with is a general
> solution that covers at least the frontend, clang-tidy, and the static
> analyzer with buy-in from the people heavily involved in those tools,
> even if there's not a commitment to get that work done immediately for
> all three tools.
>
>
> The situation I specifically want to avoid is to be held back by Frontend. I am not aware of any motivation for attribute-based warning suppression coming from that side of things and don't think that Analyzer need is the right justification for that.

The fact that Clang supports -Wno command line flags to suppress
diagnostics as well as a pragma to do the same (potentially only for a
single instance of the diagnostic) certainly suggests there's
motivation for suppressing diagnostics in the frontend. Put another
way: why are the static analyzer's diagnostics any more or less
special than the frontend's diagnostics in terms of a user wanting to
suppress them?

> That being said - ultimately we are more or less talking about an attribute with a textual identifier. I consider such specification broad enough to support pretty much any possible use-case. The only thing we might want to do is to keep the specification as trivial as possible. I don't think we need to come up with a plan for Frontend as long as we do our best to avoid potential blockers in the design.

Agreed -- I just want to make sure we don't paint ourselves into a
corner with the initial design by picking names or features that won't
translate well to other diagnostic-producing constructs in the
toolchain. While it'd be awesome to see the attribute implemented
everywhere right off the bat, I suspect we'll roll it out gradually.

> Re: "stable identifiers". I feel that no matter what identifiers we decide to use we still might want some escape hatch in case we end up having to rename a "stable identifier".
> I am not aware of any solution for such situation with pragmas.
> https://clang.llvm.org/docs/UsersManual.html#controlling-diagnostics-via-pragmas
>
>
> I'm not overly worried about the stability of identifiers -- we change
> diagnostics on occasion, but I think we tweak wording or remove/add
> diagnostics far more than we change a diagnostic's meaning
> sufficiently to warrant giving it a new stable identifier.
>
> That said, it does bring up a different kind of diagnostic we should
> keep in mind: [[clang::suppress("I no longer exist")]] where the
> string literal doesn't refer to a diagnostic name. e.g., the user was
> suppressing a diagnostic we removed, or the user has a typo in the
> string literal, etc. We should perhaps diagnose this case with typo
> correction to help alert users of the confusion rather than silently
> accepting an attribute that has no impact.
>
>
> I think that while this is important it is orthogonal to the discussion of how to name the attribute and what kind of identifier to use and we can deal with this separately later.
>
> Note that this would still be a problem for projects that need to support multiple toolchains & versions - e. g. LLVM still supports clang-3.5:
> https://llvm.org/docs/GettingStarted.html#host-c-toolchain-both-compiler-and-standard-library
>
> If we ever remove an identifier then I don't see what a maintainer of such project should do:
> - When they build and analyze the project with older toolchain they support the identifier would work as expected.
> - When they build with a newer toolchain they support they get a warning which is not actionable for them - assuming we don't want to encourage a suppression suppression mechanism...
>
> Maybe the bottom line is that we simply shouldn't ever remove an identifier?

I can see how this is orthogonal, but at the same time, I'd like us to
have a policy for how all the tools would deal with this so that users
get consistent behavior from the attribute, which suggests we should
be thinking about it early on in the design process. I think the
points you bring up are great ones for us to keep in mind and perhaps
not removing a diagnostic identifier for these attributes is
reasonable (I'd have to think on it more).

>
>
> But I am thinking that basically all we need is a notion of identifier alias - if we can have two names for an attribute we're fine.
> Imagine we add a warning identifier "foo" to Analyzer in clang-12 and add an alias "bar" for it in clang-13. Projects using the "bar" identifier are fine and any project that is ok with minimal supported version being clang-13 can change the spelling in their sources to "bar".
>
>
> I think this is an interesting idea and is worth exploring. For
> instance, in clang-tidy we have checks that are aliased into multiple
> groupings and I could imagine the user saying they want to suppress a
> readability-based diagnostic but still allow the diagnostic if another
> module is turned on even if it's the same information (e.g., maybe
> they enable the CERT module and suddenly the issue is no longer about
> readability but security as far as the user is concerned).
>
> Overall I suggest we:
> - Adopt the suppress attribute Valeriy mentioned and extend it to declarations.
>
>
> Agreed, though we should also consider whether we have any type-based
> warnings where we want the attribute on a type (I can't think of any
> off the top of my head)
>
>
> Agreed. I feel that we can just make sure this works for attributes on a type but don't have to necessarily implement this until there's a use-case.
>
>
> - Don't use prefix like clang:: or csa:: to make it an obvious interface for suppression to be used by other tools too.
>
>
> -1, this steps on design space that's not ours to step on.
>
>
> I put my concern about C above. If that's ok I'm fine with clang:: prefix.
>
> - MVP can support just a single string identifier, later extend to comma-separated list of string identifiers.
>
> I can't imagine supporting a list of identifiers is actually more work
> than supporting a single identifier. That said, I don't think it's
> critical for the MVP, but if someone wants to implement it in the
> initial offerings, I say go for it.
>
>
> Agreed.
>
> - Use checker name as an identifier.
>
> This approach seems to work fine if the checker only issues a few
> diagnostics and they generally aren't on the same statement or
> declaration, but it would be a problem if we had a checker which
> issues multiple unrelated (or semi-related) warnings on the same
> statement. e.g.,
>
> // note, this is all one statement with newlines for clarity
> foo() + // say Checker A issues a "don't call foo" warning here
> bar() / baz(); // and Checker A also issues a "potential divide by
> zero" warning here
>
>
> You're right that having a per-checker suppression would either constitute a constraint on implementation details of the checkers or force users to work around it by refactoring their code.
>
> Also, there's a problem with using checker name as the identifier - since we have alpha checkers it's somewhat expected that such checker will at some point get renamed:
> alpha.PackageFoo.CheckerBar -> PackageFoo.CheckerBar

Good point! There have also been discussions about adding an
"experimental" module in clang-tidy that would likely suffer the same
problem.

> The above makes me lean towards the direction that we should devise a new identifier then.
>
> If the user wanted to use an attribute to suppress the potential
> divide by zero warning, using the checker name would also mean the
> "don't call foo" warning is silenced because the attribute is at the
> statement level. My impression is that we don't have many checkers
> that would be in this situation, so maybe it's fine, but I do wonder a
> bit about the frontend diagnostics. I'd imagine we could use the
> warning group name as the "checker" name in that situation, but I
> could imagine code like this:
>
> [[gnu::not_supported_in_clang]] // want to suppress the ignored
> unknown attribute diagnostic here
> [[clang::supproted_in_clang]] // don't want to suppress the ignored
> unknown attribute diagnostic here (from the typo)
> void func();
>
>
> Just to make sure we're on the same page - Analyzer warnings aren't defined in a tablegen file like Frontend warnings are and they also don't have associated -Wfoo/-Wno-foo command line flags.
> The warning text is produced ad-hoc in each checker's implementation. The identifier that's printed as part of the warning is the checker name.

Agreed.

> I basically don't see a connection between checker name and Frontend warning names which makes me think that it doesn't really matter what identifier we decide to use for suppression in Analyzer as that's going to be an Analyzer-only concept anyway and Frontend is welcome to use any suitable identifier.

Oh, definitely! I was speaking more to the point that some checkers
will produce multiple of the same diagnostics on the same
declaration/statement and the user may want to suppress only one of
them. I used attributes as an example because it was a similar case of
I'm talking about -- the user will get two diagnostics (only variation
is the name of the ignored attribute but the diagnostics come from the
same "checker") and they may want to suppress one but not the other.
I'm not certain how many checkers (either real checkers in the
analyzer or clang-tidy, or notional checkers from the frontend) this
applies to.

> I think it's worth intentionally deciding whether we want to support
> cases like this or not up front. We also probably should talk about
> some edge cases like whether aliases are treated the same as the
> original checker or not, does it support families of checkers, etc.
>
>
> I would strongly prefer to keep the design as simple as possible (at least initially).

Same here -- I think we can extend the functionality as we learn of
more real-world use cases, but anything like this that we spend a bit
of time thinking on and decide to explicitly support/not support
should be documented.

~Aaron

>
> Cheers,
>
> Jan
>
> ~Aaron
>
>
>
> Cheers,
>
> Jan
>
>
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev