[RFC] Adding lifetime analysis to clang

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding lifetime analysis to clang

Kristóf Umann via cfe-dev
Thank you for summarizing the discussion and raising all these points.
Based on your response I propose the following plan moving forward:

* Start submitting patches regarding the non-controversial parts of the analysis, including:
  - Adding the annotations to Clang
  - Generalize existing statement-local warnings
    - Technical question here: should we expect the STL vendors to annotate the types, should we hard-code the annotations into the compiler, or should we actually implement type inference but restrict it to STL types?
  - Adding some extra statement-local warnings
  - Adding the flow sensitive analysis

Can we add you as a reviewer to those patches?

While we add the non-controversial part we can continue to discuss the rest on the mailing list and act according to the consensus we have later on.
Fortunately, not having the inference upstream will not stop us from leveraging the rest of the work.

To start the discussions regarding the type category inference, as far as I understand you are more worried about the false negatives that can give a bad impression about the analysis rather than the false positives. One idea we had is to have a refactoring that will automatically annotate the user defined types based on the inference we have. This could give the users a way to understand the root causes of such false negatives.
In case the community will not want automatic type category inference in the compiler we are still likely to implement it as a Tidy check with a refactoring to make it easier for users to adopt this analysis.

What do you think?

Cheers,
Gabor

On Fri, 12 Apr 2019 at 16:04, Dmitri Gribenko <[hidden email]> wrote:
Gabor and I chatted privately, and I wanted to summarize the discussion.

=== Concerns about inference of type categories ===

I agree that having type categories is a good idea.  I think this is
one of the most important user-facing features in the proposal.

I don't agree with inferring type categories though.  The rules are
too complex to allow the inference to be done implicitly without
hurting readability.  I think the users should be always required to
annotate their Owners and Pointers, but not Aggregates and not Values.
The implementation could still check the rules, if they are crucial.

If the rules were simpler (e.g., "Pointer is any type that satisfies
the iterator concept"), I would consider inference more acceptable,
but still borderline.  Iterator is a complex concept by itself, there
are many broken iterators out there that don't satisfy the concept,
but the simpler usages in practice compile anyway.  However, just the
Pointer rules way more complex than that -- they are like half a page
long.

I don't think we can simplify the rules though -- they need to cover
the necessary C++ concepts.

My primary concern with inference is not false positives, but
understandability and debuggability of the system.  When I get a
warning, how do I, as a human, determine if the types I use are being
correctly classified?  As an API vendor, how can I be sure that my
types are correctly classified so that users will get correct
warnings?  The answer is "add an annotation".  So wouldn't every API
provider want to add an annotation regardless then?

My worry is that a fully automatic system where the user is not
required to understand the rules, will not match users' expectations.
If a provider of a type thinks that it is a Pointer, but it actually
isn't, the failure mode is false negatives, not false positives. Users
will just think the analysis is too weak and can't detect the problem.
So the API providers will have to test that their types are indeed
recognized by the compiler with the correct type category.  At that
point they could as well add an annotation.

This is basically the same reason why C++ added the `override`
keyword, and C++ having backwards compatibility constraints made
`override` optional.  The inference rules for `override` are much
simpler than inference of type categories.  However, style guides
often require writing `override` where possible.

Some types don't conform to the Pointer category formally, even though
in spirit they are pointers.  For example, if style guide used in the
project prohibits operator overloading: https://godbolt.org/z/IYJwHw .
However, users who "don't fully understand the rules" could reasonably
think that the compiler should be able to figure out that this type is
a Pointer.  If the annotation was required, the problem would be
instantly discovered when the compiler would rightfully complain that
the type does not conform to Pointer.

I understand there are two primary reasons why inference is desired:
reduction of annotation burden, and third-party libraries.

To evaluate the annotation burden in practice, I think LLVM is not a
great example.  LLVM has an above-average number of Owners and
Pointers because it tries to implement standard library bits from
future C++ versions.  Also, those Owners and Pointers are in "ADT" and
"Support" libraries, not in regular application code.

Third-party libraries can be annotated through the "API Notes"
approach, that Swift's Objective-C interop uses.  API notes allow the
user to pass an annotations file that injects the necessary attributes
into the library headers.

=== Future work idea: "Optional" type category ===

I also think it would be great to have more type categories, in
particular "optional" -- which will include pointers (that can contain
null), `std::optional` (that can contain nullopt), and whatever other
types with an empty state people might have in their projects.  Then
we could use the same dataflow-sensitive rules as the proposal uses
for tracking null pointers to track checks of empty optionals etc.

However, the rules for assuming nullability seem to be loose (return
values and members are assumed non-null), I'm not very happy about
that: https://godbolt.org/z/CMxEVv . I understand this was probably
done to lower the annotation burden.  I don't think it makes for a
very understandable model though.  I also couldn't find a single place
in the proposal that summarized these assumptions, I had to piece them
together from different sections.

Dmitri

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding lifetime analysis to clang

Kristóf Umann via cfe-dev
On Mon, Apr 15, 2019 at 2:26 PM Gábor Horváth <[hidden email]> wrote:

>
> Thank you for summarizing the discussion and raising all these points.
> Based on your response I propose the following plan moving forward:
>
> * Start submitting patches regarding the non-controversial parts of the analysis, including:
>   - Adding the annotations to Clang
>   - Generalize existing statement-local warnings
>     - Technical question here: should we expect the STL vendors to annotate the types, should we hard-code the annotations into the compiler, or should we actually implement type inference but restrict it to STL types?
>   - Adding some extra statement-local warnings
>   - Adding the flow sensitive analysis
>
> Can we add you as a reviewer to those patches?

Happy to help :)

> To start the discussions regarding the type category inference, as far as I understand you are more worried about the false negatives that can give a bad impression about the analysis rather than the false positives.

I would say my concerns are around the understandability of the
system, more so than the false negatives of inference.

> One idea we had is to have a refactoring that will automatically annotate the user defined types based on the inference we have. This could give the users a way to understand the root causes of such false negatives.
> In case the community will not want automatic type category inference in the compiler we are still likely to implement it as a Tidy check with a refactoring to make it easier for users to adopt this analysis.

That'd be helpful, however, relying purely on such analysis is not
much better than inference.  Users should annotate types that conform
to type categories.  We should also figure out a way to learn if it is
common for users to have types that are in spirit conforming to type
categories, but don't conform syntactically (e.g., have named methods
instead of overloaded operators).  If those are important, we would
need to build corresponding annotations.

Dmitri

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <[hidden email]>*/
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding lifetime analysis to clang

Kristóf Umann via cfe-dev


Dmitri Gribenko <[hidden email]> ezt írta (időpont: 2019. ápr. 15., H 16:02):
On Mon, Apr 15, 2019 at 2:26 PM Gábor Horváth <[hidden email]> wrote:
>
> Thank you for summarizing the discussion and raising all these points.
> Based on your response I propose the following plan moving forward:
>
> * Start submitting patches regarding the non-controversial parts of the analysis, including:
>   - Adding the annotations to Clang
>   - Generalize existing statement-local warnings
>     - Technical question here: should we expect the STL vendors to annotate the types, should we hard-code the annotations into the compiler, or should we actually implement type inference but restrict it to STL types?
>   - Adding some extra statement-local warnings
>   - Adding the flow sensitive analysis
>
> Can we add you as a reviewer to those patches?

Happy to help :)

> To start the discussions regarding the type category inference, as far as I understand you are more worried about the false negatives that can give a bad impression about the analysis rather than the false positives.

I would say my concerns are around the understandability of the
system, more so than the false negatives of inference.

> One idea we had is to have a refactoring that will automatically annotate the user defined types based on the inference we have. This could give the users a way to understand the root causes of such false negatives.
> In case the community will not want automatic type category inference in the compiler we are still likely to implement it as a Tidy check with a refactoring to make it easier for users to adopt this analysis.

That'd be helpful, however, relying purely on such analysis is not
much better than inference.  Users should annotate types that conform
to type categories.  We should also figure out a way to learn if it is
common for users to have types that are in spirit conforming to type
categories, but don't conform syntactically (e.g., have named methods
instead of overloaded operators).  If those are important, we would
need to build corresponding annotations.

Great! So let's move on with upstreaming the non-controversial parts and return to discussing the inference problem once we have more experience from users and other data. 

Thanks,
Gábor


Dmitri

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <[hidden email]>*/

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding lifetime analysis to clang

Kristóf Umann via cfe-dev
In reply to this post by Kristóf Umann via cfe-dev
Hi!

I wanted to give you a small update on the process of upstreaming these changes. The type category annotations along with most of the statement local warnings are already upstreamed. Clang will not guess the type categories automatically, types need to be annotated explicitly. For some STL types, Clang will automatically append the annotations even if they are not present in the source code. This way the new warnings will trigger even if the STL implementation does not have the annotations in place yet.

Furthermore, it looks like these warnings really do catch bugs! :) See the two examples we found in the LLVM repository by annotationg llvm::StringRef [1]. We also found bugs in other popular open source projects most of which was fixed very soon after reporting them. We did not see any false positives when running current Clang top of tree on other projects, but as these warnings are relatively new it is not impossible to have some. The Chromium and LLVM builds are clean though, so there should not be any obvious problem. If you notice any spurious warnings please let us know!

We plan to share a more detailed evaluation/report after the CppCon talk on the subject [2].

Cheers,
Gabor



On Thu, 29 Nov 2018 at 08:02, Gábor Horváth <[hidden email]> wrote:

Hi!


This is a proposal to implement Lifetime Analysis [1] defined by Herb Sutter in Clang.

Summary from the paper:

“This analysis shows how to efficiently diagnose many common cases of dangling (use-after-free) in C++ code, using only local analysis to report them as deterministic readable errors at compile time. The approach is to identify variables that are of generalized “Owner” types (e.g., smart pointers, containers, string) and “Pointer” types (e.g., int*, string_view, span, iterators, and ranges), and then use a local simple acyclic control flow graph (ACFG) analysis to track what each Pointer points to and identify when modifying an Owner invalidates a Pointer. The analysis leverages C++’s existing strong notions of scopes, object lifetimes, and const that carry rich information already available in reasonably modern C++ source code. Interestingly, it appears that with minor extension this analysis can also detect uses of local moved-from variables (use-after-move), which are a form of dangling.”

More details can be found in the paper [1] or in the CppCon keynote [3].


Matthias Gehre and myself had been working on a prototype in Clang [2]. The changes are rather large, so we are planning to take an incremental approach to upstreaming the features should the community want to see this upstream.


Plans for upstreaming


1. Upstream Type Categorization


Clang already performs statement-local lifetime analyses that would benefit from type categorization even before adding any other analysis.


This includes annotating types as Owners and Pointers, and automatically inferring Owner or Point without annotation to minimize annotation burden.


Consider the following code example:


std::reference_wrapper<const int> get_data() {

   const int i = 3;

   return {i};

}


Unfortunately, today compilers do not warn on this case of returning a dangling reference. They do warn if we return a raw pointer or reference, but the compiler does not know that std::reference_wrapper also is a non-owning indirection. In the Lifetime analysis, this is diagnosed because std::reference_wrapper is recognized as a Pointer type.


As a first step we would upstream the type categorization part of the analysis and make some clang warnings optionally use it. We would also upstream a set of annotations to give the users a way to fix potential false positives due to miscategorization. (This should be very rare according to our experience so far). By default, we could constrain the categorization for std types, whose semantics are known.


2. Extensions of existing CFG-less analyses


2a. Initialization from temporaries

The goal is to detect Pointers that dangle on initialization, such as

std::string_view sv = “test”s;

By restricting the analysis to single statements, it has a low false-positive rate and can be done without building a CFG (i.e. faster).


2b. Return of locals

The goal is to detect returning Pointers to local variables, e.g.

std::reference_wrapper<const int> get_data() {

   const int i = 3;

   return {i};

}

Similar to 2a also restricted to single statement.


2c. Member pointer that dangles once construction is complete

struct X {

   std::string_view sv;

   X() : sv("test"s) {} // warning: string_view member bound to string temporary whose lifetime ends within the constructor

};


2d. New of a Pointer that dangles after the end of the full-expression

new string_view("test"s) // warning: dynamically-allocated string_view refers to string whose lifetime ends at the end of the full-expression


3. Intra-function analysis across basic blocks, excluding function call expressions

Propagate point-to sets of Pointers across branches/loops intra-function, e.g. analysing


int* p = &i;

if(condition)

 p = nullptr;

*p; // ERROR: p is possibly null


We have some CFG patches and some code traversing the CFG and propagating the analysis state. With the type categories already in place, this patch should be smaller. We could split these patches further by implementing null tracking in a separate patch.


4. Function calls


auto find(const string& needle, const string& haystack) -> string_view [[gsl::lifetime(haystack)]];


string_view sv = find(“needle”, haystack);   

sv[0]; // OK

string_view sv = find(needle, “temporaryhaystack”);   

sv[0]; // ERROR: sv is dangling


This includes the following subparts.


4a. Precondition checks

Check that the psets of the arguments are valid at call site according to the lifetime annotations of the callee.


4b. Postcondition checks

Check that the psets returned from a function adhere to its advertised return/output psets.

Rigorous checking of not just the function arguments but also the returned values is crucial part of the analysis.


4c. Lifetimes annotations


The analysis gets pretty usable at this point. Most of the time the user does not need any annotations, but it is crucial to have them before a project can adapt it. For example, the user will occasionally want to explicitly state that a member function is “const as far as Lifetime is concerned” even though the function itself is not actually declared const (e.g., vector::operator[] does not invalidate any Pointers, such as iterators or raw pointers).


5. Implementing use after move analysis and exception support


These parts are not implemented yet in our prototype, but they will be useful additions for the analysis.


Questions


Does that make sense? What is the criteria for this work to be upstreamed? Who is willing to participate in reviewing the patches?


Thanks in advance,

Gabor, Matthias, and Herb


[1] https://github.com/isocpp/CppCoreGuidelines/blob/master/docs/Lifetime.pdf

[2] https://github.com/mgehre/clang

[3] https://www.youtube.com/watch?v=80BZxujhY38

[4] https://godbolt.org/z/90puuu



_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding lifetime analysis to clang

Kristóf Umann via cfe-dev
Thanks for the update!

I looked at this a bit now that it exists, and I have a question about the Pointer/Owner attributes: It looks like they go on a class to mark the class as an owner or a pointer – but the examples I've seen so far are all for classes that have a single member variable.

If I have a class with multiple member variables, some of them weak, others strong, can I use these attributes? Say I have something like

class Foo {
  std::unique_ptr<Bar> b;
  C* c;  // Owned by b, or something else entirely
  D* d; // Also weak
  E* e; // Happens to be owned by Foo, but not yet a unique_ptr
};

Since unique_ptr is Owner, `b` should be fine, but Foo is a Pointer to c and d and an owner of E. Can I express this with the current attributes? Or is this intentionally out of scope?

Thanks,
Nico

On Wed, Aug 21, 2019 at 12:35 PM Gábor Horváth via cfe-dev <[hidden email]> wrote:
Hi!

I wanted to give you a small update on the process of upstreaming these changes. The type category annotations along with most of the statement local warnings are already upstreamed. Clang will not guess the type categories automatically, types need to be annotated explicitly. For some STL types, Clang will automatically append the annotations even if they are not present in the source code. This way the new warnings will trigger even if the STL implementation does not have the annotations in place yet.

Furthermore, it looks like these warnings really do catch bugs! :) See the two examples we found in the LLVM repository by annotationg llvm::StringRef [1]. We also found bugs in other popular open source projects most of which was fixed very soon after reporting them. We did not see any false positives when running current Clang top of tree on other projects, but as these warnings are relatively new it is not impossible to have some. The Chromium and LLVM builds are clean though, so there should not be any obvious problem. If you notice any spurious warnings please let us know!

We plan to share a more detailed evaluation/report after the CppCon talk on the subject [2].

Cheers,
Gabor



On Thu, 29 Nov 2018 at 08:02, Gábor Horváth <[hidden email]> wrote:

Hi!


This is a proposal to implement Lifetime Analysis [1] defined by Herb Sutter in Clang.

Summary from the paper:

“This analysis shows how to efficiently diagnose many common cases of dangling (use-after-free) in C++ code, using only local analysis to report them as deterministic readable errors at compile time. The approach is to identify variables that are of generalized “Owner” types (e.g., smart pointers, containers, string) and “Pointer” types (e.g., int*, string_view, span, iterators, and ranges), and then use a local simple acyclic control flow graph (ACFG) analysis to track what each Pointer points to and identify when modifying an Owner invalidates a Pointer. The analysis leverages C++’s existing strong notions of scopes, object lifetimes, and const that carry rich information already available in reasonably modern C++ source code. Interestingly, it appears that with minor extension this analysis can also detect uses of local moved-from variables (use-after-move), which are a form of dangling.”

More details can be found in the paper [1] or in the CppCon keynote [3].


Matthias Gehre and myself had been working on a prototype in Clang [2]. The changes are rather large, so we are planning to take an incremental approach to upstreaming the features should the community want to see this upstream.


Plans for upstreaming


1. Upstream Type Categorization


Clang already performs statement-local lifetime analyses that would benefit from type categorization even before adding any other analysis.


This includes annotating types as Owners and Pointers, and automatically inferring Owner or Point without annotation to minimize annotation burden.


Consider the following code example:


std::reference_wrapper<const int> get_data() {

   const int i = 3;

   return {i};

}


Unfortunately, today compilers do not warn on this case of returning a dangling reference. They do warn if we return a raw pointer or reference, but the compiler does not know that std::reference_wrapper also is a non-owning indirection. In the Lifetime analysis, this is diagnosed because std::reference_wrapper is recognized as a Pointer type.


As a first step we would upstream the type categorization part of the analysis and make some clang warnings optionally use it. We would also upstream a set of annotations to give the users a way to fix potential false positives due to miscategorization. (This should be very rare according to our experience so far). By default, we could constrain the categorization for std types, whose semantics are known.


2. Extensions of existing CFG-less analyses


2a. Initialization from temporaries

The goal is to detect Pointers that dangle on initialization, such as

std::string_view sv = “test”s;

By restricting the analysis to single statements, it has a low false-positive rate and can be done without building a CFG (i.e. faster).


2b. Return of locals

The goal is to detect returning Pointers to local variables, e.g.

std::reference_wrapper<const int> get_data() {

   const int i = 3;

   return {i};

}

Similar to 2a also restricted to single statement.


2c. Member pointer that dangles once construction is complete

struct X {

   std::string_view sv;

   X() : sv("test"s) {} // warning: string_view member bound to string temporary whose lifetime ends within the constructor

};


2d. New of a Pointer that dangles after the end of the full-expression

new string_view("test"s) // warning: dynamically-allocated string_view refers to string whose lifetime ends at the end of the full-expression


3. Intra-function analysis across basic blocks, excluding function call expressions

Propagate point-to sets of Pointers across branches/loops intra-function, e.g. analysing


int* p = &i;

if(condition)

 p = nullptr;

*p; // ERROR: p is possibly null


We have some CFG patches and some code traversing the CFG and propagating the analysis state. With the type categories already in place, this patch should be smaller. We could split these patches further by implementing null tracking in a separate patch.


4. Function calls


auto find(const string& needle, const string& haystack) -> string_view [[gsl::lifetime(haystack)]];


string_view sv = find(“needle”, haystack);   

sv[0]; // OK

string_view sv = find(needle, “temporaryhaystack”);   

sv[0]; // ERROR: sv is dangling


This includes the following subparts.


4a. Precondition checks

Check that the psets of the arguments are valid at call site according to the lifetime annotations of the callee.


4b. Postcondition checks

Check that the psets returned from a function adhere to its advertised return/output psets.

Rigorous checking of not just the function arguments but also the returned values is crucial part of the analysis.


4c. Lifetimes annotations


The analysis gets pretty usable at this point. Most of the time the user does not need any annotations, but it is crucial to have them before a project can adapt it. For example, the user will occasionally want to explicitly state that a member function is “const as far as Lifetime is concerned” even though the function itself is not actually declared const (e.g., vector::operator[] does not invalidate any Pointers, such as iterators or raw pointers).


5. Implementing use after move analysis and exception support


These parts are not implemented yet in our prototype, but they will be useful additions for the analysis.


Questions


Does that make sense? What is the criteria for this work to be upstreamed? Who is willing to participate in reviewing the patches?


Thanks in advance,

Gabor, Matthias, and Herb


[1] https://github.com/isocpp/CppCoreGuidelines/blob/master/docs/Lifetime.pdf

[2] https://github.com/mgehre/clang

[3] https://www.youtube.com/watch?v=80BZxujhY38

[4] https://godbolt.org/z/90puuu


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding lifetime analysis to clang

Kristóf Umann via cfe-dev
Hi Nico!

Currently, the statement-local warnings have a very limited scope. If you create a Pointer from an Owner using a conversion operator or a constructor it will assume that the Pointer points 'into' the owner. To make these warnings more useful we hardcoded the semantics of some methods in the STL. So in case you have another type `Bar` that can be created from `Foo` and `Bar` will point to something owned by `Foo` it make a lot of sense to annotate both classes. Otherwise, it has relatively little use right now.

Improving the situation, however, is not out of scope. We plan to introduce function annotations as well. With those function annotations you can annotate which methods will return a pointer to an owned entity. This way having multiple pointees will not be a problem.

One thing, however, that is out of the scope, is more complex ownership models. For example, when a function can return both owned and non-owned pointers depending on their arguments. We do not scan to support that scenario, and simply not annotating such functions should avoid the false positives.

So to summarize, currently the most value of these warnings are coming from using STL types and user defined conversions. But we do plan to extend these to be able to catch additional problems but that will require the users to annotate functions/methods. We also plan to suggest/infer those annotations automatically, but his feature will certainly be off by default and might be provided by a separate tool, like a clang-tidy check (or just a separate warning flag).

Cheers,
Gabor



On Fri, 23 Aug 2019 at 07:58, Nico Weber <[hidden email]> wrote:
Thanks for the update!

I looked at this a bit now that it exists, and I have a question about the Pointer/Owner attributes: It looks like they go on a class to mark the class as an owner or a pointer – but the examples I've seen so far are all for classes that have a single member variable.

If I have a class with multiple member variables, some of them weak, others strong, can I use these attributes? Say I have something like

class Foo {
  std::unique_ptr<Bar> b;
  C* c;  // Owned by b, or something else entirely
  D* d; // Also weak
  E* e; // Happens to be owned by Foo, but not yet a unique_ptr
};

Since unique_ptr is Owner, `b` should be fine, but Foo is a Pointer to c and d and an owner of E. Can I express this with the current attributes? Or is this intentionally out of scope?

Thanks,
Nico

On Wed, Aug 21, 2019 at 12:35 PM Gábor Horváth via cfe-dev <[hidden email]> wrote:
Hi!

I wanted to give you a small update on the process of upstreaming these changes. The type category annotations along with most of the statement local warnings are already upstreamed. Clang will not guess the type categories automatically, types need to be annotated explicitly. For some STL types, Clang will automatically append the annotations even if they are not present in the source code. This way the new warnings will trigger even if the STL implementation does not have the annotations in place yet.

Furthermore, it looks like these warnings really do catch bugs! :) See the two examples we found in the LLVM repository by annotationg llvm::StringRef [1]. We also found bugs in other popular open source projects most of which was fixed very soon after reporting them. We did not see any false positives when running current Clang top of tree on other projects, but as these warnings are relatively new it is not impossible to have some. The Chromium and LLVM builds are clean though, so there should not be any obvious problem. If you notice any spurious warnings please let us know!

We plan to share a more detailed evaluation/report after the CppCon talk on the subject [2].

Cheers,
Gabor



On Thu, 29 Nov 2018 at 08:02, Gábor Horváth <[hidden email]> wrote:

Hi!


This is a proposal to implement Lifetime Analysis [1] defined by Herb Sutter in Clang.

Summary from the paper:

“This analysis shows how to efficiently diagnose many common cases of dangling (use-after-free) in C++ code, using only local analysis to report them as deterministic readable errors at compile time. The approach is to identify variables that are of generalized “Owner” types (e.g., smart pointers, containers, string) and “Pointer” types (e.g., int*, string_view, span, iterators, and ranges), and then use a local simple acyclic control flow graph (ACFG) analysis to track what each Pointer points to and identify when modifying an Owner invalidates a Pointer. The analysis leverages C++’s existing strong notions of scopes, object lifetimes, and const that carry rich information already available in reasonably modern C++ source code. Interestingly, it appears that with minor extension this analysis can also detect uses of local moved-from variables (use-after-move), which are a form of dangling.”

More details can be found in the paper [1] or in the CppCon keynote [3].


Matthias Gehre and myself had been working on a prototype in Clang [2]. The changes are rather large, so we are planning to take an incremental approach to upstreaming the features should the community want to see this upstream.


Plans for upstreaming


1. Upstream Type Categorization


Clang already performs statement-local lifetime analyses that would benefit from type categorization even before adding any other analysis.


This includes annotating types as Owners and Pointers, and automatically inferring Owner or Point without annotation to minimize annotation burden.


Consider the following code example:


std::reference_wrapper<const int> get_data() {

   const int i = 3;

   return {i};

}


Unfortunately, today compilers do not warn on this case of returning a dangling reference. They do warn if we return a raw pointer or reference, but the compiler does not know that std::reference_wrapper also is a non-owning indirection. In the Lifetime analysis, this is diagnosed because std::reference_wrapper is recognized as a Pointer type.


As a first step we would upstream the type categorization part of the analysis and make some clang warnings optionally use it. We would also upstream a set of annotations to give the users a way to fix potential false positives due to miscategorization. (This should be very rare according to our experience so far). By default, we could constrain the categorization for std types, whose semantics are known.


2. Extensions of existing CFG-less analyses


2a. Initialization from temporaries

The goal is to detect Pointers that dangle on initialization, such as

std::string_view sv = “test”s;

By restricting the analysis to single statements, it has a low false-positive rate and can be done without building a CFG (i.e. faster).


2b. Return of locals

The goal is to detect returning Pointers to local variables, e.g.

std::reference_wrapper<const int> get_data() {

   const int i = 3;

   return {i};

}

Similar to 2a also restricted to single statement.


2c. Member pointer that dangles once construction is complete

struct X {

   std::string_view sv;

   X() : sv("test"s) {} // warning: string_view member bound to string temporary whose lifetime ends within the constructor

};


2d. New of a Pointer that dangles after the end of the full-expression

new string_view("test"s) // warning: dynamically-allocated string_view refers to string whose lifetime ends at the end of the full-expression


3. Intra-function analysis across basic blocks, excluding function call expressions

Propagate point-to sets of Pointers across branches/loops intra-function, e.g. analysing


int* p = &i;

if(condition)

 p = nullptr;

*p; // ERROR: p is possibly null


We have some CFG patches and some code traversing the CFG and propagating the analysis state. With the type categories already in place, this patch should be smaller. We could split these patches further by implementing null tracking in a separate patch.


4. Function calls


auto find(const string& needle, const string& haystack) -> string_view [[gsl::lifetime(haystack)]];


string_view sv = find(“needle”, haystack);   

sv[0]; // OK

string_view sv = find(needle, “temporaryhaystack”);   

sv[0]; // ERROR: sv is dangling


This includes the following subparts.


4a. Precondition checks

Check that the psets of the arguments are valid at call site according to the lifetime annotations of the callee.


4b. Postcondition checks

Check that the psets returned from a function adhere to its advertised return/output psets.

Rigorous checking of not just the function arguments but also the returned values is crucial part of the analysis.


4c. Lifetimes annotations


The analysis gets pretty usable at this point. Most of the time the user does not need any annotations, but it is crucial to have them before a project can adapt it. For example, the user will occasionally want to explicitly state that a member function is “const as far as Lifetime is concerned” even though the function itself is not actually declared const (e.g., vector::operator[] does not invalidate any Pointers, such as iterators or raw pointers).


5. Implementing use after move analysis and exception support


These parts are not implemented yet in our prototype, but they will be useful additions for the analysis.


Questions


Does that make sense? What is the criteria for this work to be upstreamed? Who is willing to participate in reviewing the patches?


Thanks in advance,

Gabor, Matthias, and Herb


[1] https://github.com/isocpp/CppCoreGuidelines/blob/master/docs/Lifetime.pdf

[2] https://github.com/mgehre/clang

[3] https://www.youtube.com/watch?v=80BZxujhY38

[4] https://godbolt.org/z/90puuu


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
12