[RFC] Adding lifetime analysis to clang

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[RFC] Adding lifetime analysis to clang

Kristóf Umann via cfe-dev

Hi!


This is a proposal to implement Lifetime Analysis [1] defined by Herb Sutter in Clang.

Summary from the paper:

“This analysis shows how to efficiently diagnose many common cases of dangling (use-after-free) in C++ code, using only local analysis to report them as deterministic readable errors at compile time. The approach is to identify variables that are of generalized “Owner” types (e.g., smart pointers, containers, string) and “Pointer” types (e.g., int*, string_view, span, iterators, and ranges), and then use a local simple acyclic control flow graph (ACFG) analysis to track what each Pointer points to and identify when modifying an Owner invalidates a Pointer. The analysis leverages C++’s existing strong notions of scopes, object lifetimes, and const that carry rich information already available in reasonably modern C++ source code. Interestingly, it appears that with minor extension this analysis can also detect uses of local moved-from variables (use-after-move), which are a form of dangling.”

More details can be found in the paper [1] or in the CppCon keynote [3].


Matthias Gehre and myself had been working on a prototype in Clang [2]. The changes are rather large, so we are planning to take an incremental approach to upstreaming the features should the community want to see this upstream.


Plans for upstreaming


1. Upstream Type Categorization


Clang already performs statement-local lifetime analyses that would benefit from type categorization even before adding any other analysis.


This includes annotating types as Owners and Pointers, and automatically inferring Owner or Point without annotation to minimize annotation burden.


Consider the following code example:


std::reference_wrapper<const int> get_data() {

   const int i = 3;

   return {i};

}


Unfortunately, today compilers do not warn on this case of returning a dangling reference. They do warn if we return a raw pointer or reference, but the compiler does not know that std::reference_wrapper also is a non-owning indirection. In the Lifetime analysis, this is diagnosed because std::reference_wrapper is recognized as a Pointer type.


As a first step we would upstream the type categorization part of the analysis and make some clang warnings optionally use it. We would also upstream a set of annotations to give the users a way to fix potential false positives due to miscategorization. (This should be very rare according to our experience so far). By default, we could constrain the categorization for std types, whose semantics are known.


2. Extensions of existing CFG-less analyses


2a. Initialization from temporaries

The goal is to detect Pointers that dangle on initialization, such as

std::string_view sv = “test”s;

By restricting the analysis to single statements, it has a low false-positive rate and can be done without building a CFG (i.e. faster).


2b. Return of locals

The goal is to detect returning Pointers to local variables, e.g.

std::reference_wrapper<const int> get_data() {

   const int i = 3;

   return {i};

}

Similar to 2a also restricted to single statement.


2c. Member pointer that dangles once construction is complete

struct X {

   std::string_view sv;

   X() : sv("test"s) {} // warning: string_view member bound to string temporary whose lifetime ends within the constructor

};


2d. New of a Pointer that dangles after the end of the full-expression

new string_view("test"s) // warning: dynamically-allocated string_view refers to string whose lifetime ends at the end of the full-expression


3. Intra-function analysis across basic blocks, excluding function call expressions

Propagate point-to sets of Pointers across branches/loops intra-function, e.g. analysing


int* p = &i;

if(condition)

 p = nullptr;

*p; // ERROR: p is possibly null


We have some CFG patches and some code traversing the CFG and propagating the analysis state. With the type categories already in place, this patch should be smaller. We could split these patches further by implementing null tracking in a separate patch.


4. Function calls


auto find(const string& needle, const string& haystack) -> string_view [[gsl::lifetime(haystack)]];


string_view sv = find(“needle”, haystack);   

sv[0]; // OK

string_view sv = find(needle, “temporaryhaystack”);   

sv[0]; // ERROR: sv is dangling


This includes the following subparts.


4a. Precondition checks

Check that the psets of the arguments are valid at call site according to the lifetime annotations of the callee.


4b. Postcondition checks

Check that the psets returned from a function adhere to its advertised return/output psets.

Rigorous checking of not just the function arguments but also the returned values is crucial part of the analysis.


4c. Lifetimes annotations


The analysis gets pretty usable at this point. Most of the time the user does not need any annotations, but it is crucial to have them before a project can adapt it. For example, the user will occasionally want to explicitly state that a member function is “const as far as Lifetime is concerned” even though the function itself is not actually declared const (e.g., vector::operator[] does not invalidate any Pointers, such as iterators or raw pointers).


5. Implementing use after move analysis and exception support


These parts are not implemented yet in our prototype, but they will be useful additions for the analysis.


Questions


Does that make sense? What is the criteria for this work to be upstreamed? Who is willing to participate in reviewing the patches?


Thanks in advance,

Gabor, Matthias, and Herb


[1] https://github.com/isocpp/CppCoreGuidelines/blob/master/docs/Lifetime.pdf

[2] https://github.com/mgehre/clang

[3] https://www.youtube.com/watch?v=80BZxujhY38

[4] https://godbolt.org/z/90puuu



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding lifetime analysis to clang

Kristóf Umann via cfe-dev
Thanks, the idea looks great!

The heuristic described in a talk on matching returned and accepted references looked fragile to me initially,
but after a bit of time I could not find obvious and common counterexamples which would confuse it.

Has the evaluation been done on any large project? E.g. LLVM itself, Chrome, Webkit, or any other large C++ codebase?
What is the false positive rate? What is the annotation burden to suppress those?
How many useful bugs were found?

I’m also very happy that more dataflow analyses are being implemented in Clang,
and I could help with reviews (though obviously I’m not the code owner for most of the components).

On Nov 29, 2018, at 8:02 AM, Gábor Horváth via cfe-dev <[hidden email]> wrote:

Hi!

This is a proposal to implement Lifetime Analysis [1] defined by Herb Sutter in Clang.
Summary from the paper:
“This analysis shows how to efficiently diagnose many common cases of dangling (use-after-free) in C++ code, using only local analysis to report them as deterministic readable errors at compile time. The approach is to identify variables that are of generalized “Owner” types (e.g., smart pointers, containers, string) and “Pointer” types (e.g., int*, string_view, span, iterators, and ranges), and then use a local simple acyclic control flow graph (ACFG) analysis to track what each Pointer points to and identify when modifying an Owner invalidates a Pointer. The analysis leverages C++’s existing strong notions of scopes, object lifetimes, and const that carry rich information already available in reasonably modern C++ source code. Interestingly, it appears that with minor extension this analysis can also detect uses of local moved-from variables (use-after-move), which are a form of dangling.”
More details can be found in the paper [1] or in the CppCon keynote [3].

Matthias Gehre and myself had been working on a prototype in Clang [2]. The changes are rather large, so we are planning to take an incremental approach to upstreaming the features should the community want to see this upstream.

Plans for upstreaming

1. Upstream Type Categorization

Clang already performs statement-local lifetime analyses that would benefit from type categorization even before adding any other analysis.

This includes annotating types as Owners and Pointers, and automatically inferring Owner or Point without annotation to minimize annotation burden.

Consider the following code example:

std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}

Unfortunately, today compilers do not warn on this case of returning a dangling reference. They do warn if we return a raw pointer or reference, but the compiler does not know that std::reference_wrapper also is a non-owning indirection. In the Lifetime analysis, this is diagnosed because std::reference_wrapper is recognized as a Pointer type.

As a first step we would upstream the type categorization part of the analysis and make some clang warnings optionally use it. We would also upstream a set of annotations to give the users a way to fix potential false positives due to miscategorization. (This should be very rare according to our experience so far). By default, we could constrain the categorization for std types, whose semantics are known.

2. Extensions of existing CFG-less analyses

2a. Initialization from temporaries
The goal is to detect Pointers that dangle on initialization, such as
std::string_view sv = “test”s;
By restricting the analysis to single statements, it has a low false-positive rate and can be done without building a CFG (i.e. faster).

2b. Return of locals
The goal is to detect returning Pointers to local variables, e.g.
std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}
Similar to 2a also restricted to single statement.

2c. Member pointer that dangles once construction is complete
struct X {
   std::string_view sv;
   X() : sv("test"s) {} // warning: string_view member bound to string temporary whose lifetime ends within the constructor
};

2d. New of a Pointer that dangles after the end of the full-expression
new string_view("test"s) // warning: dynamically-allocated string_view refers to string whose lifetime ends at the end of the full-expression

3. Intra-function analysis across basic blocks, excluding function call expressions
Propagate point-to sets of Pointers across branches/loops intra-function, e.g. analysing

int* p = &i;
if(condition)
 p = nullptr;
*p; // ERROR: p is possibly null

We have some CFG patches and some code traversing the CFG and propagating the analysis state. With the type categories already in place, this patch should be smaller. We could split these patches further by implementing null tracking in a separate patch.

4. Function calls

auto find(const string& needle, const string& haystack) -> string_view [[gsl::lifetime(haystack)]];

string_view sv = find(“needle”, haystack);   
sv[0]; // OK
string_view sv = find(needle, “temporaryhaystack”);   
sv[0]; // ERROR: sv is dangling

This includes the following subparts.

4a. Precondition checks
Check that the psets of the arguments are valid at call site according to the lifetime annotations of the callee.

4b. Postcondition checks
Check that the psets returned from a function adhere to its advertised return/output psets.
Rigorous checking of not just the function arguments but also the returned values is crucial part of the analysis.

4c. Lifetimes annotations

The analysis gets pretty usable at this point. Most of the time the user does not need any annotations, but it is crucial to have them before a project can adapt it. For example, the user will occasionally want to explicitly state that a member function is “const as far as Lifetime is concerned” even though the function itself is not actually declared const (e.g., vector::operator[] does not invalidate any Pointers, such as iterators or raw pointers).

5. Implementing use after move analysis and exception support

These parts are not implemented yet in our prototype, but they will be useful additions for the analysis.

Questions

Does that make sense? What is the criteria for this work to be upstreamed? Who is willing to participate in reviewing the patches?

Thanks in advance,
Gabor, Matthias, and Herb


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding lifetime analysis to clang

Kristóf Umann via cfe-dev
Unfortunately, we do not have real world experience on a large codebase for the following reasons:
* The implementation we have does not support annotations yet, and large projects are likely to contain exceptions to the default rules (even if only a few)
* The analysis has an expected coding style, e.g. it does not reason about arithmetic on raw pointers

We did run it on some projects like LLVM and there were a lot of noise mainly due to the unsupported language features like pointer arithmetic.
The rest of the false positives were mostly due to the basic assumption of the analysis that every path is feasible.
We did find some true positives with a similar analysis looking for use after moves in LLVM. Those true positives were redundant std::forward calls that are not going to cause any runtime error but still removing them would make the code cleaner and less likely to misbehave in the future. See: https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Support/Error.h#L894 (The same arguments forwarded multiple times in a loop. Since the callee never actually moves the arguments we will not end up using a moved from object. But if we never move the argument, why would we use std::forward in the first place?)

But even if the cfg-based lifetime analysis end up being a coding style specific check, the rest of the (cfg-less) clang warnings would become much more powerful after type categories are upstreamed. They could catch a lot of errors without having any false positives by default. Thus, I do see the value going forward, even if we do not have much real world experience yet.

Regards,
Gabor

On Thu, 29 Nov 2018 at 20:58, George Karpenkov <[hidden email]> wrote:
Thanks, the idea looks great!

The heuristic described in a talk on matching returned and accepted references looked fragile to me initially,
but after a bit of time I could not find obvious and common counterexamples which would confuse it.

Has the evaluation been done on any large project? E.g. LLVM itself, Chrome, Webkit, or any other large C++ codebase?
What is the false positive rate? What is the annotation burden to suppress those?
How many useful bugs were found?

I’m also very happy that more dataflow analyses are being implemented in Clang,
and I could help with reviews (though obviously I’m not the code owner for most of the components).

On Nov 29, 2018, at 8:02 AM, Gábor Horváth via cfe-dev <[hidden email]> wrote:

Hi!

This is a proposal to implement Lifetime Analysis [1] defined by Herb Sutter in Clang.
Summary from the paper:
“This analysis shows how to efficiently diagnose many common cases of dangling (use-after-free) in C++ code, using only local analysis to report them as deterministic readable errors at compile time. The approach is to identify variables that are of generalized “Owner” types (e.g., smart pointers, containers, string) and “Pointer” types (e.g., int*, string_view, span, iterators, and ranges), and then use a local simple acyclic control flow graph (ACFG) analysis to track what each Pointer points to and identify when modifying an Owner invalidates a Pointer. The analysis leverages C++’s existing strong notions of scopes, object lifetimes, and const that carry rich information already available in reasonably modern C++ source code. Interestingly, it appears that with minor extension this analysis can also detect uses of local moved-from variables (use-after-move), which are a form of dangling.”
More details can be found in the paper [1] or in the CppCon keynote [3].

Matthias Gehre and myself had been working on a prototype in Clang [2]. The changes are rather large, so we are planning to take an incremental approach to upstreaming the features should the community want to see this upstream.

Plans for upstreaming

1. Upstream Type Categorization

Clang already performs statement-local lifetime analyses that would benefit from type categorization even before adding any other analysis.

This includes annotating types as Owners and Pointers, and automatically inferring Owner or Point without annotation to minimize annotation burden.

Consider the following code example:

std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}

Unfortunately, today compilers do not warn on this case of returning a dangling reference. They do warn if we return a raw pointer or reference, but the compiler does not know that std::reference_wrapper also is a non-owning indirection. In the Lifetime analysis, this is diagnosed because std::reference_wrapper is recognized as a Pointer type.

As a first step we would upstream the type categorization part of the analysis and make some clang warnings optionally use it. We would also upstream a set of annotations to give the users a way to fix potential false positives due to miscategorization. (This should be very rare according to our experience so far). By default, we could constrain the categorization for std types, whose semantics are known.

2. Extensions of existing CFG-less analyses

2a. Initialization from temporaries
The goal is to detect Pointers that dangle on initialization, such as
std::string_view sv = “test”s;
By restricting the analysis to single statements, it has a low false-positive rate and can be done without building a CFG (i.e. faster).

2b. Return of locals
The goal is to detect returning Pointers to local variables, e.g.
std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}
Similar to 2a also restricted to single statement.

2c. Member pointer that dangles once construction is complete
struct X {
   std::string_view sv;
   X() : sv("test"s) {} // warning: string_view member bound to string temporary whose lifetime ends within the constructor
};

2d. New of a Pointer that dangles after the end of the full-expression
new string_view("test"s) // warning: dynamically-allocated string_view refers to string whose lifetime ends at the end of the full-expression

3. Intra-function analysis across basic blocks, excluding function call expressions
Propagate point-to sets of Pointers across branches/loops intra-function, e.g. analysing

int* p = &i;
if(condition)
 p = nullptr;
*p; // ERROR: p is possibly null

We have some CFG patches and some code traversing the CFG and propagating the analysis state. With the type categories already in place, this patch should be smaller. We could split these patches further by implementing null tracking in a separate patch.

4. Function calls

auto find(const string& needle, const string& haystack) -> string_view [[gsl::lifetime(haystack)]];

string_view sv = find(“needle”, haystack);   
sv[0]; // OK
string_view sv = find(needle, “temporaryhaystack”);   
sv[0]; // ERROR: sv is dangling

This includes the following subparts.

4a. Precondition checks
Check that the psets of the arguments are valid at call site according to the lifetime annotations of the callee.

4b. Postcondition checks
Check that the psets returned from a function adhere to its advertised return/output psets.
Rigorous checking of not just the function arguments but also the returned values is crucial part of the analysis.

4c. Lifetimes annotations

The analysis gets pretty usable at this point. Most of the time the user does not need any annotations, but it is crucial to have them before a project can adapt it. For example, the user will occasionally want to explicitly state that a member function is “const as far as Lifetime is concerned” even though the function itself is not actually declared const (e.g., vector::operator[] does not invalidate any Pointers, such as iterators or raw pointers).

5. Implementing use after move analysis and exception support

These parts are not implemented yet in our prototype, but they will be useful additions for the analysis.

Questions

Does that make sense? What is the criteria for this work to be upstreamed? Who is willing to participate in reviewing the patches?

Thanks in advance,
Gabor, Matthias, and Herb


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding lifetime analysis to clang

Kristóf Umann via cfe-dev


On Fri, Nov 30, 2018 at 12:00 PM Gábor Horváth <[hidden email]> wrote:
Unfortunately, we do not have real world experience on a large codebase for the following reasons:
* The implementation we have does not support annotations yet, and large projects are likely to contain exceptions to the default rules (even if only a few)
* The analysis has an expected coding style, e.g. it does not reason about arithmetic on raw pointers

We did run it on some projects like LLVM and there were a lot of noise mainly due to the unsupported language features like pointer arithmetic.
The rest of the false positives were mostly due to the basic assumption of the analysis that every path is feasible.
We did find some true positives with a similar analysis looking for use after moves in LLVM. Those true positives were redundant std::forward calls that are not going to cause any runtime error but still removing them would make the code cleaner and less likely to misbehave in the future. See: https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Support/Error.h#L894 (The same arguments forwarded multiple times in a loop. Since the callee never actually moves the arguments we will not end up using a moved from object. But if we never move the argument, why would we use std::forward in the first place?)

But even if the cfg-based lifetime analysis end up being a coding style specific check, the rest of the (cfg-less) clang warnings would become much more powerful after type categories are upstreamed. They could catch a lot of errors without having any false positives by default. Thus, I do see the value going forward, even if we do not have much real world experience yet.

Could you run the restricted checks on llvm?
 

Regards,
Gabor

On Thu, 29 Nov 2018 at 20:58, George Karpenkov <[hidden email]> wrote:
Thanks, the idea looks great!

The heuristic described in a talk on matching returned and accepted references looked fragile to me initially,
but after a bit of time I could not find obvious and common counterexamples which would confuse it.

Has the evaluation been done on any large project? E.g. LLVM itself, Chrome, Webkit, or any other large C++ codebase?
What is the false positive rate? What is the annotation burden to suppress those?
How many useful bugs were found?

I’m also very happy that more dataflow analyses are being implemented in Clang,
and I could help with reviews (though obviously I’m not the code owner for most of the components).

On Nov 29, 2018, at 8:02 AM, Gábor Horváth via cfe-dev <[hidden email]> wrote:

Hi!

This is a proposal to implement Lifetime Analysis [1] defined by Herb Sutter in Clang.
Summary from the paper:
“This analysis shows how to efficiently diagnose many common cases of dangling (use-after-free) in C++ code, using only local analysis to report them as deterministic readable errors at compile time. The approach is to identify variables that are of generalized “Owner” types (e.g., smart pointers, containers, string) and “Pointer” types (e.g., int*, string_view, span, iterators, and ranges), and then use a local simple acyclic control flow graph (ACFG) analysis to track what each Pointer points to and identify when modifying an Owner invalidates a Pointer. The analysis leverages C++’s existing strong notions of scopes, object lifetimes, and const that carry rich information already available in reasonably modern C++ source code. Interestingly, it appears that with minor extension this analysis can also detect uses of local moved-from variables (use-after-move), which are a form of dangling.”
More details can be found in the paper [1] or in the CppCon keynote [3].

Matthias Gehre and myself had been working on a prototype in Clang [2]. The changes are rather large, so we are planning to take an incremental approach to upstreaming the features should the community want to see this upstream.

Plans for upstreaming

1. Upstream Type Categorization

Clang already performs statement-local lifetime analyses that would benefit from type categorization even before adding any other analysis.

This includes annotating types as Owners and Pointers, and automatically inferring Owner or Point without annotation to minimize annotation burden.

Consider the following code example:

std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}

Unfortunately, today compilers do not warn on this case of returning a dangling reference. They do warn if we return a raw pointer or reference, but the compiler does not know that std::reference_wrapper also is a non-owning indirection. In the Lifetime analysis, this is diagnosed because std::reference_wrapper is recognized as a Pointer type.

As a first step we would upstream the type categorization part of the analysis and make some clang warnings optionally use it. We would also upstream a set of annotations to give the users a way to fix potential false positives due to miscategorization. (This should be very rare according to our experience so far). By default, we could constrain the categorization for std types, whose semantics are known.

2. Extensions of existing CFG-less analyses

2a. Initialization from temporaries
The goal is to detect Pointers that dangle on initialization, such as
std::string_view sv = “test”s;
By restricting the analysis to single statements, it has a low false-positive rate and can be done without building a CFG (i.e. faster).

2b. Return of locals
The goal is to detect returning Pointers to local variables, e.g.
std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}
Similar to 2a also restricted to single statement.

2c. Member pointer that dangles once construction is complete
struct X {
   std::string_view sv;
   X() : sv("test"s) {} // warning: string_view member bound to string temporary whose lifetime ends within the constructor
};

2d. New of a Pointer that dangles after the end of the full-expression
new string_view("test"s) // warning: dynamically-allocated string_view refers to string whose lifetime ends at the end of the full-expression

3. Intra-function analysis across basic blocks, excluding function call expressions
Propagate point-to sets of Pointers across branches/loops intra-function, e.g. analysing

int* p = &i;
if(condition)
 p = nullptr;
*p; // ERROR: p is possibly null

We have some CFG patches and some code traversing the CFG and propagating the analysis state. With the type categories already in place, this patch should be smaller. We could split these patches further by implementing null tracking in a separate patch.

4. Function calls

auto find(const string& needle, const string& haystack) -> string_view [[gsl::lifetime(haystack)]];

string_view sv = find(“needle”, haystack);   
sv[0]; // OK
string_view sv = find(needle, “temporaryhaystack”);   
sv[0]; // ERROR: sv is dangling

This includes the following subparts.

4a. Precondition checks
Check that the psets of the arguments are valid at call site according to the lifetime annotations of the callee.

4b. Postcondition checks
Check that the psets returned from a function adhere to its advertised return/output psets.
Rigorous checking of not just the function arguments but also the returned values is crucial part of the analysis.

4c. Lifetimes annotations

The analysis gets pretty usable at this point. Most of the time the user does not need any annotations, but it is crucial to have them before a project can adapt it. For example, the user will occasionally want to explicitly state that a member function is “const as far as Lifetime is concerned” even though the function itself is not actually declared const (e.g., vector::operator[] does not invalidate any Pointers, such as iterators or raw pointers).

5. Implementing use after move analysis and exception support

These parts are not implemented yet in our prototype, but they will be useful additions for the analysis.

Questions

Does that make sense? What is the criteria for this work to be upstreamed? Who is willing to participate in reviewing the patches?

Thanks in advance,
Gabor, Matthias, and Herb


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding lifetime analysis to clang

Kristóf Umann via cfe-dev


On Fri, 30 Nov 2018 at 12:04, Manuel Klimek <[hidden email]> wrote:


On Fri, Nov 30, 2018 at 12:00 PM Gábor Horváth <[hidden email]> wrote:
Unfortunately, we do not have real world experience on a large codebase for the following reasons:
* The implementation we have does not support annotations yet, and large projects are likely to contain exceptions to the default rules (even if only a few)
* The analysis has an expected coding style, e.g. it does not reason about arithmetic on raw pointers

We did run it on some projects like LLVM and there were a lot of noise mainly due to the unsupported language features like pointer arithmetic.
The rest of the false positives were mostly due to the basic assumption of the analysis that every path is feasible.
We did find some true positives with a similar analysis looking for use after moves in LLVM. Those true positives were redundant std::forward calls that are not going to cause any runtime error but still removing them would make the code cleaner and less likely to misbehave in the future. See: https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Support/Error.h#L894 (The same arguments forwarded multiple times in a loop. Since the callee never actually moves the arguments we will not end up using a moved from object. But if we never move the argument, why would we use std::forward in the first place?)

But even if the cfg-based lifetime analysis end up being a coding style specific check, the rest of the (cfg-less) clang warnings would become much more powerful after type categories are upstreamed. They could catch a lot of errors without having any false positives by default. Thus, I do see the value going forward, even if we do not have much real world experience yet.

Could you run the restricted checks on llvm?

This makes perfect sense. We will report back with the results.
 
 

Regards,
Gabor

On Thu, 29 Nov 2018 at 20:58, George Karpenkov <[hidden email]> wrote:
Thanks, the idea looks great!

The heuristic described in a talk on matching returned and accepted references looked fragile to me initially,
but after a bit of time I could not find obvious and common counterexamples which would confuse it.

Has the evaluation been done on any large project? E.g. LLVM itself, Chrome, Webkit, or any other large C++ codebase?
What is the false positive rate? What is the annotation burden to suppress those?
How many useful bugs were found?

I’m also very happy that more dataflow analyses are being implemented in Clang,
and I could help with reviews (though obviously I’m not the code owner for most of the components).

On Nov 29, 2018, at 8:02 AM, Gábor Horváth via cfe-dev <[hidden email]> wrote:

Hi!

This is a proposal to implement Lifetime Analysis [1] defined by Herb Sutter in Clang.
Summary from the paper:
“This analysis shows how to efficiently diagnose many common cases of dangling (use-after-free) in C++ code, using only local analysis to report them as deterministic readable errors at compile time. The approach is to identify variables that are of generalized “Owner” types (e.g., smart pointers, containers, string) and “Pointer” types (e.g., int*, string_view, span, iterators, and ranges), and then use a local simple acyclic control flow graph (ACFG) analysis to track what each Pointer points to and identify when modifying an Owner invalidates a Pointer. The analysis leverages C++’s existing strong notions of scopes, object lifetimes, and const that carry rich information already available in reasonably modern C++ source code. Interestingly, it appears that with minor extension this analysis can also detect uses of local moved-from variables (use-after-move), which are a form of dangling.”
More details can be found in the paper [1] or in the CppCon keynote [3].

Matthias Gehre and myself had been working on a prototype in Clang [2]. The changes are rather large, so we are planning to take an incremental approach to upstreaming the features should the community want to see this upstream.

Plans for upstreaming

1. Upstream Type Categorization

Clang already performs statement-local lifetime analyses that would benefit from type categorization even before adding any other analysis.

This includes annotating types as Owners and Pointers, and automatically inferring Owner or Point without annotation to minimize annotation burden.

Consider the following code example:

std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}

Unfortunately, today compilers do not warn on this case of returning a dangling reference. They do warn if we return a raw pointer or reference, but the compiler does not know that std::reference_wrapper also is a non-owning indirection. In the Lifetime analysis, this is diagnosed because std::reference_wrapper is recognized as a Pointer type.

As a first step we would upstream the type categorization part of the analysis and make some clang warnings optionally use it. We would also upstream a set of annotations to give the users a way to fix potential false positives due to miscategorization. (This should be very rare according to our experience so far). By default, we could constrain the categorization for std types, whose semantics are known.

2. Extensions of existing CFG-less analyses

2a. Initialization from temporaries
The goal is to detect Pointers that dangle on initialization, such as
std::string_view sv = “test”s;
By restricting the analysis to single statements, it has a low false-positive rate and can be done without building a CFG (i.e. faster).

2b. Return of locals
The goal is to detect returning Pointers to local variables, e.g.
std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}
Similar to 2a also restricted to single statement.

2c. Member pointer that dangles once construction is complete
struct X {
   std::string_view sv;
   X() : sv("test"s) {} // warning: string_view member bound to string temporary whose lifetime ends within the constructor
};

2d. New of a Pointer that dangles after the end of the full-expression
new string_view("test"s) // warning: dynamically-allocated string_view refers to string whose lifetime ends at the end of the full-expression

3. Intra-function analysis across basic blocks, excluding function call expressions
Propagate point-to sets of Pointers across branches/loops intra-function, e.g. analysing

int* p = &i;
if(condition)
 p = nullptr;
*p; // ERROR: p is possibly null

We have some CFG patches and some code traversing the CFG and propagating the analysis state. With the type categories already in place, this patch should be smaller. We could split these patches further by implementing null tracking in a separate patch.

4. Function calls

auto find(const string& needle, const string& haystack) -> string_view [[gsl::lifetime(haystack)]];

string_view sv = find(“needle”, haystack);   
sv[0]; // OK
string_view sv = find(needle, “temporaryhaystack”);   
sv[0]; // ERROR: sv is dangling

This includes the following subparts.

4a. Precondition checks
Check that the psets of the arguments are valid at call site according to the lifetime annotations of the callee.

4b. Postcondition checks
Check that the psets returned from a function adhere to its advertised return/output psets.
Rigorous checking of not just the function arguments but also the returned values is crucial part of the analysis.

4c. Lifetimes annotations

The analysis gets pretty usable at this point. Most of the time the user does not need any annotations, but it is crucial to have them before a project can adapt it. For example, the user will occasionally want to explicitly state that a member function is “const as far as Lifetime is concerned” even though the function itself is not actually declared const (e.g., vector::operator[] does not invalidate any Pointers, such as iterators or raw pointers).

5. Implementing use after move analysis and exception support

These parts are not implemented yet in our prototype, but they will be useful additions for the analysis.

Questions

Does that make sense? What is the criteria for this work to be upstreamed? Who is willing to participate in reviewing the patches?

Thanks in advance,
Gabor, Matthias, and Herb


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding lifetime analysis to clang

Kristóf Umann via cfe-dev
Thanks! In our experience with the static analyzer it is always better to make these analyses benchmark-driven,
so TBH I’m a bit puzzled that the implementation was fully developed without continuous evaluation.

The overall approach (dataflow + heuristics + annotations to override heuristics) makes sense to me,
but then the heuristics might need further tuning.
As a somewhat orthogonal question, do you think the symbolic execution checker developed by Réka could also benefit from the heuristic/annotation
combination you are proposing?

On Dec 3, 2018, at 7:30 AM, Gábor Horváth <[hidden email]> wrote:



On Fri, 30 Nov 2018 at 12:04, Manuel Klimek <[hidden email]> wrote:


On Fri, Nov 30, 2018 at 12:00 PM Gábor Horváth <[hidden email]> wrote:
Unfortunately, we do not have real world experience on a large codebase for the following reasons:
* The implementation we have does not support annotations yet, and large projects are likely to contain exceptions to the default rules (even if only a few)
* The analysis has an expected coding style, e.g. it does not reason about arithmetic on raw pointers

We did run it on some projects like LLVM and there were a lot of noise mainly due to the unsupported language features like pointer arithmetic.
The rest of the false positives were mostly due to the basic assumption of the analysis that every path is feasible.
We did find some true positives with a similar analysis looking for use after moves in LLVM. Those true positives were redundant std::forward calls that are not going to cause any runtime error but still removing them would make the code cleaner and less likely to misbehave in the future. See: https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Support/Error.h#L894 (The same arguments forwarded multiple times in a loop. Since the callee never actually moves the arguments we will not end up using a moved from object. But if we never move the argument, why would we use std::forward in the first place?)

But even if the cfg-based lifetime analysis end up being a coding style specific check, the rest of the (cfg-less) clang warnings would become much more powerful after type categories are upstreamed. They could catch a lot of errors without having any false positives by default. Thus, I do see the value going forward, even if we do not have much real world experience yet.

Could you run the restricted checks on llvm?

This makes perfect sense. We will report back with the results.
 
 

Regards,
Gabor

On Thu, 29 Nov 2018 at 20:58, George Karpenkov <[hidden email]> wrote:
Thanks, the idea looks great!

The heuristic described in a talk on matching returned and accepted references looked fragile to me initially,
but after a bit of time I could not find obvious and common counterexamples which would confuse it.

Has the evaluation been done on any large project? E.g. LLVM itself, Chrome, Webkit, or any other large C++ codebase?
What is the false positive rate? What is the annotation burden to suppress those?
How many useful bugs were found?

I’m also very happy that more dataflow analyses are being implemented in Clang,
and I could help with reviews (though obviously I’m not the code owner for most of the components).

On Nov 29, 2018, at 8:02 AM, Gábor Horváth via cfe-dev <[hidden email]> wrote:

Hi!

This is a proposal to implement Lifetime Analysis [1] defined by Herb Sutter in Clang.
Summary from the paper:
“This analysis shows how to efficiently diagnose many common cases of dangling (use-after-free) in C++ code, using only local analysis to report them as deterministic readable errors at compile time. The approach is to identify variables that are of generalized “Owner” types (e.g., smart pointers, containers, string) and “Pointer” types (e.g., int*, string_view, span, iterators, and ranges), and then use a local simple acyclic control flow graph (ACFG) analysis to track what each Pointer points to and identify when modifying an Owner invalidates a Pointer. The analysis leverages C++’s existing strong notions of scopes, object lifetimes, and const that carry rich information already available in reasonably modern C++ source code. Interestingly, it appears that with minor extension this analysis can also detect uses of local moved-from variables (use-after-move), which are a form of dangling.”
More details can be found in the paper [1] or in the CppCon keynote [3].

Matthias Gehre and myself had been working on a prototype in Clang [2]. The changes are rather large, so we are planning to take an incremental approach to upstreaming the features should the community want to see this upstream.

Plans for upstreaming

1. Upstream Type Categorization

Clang already performs statement-local lifetime analyses that would benefit from type categorization even before adding any other analysis.

This includes annotating types as Owners and Pointers, and automatically inferring Owner or Point without annotation to minimize annotation burden.

Consider the following code example:

std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}

Unfortunately, today compilers do not warn on this case of returning a dangling reference. They do warn if we return a raw pointer or reference, but the compiler does not know that std::reference_wrapper also is a non-owning indirection. In the Lifetime analysis, this is diagnosed because std::reference_wrapper is recognized as a Pointer type.

As a first step we would upstream the type categorization part of the analysis and make some clang warnings optionally use it. We would also upstream a set of annotations to give the users a way to fix potential false positives due to miscategorization. (This should be very rare according to our experience so far). By default, we could constrain the categorization for std types, whose semantics are known.

2. Extensions of existing CFG-less analyses

2a. Initialization from temporaries
The goal is to detect Pointers that dangle on initialization, such as
std::string_view sv = “test”s;
By restricting the analysis to single statements, it has a low false-positive rate and can be done without building a CFG (i.e. faster).

2b. Return of locals
The goal is to detect returning Pointers to local variables, e.g.
std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}
Similar to 2a also restricted to single statement.

2c. Member pointer that dangles once construction is complete
struct X {
   std::string_view sv;
   X() : sv("test"s) {} // warning: string_view member bound to string temporary whose lifetime ends within the constructor
};

2d. New of a Pointer that dangles after the end of the full-expression
new string_view("test"s) // warning: dynamically-allocated string_view refers to string whose lifetime ends at the end of the full-expression

3. Intra-function analysis across basic blocks, excluding function call expressions
Propagate point-to sets of Pointers across branches/loops intra-function, e.g. analysing

int* p = &i;
if(condition)
 p = nullptr;
*p; // ERROR: p is possibly null

We have some CFG patches and some code traversing the CFG and propagating the analysis state. With the type categories already in place, this patch should be smaller. We could split these patches further by implementing null tracking in a separate patch.

4. Function calls

auto find(const string& needle, const string& haystack) -> string_view [[gsl::lifetime(haystack)]];

string_view sv = find(“needle”, haystack);   
sv[0]; // OK
string_view sv = find(needle, “temporaryhaystack”);   
sv[0]; // ERROR: sv is dangling

This includes the following subparts.

4a. Precondition checks
Check that the psets of the arguments are valid at call site according to the lifetime annotations of the callee.

4b. Postcondition checks
Check that the psets returned from a function adhere to its advertised return/output psets.
Rigorous checking of not just the function arguments but also the returned values is crucial part of the analysis.

4c. Lifetimes annotations

The analysis gets pretty usable at this point. Most of the time the user does not need any annotations, but it is crucial to have them before a project can adapt it. For example, the user will occasionally want to explicitly state that a member function is “const as far as Lifetime is concerned” even though the function itself is not actually declared const (e.g., vector::operator[] does not invalidate any Pointers, such as iterators or raw pointers).

5. Implementing use after move analysis and exception support

These parts are not implemented yet in our prototype, but they will be useful additions for the analysis.

Questions

Does that make sense? What is the criteria for this work to be upstreamed? Who is willing to participate in reviewing the patches?

Thanks in advance,
Gabor, Matthias, and Herb


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding lifetime analysis to clang

Kristóf Umann via cfe-dev
Maybe this needs a bit of clarification.

Clang already has some very useful statement local analysis for lifetime issues that could benefit from the type categories we developed for the CFG based analysis. The did not test how these warnings perform after being enhanced with type categories because we did not implement the enhanced versions yet (we were focusing on the CFG based analysis, which is also not fully implemented). But extending those warnings should be trivial, so we can report back how well the type categories perform in that context without too much effort to help the decision making process.

As far as I understand, the biggest obstacle with InnerPointer checker at the moment is how to model the state of complex C++ objects (e.g.: string_views) in an implementation independent way. Using type categories it should be simple to pinpoint types that have similar semantics to string_views and strings, so it will be easier to generalize the check to arbitrary user defined types. Unfortunately, it will not help with the modelling part.

BTW thanks for the support, we will definitely add you as a reviewer for the patches :)

On Mon, 3 Dec 2018 at 19:57, George Karpenkov <[hidden email]> wrote:
Thanks! In our experience with the static analyzer it is always better to make these analyses benchmark-driven,
so TBH I’m a bit puzzled that the implementation was fully developed without continuous evaluation.

The overall approach (dataflow + heuristics + annotations to override heuristics) makes sense to me,
but then the heuristics might need further tuning.
As a somewhat orthogonal question, do you think the symbolic execution checker developed by Réka could also benefit from the heuristic/annotation
combination you are proposing?

On Dec 3, 2018, at 7:30 AM, Gábor Horváth <[hidden email]> wrote:



On Fri, 30 Nov 2018 at 12:04, Manuel Klimek <[hidden email]> wrote:


On Fri, Nov 30, 2018 at 12:00 PM Gábor Horváth <[hidden email]> wrote:
Unfortunately, we do not have real world experience on a large codebase for the following reasons:
* The implementation we have does not support annotations yet, and large projects are likely to contain exceptions to the default rules (even if only a few)
* The analysis has an expected coding style, e.g. it does not reason about arithmetic on raw pointers

We did run it on some projects like LLVM and there were a lot of noise mainly due to the unsupported language features like pointer arithmetic.
The rest of the false positives were mostly due to the basic assumption of the analysis that every path is feasible.
We did find some true positives with a similar analysis looking for use after moves in LLVM. Those true positives were redundant std::forward calls that are not going to cause any runtime error but still removing them would make the code cleaner and less likely to misbehave in the future. See: https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Support/Error.h#L894 (The same arguments forwarded multiple times in a loop. Since the callee never actually moves the arguments we will not end up using a moved from object. But if we never move the argument, why would we use std::forward in the first place?)

But even if the cfg-based lifetime analysis end up being a coding style specific check, the rest of the (cfg-less) clang warnings would become much more powerful after type categories are upstreamed. They could catch a lot of errors without having any false positives by default. Thus, I do see the value going forward, even if we do not have much real world experience yet.

Could you run the restricted checks on llvm?

This makes perfect sense. We will report back with the results.
 
 

Regards,
Gabor

On Thu, 29 Nov 2018 at 20:58, George Karpenkov <[hidden email]> wrote:
Thanks, the idea looks great!

The heuristic described in a talk on matching returned and accepted references looked fragile to me initially,
but after a bit of time I could not find obvious and common counterexamples which would confuse it.

Has the evaluation been done on any large project? E.g. LLVM itself, Chrome, Webkit, or any other large C++ codebase?
What is the false positive rate? What is the annotation burden to suppress those?
How many useful bugs were found?

I’m also very happy that more dataflow analyses are being implemented in Clang,
and I could help with reviews (though obviously I’m not the code owner for most of the components).

On Nov 29, 2018, at 8:02 AM, Gábor Horváth via cfe-dev <[hidden email]> wrote:

Hi!

This is a proposal to implement Lifetime Analysis [1] defined by Herb Sutter in Clang.
Summary from the paper:
“This analysis shows how to efficiently diagnose many common cases of dangling (use-after-free) in C++ code, using only local analysis to report them as deterministic readable errors at compile time. The approach is to identify variables that are of generalized “Owner” types (e.g., smart pointers, containers, string) and “Pointer” types (e.g., int*, string_view, span, iterators, and ranges), and then use a local simple acyclic control flow graph (ACFG) analysis to track what each Pointer points to and identify when modifying an Owner invalidates a Pointer. The analysis leverages C++’s existing strong notions of scopes, object lifetimes, and const that carry rich information already available in reasonably modern C++ source code. Interestingly, it appears that with minor extension this analysis can also detect uses of local moved-from variables (use-after-move), which are a form of dangling.”
More details can be found in the paper [1] or in the CppCon keynote [3].

Matthias Gehre and myself had been working on a prototype in Clang [2]. The changes are rather large, so we are planning to take an incremental approach to upstreaming the features should the community want to see this upstream.

Plans for upstreaming

1. Upstream Type Categorization

Clang already performs statement-local lifetime analyses that would benefit from type categorization even before adding any other analysis.

This includes annotating types as Owners and Pointers, and automatically inferring Owner or Point without annotation to minimize annotation burden.

Consider the following code example:

std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}

Unfortunately, today compilers do not warn on this case of returning a dangling reference. They do warn if we return a raw pointer or reference, but the compiler does not know that std::reference_wrapper also is a non-owning indirection. In the Lifetime analysis, this is diagnosed because std::reference_wrapper is recognized as a Pointer type.

As a first step we would upstream the type categorization part of the analysis and make some clang warnings optionally use it. We would also upstream a set of annotations to give the users a way to fix potential false positives due to miscategorization. (This should be very rare according to our experience so far). By default, we could constrain the categorization for std types, whose semantics are known.

2. Extensions of existing CFG-less analyses

2a. Initialization from temporaries
The goal is to detect Pointers that dangle on initialization, such as
std::string_view sv = “test”s;
By restricting the analysis to single statements, it has a low false-positive rate and can be done without building a CFG (i.e. faster).

2b. Return of locals
The goal is to detect returning Pointers to local variables, e.g.
std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}
Similar to 2a also restricted to single statement.

2c. Member pointer that dangles once construction is complete
struct X {
   std::string_view sv;
   X() : sv("test"s) {} // warning: string_view member bound to string temporary whose lifetime ends within the constructor
};

2d. New of a Pointer that dangles after the end of the full-expression
new string_view("test"s) // warning: dynamically-allocated string_view refers to string whose lifetime ends at the end of the full-expression

3. Intra-function analysis across basic blocks, excluding function call expressions
Propagate point-to sets of Pointers across branches/loops intra-function, e.g. analysing

int* p = &i;
if(condition)
 p = nullptr;
*p; // ERROR: p is possibly null

We have some CFG patches and some code traversing the CFG and propagating the analysis state. With the type categories already in place, this patch should be smaller. We could split these patches further by implementing null tracking in a separate patch.

4. Function calls

auto find(const string& needle, const string& haystack) -> string_view [[gsl::lifetime(haystack)]];

string_view sv = find(“needle”, haystack);   
sv[0]; // OK
string_view sv = find(needle, “temporaryhaystack”);   
sv[0]; // ERROR: sv is dangling

This includes the following subparts.

4a. Precondition checks
Check that the psets of the arguments are valid at call site according to the lifetime annotations of the callee.

4b. Postcondition checks
Check that the psets returned from a function adhere to its advertised return/output psets.
Rigorous checking of not just the function arguments but also the returned values is crucial part of the analysis.

4c. Lifetimes annotations

The analysis gets pretty usable at this point. Most of the time the user does not need any annotations, but it is crucial to have them before a project can adapt it. For example, the user will occasionally want to explicitly state that a member function is “const as far as Lifetime is concerned” even though the function itself is not actually declared const (e.g., vector::operator[] does not invalidate any Pointers, such as iterators or raw pointers).

5. Implementing use after move analysis and exception support

These parts are not implemented yet in our prototype, but they will be useful additions for the analysis.

Questions

Does that make sense? What is the criteria for this work to be upstreamed? Who is willing to participate in reviewing the patches?

Thanks in advance,
Gabor, Matthias, and Herb


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding lifetime analysis to clang

Kristóf Umann via cfe-dev
In reply to this post by Kristóf Umann via cfe-dev
Hi Manuel,

On Fri, 30 Nov 2018 at 12:04, Manuel Klimek <[hidden email]> wrote:


On Fri, Nov 30, 2018 at 12:00 PM Gábor Horváth <[hidden email]> wrote:
Unfortunately, we do not have real world experience on a large codebase for the following reasons:
* The implementation we have does not support annotations yet, and large projects are likely to contain exceptions to the default rules (even if only a few)
* The analysis has an expected coding style, e.g. it does not reason about arithmetic on raw pointers

We did run it on some projects like LLVM and there were a lot of noise mainly due to the unsupported language features like pointer arithmetic.
The rest of the false positives were mostly due to the basic assumption of the analysis that every path is feasible.
We did find some true positives with a similar analysis looking for use after moves in LLVM. Those true positives were redundant std::forward calls that are not going to cause any runtime error but still removing them would make the code cleaner and less likely to misbehave in the future. See: https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Support/Error.h#L894 (The same arguments forwarded multiple times in a loop. Since the callee never actually moves the arguments we will not end up using a moved from object. But if we never move the argument, why would we use std::forward in the first place?)

But even if the cfg-based lifetime analysis end up being a coding style specific check, the rest of the (cfg-less) clang warnings would become much more powerful after type categories are upstreamed. They could catch a lot of errors without having any false positives by default. Thus, I do see the value going forward, even if we do not have much real world experience yet.

Could you run the restricted checks on llvm?

Sorry for the delay, but finally I found some time to actually implement the CFG-less checks and run them on LLVM. The quick prototype is available here: https://github.com/mgehre/clang/commits/lifetime-warning-typecategories
Literally, all of the false positives I encountered after running on LLVM and Clang were due to llvm::ValueHandleBase being miscategorized as an owner (because of its user-defined destructor).
So annotating this class would render the LLVM codebase clean of these new lifetime errors. Do you think this is a strong enough evidence to start upstreaming the type categories or do you think we should do additional measurements?
I also plan to look up some lifetime fixes/reverts in the commit history to check if some of those problems would have been caught by these warnings. I will share the results in a EuroLLVM technical talk.

P.S.:
Does anybody have a good idea how to get a list of buildbot breaking commits where the reason was lifetime related (other than grepping git history)?

Regards,
Gábor
 
 

Regards,
Gabor

On Thu, 29 Nov 2018 at 20:58, George Karpenkov <[hidden email]> wrote:
Thanks, the idea looks great!

The heuristic described in a talk on matching returned and accepted references looked fragile to me initially,
but after a bit of time I could not find obvious and common counterexamples which would confuse it.

Has the evaluation been done on any large project? E.g. LLVM itself, Chrome, Webkit, or any other large C++ codebase?
What is the false positive rate? What is the annotation burden to suppress those?
How many useful bugs were found?

I’m also very happy that more dataflow analyses are being implemented in Clang,
and I could help with reviews (though obviously I’m not the code owner for most of the components).

On Nov 29, 2018, at 8:02 AM, Gábor Horváth via cfe-dev <[hidden email]> wrote:

Hi!

This is a proposal to implement Lifetime Analysis [1] defined by Herb Sutter in Clang.
Summary from the paper:
“This analysis shows how to efficiently diagnose many common cases of dangling (use-after-free) in C++ code, using only local analysis to report them as deterministic readable errors at compile time. The approach is to identify variables that are of generalized “Owner” types (e.g., smart pointers, containers, string) and “Pointer” types (e.g., int*, string_view, span, iterators, and ranges), and then use a local simple acyclic control flow graph (ACFG) analysis to track what each Pointer points to and identify when modifying an Owner invalidates a Pointer. The analysis leverages C++’s existing strong notions of scopes, object lifetimes, and const that carry rich information already available in reasonably modern C++ source code. Interestingly, it appears that with minor extension this analysis can also detect uses of local moved-from variables (use-after-move), which are a form of dangling.”
More details can be found in the paper [1] or in the CppCon keynote [3].

Matthias Gehre and myself had been working on a prototype in Clang [2]. The changes are rather large, so we are planning to take an incremental approach to upstreaming the features should the community want to see this upstream.

Plans for upstreaming

1. Upstream Type Categorization

Clang already performs statement-local lifetime analyses that would benefit from type categorization even before adding any other analysis.

This includes annotating types as Owners and Pointers, and automatically inferring Owner or Point without annotation to minimize annotation burden.

Consider the following code example:

std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}

Unfortunately, today compilers do not warn on this case of returning a dangling reference. They do warn if we return a raw pointer or reference, but the compiler does not know that std::reference_wrapper also is a non-owning indirection. In the Lifetime analysis, this is diagnosed because std::reference_wrapper is recognized as a Pointer type.

As a first step we would upstream the type categorization part of the analysis and make some clang warnings optionally use it. We would also upstream a set of annotations to give the users a way to fix potential false positives due to miscategorization. (This should be very rare according to our experience so far). By default, we could constrain the categorization for std types, whose semantics are known.

2. Extensions of existing CFG-less analyses

2a. Initialization from temporaries
The goal is to detect Pointers that dangle on initialization, such as
std::string_view sv = “test”s;
By restricting the analysis to single statements, it has a low false-positive rate and can be done without building a CFG (i.e. faster).

2b. Return of locals
The goal is to detect returning Pointers to local variables, e.g.
std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}
Similar to 2a also restricted to single statement.

2c. Member pointer that dangles once construction is complete
struct X {
   std::string_view sv;
   X() : sv("test"s) {} // warning: string_view member bound to string temporary whose lifetime ends within the constructor
};

2d. New of a Pointer that dangles after the end of the full-expression
new string_view("test"s) // warning: dynamically-allocated string_view refers to string whose lifetime ends at the end of the full-expression

3. Intra-function analysis across basic blocks, excluding function call expressions
Propagate point-to sets of Pointers across branches/loops intra-function, e.g. analysing

int* p = &i;
if(condition)
 p = nullptr;
*p; // ERROR: p is possibly null

We have some CFG patches and some code traversing the CFG and propagating the analysis state. With the type categories already in place, this patch should be smaller. We could split these patches further by implementing null tracking in a separate patch.

4. Function calls

auto find(const string& needle, const string& haystack) -> string_view [[gsl::lifetime(haystack)]];

string_view sv = find(“needle”, haystack);   
sv[0]; // OK
string_view sv = find(needle, “temporaryhaystack”);   
sv[0]; // ERROR: sv is dangling

This includes the following subparts.

4a. Precondition checks
Check that the psets of the arguments are valid at call site according to the lifetime annotations of the callee.

4b. Postcondition checks
Check that the psets returned from a function adhere to its advertised return/output psets.
Rigorous checking of not just the function arguments but also the returned values is crucial part of the analysis.

4c. Lifetimes annotations

The analysis gets pretty usable at this point. Most of the time the user does not need any annotations, but it is crucial to have them before a project can adapt it. For example, the user will occasionally want to explicitly state that a member function is “const as far as Lifetime is concerned” even though the function itself is not actually declared const (e.g., vector::operator[] does not invalidate any Pointers, such as iterators or raw pointers).

5. Implementing use after move analysis and exception support

These parts are not implemented yet in our prototype, but they will be useful additions for the analysis.

Questions

Does that make sense? What is the criteria for this work to be upstreamed? Who is willing to participate in reviewing the patches?

Thanks in advance,
Gabor, Matthias, and Herb


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding lifetime analysis to clang

Kristóf Umann via cfe-dev
On Fri, Mar 1, 2019 at 11:51 AM Gábor Horváth <[hidden email]> wrote:
Hi Manuel,

On Fri, 30 Nov 2018 at 12:04, Manuel Klimek <[hidden email]> wrote:


On Fri, Nov 30, 2018 at 12:00 PM Gábor Horváth <[hidden email]> wrote:
Unfortunately, we do not have real world experience on a large codebase for the following reasons:
* The implementation we have does not support annotations yet, and large projects are likely to contain exceptions to the default rules (even if only a few)
* The analysis has an expected coding style, e.g. it does not reason about arithmetic on raw pointers

We did run it on some projects like LLVM and there were a lot of noise mainly due to the unsupported language features like pointer arithmetic.
The rest of the false positives were mostly due to the basic assumption of the analysis that every path is feasible.
We did find some true positives with a similar analysis looking for use after moves in LLVM. Those true positives were redundant std::forward calls that are not going to cause any runtime error but still removing them would make the code cleaner and less likely to misbehave in the future. See: https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Support/Error.h#L894 (The same arguments forwarded multiple times in a loop. Since the callee never actually moves the arguments we will not end up using a moved from object. But if we never move the argument, why would we use std::forward in the first place?)

But even if the cfg-based lifetime analysis end up being a coding style specific check, the rest of the (cfg-less) clang warnings would become much more powerful after type categories are upstreamed. They could catch a lot of errors without having any false positives by default. Thus, I do see the value going forward, even if we do not have much real world experience yet.

Could you run the restricted checks on llvm?

Sorry for the delay, but finally I found some time to actually implement the CFG-less checks and run them on LLVM. The quick prototype is available here: https://github.com/mgehre/clang/commits/lifetime-warning-typecategories
Literally, all of the false positives I encountered after running on LLVM and Clang were due to llvm::ValueHandleBase being miscategorized as an owner (because of its user-defined destructor).
So annotating this class would render the LLVM codebase clean of these new lifetime errors. Do you think this is a strong enough evidence to start upstreaming the type categories or do you think we should do additional measurements?

I'd like to get Richard's opinion on this.
 
I also plan to look up some lifetime fixes/reverts in the commit history to check if some of those problems would have been caught by these warnings. I will share the results in a EuroLLVM technical talk.

P.S.:
Does anybody have a good idea how to get a list of buildbot breaking commits where the reason was lifetime related (other than grepping git history)?

Regards,
Gábor
 
 

Regards,
Gabor

On Thu, 29 Nov 2018 at 20:58, George Karpenkov <[hidden email]> wrote:
Thanks, the idea looks great!

The heuristic described in a talk on matching returned and accepted references looked fragile to me initially,
but after a bit of time I could not find obvious and common counterexamples which would confuse it.

Has the evaluation been done on any large project? E.g. LLVM itself, Chrome, Webkit, or any other large C++ codebase?
What is the false positive rate? What is the annotation burden to suppress those?
How many useful bugs were found?

I’m also very happy that more dataflow analyses are being implemented in Clang,
and I could help with reviews (though obviously I’m not the code owner for most of the components).

On Nov 29, 2018, at 8:02 AM, Gábor Horváth via cfe-dev <[hidden email]> wrote:

Hi!

This is a proposal to implement Lifetime Analysis [1] defined by Herb Sutter in Clang.
Summary from the paper:
“This analysis shows how to efficiently diagnose many common cases of dangling (use-after-free) in C++ code, using only local analysis to report them as deterministic readable errors at compile time. The approach is to identify variables that are of generalized “Owner” types (e.g., smart pointers, containers, string) and “Pointer” types (e.g., int*, string_view, span, iterators, and ranges), and then use a local simple acyclic control flow graph (ACFG) analysis to track what each Pointer points to and identify when modifying an Owner invalidates a Pointer. The analysis leverages C++’s existing strong notions of scopes, object lifetimes, and const that carry rich information already available in reasonably modern C++ source code. Interestingly, it appears that with minor extension this analysis can also detect uses of local moved-from variables (use-after-move), which are a form of dangling.”
More details can be found in the paper [1] or in the CppCon keynote [3].

Matthias Gehre and myself had been working on a prototype in Clang [2]. The changes are rather large, so we are planning to take an incremental approach to upstreaming the features should the community want to see this upstream.

Plans for upstreaming

1. Upstream Type Categorization

Clang already performs statement-local lifetime analyses that would benefit from type categorization even before adding any other analysis.

This includes annotating types as Owners and Pointers, and automatically inferring Owner or Point without annotation to minimize annotation burden.

Consider the following code example:

std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}

Unfortunately, today compilers do not warn on this case of returning a dangling reference. They do warn if we return a raw pointer or reference, but the compiler does not know that std::reference_wrapper also is a non-owning indirection. In the Lifetime analysis, this is diagnosed because std::reference_wrapper is recognized as a Pointer type.

As a first step we would upstream the type categorization part of the analysis and make some clang warnings optionally use it. We would also upstream a set of annotations to give the users a way to fix potential false positives due to miscategorization. (This should be very rare according to our experience so far). By default, we could constrain the categorization for std types, whose semantics are known.

2. Extensions of existing CFG-less analyses

2a. Initialization from temporaries
The goal is to detect Pointers that dangle on initialization, such as
std::string_view sv = “test”s;
By restricting the analysis to single statements, it has a low false-positive rate and can be done without building a CFG (i.e. faster).

2b. Return of locals
The goal is to detect returning Pointers to local variables, e.g.
std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}
Similar to 2a also restricted to single statement.

2c. Member pointer that dangles once construction is complete
struct X {
   std::string_view sv;
   X() : sv("test"s) {} // warning: string_view member bound to string temporary whose lifetime ends within the constructor
};

2d. New of a Pointer that dangles after the end of the full-expression
new string_view("test"s) // warning: dynamically-allocated string_view refers to string whose lifetime ends at the end of the full-expression

3. Intra-function analysis across basic blocks, excluding function call expressions
Propagate point-to sets of Pointers across branches/loops intra-function, e.g. analysing

int* p = &i;
if(condition)
 p = nullptr;
*p; // ERROR: p is possibly null

We have some CFG patches and some code traversing the CFG and propagating the analysis state. With the type categories already in place, this patch should be smaller. We could split these patches further by implementing null tracking in a separate patch.

4. Function calls

auto find(const string& needle, const string& haystack) -> string_view [[gsl::lifetime(haystack)]];

string_view sv = find(“needle”, haystack);   
sv[0]; // OK
string_view sv = find(needle, “temporaryhaystack”);   
sv[0]; // ERROR: sv is dangling

This includes the following subparts.

4a. Precondition checks
Check that the psets of the arguments are valid at call site according to the lifetime annotations of the callee.

4b. Postcondition checks
Check that the psets returned from a function adhere to its advertised return/output psets.
Rigorous checking of not just the function arguments but also the returned values is crucial part of the analysis.

4c. Lifetimes annotations

The analysis gets pretty usable at this point. Most of the time the user does not need any annotations, but it is crucial to have them before a project can adapt it. For example, the user will occasionally want to explicitly state that a member function is “const as far as Lifetime is concerned” even though the function itself is not actually declared const (e.g., vector::operator[] does not invalidate any Pointers, such as iterators or raw pointers).

5. Implementing use after move analysis and exception support

These parts are not implemented yet in our prototype, but they will be useful additions for the analysis.

Questions

Does that make sense? What is the criteria for this work to be upstreamed? Who is willing to participate in reviewing the patches?

Thanks in advance,
Gabor, Matthias, and Herb


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding lifetime analysis to clang

Kristóf Umann via cfe-dev
-apple George + google George to prevent bounces :)

On Fri, Mar 1, 2019 at 12:06 PM Manuel Klimek <[hidden email]> wrote:
On Fri, Mar 1, 2019 at 11:51 AM Gábor Horváth <[hidden email]> wrote:
Hi Manuel,

On Fri, 30 Nov 2018 at 12:04, Manuel Klimek <[hidden email]> wrote:


On Fri, Nov 30, 2018 at 12:00 PM Gábor Horváth <[hidden email]> wrote:
Unfortunately, we do not have real world experience on a large codebase for the following reasons:
* The implementation we have does not support annotations yet, and large projects are likely to contain exceptions to the default rules (even if only a few)
* The analysis has an expected coding style, e.g. it does not reason about arithmetic on raw pointers

We did run it on some projects like LLVM and there were a lot of noise mainly due to the unsupported language features like pointer arithmetic.
The rest of the false positives were mostly due to the basic assumption of the analysis that every path is feasible.
We did find some true positives with a similar analysis looking for use after moves in LLVM. Those true positives were redundant std::forward calls that are not going to cause any runtime error but still removing them would make the code cleaner and less likely to misbehave in the future. See: https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Support/Error.h#L894 (The same arguments forwarded multiple times in a loop. Since the callee never actually moves the arguments we will not end up using a moved from object. But if we never move the argument, why would we use std::forward in the first place?)

But even if the cfg-based lifetime analysis end up being a coding style specific check, the rest of the (cfg-less) clang warnings would become much more powerful after type categories are upstreamed. They could catch a lot of errors without having any false positives by default. Thus, I do see the value going forward, even if we do not have much real world experience yet.

Could you run the restricted checks on llvm?

Sorry for the delay, but finally I found some time to actually implement the CFG-less checks and run them on LLVM. The quick prototype is available here: https://github.com/mgehre/clang/commits/lifetime-warning-typecategories
Literally, all of the false positives I encountered after running on LLVM and Clang were due to llvm::ValueHandleBase being miscategorized as an owner (because of its user-defined destructor).
So annotating this class would render the LLVM codebase clean of these new lifetime errors. Do you think this is a strong enough evidence to start upstreaming the type categories or do you think we should do additional measurements?

I'd like to get Richard's opinion on this.
 
I also plan to look up some lifetime fixes/reverts in the commit history to check if some of those problems would have been caught by these warnings. I will share the results in a EuroLLVM technical talk.

P.S.:
Does anybody have a good idea how to get a list of buildbot breaking commits where the reason was lifetime related (other than grepping git history)?

Regards,
Gábor
 
 

Regards,
Gabor

On Thu, 29 Nov 2018 at 20:58, George Karpenkov <[hidden email]> wrote:
Thanks, the idea looks great!

The heuristic described in a talk on matching returned and accepted references looked fragile to me initially,
but after a bit of time I could not find obvious and common counterexamples which would confuse it.

Has the evaluation been done on any large project? E.g. LLVM itself, Chrome, Webkit, or any other large C++ codebase?
What is the false positive rate? What is the annotation burden to suppress those?
How many useful bugs were found?

I’m also very happy that more dataflow analyses are being implemented in Clang,
and I could help with reviews (though obviously I’m not the code owner for most of the components).

On Nov 29, 2018, at 8:02 AM, Gábor Horváth via cfe-dev <[hidden email]> wrote:

Hi!

This is a proposal to implement Lifetime Analysis [1] defined by Herb Sutter in Clang.
Summary from the paper:
“This analysis shows how to efficiently diagnose many common cases of dangling (use-after-free) in C++ code, using only local analysis to report them as deterministic readable errors at compile time. The approach is to identify variables that are of generalized “Owner” types (e.g., smart pointers, containers, string) and “Pointer” types (e.g., int*, string_view, span, iterators, and ranges), and then use a local simple acyclic control flow graph (ACFG) analysis to track what each Pointer points to and identify when modifying an Owner invalidates a Pointer. The analysis leverages C++’s existing strong notions of scopes, object lifetimes, and const that carry rich information already available in reasonably modern C++ source code. Interestingly, it appears that with minor extension this analysis can also detect uses of local moved-from variables (use-after-move), which are a form of dangling.”
More details can be found in the paper [1] or in the CppCon keynote [3].

Matthias Gehre and myself had been working on a prototype in Clang [2]. The changes are rather large, so we are planning to take an incremental approach to upstreaming the features should the community want to see this upstream.

Plans for upstreaming

1. Upstream Type Categorization

Clang already performs statement-local lifetime analyses that would benefit from type categorization even before adding any other analysis.

This includes annotating types as Owners and Pointers, and automatically inferring Owner or Point without annotation to minimize annotation burden.

Consider the following code example:

std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}

Unfortunately, today compilers do not warn on this case of returning a dangling reference. They do warn if we return a raw pointer or reference, but the compiler does not know that std::reference_wrapper also is a non-owning indirection. In the Lifetime analysis, this is diagnosed because std::reference_wrapper is recognized as a Pointer type.

As a first step we would upstream the type categorization part of the analysis and make some clang warnings optionally use it. We would also upstream a set of annotations to give the users a way to fix potential false positives due to miscategorization. (This should be very rare according to our experience so far). By default, we could constrain the categorization for std types, whose semantics are known.

2. Extensions of existing CFG-less analyses

2a. Initialization from temporaries
The goal is to detect Pointers that dangle on initialization, such as
std::string_view sv = “test”s;
By restricting the analysis to single statements, it has a low false-positive rate and can be done without building a CFG (i.e. faster).

2b. Return of locals
The goal is to detect returning Pointers to local variables, e.g.
std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}
Similar to 2a also restricted to single statement.

2c. Member pointer that dangles once construction is complete
struct X {
   std::string_view sv;
   X() : sv("test"s) {} // warning: string_view member bound to string temporary whose lifetime ends within the constructor
};

2d. New of a Pointer that dangles after the end of the full-expression
new string_view("test"s) // warning: dynamically-allocated string_view refers to string whose lifetime ends at the end of the full-expression

3. Intra-function analysis across basic blocks, excluding function call expressions
Propagate point-to sets of Pointers across branches/loops intra-function, e.g. analysing

int* p = &i;
if(condition)
 p = nullptr;
*p; // ERROR: p is possibly null

We have some CFG patches and some code traversing the CFG and propagating the analysis state. With the type categories already in place, this patch should be smaller. We could split these patches further by implementing null tracking in a separate patch.

4. Function calls

auto find(const string& needle, const string& haystack) -> string_view [[gsl::lifetime(haystack)]];

string_view sv = find(“needle”, haystack);   
sv[0]; // OK
string_view sv = find(needle, “temporaryhaystack”);   
sv[0]; // ERROR: sv is dangling

This includes the following subparts.

4a. Precondition checks
Check that the psets of the arguments are valid at call site according to the lifetime annotations of the callee.

4b. Postcondition checks
Check that the psets returned from a function adhere to its advertised return/output psets.
Rigorous checking of not just the function arguments but also the returned values is crucial part of the analysis.

4c. Lifetimes annotations

The analysis gets pretty usable at this point. Most of the time the user does not need any annotations, but it is crucial to have them before a project can adapt it. For example, the user will occasionally want to explicitly state that a member function is “const as far as Lifetime is concerned” even though the function itself is not actually declared const (e.g., vector::operator[] does not invalidate any Pointers, such as iterators or raw pointers).

5. Implementing use after move analysis and exception support

These parts are not implemented yet in our prototype, but they will be useful additions for the analysis.

Questions

Does that make sense? What is the criteria for this work to be upstreamed? Who is willing to participate in reviewing the patches?

Thanks in advance,
Gabor, Matthias, and Herb


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding lifetime analysis to clang

Kristóf Umann via cfe-dev
On Fri, Mar 1, 2019 at 12:10 PM Manuel Klimek <[hidden email]> wrote:
-apple George + google George to prevent bounces :)

On Fri, Mar 1, 2019 at 12:06 PM Manuel Klimek <[hidden email]> wrote:
On Fri, Mar 1, 2019 at 11:51 AM Gábor Horváth <[hidden email]> wrote:
Hi Manuel,

On Fri, 30 Nov 2018 at 12:04, Manuel Klimek <[hidden email]> wrote:


On Fri, Nov 30, 2018 at 12:00 PM Gábor Horváth <[hidden email]> wrote:
Unfortunately, we do not have real world experience on a large codebase for the following reasons:
* The implementation we have does not support annotations yet, and large projects are likely to contain exceptions to the default rules (even if only a few)
* The analysis has an expected coding style, e.g. it does not reason about arithmetic on raw pointers

We did run it on some projects like LLVM and there were a lot of noise mainly due to the unsupported language features like pointer arithmetic.
The rest of the false positives were mostly due to the basic assumption of the analysis that every path is feasible.
We did find some true positives with a similar analysis looking for use after moves in LLVM. Those true positives were redundant std::forward calls that are not going to cause any runtime error but still removing them would make the code cleaner and less likely to misbehave in the future. See: https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Support/Error.h#L894 (The same arguments forwarded multiple times in a loop. Since the callee never actually moves the arguments we will not end up using a moved from object. But if we never move the argument, why would we use std::forward in the first place?)

But even if the cfg-based lifetime analysis end up being a coding style specific check, the rest of the (cfg-less) clang warnings would become much more powerful after type categories are upstreamed. They could catch a lot of errors without having any false positives by default. Thus, I do see the value going forward, even if we do not have much real world experience yet.

Could you run the restricted checks on llvm?

Sorry for the delay, but finally I found some time to actually implement the CFG-less checks and run them on LLVM. The quick prototype is available here: https://github.com/mgehre/clang/commits/lifetime-warning-typecategories
Literally, all of the false positives I encountered after running on LLVM and Clang were due to llvm::ValueHandleBase being miscategorized as an owner (because of its user-defined destructor).
So annotating this class would render the LLVM codebase clean of these new lifetime errors. Do you think this is a strong enough evidence to start upstreaming the type categories or do you think we should do additional measurements?

I'd like to get Richard's opinion on this.

So just to make sure I understand correctly:
The result is that in LLVM we have:
- 1 false positive root cause (causing X false positives)
- 0 true positives
?
 
 
I also plan to look up some lifetime fixes/reverts in the commit history to check if some of those problems would have been caught by these warnings. I will share the results in a EuroLLVM technical talk.

P.S.:
Does anybody have a good idea how to get a list of buildbot breaking commits where the reason was lifetime related (other than grepping git history)?

Regards,
Gábor
 
 

Regards,
Gabor

On Thu, 29 Nov 2018 at 20:58, George Karpenkov <[hidden email]> wrote:
Thanks, the idea looks great!

The heuristic described in a talk on matching returned and accepted references looked fragile to me initially,
but after a bit of time I could not find obvious and common counterexamples which would confuse it.

Has the evaluation been done on any large project? E.g. LLVM itself, Chrome, Webkit, or any other large C++ codebase?
What is the false positive rate? What is the annotation burden to suppress those?
How many useful bugs were found?

I’m also very happy that more dataflow analyses are being implemented in Clang,
and I could help with reviews (though obviously I’m not the code owner for most of the components).

On Nov 29, 2018, at 8:02 AM, Gábor Horváth via cfe-dev <[hidden email]> wrote:

Hi!

This is a proposal to implement Lifetime Analysis [1] defined by Herb Sutter in Clang.
Summary from the paper:
“This analysis shows how to efficiently diagnose many common cases of dangling (use-after-free) in C++ code, using only local analysis to report them as deterministic readable errors at compile time. The approach is to identify variables that are of generalized “Owner” types (e.g., smart pointers, containers, string) and “Pointer” types (e.g., int*, string_view, span, iterators, and ranges), and then use a local simple acyclic control flow graph (ACFG) analysis to track what each Pointer points to and identify when modifying an Owner invalidates a Pointer. The analysis leverages C++’s existing strong notions of scopes, object lifetimes, and const that carry rich information already available in reasonably modern C++ source code. Interestingly, it appears that with minor extension this analysis can also detect uses of local moved-from variables (use-after-move), which are a form of dangling.”
More details can be found in the paper [1] or in the CppCon keynote [3].

Matthias Gehre and myself had been working on a prototype in Clang [2]. The changes are rather large, so we are planning to take an incremental approach to upstreaming the features should the community want to see this upstream.

Plans for upstreaming

1. Upstream Type Categorization

Clang already performs statement-local lifetime analyses that would benefit from type categorization even before adding any other analysis.

This includes annotating types as Owners and Pointers, and automatically inferring Owner or Point without annotation to minimize annotation burden.

Consider the following code example:

std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}

Unfortunately, today compilers do not warn on this case of returning a dangling reference. They do warn if we return a raw pointer or reference, but the compiler does not know that std::reference_wrapper also is a non-owning indirection. In the Lifetime analysis, this is diagnosed because std::reference_wrapper is recognized as a Pointer type.

As a first step we would upstream the type categorization part of the analysis and make some clang warnings optionally use it. We would also upstream a set of annotations to give the users a way to fix potential false positives due to miscategorization. (This should be very rare according to our experience so far). By default, we could constrain the categorization for std types, whose semantics are known.

2. Extensions of existing CFG-less analyses

2a. Initialization from temporaries
The goal is to detect Pointers that dangle on initialization, such as
std::string_view sv = “test”s;
By restricting the analysis to single statements, it has a low false-positive rate and can be done without building a CFG (i.e. faster).

2b. Return of locals
The goal is to detect returning Pointers to local variables, e.g.
std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}
Similar to 2a also restricted to single statement.

2c. Member pointer that dangles once construction is complete
struct X {
   std::string_view sv;
   X() : sv("test"s) {} // warning: string_view member bound to string temporary whose lifetime ends within the constructor
};

2d. New of a Pointer that dangles after the end of the full-expression
new string_view("test"s) // warning: dynamically-allocated string_view refers to string whose lifetime ends at the end of the full-expression

3. Intra-function analysis across basic blocks, excluding function call expressions
Propagate point-to sets of Pointers across branches/loops intra-function, e.g. analysing

int* p = &i;
if(condition)
 p = nullptr;
*p; // ERROR: p is possibly null

We have some CFG patches and some code traversing the CFG and propagating the analysis state. With the type categories already in place, this patch should be smaller. We could split these patches further by implementing null tracking in a separate patch.

4. Function calls

auto find(const string& needle, const string& haystack) -> string_view [[gsl::lifetime(haystack)]];

string_view sv = find(“needle”, haystack);   
sv[0]; // OK
string_view sv = find(needle, “temporaryhaystack”);   
sv[0]; // ERROR: sv is dangling

This includes the following subparts.

4a. Precondition checks
Check that the psets of the arguments are valid at call site according to the lifetime annotations of the callee.

4b. Postcondition checks
Check that the psets returned from a function adhere to its advertised return/output psets.
Rigorous checking of not just the function arguments but also the returned values is crucial part of the analysis.

4c. Lifetimes annotations

The analysis gets pretty usable at this point. Most of the time the user does not need any annotations, but it is crucial to have them before a project can adapt it. For example, the user will occasionally want to explicitly state that a member function is “const as far as Lifetime is concerned” even though the function itself is not actually declared const (e.g., vector::operator[] does not invalidate any Pointers, such as iterators or raw pointers).

5. Implementing use after move analysis and exception support

These parts are not implemented yet in our prototype, but they will be useful additions for the analysis.

Questions

Does that make sense? What is the criteria for this work to be upstreamed? Who is willing to participate in reviewing the patches?

Thanks in advance,
Gabor, Matthias, and Herb


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding lifetime analysis to clang

Kristóf Umann via cfe-dev


On Fri, 1 Mar 2019 at 12:13, Manuel Klimek <[hidden email]> wrote:
On Fri, Mar 1, 2019 at 12:10 PM Manuel Klimek <[hidden email]> wrote:
-apple George + google George to prevent bounces :)

On Fri, Mar 1, 2019 at 12:06 PM Manuel Klimek <[hidden email]> wrote:
On Fri, Mar 1, 2019 at 11:51 AM Gábor Horváth <[hidden email]> wrote:
Hi Manuel,

On Fri, 30 Nov 2018 at 12:04, Manuel Klimek <[hidden email]> wrote:


On Fri, Nov 30, 2018 at 12:00 PM Gábor Horváth <[hidden email]> wrote:
Unfortunately, we do not have real world experience on a large codebase for the following reasons:
* The implementation we have does not support annotations yet, and large projects are likely to contain exceptions to the default rules (even if only a few)
* The analysis has an expected coding style, e.g. it does not reason about arithmetic on raw pointers

We did run it on some projects like LLVM and there were a lot of noise mainly due to the unsupported language features like pointer arithmetic.
The rest of the false positives were mostly due to the basic assumption of the analysis that every path is feasible.
We did find some true positives with a similar analysis looking for use after moves in LLVM. Those true positives were redundant std::forward calls that are not going to cause any runtime error but still removing them would make the code cleaner and less likely to misbehave in the future. See: https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Support/Error.h#L894 (The same arguments forwarded multiple times in a loop. Since the callee never actually moves the arguments we will not end up using a moved from object. But if we never move the argument, why would we use std::forward in the first place?)

But even if the cfg-based lifetime analysis end up being a coding style specific check, the rest of the (cfg-less) clang warnings would become much more powerful after type categories are upstreamed. They could catch a lot of errors without having any false positives by default. Thus, I do see the value going forward, even if we do not have much real world experience yet.

Could you run the restricted checks on llvm?

Sorry for the delay, but finally I found some time to actually implement the CFG-less checks and run them on LLVM. The quick prototype is available here: https://github.com/mgehre/clang/commits/lifetime-warning-typecategories
Literally, all of the false positives I encountered after running on LLVM and Clang were due to llvm::ValueHandleBase being miscategorized as an owner (because of its user-defined destructor).
So annotating this class would render the LLVM codebase clean of these new lifetime errors. Do you think this is a strong enough evidence to start upstreaming the type categories or do you think we should do additional measurements?

I'd like to get Richard's opinion on this.

So just to make sure I understand correctly:
The result is that in LLVM we have:
- 1 false positive root cause (causing X false positives)
- 0 true positives
?
 

Correct.
 
 
I also plan to look up some lifetime fixes/reverts in the commit history to check if some of those problems would have been caught by these warnings. I will share the results in a EuroLLVM technical talk.

P.S.:
Does anybody have a good idea how to get a list of buildbot breaking commits where the reason was lifetime related (other than grepping git history)?

Regards,
Gábor
 
 

Regards,
Gabor

On Thu, 29 Nov 2018 at 20:58, George Karpenkov <[hidden email]> wrote:
Thanks, the idea looks great!

The heuristic described in a talk on matching returned and accepted references looked fragile to me initially,
but after a bit of time I could not find obvious and common counterexamples which would confuse it.

Has the evaluation been done on any large project? E.g. LLVM itself, Chrome, Webkit, or any other large C++ codebase?
What is the false positive rate? What is the annotation burden to suppress those?
How many useful bugs were found?

I’m also very happy that more dataflow analyses are being implemented in Clang,
and I could help with reviews (though obviously I’m not the code owner for most of the components).

On Nov 29, 2018, at 8:02 AM, Gábor Horváth via cfe-dev <[hidden email]> wrote:

Hi!

This is a proposal to implement Lifetime Analysis [1] defined by Herb Sutter in Clang.
Summary from the paper:
“This analysis shows how to efficiently diagnose many common cases of dangling (use-after-free) in C++ code, using only local analysis to report them as deterministic readable errors at compile time. The approach is to identify variables that are of generalized “Owner” types (e.g., smart pointers, containers, string) and “Pointer” types (e.g., int*, string_view, span, iterators, and ranges), and then use a local simple acyclic control flow graph (ACFG) analysis to track what each Pointer points to and identify when modifying an Owner invalidates a Pointer. The analysis leverages C++’s existing strong notions of scopes, object lifetimes, and const that carry rich information already available in reasonably modern C++ source code. Interestingly, it appears that with minor extension this analysis can also detect uses of local moved-from variables (use-after-move), which are a form of dangling.”
More details can be found in the paper [1] or in the CppCon keynote [3].

Matthias Gehre and myself had been working on a prototype in Clang [2]. The changes are rather large, so we are planning to take an incremental approach to upstreaming the features should the community want to see this upstream.

Plans for upstreaming

1. Upstream Type Categorization

Clang already performs statement-local lifetime analyses that would benefit from type categorization even before adding any other analysis.

This includes annotating types as Owners and Pointers, and automatically inferring Owner or Point without annotation to minimize annotation burden.

Consider the following code example:

std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}

Unfortunately, today compilers do not warn on this case of returning a dangling reference. They do warn if we return a raw pointer or reference, but the compiler does not know that std::reference_wrapper also is a non-owning indirection. In the Lifetime analysis, this is diagnosed because std::reference_wrapper is recognized as a Pointer type.

As a first step we would upstream the type categorization part of the analysis and make some clang warnings optionally use it. We would also upstream a set of annotations to give the users a way to fix potential false positives due to miscategorization. (This should be very rare according to our experience so far). By default, we could constrain the categorization for std types, whose semantics are known.

2. Extensions of existing CFG-less analyses

2a. Initialization from temporaries
The goal is to detect Pointers that dangle on initialization, such as
std::string_view sv = “test”s;
By restricting the analysis to single statements, it has a low false-positive rate and can be done without building a CFG (i.e. faster).

2b. Return of locals
The goal is to detect returning Pointers to local variables, e.g.
std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}
Similar to 2a also restricted to single statement.

2c. Member pointer that dangles once construction is complete
struct X {
   std::string_view sv;
   X() : sv("test"s) {} // warning: string_view member bound to string temporary whose lifetime ends within the constructor
};

2d. New of a Pointer that dangles after the end of the full-expression
new string_view("test"s) // warning: dynamically-allocated string_view refers to string whose lifetime ends at the end of the full-expression

3. Intra-function analysis across basic blocks, excluding function call expressions
Propagate point-to sets of Pointers across branches/loops intra-function, e.g. analysing

int* p = &i;
if(condition)
 p = nullptr;
*p; // ERROR: p is possibly null

We have some CFG patches and some code traversing the CFG and propagating the analysis state. With the type categories already in place, this patch should be smaller. We could split these patches further by implementing null tracking in a separate patch.

4. Function calls

auto find(const string& needle, const string& haystack) -> string_view [[gsl::lifetime(haystack)]];

string_view sv = find(“needle”, haystack);   
sv[0]; // OK
string_view sv = find(needle, “temporaryhaystack”);   
sv[0]; // ERROR: sv is dangling

This includes the following subparts.

4a. Precondition checks
Check that the psets of the arguments are valid at call site according to the lifetime annotations of the callee.

4b. Postcondition checks
Check that the psets returned from a function adhere to its advertised return/output psets.
Rigorous checking of not just the function arguments but also the returned values is crucial part of the analysis.

4c. Lifetimes annotations

The analysis gets pretty usable at this point. Most of the time the user does not need any annotations, but it is crucial to have them before a project can adapt it. For example, the user will occasionally want to explicitly state that a member function is “const as far as Lifetime is concerned” even though the function itself is not actually declared const (e.g., vector::operator[] does not invalidate any Pointers, such as iterators or raw pointers).

5. Implementing use after move analysis and exception support

These parts are not implemented yet in our prototype, but they will be useful additions for the analysis.

Questions

Does that make sense? What is the criteria for this work to be upstreamed? Who is willing to participate in reviewing the patches?

Thanks in advance,
Gabor, Matthias, and Herb


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding lifetime analysis to clang

Kristóf Umann via cfe-dev
(correcting dmitri's email address)

On Fri, Mar 1, 2019 at 12:24 PM Gábor Horváth <[hidden email]> wrote:


On Fri, 1 Mar 2019 at 12:13, Manuel Klimek <[hidden email]> wrote:
On Fri, Mar 1, 2019 at 12:10 PM Manuel Klimek <[hidden email]> wrote:
-apple George + google George to prevent bounces :)

On Fri, Mar 1, 2019 at 12:06 PM Manuel Klimek <[hidden email]> wrote:
On Fri, Mar 1, 2019 at 11:51 AM Gábor Horváth <[hidden email]> wrote:
Hi Manuel,

On Fri, 30 Nov 2018 at 12:04, Manuel Klimek <[hidden email]> wrote:


On Fri, Nov 30, 2018 at 12:00 PM Gábor Horváth <[hidden email]> wrote:
Unfortunately, we do not have real world experience on a large codebase for the following reasons:
* The implementation we have does not support annotations yet, and large projects are likely to contain exceptions to the default rules (even if only a few)
* The analysis has an expected coding style, e.g. it does not reason about arithmetic on raw pointers

We did run it on some projects like LLVM and there were a lot of noise mainly due to the unsupported language features like pointer arithmetic.
The rest of the false positives were mostly due to the basic assumption of the analysis that every path is feasible.
We did find some true positives with a similar analysis looking for use after moves in LLVM. Those true positives were redundant std::forward calls that are not going to cause any runtime error but still removing them would make the code cleaner and less likely to misbehave in the future. See: https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Support/Error.h#L894 (The same arguments forwarded multiple times in a loop. Since the callee never actually moves the arguments we will not end up using a moved from object. But if we never move the argument, why would we use std::forward in the first place?)

But even if the cfg-based lifetime analysis end up being a coding style specific check, the rest of the (cfg-less) clang warnings would become much more powerful after type categories are upstreamed. They could catch a lot of errors without having any false positives by default. Thus, I do see the value going forward, even if we do not have much real world experience yet.

Could you run the restricted checks on llvm?

Sorry for the delay, but finally I found some time to actually implement the CFG-less checks and run them on LLVM. The quick prototype is available here: https://github.com/mgehre/clang/commits/lifetime-warning-typecategories
Literally, all of the false positives I encountered after running on LLVM and Clang were due to llvm::ValueHandleBase being miscategorized as an owner (because of its user-defined destructor).
So annotating this class would render the LLVM codebase clean of these new lifetime errors. Do you think this is a strong enough evidence to start upstreaming the type categories or do you think we should do additional measurements?

I'd like to get Richard's opinion on this.

So just to make sure I understand correctly:
The result is that in LLVM we have:
- 1 false positive root cause (causing X false positives)
- 0 true positives
?
 

Correct.

Unfortunately that doesn't sound like a really strong argument in favor - if you have something that works cleanly against clang trunk I can try to find somebody willing to try it on a bit of our internal codebase, but I can't promise anything.

Basically what I'd be looking for is some evidence that the false positive ratio is not too high, and for that we'd need at least a codebase containing true positives.
 
 
 
I also plan to look up some lifetime fixes/reverts in the commit history to check if some of those problems would have been caught by these warnings. I will share the results in a EuroLLVM technical talk.

P.S.:
Does anybody have a good idea how to get a list of buildbot breaking commits where the reason was lifetime related (other than grepping git history)?

Regards,
Gábor
 
 

Regards,
Gabor

On Thu, 29 Nov 2018 at 20:58, George Karpenkov <[hidden email]> wrote:
Thanks, the idea looks great!

The heuristic described in a talk on matching returned and accepted references looked fragile to me initially,
but after a bit of time I could not find obvious and common counterexamples which would confuse it.

Has the evaluation been done on any large project? E.g. LLVM itself, Chrome, Webkit, or any other large C++ codebase?
What is the false positive rate? What is the annotation burden to suppress those?
How many useful bugs were found?

I’m also very happy that more dataflow analyses are being implemented in Clang,
and I could help with reviews (though obviously I’m not the code owner for most of the components).

On Nov 29, 2018, at 8:02 AM, Gábor Horváth via cfe-dev <[hidden email]> wrote:

Hi!

This is a proposal to implement Lifetime Analysis [1] defined by Herb Sutter in Clang.
Summary from the paper:
“This analysis shows how to efficiently diagnose many common cases of dangling (use-after-free) in C++ code, using only local analysis to report them as deterministic readable errors at compile time. The approach is to identify variables that are of generalized “Owner” types (e.g., smart pointers, containers, string) and “Pointer” types (e.g., int*, string_view, span, iterators, and ranges), and then use a local simple acyclic control flow graph (ACFG) analysis to track what each Pointer points to and identify when modifying an Owner invalidates a Pointer. The analysis leverages C++’s existing strong notions of scopes, object lifetimes, and const that carry rich information already available in reasonably modern C++ source code. Interestingly, it appears that with minor extension this analysis can also detect uses of local moved-from variables (use-after-move), which are a form of dangling.”
More details can be found in the paper [1] or in the CppCon keynote [3].

Matthias Gehre and myself had been working on a prototype in Clang [2]. The changes are rather large, so we are planning to take an incremental approach to upstreaming the features should the community want to see this upstream.

Plans for upstreaming

1. Upstream Type Categorization

Clang already performs statement-local lifetime analyses that would benefit from type categorization even before adding any other analysis.

This includes annotating types as Owners and Pointers, and automatically inferring Owner or Point without annotation to minimize annotation burden.

Consider the following code example:

std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}

Unfortunately, today compilers do not warn on this case of returning a dangling reference. They do warn if we return a raw pointer or reference, but the compiler does not know that std::reference_wrapper also is a non-owning indirection. In the Lifetime analysis, this is diagnosed because std::reference_wrapper is recognized as a Pointer type.

As a first step we would upstream the type categorization part of the analysis and make some clang warnings optionally use it. We would also upstream a set of annotations to give the users a way to fix potential false positives due to miscategorization. (This should be very rare according to our experience so far). By default, we could constrain the categorization for std types, whose semantics are known.

2. Extensions of existing CFG-less analyses

2a. Initialization from temporaries
The goal is to detect Pointers that dangle on initialization, such as
std::string_view sv = “test”s;
By restricting the analysis to single statements, it has a low false-positive rate and can be done without building a CFG (i.e. faster).

2b. Return of locals
The goal is to detect returning Pointers to local variables, e.g.
std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}
Similar to 2a also restricted to single statement.

2c. Member pointer that dangles once construction is complete
struct X {
   std::string_view sv;
   X() : sv("test"s) {} // warning: string_view member bound to string temporary whose lifetime ends within the constructor
};

2d. New of a Pointer that dangles after the end of the full-expression
new string_view("test"s) // warning: dynamically-allocated string_view refers to string whose lifetime ends at the end of the full-expression

3. Intra-function analysis across basic blocks, excluding function call expressions
Propagate point-to sets of Pointers across branches/loops intra-function, e.g. analysing

int* p = &i;
if(condition)
 p = nullptr;
*p; // ERROR: p is possibly null

We have some CFG patches and some code traversing the CFG and propagating the analysis state. With the type categories already in place, this patch should be smaller. We could split these patches further by implementing null tracking in a separate patch.

4. Function calls

auto find(const string& needle, const string& haystack) -> string_view [[gsl::lifetime(haystack)]];

string_view sv = find(“needle”, haystack);   
sv[0]; // OK
string_view sv = find(needle, “temporaryhaystack”);   
sv[0]; // ERROR: sv is dangling

This includes the following subparts.

4a. Precondition checks
Check that the psets of the arguments are valid at call site according to the lifetime annotations of the callee.

4b. Postcondition checks
Check that the psets returned from a function adhere to its advertised return/output psets.
Rigorous checking of not just the function arguments but also the returned values is crucial part of the analysis.

4c. Lifetimes annotations

The analysis gets pretty usable at this point. Most of the time the user does not need any annotations, but it is crucial to have them before a project can adapt it. For example, the user will occasionally want to explicitly state that a member function is “const as far as Lifetime is concerned” even though the function itself is not actually declared const (e.g., vector::operator[] does not invalidate any Pointers, such as iterators or raw pointers).

5. Implementing use after move analysis and exception support

These parts are not implemented yet in our prototype, but they will be useful additions for the analysis.

Questions

Does that make sense? What is the criteria for this work to be upstreamed? Who is willing to participate in reviewing the patches?

Thanks in advance,
Gabor, Matthias, and Herb


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding lifetime analysis to clang

Kristóf Umann via cfe-dev


On Fri, 1 Mar 2019 at 12:54, Manuel Klimek <[hidden email]> wrote:
(correcting dmitri's email address)

On Fri, Mar 1, 2019 at 12:24 PM Gábor Horváth <[hidden email]> wrote:


On Fri, 1 Mar 2019 at 12:13, Manuel Klimek <[hidden email]> wrote:
On Fri, Mar 1, 2019 at 12:10 PM Manuel Klimek <[hidden email]> wrote:
-apple George + google George to prevent bounces :)

On Fri, Mar 1, 2019 at 12:06 PM Manuel Klimek <[hidden email]> wrote:
On Fri, Mar 1, 2019 at 11:51 AM Gábor Horváth <[hidden email]> wrote:
Hi Manuel,

On Fri, 30 Nov 2018 at 12:04, Manuel Klimek <[hidden email]> wrote:


On Fri, Nov 30, 2018 at 12:00 PM Gábor Horváth <[hidden email]> wrote:
Unfortunately, we do not have real world experience on a large codebase for the following reasons:
* The implementation we have does not support annotations yet, and large projects are likely to contain exceptions to the default rules (even if only a few)
* The analysis has an expected coding style, e.g. it does not reason about arithmetic on raw pointers

We did run it on some projects like LLVM and there were a lot of noise mainly due to the unsupported language features like pointer arithmetic.
The rest of the false positives were mostly due to the basic assumption of the analysis that every path is feasible.
We did find some true positives with a similar analysis looking for use after moves in LLVM. Those true positives were redundant std::forward calls that are not going to cause any runtime error but still removing them would make the code cleaner and less likely to misbehave in the future. See: https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Support/Error.h#L894 (The same arguments forwarded multiple times in a loop. Since the callee never actually moves the arguments we will not end up using a moved from object. But if we never move the argument, why would we use std::forward in the first place?)

But even if the cfg-based lifetime analysis end up being a coding style specific check, the rest of the (cfg-less) clang warnings would become much more powerful after type categories are upstreamed. They could catch a lot of errors without having any false positives by default. Thus, I do see the value going forward, even if we do not have much real world experience yet.

Could you run the restricted checks on llvm?

Sorry for the delay, but finally I found some time to actually implement the CFG-less checks and run them on LLVM. The quick prototype is available here: https://github.com/mgehre/clang/commits/lifetime-warning-typecategories
Literally, all of the false positives I encountered after running on LLVM and Clang were due to llvm::ValueHandleBase being miscategorized as an owner (because of its user-defined destructor).
So annotating this class would render the LLVM codebase clean of these new lifetime errors. Do you think this is a strong enough evidence to start upstreaming the type categories or do you think we should do additional measurements?

I'd like to get Richard's opinion on this.

So just to make sure I understand correctly:
The result is that in LLVM we have:
- 1 false positive root cause (causing X false positives)
- 0 true positives
?
 

Correct.

Unfortunately that doesn't sound like a really strong argument in favor - if you have something that works cleanly against clang trunk I can try to find somebody willing to try it on a bit of our internal codebase, but I can't promise anything.

Basically what I'd be looking for is some evidence that the false positive ratio is not too high, and for that we'd need at least a codebase containing true positives.

I do not agree. LLVM has good coverage and tests are regularly run with sanitizers on. Thus, I would except most of the lifetime related issues being caught by dynamic analysis in our case. I do remember, however, such problems to slip through the reviews and break the build bots occasionally. I think, the value of those warnings are to catch those errors early, saving time for the developers. So even though if it is unlikely to find true positives in sanitizer clean builds for certain projects I still see the value. Note that, this is true for most of the similar warnings in clang. LLVM (and probably most other projects) is clean of -Wreturn-stack-address, still we do not question whether that warning is useful.
 
 
 
 
I also plan to look up some lifetime fixes/reverts in the commit history to check if some of those problems would have been caught by these warnings. I will share the results in a EuroLLVM technical talk.

P.S.:
Does anybody have a good idea how to get a list of buildbot breaking commits where the reason was lifetime related (other than grepping git history)?

Regards,
Gábor
 
 

Regards,
Gabor

On Thu, 29 Nov 2018 at 20:58, George Karpenkov <[hidden email]> wrote:
Thanks, the idea looks great!

The heuristic described in a talk on matching returned and accepted references looked fragile to me initially,
but after a bit of time I could not find obvious and common counterexamples which would confuse it.

Has the evaluation been done on any large project? E.g. LLVM itself, Chrome, Webkit, or any other large C++ codebase?
What is the false positive rate? What is the annotation burden to suppress those?
How many useful bugs were found?

I’m also very happy that more dataflow analyses are being implemented in Clang,
and I could help with reviews (though obviously I’m not the code owner for most of the components).

On Nov 29, 2018, at 8:02 AM, Gábor Horváth via cfe-dev <[hidden email]> wrote:

Hi!

This is a proposal to implement Lifetime Analysis [1] defined by Herb Sutter in Clang.
Summary from the paper:
“This analysis shows how to efficiently diagnose many common cases of dangling (use-after-free) in C++ code, using only local analysis to report them as deterministic readable errors at compile time. The approach is to identify variables that are of generalized “Owner” types (e.g., smart pointers, containers, string) and “Pointer” types (e.g., int*, string_view, span, iterators, and ranges), and then use a local simple acyclic control flow graph (ACFG) analysis to track what each Pointer points to and identify when modifying an Owner invalidates a Pointer. The analysis leverages C++’s existing strong notions of scopes, object lifetimes, and const that carry rich information already available in reasonably modern C++ source code. Interestingly, it appears that with minor extension this analysis can also detect uses of local moved-from variables (use-after-move), which are a form of dangling.”
More details can be found in the paper [1] or in the CppCon keynote [3].

Matthias Gehre and myself had been working on a prototype in Clang [2]. The changes are rather large, so we are planning to take an incremental approach to upstreaming the features should the community want to see this upstream.

Plans for upstreaming

1. Upstream Type Categorization

Clang already performs statement-local lifetime analyses that would benefit from type categorization even before adding any other analysis.

This includes annotating types as Owners and Pointers, and automatically inferring Owner or Point without annotation to minimize annotation burden.

Consider the following code example:

std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}

Unfortunately, today compilers do not warn on this case of returning a dangling reference. They do warn if we return a raw pointer or reference, but the compiler does not know that std::reference_wrapper also is a non-owning indirection. In the Lifetime analysis, this is diagnosed because std::reference_wrapper is recognized as a Pointer type.

As a first step we would upstream the type categorization part of the analysis and make some clang warnings optionally use it. We would also upstream a set of annotations to give the users a way to fix potential false positives due to miscategorization. (This should be very rare according to our experience so far). By default, we could constrain the categorization for std types, whose semantics are known.

2. Extensions of existing CFG-less analyses

2a. Initialization from temporaries
The goal is to detect Pointers that dangle on initialization, such as
std::string_view sv = “test”s;
By restricting the analysis to single statements, it has a low false-positive rate and can be done without building a CFG (i.e. faster).

2b. Return of locals
The goal is to detect returning Pointers to local variables, e.g.
std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}
Similar to 2a also restricted to single statement.

2c. Member pointer that dangles once construction is complete
struct X {
   std::string_view sv;
   X() : sv("test"s) {} // warning: string_view member bound to string temporary whose lifetime ends within the constructor
};

2d. New of a Pointer that dangles after the end of the full-expression
new string_view("test"s) // warning: dynamically-allocated string_view refers to string whose lifetime ends at the end of the full-expression

3. Intra-function analysis across basic blocks, excluding function call expressions
Propagate point-to sets of Pointers across branches/loops intra-function, e.g. analysing

int* p = &i;
if(condition)
 p = nullptr;
*p; // ERROR: p is possibly null

We have some CFG patches and some code traversing the CFG and propagating the analysis state. With the type categories already in place, this patch should be smaller. We could split these patches further by implementing null tracking in a separate patch.

4. Function calls

auto find(const string& needle, const string& haystack) -> string_view [[gsl::lifetime(haystack)]];

string_view sv = find(“needle”, haystack);   
sv[0]; // OK
string_view sv = find(needle, “temporaryhaystack”);   
sv[0]; // ERROR: sv is dangling

This includes the following subparts.

4a. Precondition checks
Check that the psets of the arguments are valid at call site according to the lifetime annotations of the callee.

4b. Postcondition checks
Check that the psets returned from a function adhere to its advertised return/output psets.
Rigorous checking of not just the function arguments but also the returned values is crucial part of the analysis.

4c. Lifetimes annotations

The analysis gets pretty usable at this point. Most of the time the user does not need any annotations, but it is crucial to have them before a project can adapt it. For example, the user will occasionally want to explicitly state that a member function is “const as far as Lifetime is concerned” even though the function itself is not actually declared const (e.g., vector::operator[] does not invalidate any Pointers, such as iterators or raw pointers).

5. Implementing use after move analysis and exception support

These parts are not implemented yet in our prototype, but they will be useful additions for the analysis.

Questions

Does that make sense? What is the criteria for this work to be upstreamed? Who is willing to participate in reviewing the patches?

Thanks in advance,
Gabor, Matthias, and Herb


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding lifetime analysis to clang

Kristóf Umann via cfe-dev


On Fri, Mar 1, 2019 at 1:49 PM Gábor Horváth <[hidden email]> wrote:


On Fri, 1 Mar 2019 at 12:54, Manuel Klimek <[hidden email]> wrote:
(correcting dmitri's email address)

On Fri, Mar 1, 2019 at 12:24 PM Gábor Horváth <[hidden email]> wrote:


On Fri, 1 Mar 2019 at 12:13, Manuel Klimek <[hidden email]> wrote:
On Fri, Mar 1, 2019 at 12:10 PM Manuel Klimek <[hidden email]> wrote:
-apple George + google George to prevent bounces :)

On Fri, Mar 1, 2019 at 12:06 PM Manuel Klimek <[hidden email]> wrote:
On Fri, Mar 1, 2019 at 11:51 AM Gábor Horváth <[hidden email]> wrote:
Hi Manuel,

On Fri, 30 Nov 2018 at 12:04, Manuel Klimek <[hidden email]> wrote:


On Fri, Nov 30, 2018 at 12:00 PM Gábor Horváth <[hidden email]> wrote:
Unfortunately, we do not have real world experience on a large codebase for the following reasons:
* The implementation we have does not support annotations yet, and large projects are likely to contain exceptions to the default rules (even if only a few)
* The analysis has an expected coding style, e.g. it does not reason about arithmetic on raw pointers

We did run it on some projects like LLVM and there were a lot of noise mainly due to the unsupported language features like pointer arithmetic.
The rest of the false positives were mostly due to the basic assumption of the analysis that every path is feasible.
We did find some true positives with a similar analysis looking for use after moves in LLVM. Those true positives were redundant std::forward calls that are not going to cause any runtime error but still removing them would make the code cleaner and less likely to misbehave in the future. See: https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Support/Error.h#L894 (The same arguments forwarded multiple times in a loop. Since the callee never actually moves the arguments we will not end up using a moved from object. But if we never move the argument, why would we use std::forward in the first place?)

But even if the cfg-based lifetime analysis end up being a coding style specific check, the rest of the (cfg-less) clang warnings would become much more powerful after type categories are upstreamed. They could catch a lot of errors without having any false positives by default. Thus, I do see the value going forward, even if we do not have much real world experience yet.

Could you run the restricted checks on llvm?

Sorry for the delay, but finally I found some time to actually implement the CFG-less checks and run them on LLVM. The quick prototype is available here: https://github.com/mgehre/clang/commits/lifetime-warning-typecategories
Literally, all of the false positives I encountered after running on LLVM and Clang were due to llvm::ValueHandleBase being miscategorized as an owner (because of its user-defined destructor).
So annotating this class would render the LLVM codebase clean of these new lifetime errors. Do you think this is a strong enough evidence to start upstreaming the type categories or do you think we should do additional measurements?

I'd like to get Richard's opinion on this.

So just to make sure I understand correctly:
The result is that in LLVM we have:
- 1 false positive root cause (causing X false positives)
- 0 true positives
?
 

Correct.

Unfortunately that doesn't sound like a really strong argument in favor - if you have something that works cleanly against clang trunk I can try to find somebody willing to try it on a bit of our internal codebase, but I can't promise anything.

Basically what I'd be looking for is some evidence that the false positive ratio is not too high, and for that we'd need at least a codebase containing true positives.

I do not agree.

Sorry if what I said came over stronger than I intended it to - with "not a strong argument in favor" I didn't mean that it's an argument against, just that it doesn't seem like overwhelming data for.
 
LLVM has good coverage and tests are regularly run with sanitizers on. Thus, I would except most of the lifetime related issues being caught by dynamic analysis in our case. I do remember, however, such problems to slip through the reviews and break the build bots occasionally.

Would be interesting to have some evidence / examples for this that this check would have caught - I definitely agree that lifetime issues fall through the cracks, the question is whether those actually lead to true positives, or slip through the simple check.
 
I think, the value of those warnings are to catch those errors early, saving time for the developers. So even though if it is unlikely to find true positives in sanitizer clean builds for certain projects I still see the value. Note that, this is true for most of the similar warnings in clang. LLVM (and probably most other projects) is clean of -Wreturn-stack-address, still we do not question whether that warning is useful.

Agreed. I'm really mainly looking for evidence for the decision. I'm not saying that evidence doesn't exist, but that we have to make sure to find it.
 

 
 
 
 
I also plan to look up some lifetime fixes/reverts in the commit history to check if some of those problems would have been caught by these warnings. I will share the results in a EuroLLVM technical talk.

P.S.:
Does anybody have a good idea how to get a list of buildbot breaking commits where the reason was lifetime related (other than grepping git history)?

Regards,
Gábor
 
 

Regards,
Gabor

On Thu, 29 Nov 2018 at 20:58, George Karpenkov <[hidden email]> wrote:
Thanks, the idea looks great!

The heuristic described in a talk on matching returned and accepted references looked fragile to me initially,
but after a bit of time I could not find obvious and common counterexamples which would confuse it.

Has the evaluation been done on any large project? E.g. LLVM itself, Chrome, Webkit, or any other large C++ codebase?
What is the false positive rate? What is the annotation burden to suppress those?
How many useful bugs were found?

I’m also very happy that more dataflow analyses are being implemented in Clang,
and I could help with reviews (though obviously I’m not the code owner for most of the components).

On Nov 29, 2018, at 8:02 AM, Gábor Horváth via cfe-dev <[hidden email]> wrote:

Hi!

This is a proposal to implement Lifetime Analysis [1] defined by Herb Sutter in Clang.
Summary from the paper:
“This analysis shows how to efficiently diagnose many common cases of dangling (use-after-free) in C++ code, using only local analysis to report them as deterministic readable errors at compile time. The approach is to identify variables that are of generalized “Owner” types (e.g., smart pointers, containers, string) and “Pointer” types (e.g., int*, string_view, span, iterators, and ranges), and then use a local simple acyclic control flow graph (ACFG) analysis to track what each Pointer points to and identify when modifying an Owner invalidates a Pointer. The analysis leverages C++’s existing strong notions of scopes, object lifetimes, and const that carry rich information already available in reasonably modern C++ source code. Interestingly, it appears that with minor extension this analysis can also detect uses of local moved-from variables (use-after-move), which are a form of dangling.”
More details can be found in the paper [1] or in the CppCon keynote [3].

Matthias Gehre and myself had been working on a prototype in Clang [2]. The changes are rather large, so we are planning to take an incremental approach to upstreaming the features should the community want to see this upstream.

Plans for upstreaming

1. Upstream Type Categorization

Clang already performs statement-local lifetime analyses that would benefit from type categorization even before adding any other analysis.

This includes annotating types as Owners and Pointers, and automatically inferring Owner or Point without annotation to minimize annotation burden.

Consider the following code example:

std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}

Unfortunately, today compilers do not warn on this case of returning a dangling reference. They do warn if we return a raw pointer or reference, but the compiler does not know that std::reference_wrapper also is a non-owning indirection. In the Lifetime analysis, this is diagnosed because std::reference_wrapper is recognized as a Pointer type.

As a first step we would upstream the type categorization part of the analysis and make some clang warnings optionally use it. We would also upstream a set of annotations to give the users a way to fix potential false positives due to miscategorization. (This should be very rare according to our experience so far). By default, we could constrain the categorization for std types, whose semantics are known.

2. Extensions of existing CFG-less analyses

2a. Initialization from temporaries
The goal is to detect Pointers that dangle on initialization, such as
std::string_view sv = “test”s;
By restricting the analysis to single statements, it has a low false-positive rate and can be done without building a CFG (i.e. faster).

2b. Return of locals
The goal is to detect returning Pointers to local variables, e.g.
std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}
Similar to 2a also restricted to single statement.

2c. Member pointer that dangles once construction is complete
struct X {
   std::string_view sv;
   X() : sv("test"s) {} // warning: string_view member bound to string temporary whose lifetime ends within the constructor
};

2d. New of a Pointer that dangles after the end of the full-expression
new string_view("test"s) // warning: dynamically-allocated string_view refers to string whose lifetime ends at the end of the full-expression

3. Intra-function analysis across basic blocks, excluding function call expressions
Propagate point-to sets of Pointers across branches/loops intra-function, e.g. analysing

int* p = &i;
if(condition)
 p = nullptr;
*p; // ERROR: p is possibly null

We have some CFG patches and some code traversing the CFG and propagating the analysis state. With the type categories already in place, this patch should be smaller. We could split these patches further by implementing null tracking in a separate patch.

4. Function calls

auto find(const string& needle, const string& haystack) -> string_view [[gsl::lifetime(haystack)]];

string_view sv = find(“needle”, haystack);   
sv[0]; // OK
string_view sv = find(needle, “temporaryhaystack”);   
sv[0]; // ERROR: sv is dangling

This includes the following subparts.

4a. Precondition checks
Check that the psets of the arguments are valid at call site according to the lifetime annotations of the callee.

4b. Postcondition checks
Check that the psets returned from a function adhere to its advertised return/output psets.
Rigorous checking of not just the function arguments but also the returned values is crucial part of the analysis.

4c. Lifetimes annotations

The analysis gets pretty usable at this point. Most of the time the user does not need any annotations, but it is crucial to have them before a project can adapt it. For example, the user will occasionally want to explicitly state that a member function is “const as far as Lifetime is concerned” even though the function itself is not actually declared const (e.g., vector::operator[] does not invalidate any Pointers, such as iterators or raw pointers).

5. Implementing use after move analysis and exception support

These parts are not implemented yet in our prototype, but they will be useful additions for the analysis.

Questions

Does that make sense? What is the criteria for this work to be upstreamed? Who is willing to participate in reviewing the patches?

Thanks in advance,
Gabor, Matthias, and Herb


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding lifetime analysis to clang

Kristóf Umann via cfe-dev
> I do not agree. LLVM has good coverage and tests are regularly run with sanitizers on.

I'm not entirely convinced: coverage is only line-based and does not cover all executions,
and e.g. fuzzers still regularly find new bugs.

But in any case it should be easy to find other projects which have more true positives, right?


On Fri, Mar 1, 2019 at 5:03 AM Manuel Klimek <[hidden email]> wrote:


On Fri, Mar 1, 2019 at 1:49 PM Gábor Horváth <[hidden email]> wrote:


On Fri, 1 Mar 2019 at 12:54, Manuel Klimek <[hidden email]> wrote:
(correcting dmitri's email address)

On Fri, Mar 1, 2019 at 12:24 PM Gábor Horváth <[hidden email]> wrote:


On Fri, 1 Mar 2019 at 12:13, Manuel Klimek <[hidden email]> wrote:
On Fri, Mar 1, 2019 at 12:10 PM Manuel Klimek <[hidden email]> wrote:
-apple George + google George to prevent bounces :)

On Fri, Mar 1, 2019 at 12:06 PM Manuel Klimek <[hidden email]> wrote:
On Fri, Mar 1, 2019 at 11:51 AM Gábor Horváth <[hidden email]> wrote:
Hi Manuel,

On Fri, 30 Nov 2018 at 12:04, Manuel Klimek <[hidden email]> wrote:


On Fri, Nov 30, 2018 at 12:00 PM Gábor Horváth <[hidden email]> wrote:
Unfortunately, we do not have real world experience on a large codebase for the following reasons:
* The implementation we have does not support annotations yet, and large projects are likely to contain exceptions to the default rules (even if only a few)
* The analysis has an expected coding style, e.g. it does not reason about arithmetic on raw pointers

We did run it on some projects like LLVM and there were a lot of noise mainly due to the unsupported language features like pointer arithmetic.
The rest of the false positives were mostly due to the basic assumption of the analysis that every path is feasible.
We did find some true positives with a similar analysis looking for use after moves in LLVM. Those true positives were redundant std::forward calls that are not going to cause any runtime error but still removing them would make the code cleaner and less likely to misbehave in the future. See: https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Support/Error.h#L894 (The same arguments forwarded multiple times in a loop. Since the callee never actually moves the arguments we will not end up using a moved from object. But if we never move the argument, why would we use std::forward in the first place?)

But even if the cfg-based lifetime analysis end up being a coding style specific check, the rest of the (cfg-less) clang warnings would become much more powerful after type categories are upstreamed. They could catch a lot of errors without having any false positives by default. Thus, I do see the value going forward, even if we do not have much real world experience yet.

Could you run the restricted checks on llvm?

Sorry for the delay, but finally I found some time to actually implement the CFG-less checks and run them on LLVM. The quick prototype is available here: https://github.com/mgehre/clang/commits/lifetime-warning-typecategories
Literally, all of the false positives I encountered after running on LLVM and Clang were due to llvm::ValueHandleBase being miscategorized as an owner (because of its user-defined destructor).
So annotating this class would render the LLVM codebase clean of these new lifetime errors. Do you think this is a strong enough evidence to start upstreaming the type categories or do you think we should do additional measurements?

I'd like to get Richard's opinion on this.

So just to make sure I understand correctly:
The result is that in LLVM we have:
- 1 false positive root cause (causing X false positives)
- 0 true positives
?
 

Correct.

Unfortunately that doesn't sound like a really strong argument in favor - if you have something that works cleanly against clang trunk I can try to find somebody willing to try it on a bit of our internal codebase, but I can't promise anything.

Basically what I'd be looking for is some evidence that the false positive ratio is not too high, and for that we'd need at least a codebase containing true positives.

I do not agree.

Sorry if what I said came over stronger than I intended it to - with "not a strong argument in favor" I didn't mean that it's an argument against, just that it doesn't seem like overwhelming data for.
 
LLVM has good coverage and tests are regularly run with sanitizers on. Thus, I would except most of the lifetime related issues being caught by dynamic analysis in our case. I do remember, however, such problems to slip through the reviews and break the build bots occasionally.

Would be interesting to have some evidence / examples for this that this check would have caught - I definitely agree that lifetime issues fall through the cracks, the question is whether those actually lead to true positives, or slip through the simple check.
 
I think, the value of those warnings are to catch those errors early, saving time for the developers. So even though if it is unlikely to find true positives in sanitizer clean builds for certain projects I still see the value. Note that, this is true for most of the similar warnings in clang. LLVM (and probably most other projects) is clean of -Wreturn-stack-address, still we do not question whether that warning is useful.

Agreed. I'm really mainly looking for evidence for the decision. I'm not saying that evidence doesn't exist, but that we have to make sure to find it.
 

 
 
 
 
I also plan to look up some lifetime fixes/reverts in the commit history to check if some of those problems would have been caught by these warnings. I will share the results in a EuroLLVM technical talk.

P.S.:
Does anybody have a good idea how to get a list of buildbot breaking commits where the reason was lifetime related (other than grepping git history)?

Regards,
Gábor
 
 

Regards,
Gabor

On Thu, 29 Nov 2018 at 20:58, George Karpenkov <[hidden email]> wrote:
Thanks, the idea looks great!

The heuristic described in a talk on matching returned and accepted references looked fragile to me initially,
but after a bit of time I could not find obvious and common counterexamples which would confuse it.

Has the evaluation been done on any large project? E.g. LLVM itself, Chrome, Webkit, or any other large C++ codebase?
What is the false positive rate? What is the annotation burden to suppress those?
How many useful bugs were found?

I’m also very happy that more dataflow analyses are being implemented in Clang,
and I could help with reviews (though obviously I’m not the code owner for most of the components).

On Nov 29, 2018, at 8:02 AM, Gábor Horváth via cfe-dev <[hidden email]> wrote:

Hi!

This is a proposal to implement Lifetime Analysis [1] defined by Herb Sutter in Clang.
Summary from the paper:
“This analysis shows how to efficiently diagnose many common cases of dangling (use-after-free) in C++ code, using only local analysis to report them as deterministic readable errors at compile time. The approach is to identify variables that are of generalized “Owner” types (e.g., smart pointers, containers, string) and “Pointer” types (e.g., int*, string_view, span, iterators, and ranges), and then use a local simple acyclic control flow graph (ACFG) analysis to track what each Pointer points to and identify when modifying an Owner invalidates a Pointer. The analysis leverages C++’s existing strong notions of scopes, object lifetimes, and const that carry rich information already available in reasonably modern C++ source code. Interestingly, it appears that with minor extension this analysis can also detect uses of local moved-from variables (use-after-move), which are a form of dangling.”
More details can be found in the paper [1] or in the CppCon keynote [3].

Matthias Gehre and myself had been working on a prototype in Clang [2]. The changes are rather large, so we are planning to take an incremental approach to upstreaming the features should the community want to see this upstream.

Plans for upstreaming

1. Upstream Type Categorization

Clang already performs statement-local lifetime analyses that would benefit from type categorization even before adding any other analysis.

This includes annotating types as Owners and Pointers, and automatically inferring Owner or Point without annotation to minimize annotation burden.

Consider the following code example:

std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}

Unfortunately, today compilers do not warn on this case of returning a dangling reference. They do warn if we return a raw pointer or reference, but the compiler does not know that std::reference_wrapper also is a non-owning indirection. In the Lifetime analysis, this is diagnosed because std::reference_wrapper is recognized as a Pointer type.

As a first step we would upstream the type categorization part of the analysis and make some clang warnings optionally use it. We would also upstream a set of annotations to give the users a way to fix potential false positives due to miscategorization. (This should be very rare according to our experience so far). By default, we could constrain the categorization for std types, whose semantics are known.

2. Extensions of existing CFG-less analyses

2a. Initialization from temporaries
The goal is to detect Pointers that dangle on initialization, such as
std::string_view sv = “test”s;
By restricting the analysis to single statements, it has a low false-positive rate and can be done without building a CFG (i.e. faster).

2b. Return of locals
The goal is to detect returning Pointers to local variables, e.g.
std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}
Similar to 2a also restricted to single statement.

2c. Member pointer that dangles once construction is complete
struct X {
   std::string_view sv;
   X() : sv("test"s) {} // warning: string_view member bound to string temporary whose lifetime ends within the constructor
};

2d. New of a Pointer that dangles after the end of the full-expression
new string_view("test"s) // warning: dynamically-allocated string_view refers to string whose lifetime ends at the end of the full-expression

3. Intra-function analysis across basic blocks, excluding function call expressions
Propagate point-to sets of Pointers across branches/loops intra-function, e.g. analysing

int* p = &i;
if(condition)
 p = nullptr;
*p; // ERROR: p is possibly null

We have some CFG patches and some code traversing the CFG and propagating the analysis state. With the type categories already in place, this patch should be smaller. We could split these patches further by implementing null tracking in a separate patch.

4. Function calls

auto find(const string& needle, const string& haystack) -> string_view [[gsl::lifetime(haystack)]];

string_view sv = find(“needle”, haystack);   
sv[0]; // OK
string_view sv = find(needle, “temporaryhaystack”);   
sv[0]; // ERROR: sv is dangling

This includes the following subparts.

4a. Precondition checks
Check that the psets of the arguments are valid at call site according to the lifetime annotations of the callee.

4b. Postcondition checks
Check that the psets returned from a function adhere to its advertised return/output psets.
Rigorous checking of not just the function arguments but also the returned values is crucial part of the analysis.

4c. Lifetimes annotations

The analysis gets pretty usable at this point. Most of the time the user does not need any annotations, but it is crucial to have them before a project can adapt it. For example, the user will occasionally want to explicitly state that a member function is “const as far as Lifetime is concerned” even though the function itself is not actually declared const (e.g., vector::operator[] does not invalidate any Pointers, such as iterators or raw pointers).

5. Implementing use after move analysis and exception support

These parts are not implemented yet in our prototype, but they will be useful additions for the analysis.

Questions

Does that make sense? What is the criteria for this work to be upstreamed? Who is willing to participate in reviewing the patches?

Thanks in advance,
Gabor, Matthias, and Herb


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding lifetime analysis to clang

Kristóf Umann via cfe-dev


On Mon, 4 Mar 2019 at 19:08, George Karpenkov <[hidden email]> wrote:
> I do not agree. LLVM has good coverage and tests are regularly run with sanitizers on.

I'm not entirely convinced: coverage is only line-based and does not cover all executions,
and e.g. fuzzers still regularly find new bugs.

But in any case it should be easy to find other projects which have more true positives, right?

Yeah, we will look into some other projects as well. But, for me, more interesting result will be to rerun it on reverted commits.
Also, note that we are talking now about the local versions of these lifetime warnings.
Here are some examples that the local analysis would catch:

new MyPointer(MyOwner{});
MyPointer p = Y{}.a;
MyPointer p = MyOwner{};
struct S {
  MyPointer p;
  S(int i) : p(&i) {}
  S() : p(MyOwner{}) {}
};
MyPointer f() { int j; return &j; }
MyPointer f() { return MyOwner{}; }

Since no flow sensitive analysis is going on when all lines are covered we are pretty unlikely to have such errors.
They are also pretty independent of the runtime values.
 


On Fri, Mar 1, 2019 at 5:03 AM Manuel Klimek <[hidden email]> wrote:


On Fri, Mar 1, 2019 at 1:49 PM Gábor Horváth <[hidden email]> wrote:


On Fri, 1 Mar 2019 at 12:54, Manuel Klimek <[hidden email]> wrote:
(correcting dmitri's email address)

On Fri, Mar 1, 2019 at 12:24 PM Gábor Horváth <[hidden email]> wrote:


On Fri, 1 Mar 2019 at 12:13, Manuel Klimek <[hidden email]> wrote:
On Fri, Mar 1, 2019 at 12:10 PM Manuel Klimek <[hidden email]> wrote:
-apple George + google George to prevent bounces :)

On Fri, Mar 1, 2019 at 12:06 PM Manuel Klimek <[hidden email]> wrote:
On Fri, Mar 1, 2019 at 11:51 AM Gábor Horváth <[hidden email]> wrote:
Hi Manuel,

On Fri, 30 Nov 2018 at 12:04, Manuel Klimek <[hidden email]> wrote:


On Fri, Nov 30, 2018 at 12:00 PM Gábor Horváth <[hidden email]> wrote:
Unfortunately, we do not have real world experience on a large codebase for the following reasons:
* The implementation we have does not support annotations yet, and large projects are likely to contain exceptions to the default rules (even if only a few)
* The analysis has an expected coding style, e.g. it does not reason about arithmetic on raw pointers

We did run it on some projects like LLVM and there were a lot of noise mainly due to the unsupported language features like pointer arithmetic.
The rest of the false positives were mostly due to the basic assumption of the analysis that every path is feasible.
We did find some true positives with a similar analysis looking for use after moves in LLVM. Those true positives were redundant std::forward calls that are not going to cause any runtime error but still removing them would make the code cleaner and less likely to misbehave in the future. See: https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Support/Error.h#L894 (The same arguments forwarded multiple times in a loop. Since the callee never actually moves the arguments we will not end up using a moved from object. But if we never move the argument, why would we use std::forward in the first place?)

But even if the cfg-based lifetime analysis end up being a coding style specific check, the rest of the (cfg-less) clang warnings would become much more powerful after type categories are upstreamed. They could catch a lot of errors without having any false positives by default. Thus, I do see the value going forward, even if we do not have much real world experience yet.

Could you run the restricted checks on llvm?

Sorry for the delay, but finally I found some time to actually implement the CFG-less checks and run them on LLVM. The quick prototype is available here: https://github.com/mgehre/clang/commits/lifetime-warning-typecategories
Literally, all of the false positives I encountered after running on LLVM and Clang were due to llvm::ValueHandleBase being miscategorized as an owner (because of its user-defined destructor).
So annotating this class would render the LLVM codebase clean of these new lifetime errors. Do you think this is a strong enough evidence to start upstreaming the type categories or do you think we should do additional measurements?

I'd like to get Richard's opinion on this.

So just to make sure I understand correctly:
The result is that in LLVM we have:
- 1 false positive root cause (causing X false positives)
- 0 true positives
?
 

Correct.

Unfortunately that doesn't sound like a really strong argument in favor - if you have something that works cleanly against clang trunk I can try to find somebody willing to try it on a bit of our internal codebase, but I can't promise anything.

Basically what I'd be looking for is some evidence that the false positive ratio is not too high, and for that we'd need at least a codebase containing true positives.

I do not agree.

Sorry if what I said came over stronger than I intended it to - with "not a strong argument in favor" I didn't mean that it's an argument against, just that it doesn't seem like overwhelming data for.
 
LLVM has good coverage and tests are regularly run with sanitizers on. Thus, I would except most of the lifetime related issues being caught by dynamic analysis in our case. I do remember, however, such problems to slip through the reviews and break the build bots occasionally.

Would be interesting to have some evidence / examples for this that this check would have caught - I definitely agree that lifetime issues fall through the cracks, the question is whether those actually lead to true positives, or slip through the simple check.
 
I think, the value of those warnings are to catch those errors early, saving time for the developers. So even though if it is unlikely to find true positives in sanitizer clean builds for certain projects I still see the value. Note that, this is true for most of the similar warnings in clang. LLVM (and probably most other projects) is clean of -Wreturn-stack-address, still we do not question whether that warning is useful.

Agreed. I'm really mainly looking for evidence for the decision. I'm not saying that evidence doesn't exist, but that we have to make sure to find it.
 

 
 
 
 
I also plan to look up some lifetime fixes/reverts in the commit history to check if some of those problems would have been caught by these warnings. I will share the results in a EuroLLVM technical talk.

P.S.:
Does anybody have a good idea how to get a list of buildbot breaking commits where the reason was lifetime related (other than grepping git history)?

Regards,
Gábor
 
 

Regards,
Gabor

On Thu, 29 Nov 2018 at 20:58, George Karpenkov <[hidden email]> wrote:
Thanks, the idea looks great!

The heuristic described in a talk on matching returned and accepted references looked fragile to me initially,
but after a bit of time I could not find obvious and common counterexamples which would confuse it.

Has the evaluation been done on any large project? E.g. LLVM itself, Chrome, Webkit, or any other large C++ codebase?
What is the false positive rate? What is the annotation burden to suppress those?
How many useful bugs were found?

I’m also very happy that more dataflow analyses are being implemented in Clang,
and I could help with reviews (though obviously I’m not the code owner for most of the components).

On Nov 29, 2018, at 8:02 AM, Gábor Horváth via cfe-dev <[hidden email]> wrote:

Hi!

This is a proposal to implement Lifetime Analysis [1] defined by Herb Sutter in Clang.
Summary from the paper:
“This analysis shows how to efficiently diagnose many common cases of dangling (use-after-free) in C++ code, using only local analysis to report them as deterministic readable errors at compile time. The approach is to identify variables that are of generalized “Owner” types (e.g., smart pointers, containers, string) and “Pointer” types (e.g., int*, string_view, span, iterators, and ranges), and then use a local simple acyclic control flow graph (ACFG) analysis to track what each Pointer points to and identify when modifying an Owner invalidates a Pointer. The analysis leverages C++’s existing strong notions of scopes, object lifetimes, and const that carry rich information already available in reasonably modern C++ source code. Interestingly, it appears that with minor extension this analysis can also detect uses of local moved-from variables (use-after-move), which are a form of dangling.”
More details can be found in the paper [1] or in the CppCon keynote [3].

Matthias Gehre and myself had been working on a prototype in Clang [2]. The changes are rather large, so we are planning to take an incremental approach to upstreaming the features should the community want to see this upstream.

Plans for upstreaming

1. Upstream Type Categorization

Clang already performs statement-local lifetime analyses that would benefit from type categorization even before adding any other analysis.

This includes annotating types as Owners and Pointers, and automatically inferring Owner or Point without annotation to minimize annotation burden.

Consider the following code example:

std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}

Unfortunately, today compilers do not warn on this case of returning a dangling reference. They do warn if we return a raw pointer or reference, but the compiler does not know that std::reference_wrapper also is a non-owning indirection. In the Lifetime analysis, this is diagnosed because std::reference_wrapper is recognized as a Pointer type.

As a first step we would upstream the type categorization part of the analysis and make some clang warnings optionally use it. We would also upstream a set of annotations to give the users a way to fix potential false positives due to miscategorization. (This should be very rare according to our experience so far). By default, we could constrain the categorization for std types, whose semantics are known.

2. Extensions of existing CFG-less analyses

2a. Initialization from temporaries
The goal is to detect Pointers that dangle on initialization, such as
std::string_view sv = “test”s;
By restricting the analysis to single statements, it has a low false-positive rate and can be done without building a CFG (i.e. faster).

2b. Return of locals
The goal is to detect returning Pointers to local variables, e.g.
std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}
Similar to 2a also restricted to single statement.

2c. Member pointer that dangles once construction is complete
struct X {
   std::string_view sv;
   X() : sv("test"s) {} // warning: string_view member bound to string temporary whose lifetime ends within the constructor
};

2d. New of a Pointer that dangles after the end of the full-expression
new string_view("test"s) // warning: dynamically-allocated string_view refers to string whose lifetime ends at the end of the full-expression

3. Intra-function analysis across basic blocks, excluding function call expressions
Propagate point-to sets of Pointers across branches/loops intra-function, e.g. analysing

int* p = &i;
if(condition)
 p = nullptr;
*p; // ERROR: p is possibly null

We have some CFG patches and some code traversing the CFG and propagating the analysis state. With the type categories already in place, this patch should be smaller. We could split these patches further by implementing null tracking in a separate patch.

4. Function calls

auto find(const string& needle, const string& haystack) -> string_view [[gsl::lifetime(haystack)]];

string_view sv = find(“needle”, haystack);   
sv[0]; // OK
string_view sv = find(needle, “temporaryhaystack”);   
sv[0]; // ERROR: sv is dangling

This includes the following subparts.

4a. Precondition checks
Check that the psets of the arguments are valid at call site according to the lifetime annotations of the callee.

4b. Postcondition checks
Check that the psets returned from a function adhere to its advertised return/output psets.
Rigorous checking of not just the function arguments but also the returned values is crucial part of the analysis.

4c. Lifetimes annotations

The analysis gets pretty usable at this point. Most of the time the user does not need any annotations, but it is crucial to have them before a project can adapt it. For example, the user will occasionally want to explicitly state that a member function is “const as far as Lifetime is concerned” even though the function itself is not actually declared const (e.g., vector::operator[] does not invalidate any Pointers, such as iterators or raw pointers).

5. Implementing use after move analysis and exception support

These parts are not implemented yet in our prototype, but they will be useful additions for the analysis.

Questions

Does that make sense? What is the criteria for this work to be upstreamed? Who is willing to participate in reviewing the patches?

Thanks in advance,
Gabor, Matthias, and Herb


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding lifetime analysis to clang

Kristóf Umann via cfe-dev
On Mon, Mar 4, 2019 at 7:27 PM Gábor Horváth <[hidden email]> wrote:

>
> On Mon, 4 Mar 2019 at 19:08, George Karpenkov <[hidden email]> wrote:
>>
>> > I do not agree. LLVM has good coverage and tests are regularly run with sanitizers on.
>>
>> I'm not entirely convinced: coverage is only line-based and does not cover all executions,
>> and e.g. fuzzers still regularly find new bugs.
>>
>> But in any case it should be easy to find other projects which have more true positives, right?
>
>
> Yeah, we will look into some other projects as well. But, for me, more interesting result will be to rerun it on reverted commits.
> Also, note that we are talking now about the local versions of these lifetime warnings.
> Here are some examples that the local analysis would catch:

Part of the reason why the lifetime analysis is not finding issues in
LLVM and Clang could be that our code allocates most objects, that are
passed around as raw pointers, on arenas (BumpPtrAllocator, included
in ASTContext, for example).

Dmitri

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <[hidden email]>*/
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding lifetime analysis to clang

Kristóf Umann via cfe-dev
In reply to this post by Kristóf Umann via cfe-dev
Hi!

I started to collect some more evidence on why type category-based local warnings are a worthy addition. I worked used Clang as a target of this experiment because we are all familiar with this project. The method was to grep for some keywords like temporary, dangling, StringRef, and ArrayRef. I filtered the results since some of the commits were not related to lifetime issues at all (of course, there is no guarantee that I did not miss a lot of interesting commits). After that, I compiled each commit with the modified clang and categorized the results:

Errors that would have been caught by the warning (so less noise on the build bots):
* https://github.com/llvm-mirror/clang/commit/be0345e47273a01065874e568ad673f89b75317a (StringRef RetTyName = RetTy.getAsString();)
* https://github.com/llvm-mirror/clang/commit/7140ad92debc63af78a9c50dcad2a0b6d4ebc680

Old revision did not compile with modern Clang but the problem is very likely to be caught:

Errors that are not caught because assignments are not checked, only initializations (could be added as a separate flag?):

Errors that are not caught because they would need some semantic information about APIs that are not covered by type categories or CFG based analysis:
* https://github.com/llvm-mirror/clang/commit/9fec733e8e62b1e2ccf0d39916899f2c2d92c764 (The problem is related to const PrintingPolicy &Policy; member of FailedBooleanConditionPrinterHelper)

Errors not caught, probably bug in the prototype (and should be feasible to catch), need to investigate:

Skipped, did not see any code that could trigger the behavior while examining the diff and did not bother looking deeper:

All of these commits slipped through the review process. I feel like this data could justify these checks (even more so since we can restrict it to annotated and STL types by default reducing the number of false positives to virtually 0).
What do you think?

Regards,
Gábor


On Fri, 1 Mar 2019 at 14:03, Manuel Klimek <[hidden email]> wrote:


On Fri, Mar 1, 2019 at 1:49 PM Gábor Horváth <[hidden email]> wrote:


On Fri, 1 Mar 2019 at 12:54, Manuel Klimek <[hidden email]> wrote:
(correcting dmitri's email address)

On Fri, Mar 1, 2019 at 12:24 PM Gábor Horváth <[hidden email]> wrote:


On Fri, 1 Mar 2019 at 12:13, Manuel Klimek <[hidden email]> wrote:
On Fri, Mar 1, 2019 at 12:10 PM Manuel Klimek <[hidden email]> wrote:
-apple George + google George to prevent bounces :)

On Fri, Mar 1, 2019 at 12:06 PM Manuel Klimek <[hidden email]> wrote:
On Fri, Mar 1, 2019 at 11:51 AM Gábor Horváth <[hidden email]> wrote:
Hi Manuel,

On Fri, 30 Nov 2018 at 12:04, Manuel Klimek <[hidden email]> wrote:


On Fri, Nov 30, 2018 at 12:00 PM Gábor Horváth <[hidden email]> wrote:
Unfortunately, we do not have real world experience on a large codebase for the following reasons:
* The implementation we have does not support annotations yet, and large projects are likely to contain exceptions to the default rules (even if only a few)
* The analysis has an expected coding style, e.g. it does not reason about arithmetic on raw pointers

We did run it on some projects like LLVM and there were a lot of noise mainly due to the unsupported language features like pointer arithmetic.
The rest of the false positives were mostly due to the basic assumption of the analysis that every path is feasible.
We did find some true positives with a similar analysis looking for use after moves in LLVM. Those true positives were redundant std::forward calls that are not going to cause any runtime error but still removing them would make the code cleaner and less likely to misbehave in the future. See: https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Support/Error.h#L894 (The same arguments forwarded multiple times in a loop. Since the callee never actually moves the arguments we will not end up using a moved from object. But if we never move the argument, why would we use std::forward in the first place?)

But even if the cfg-based lifetime analysis end up being a coding style specific check, the rest of the (cfg-less) clang warnings would become much more powerful after type categories are upstreamed. They could catch a lot of errors without having any false positives by default. Thus, I do see the value going forward, even if we do not have much real world experience yet.

Could you run the restricted checks on llvm?

Sorry for the delay, but finally I found some time to actually implement the CFG-less checks and run them on LLVM. The quick prototype is available here: https://github.com/mgehre/clang/commits/lifetime-warning-typecategories
Literally, all of the false positives I encountered after running on LLVM and Clang were due to llvm::ValueHandleBase being miscategorized as an owner (because of its user-defined destructor).
So annotating this class would render the LLVM codebase clean of these new lifetime errors. Do you think this is a strong enough evidence to start upstreaming the type categories or do you think we should do additional measurements?

I'd like to get Richard's opinion on this.

So just to make sure I understand correctly:
The result is that in LLVM we have:
- 1 false positive root cause (causing X false positives)
- 0 true positives
?
 

Correct.

Unfortunately that doesn't sound like a really strong argument in favor - if you have something that works cleanly against clang trunk I can try to find somebody willing to try it on a bit of our internal codebase, but I can't promise anything.

Basically what I'd be looking for is some evidence that the false positive ratio is not too high, and for that we'd need at least a codebase containing true positives.

I do not agree.

Sorry if what I said came over stronger than I intended it to - with "not a strong argument in favor" I didn't mean that it's an argument against, just that it doesn't seem like overwhelming data for.
 
LLVM has good coverage and tests are regularly run with sanitizers on. Thus, I would except most of the lifetime related issues being caught by dynamic analysis in our case. I do remember, however, such problems to slip through the reviews and break the build bots occasionally.

Would be interesting to have some evidence / examples for this that this check would have caught - I definitely agree that lifetime issues fall through the cracks, the question is whether those actually lead to true positives, or slip through the simple check.
 
I think, the value of those warnings are to catch those errors early, saving time for the developers. So even though if it is unlikely to find true positives in sanitizer clean builds for certain projects I still see the value. Note that, this is true for most of the similar warnings in clang. LLVM (and probably most other projects) is clean of -Wreturn-stack-address, still we do not question whether that warning is useful.

Agreed. I'm really mainly looking for evidence for the decision. I'm not saying that evidence doesn't exist, but that we have to make sure to find it.
 

 
 
 
 
I also plan to look up some lifetime fixes/reverts in the commit history to check if some of those problems would have been caught by these warnings. I will share the results in a EuroLLVM technical talk.

P.S.:
Does anybody have a good idea how to get a list of buildbot breaking commits where the reason was lifetime related (other than grepping git history)?

Regards,
Gábor
 
 

Regards,
Gabor

On Thu, 29 Nov 2018 at 20:58, George Karpenkov <[hidden email]> wrote:
Thanks, the idea looks great!

The heuristic described in a talk on matching returned and accepted references looked fragile to me initially,
but after a bit of time I could not find obvious and common counterexamples which would confuse it.

Has the evaluation been done on any large project? E.g. LLVM itself, Chrome, Webkit, or any other large C++ codebase?
What is the false positive rate? What is the annotation burden to suppress those?
How many useful bugs were found?

I’m also very happy that more dataflow analyses are being implemented in Clang,
and I could help with reviews (though obviously I’m not the code owner for most of the components).

On Nov 29, 2018, at 8:02 AM, Gábor Horváth via cfe-dev <[hidden email]> wrote:

Hi!

This is a proposal to implement Lifetime Analysis [1] defined by Herb Sutter in Clang.
Summary from the paper:
“This analysis shows how to efficiently diagnose many common cases of dangling (use-after-free) in C++ code, using only local analysis to report them as deterministic readable errors at compile time. The approach is to identify variables that are of generalized “Owner” types (e.g., smart pointers, containers, string) and “Pointer” types (e.g., int*, string_view, span, iterators, and ranges), and then use a local simple acyclic control flow graph (ACFG) analysis to track what each Pointer points to and identify when modifying an Owner invalidates a Pointer. The analysis leverages C++’s existing strong notions of scopes, object lifetimes, and const that carry rich information already available in reasonably modern C++ source code. Interestingly, it appears that with minor extension this analysis can also detect uses of local moved-from variables (use-after-move), which are a form of dangling.”
More details can be found in the paper [1] or in the CppCon keynote [3].

Matthias Gehre and myself had been working on a prototype in Clang [2]. The changes are rather large, so we are planning to take an incremental approach to upstreaming the features should the community want to see this upstream.

Plans for upstreaming

1. Upstream Type Categorization

Clang already performs statement-local lifetime analyses that would benefit from type categorization even before adding any other analysis.

This includes annotating types as Owners and Pointers, and automatically inferring Owner or Point without annotation to minimize annotation burden.

Consider the following code example:

std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}

Unfortunately, today compilers do not warn on this case of returning a dangling reference. They do warn if we return a raw pointer or reference, but the compiler does not know that std::reference_wrapper also is a non-owning indirection. In the Lifetime analysis, this is diagnosed because std::reference_wrapper is recognized as a Pointer type.

As a first step we would upstream the type categorization part of the analysis and make some clang warnings optionally use it. We would also upstream a set of annotations to give the users a way to fix potential false positives due to miscategorization. (This should be very rare according to our experience so far). By default, we could constrain the categorization for std types, whose semantics are known.

2. Extensions of existing CFG-less analyses

2a. Initialization from temporaries
The goal is to detect Pointers that dangle on initialization, such as
std::string_view sv = “test”s;
By restricting the analysis to single statements, it has a low false-positive rate and can be done without building a CFG (i.e. faster).

2b. Return of locals
The goal is to detect returning Pointers to local variables, e.g.
std::reference_wrapper<const int> get_data() {
   const int i = 3;
   return {i};
}
Similar to 2a also restricted to single statement.

2c. Member pointer that dangles once construction is complete
struct X {
   std::string_view sv;
   X() : sv("test"s) {} // warning: string_view member bound to string temporary whose lifetime ends within the constructor
};

2d. New of a Pointer that dangles after the end of the full-expression
new string_view("test"s) // warning: dynamically-allocated string_view refers to string whose lifetime ends at the end of the full-expression

3. Intra-function analysis across basic blocks, excluding function call expressions
Propagate point-to sets of Pointers across branches/loops intra-function, e.g. analysing

int* p = &i;
if(condition)
 p = nullptr;
*p; // ERROR: p is possibly null

We have some CFG patches and some code traversing the CFG and propagating the analysis state. With the type categories already in place, this patch should be smaller. We could split these patches further by implementing null tracking in a separate patch.

4. Function calls

auto find(const string& needle, const string& haystack) -> string_view [[gsl::lifetime(haystack)]];

string_view sv = find(“needle”, haystack);   
sv[0]; // OK
string_view sv = find(needle, “temporaryhaystack”);   
sv[0]; // ERROR: sv is dangling

This includes the following subparts.

4a. Precondition checks
Check that the psets of the arguments are valid at call site according to the lifetime annotations of the callee.

4b. Postcondition checks
Check that the psets returned from a function adhere to its advertised return/output psets.
Rigorous checking of not just the function arguments but also the returned values is crucial part of the analysis.

4c. Lifetimes annotations

The analysis gets pretty usable at this point. Most of the time the user does not need any annotations, but it is crucial to have them before a project can adapt it. For example, the user will occasionally want to explicitly state that a member function is “const as far as Lifetime is concerned” even though the function itself is not actually declared const (e.g., vector::operator[] does not invalidate any Pointers, such as iterators or raw pointers).

5. Implementing use after move analysis and exception support

These parts are not implemented yet in our prototype, but they will be useful additions for the analysis.

Questions

Does that make sense? What is the criteria for this work to be upstreamed? Who is willing to participate in reviewing the patches?

Thanks in advance,
Gabor, Matthias, and Herb


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding lifetime analysis to clang

Kristóf Umann via cfe-dev
Gabor and I chatted privately, and I wanted to summarize the discussion.

=== Concerns about inference of type categories ===

I agree that having type categories is a good idea.  I think this is
one of the most important user-facing features in the proposal.

I don't agree with inferring type categories though.  The rules are
too complex to allow the inference to be done implicitly without
hurting readability.  I think the users should be always required to
annotate their Owners and Pointers, but not Aggregates and not Values.
The implementation could still check the rules, if they are crucial.

If the rules were simpler (e.g., "Pointer is any type that satisfies
the iterator concept"), I would consider inference more acceptable,
but still borderline.  Iterator is a complex concept by itself, there
are many broken iterators out there that don't satisfy the concept,
but the simpler usages in practice compile anyway.  However, just the
Pointer rules way more complex than that -- they are like half a page
long.

I don't think we can simplify the rules though -- they need to cover
the necessary C++ concepts.

My primary concern with inference is not false positives, but
understandability and debuggability of the system.  When I get a
warning, how do I, as a human, determine if the types I use are being
correctly classified?  As an API vendor, how can I be sure that my
types are correctly classified so that users will get correct
warnings?  The answer is "add an annotation".  So wouldn't every API
provider want to add an annotation regardless then?

My worry is that a fully automatic system where the user is not
required to understand the rules, will not match users' expectations.
If a provider of a type thinks that it is a Pointer, but it actually
isn't, the failure mode is false negatives, not false positives. Users
will just think the analysis is too weak and can't detect the problem.
So the API providers will have to test that their types are indeed
recognized by the compiler with the correct type category.  At that
point they could as well add an annotation.

This is basically the same reason why C++ added the `override`
keyword, and C++ having backwards compatibility constraints made
`override` optional.  The inference rules for `override` are much
simpler than inference of type categories.  However, style guides
often require writing `override` where possible.

Some types don't conform to the Pointer category formally, even though
in spirit they are pointers.  For example, if style guide used in the
project prohibits operator overloading: https://godbolt.org/z/IYJwHw .
However, users who "don't fully understand the rules" could reasonably
think that the compiler should be able to figure out that this type is
a Pointer.  If the annotation was required, the problem would be
instantly discovered when the compiler would rightfully complain that
the type does not conform to Pointer.

I understand there are two primary reasons why inference is desired:
reduction of annotation burden, and third-party libraries.

To evaluate the annotation burden in practice, I think LLVM is not a
great example.  LLVM has an above-average number of Owners and
Pointers because it tries to implement standard library bits from
future C++ versions.  Also, those Owners and Pointers are in "ADT" and
"Support" libraries, not in regular application code.

Third-party libraries can be annotated through the "API Notes"
approach, that Swift's Objective-C interop uses.  API notes allow the
user to pass an annotations file that injects the necessary attributes
into the library headers.

=== Future work idea: "Optional" type category ===

I also think it would be great to have more type categories, in
particular "optional" -- which will include pointers (that can contain
null), `std::optional` (that can contain nullopt), and whatever other
types with an empty state people might have in their projects.  Then
we could use the same dataflow-sensitive rules as the proposal uses
for tracking null pointers to track checks of empty optionals etc.

However, the rules for assuming nullability seem to be loose (return
values and members are assumed non-null), I'm not very happy about
that: https://godbolt.org/z/CMxEVv . I understand this was probably
done to lower the annotation burden.  I don't think it makes for a
very understandable model though.  I also couldn't find a single place
in the proposal that summarized these assumptions, I had to piece them
together from different sections.

Dmitri
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
12