[analyzer][RFC] The future of StdLibraryFunctionsChecker and other checkers responsibility

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[analyzer][RFC] The future of StdLibraryFunctionsChecker and other checkers responsibility

Vassil Vassilev via cfe-dev
Hi,

During the review of https://reviews.llvm.org/D82288 Kristof and I have started an independent discussion about the responsibilities of StdLibraryFunctionsChecker. This may involve other checkers too. We'd like to develop a community accepted strategy, so agreed to move this discussion forward to the mailing list.

One of my goals with StdLibraryFunctionsChecker is to add argument constraints to functions in the C, C++ and POSIX standard. It is very easy to gather these argument constraints and apply them massively. By doing this, the analysis can be more precise, so, I see a lot of gain here. However, some of these argument constraints could be handled in more specific checkers too, e.g. in CStringModeling, MallocChecker or in StreamChecker.

Let's take for instance `popen`. In the mentioned patch I am adding a simple summary with argument constraints. However, sooner or later popen could be modeled in StreamChecker once that is mature enough. (1) Should we remove the summary from StdLibraryFunctionsChecker when that happens? (2) Or should we have a declaration that ranges and nullability are checked with summaries while more specific things are checked in their respective checkers? (3) Or the specialized checkers could be dependencies to the generic StdLibraryFunctionsChecker, so any bug related to e.g. popen is prioritized to StreamChecker?

My other goal would be to make it possible to add argument constraints (or branches/cases) to any library functions. These libraries may never be modeled in the analyzer. In this case the above problem is non-existent. Perhaps, we should divide the checker to two different checkers once we reach that point, shouldn't we? We are very close to rename StdLibraryFunctionsChecker to LibraryFunctionsChecker anyway by now.

There is more. In `evalCall` if a checker returns true then that function is no longer processed by any subsequent checkers. That is the case with pure functions in StdLibraryFunctionsChecker. In this case, however, it would be anarchy if another checker also returns true from its own evalCall callback. Only one can rule. It would be great if we could enforce that no two checkers evaluate the same function this way. But I don't see any clear solution to achieve that right now, this is really challenging.

Thanks,
Gabor


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer][RFC] The future of StdLibraryFunctionsChecker and other checkers responsibility

Vassil Vassilev via cfe-dev

+Balázs Kéri

On Thu, 25 Jun 2020 at 10:39, Gábor Márton <[hidden email]> wrote:
One of my goals with StdLibraryFunctionsChecker is to add argument constraints to functions in the C, C++ and POSIX standard. It is very easy to gather these argument constraints and apply them massively. By doing this, the analysis can be more precise, so, I see a lot of gain here. However, some of these argument constraints could be handled in more specific checkers too, e.g. in CStringModeling, MallocChecker or in StreamChecker.

For those that didn't follow development closely, argument constraints in this context mean the following: For select functions, we define a signature we can use to match a call to this function (for C functions, that is return value, function name and argument types), and a so called "summary", which is a list of constraints on the arguments or the return value. For a function like fread(), the signature is size_t fread(void *, size_t, size_t, FILE *), and the summary is that the stream argument and the buffer must be non-null, and that the buffer must be at least the size of (2nd arg * 3rd arg). We can also constrain the return value, that is is at least (2nd arg * 3rd arg).

If on the function call any of the preconditions isn't met, we can emit a warning, otherwise we can increase the precision of the analysis by constraining the arguments.

This is a form of summary based analysis, and its pretty easy to pull off on standard functions because they have, well, standardized signatures and constraints.

(this isn't pulled from the checker, just a theoretical example)
 
Let's take for instance `popen`. In the mentioned patch I am adding a simple summary with argument constraints. However, sooner or later popen could be modeled in StreamChecker once that is mature enough. (1) Should we remove the summary from StdLibraryFunctionsChecker when that happens? (2) Or should we have a declaration that ranges and nullability are checked with summaries while more specific things are checked in their respective checkers? (3) Or the specialized checkers could be dependencies to the generic StdLibraryFunctionsChecker, so any bug related to e.g. popen is prioritized to StreamChecker?

Yes, this this the big one. What role should batch modeling checkers play in the analyzer alongside specific modeling checkers? Lets take an example that has already presents a problem, and the most glaring of them, to me at least, is stream modeling functions.

Take for instance the above mentioned fread. The *numerical* pre- and postconditions can be enforced just as I detailed. However, the most important effect of this function is of course the stream operation, what happens depending on the validity of the stream object (which is not a numerical property), and what happens to it after the call has finished, successful or not. If the stream hits EOF during the read, the return value is less then (2nd arg * 3rd arg), and is equal to it if the read was successful. What this implies to me is that StdLibraryFunctionChecker cannot really take on the responsibility of establishing a post condition, **unless** the function is question can be described by numerical properties only (like isalpha()).

This is something that this checker already does well I think. Its clear that we don't want to expand it to cover non-numeric cases. So, onto the topic that brought us here.

(2.) This sounds great. I think precondition checking is something that introduces a lot of code duplication in checkers, and I don't think that fread()ing a nullpointer is a stream mismanagement issue necessarily. A notable challenge to overcome, however, is how can we ask StreamChecker to add NoteTags from other checkers' reports (see: https://reviews.llvm.org/D81407?id=270161#inline-750832).

(1.) I think the only thing we could do with this idea is to implement an incomplete post condition modeling until a checker comes along and does it properly. However, even this would only make sense if we have a single post-condition (https://reviews.llvm.org/D79432#inline-757474).

(3.) If we go with option (2), we must introduce strong dependencies so that numerical preconditions are checked beforehand. However, if we *duplicate* the same checking in multiple checkers and want to use dependencies to make the correct code runs first, we would introduce deadcode, or possibly running the same check twice, and the problem of inconsistently distributing the responsibility of analysis gets even worse.
 
My other goal would be to make it possible to add argument constraints (or branches/cases) to any library functions. These libraries may never be modeled in the analyzer. In this case the above problem is non-existent. Perhaps, we should divide the checker to two different checkers once we reach that point, shouldn't we? We are very close to rename StdLibraryFunctionsChecker to LibraryFunctionsChecker anyway by now.

If we want to implement summary based analysis, it would only be fitting to bear the name SpeculativeSummaryBasedFunctionModeling and StdSummaryBasedFunctionModeling or something, right? ;) Anyway, this is the least of the worries.

I agree, pure functions (like the above mentioned isalpha()) is the real learning opportunities for generalized summary based analysis for the time being.
 
There is more. In `evalCall` if a checker returns true then that function is no longer processed by any subsequent checkers. That is the case with pure functions in StdLibraryFunctionsChecker. In this case, however, it would be anarchy if another checker also returns true from its own evalCall callback. Only one can rule. It would be great if we could enforce that no two checkers evaluate the same function this way. But I don't see any clear solution to achieve that right now, this is really challenging.

Yup, the problem is that checker callbacks from CheckerManager's perspective (I might be wrong here!) takes a set of ingoing nodes and returns a set of outgoing ones. So the solution might not be terribly complicated, just ugly (large #ifs).

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer][RFC] The future of StdLibraryFunctionsChecker and other checkers responsibility

Vassil Vassilev via cfe-dev
No matter how far do we want to push the flexibility of our summaries, this checker will always have limitations. I suggest we treat it as yet another method of interprocedural analysis, i.e. simply an improvement over conservative evaluation of the call.

I don't see a need for a specific one-size-fits-all policy here. We can decide every time on per-function basis. Doing evalCall in two different checkers is obviously bad (and we have an assertion that protects against that, so it won't be "anarchy" but merely a crash, at least in non-release builds but i'd expect such crash to be fairly apparent) but any other separation of concerns sounds reasonable to me.

Like, for some functions it makes sense to have StdLibraryFunctionsChecker do the evalCall and the ranges and let other checkers model various aspects of the call that are of interest to them in their PreCall/PostCall. Specific checkers can also do a better job at reporting bugs than the generic checker. For functions for which StdLibraryFunctionsChecker doesn't do evalCall this makes even more sense.

It probably doesn't make much sense to keep StdLibraryFunctionsChecker's pre/post-call modeling if we have a specific checker for that function.

I suspect that if you end up setting up dependencies, it's almost always better to move the functionality to one checker. But there's no need to merge different aspects of modeling that are completely orthogonal. Note that state splits are idempotent which means that it's often ok to duplicate the post-call state-splitting code in multiple checkers without making them aware of each other.

On 6/25/20 7:19 AM, Kristóf Umann via cfe-dev wrote:

+Balázs Kéri

On Thu, 25 Jun 2020 at 10:39, Gábor Márton <[hidden email]> wrote:
One of my goals with StdLibraryFunctionsChecker is to add argument constraints to functions in the C, C++ and POSIX standard. It is very easy to gather these argument constraints and apply them massively. By doing this, the analysis can be more precise, so, I see a lot of gain here. However, some of these argument constraints could be handled in more specific checkers too, e.g. in CStringModeling, MallocChecker or in StreamChecker.

For those that didn't follow development closely, argument constraints in this context mean the following: For select functions, we define a signature we can use to match a call to this function (for C functions, that is return value, function name and argument types), and a so called "summary", which is a list of constraints on the arguments or the return value. For a function like fread(), the signature is size_t fread(void *, size_t, size_t, FILE *), and the summary is that the stream argument and the buffer must be non-null, and that the buffer must be at least the size of (2nd arg * 3rd arg). We can also constrain the return value, that is is at least (2nd arg * 3rd arg).

If on the function call any of the preconditions isn't met, we can emit a warning, otherwise we can increase the precision of the analysis by constraining the arguments.

This is a form of summary based analysis, and its pretty easy to pull off on standard functions because they have, well, standardized signatures and constraints.

(this isn't pulled from the checker, just a theoretical example)
 
Let's take for instance `popen`. In the mentioned patch I am adding a simple summary with argument constraints. However, sooner or later popen could be modeled in StreamChecker once that is mature enough. (1) Should we remove the summary from StdLibraryFunctionsChecker when that happens? (2) Or should we have a declaration that ranges and nullability are checked with summaries while more specific things are checked in their respective checkers? (3) Or the specialized checkers could be dependencies to the generic StdLibraryFunctionsChecker, so any bug related to e.g. popen is prioritized to StreamChecker?

Yes, this this the big one. What role should batch modeling checkers play in the analyzer alongside specific modeling checkers? Lets take an example that has already presents a problem, and the most glaring of them, to me at least, is stream modeling functions.

Take for instance the above mentioned fread. The *numerical* pre- and postconditions can be enforced just as I detailed. However, the most important effect of this function is of course the stream operation, what happens depending on the validity of the stream object (which is not a numerical property), and what happens to it after the call has finished, successful or not. If the stream hits EOF during the read, the return value is less then (2nd arg * 3rd arg), and is equal to it if the read was successful. What this implies to me is that StdLibraryFunctionChecker cannot really take on the responsibility of establishing a post condition, **unless** the function is question can be described by numerical properties only (like isalpha()).

This is something that this checker already does well I think. Its clear that we don't want to expand it to cover non-numeric cases. So, onto the topic that brought us here.

(2.) This sounds great. I think precondition checking is something that introduces a lot of code duplication in checkers, and I don't think that fread()ing a nullpointer is a stream mismanagement issue necessarily. A notable challenge to overcome, however, is how can we ask StreamChecker to add NoteTags from other checkers' reports (see: https://reviews.llvm.org/D81407?id=270161#inline-750832).

(1.) I think the only thing we could do with this idea is to implement an incomplete post condition modeling until a checker comes along and does it properly. However, even this would only make sense if we have a single post-condition (https://reviews.llvm.org/D79432#inline-757474).

(3.) If we go with option (2), we must introduce strong dependencies so that numerical preconditions are checked beforehand. However, if we *duplicate* the same checking in multiple checkers and want to use dependencies to make the correct code runs first, we would introduce deadcode, or possibly running the same check twice, and the problem of inconsistently distributing the responsibility of analysis gets even worse.
 
My other goal would be to make it possible to add argument constraints (or branches/cases) to any library functions. These libraries may never be modeled in the analyzer. In this case the above problem is non-existent. Perhaps, we should divide the checker to two different checkers once we reach that point, shouldn't we? We are very close to rename StdLibraryFunctionsChecker to LibraryFunctionsChecker anyway by now.

If we want to implement summary based analysis, it would only be fitting to bear the name SpeculativeSummaryBasedFunctionModeling and StdSummaryBasedFunctionModeling or something, right? ;) Anyway, this is the least of the worries.

I agree, pure functions (like the above mentioned isalpha()) is the real learning opportunities for generalized summary based analysis for the time being.
 
There is more. In `evalCall` if a checker returns true then that function is no longer processed by any subsequent checkers. That is the case with pure functions in StdLibraryFunctionsChecker. In this case, however, it would be anarchy if another checker also returns true from its own evalCall callback. Only one can rule. It would be great if we could enforce that no two checkers evaluate the same function this way. But I don't see any clear solution to achieve that right now, this is really challenging.

Yup, the problem is that checker callbacks from CheckerManager's perspective (I might be wrong here!) takes a set of ingoing nodes and returns a set of outgoing ones. So the solution might not be terribly complicated, just ugly (large #ifs).

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer][RFC] The future of StdLibraryFunctionsChecker and other checkers responsibility

Vassil Vassilev via cfe-dev
What I think now about this:
It may be better but more work to make a kind of generic interface that supports modeling of these summaries. This interface should make specification and apply of summaries easy enough to use at any checker, extensible with checker-specific behavior. Then we can have specific checkers for specific groups of functions. These can model every aspect of the functions so that responsibility for one function is not spread between multiple checkers.
(About 'popen': This is and will be not handled by StreamChecker because it is part of another API with integer "file descriptors". StreamChecker works with 'FILE *' stream handle. Bigger problem is that the integer file descriptor API works relatively similar to the 'FILE *' API so if there is a new checker for these functions it can work similar as StreamChecker, maybe it would be even more simple.)

Artem Dergachev via cfe-dev <[hidden email]> ezt írta (időpont: 2020. jún. 25., Cs, 18:45):
No matter how far do we want to push the flexibility of our summaries, this checker will always have limitations. I suggest we treat it as yet another method of interprocedural analysis, i.e. simply an improvement over conservative evaluation of the call.

I don't see a need for a specific one-size-fits-all policy here. We can decide every time on per-function basis. Doing evalCall in two different checkers is obviously bad (and we have an assertion that protects against that, so it won't be "anarchy" but merely a crash, at least in non-release builds but i'd expect such crash to be fairly apparent) but any other separation of concerns sounds reasonable to me.

Like, for some functions it makes sense to have StdLibraryFunctionsChecker do the evalCall and the ranges and let other checkers model various aspects of the call that are of interest to them in their PreCall/PostCall. Specific checkers can also do a better job at reporting bugs than the generic checker. For functions for which StdLibraryFunctionsChecker doesn't do evalCall this makes even more sense.

It probably doesn't make much sense to keep StdLibraryFunctionsChecker's pre/post-call modeling if we have a specific checker for that function.

I suspect that if you end up setting up dependencies, it's almost always better to move the functionality to one checker. But there's no need to merge different aspects of modeling that are completely orthogonal. Note that state splits are idempotent which means that it's often ok to duplicate the post-call state-splitting code in multiple checkers without making them aware of each other.

On 6/25/20 7:19 AM, Kristóf Umann via cfe-dev wrote:

+Balázs Kéri

On Thu, 25 Jun 2020 at 10:39, Gábor Márton <[hidden email]> wrote:
One of my goals with StdLibraryFunctionsChecker is to add argument constraints to functions in the C, C++ and POSIX standard. It is very easy to gather these argument constraints and apply them massively. By doing this, the analysis can be more precise, so, I see a lot of gain here. However, some of these argument constraints could be handled in more specific checkers too, e.g. in CStringModeling, MallocChecker or in StreamChecker.

For those that didn't follow development closely, argument constraints in this context mean the following: For select functions, we define a signature we can use to match a call to this function (for C functions, that is return value, function name and argument types), and a so called "summary", which is a list of constraints on the arguments or the return value. For a function like fread(), the signature is size_t fread(void *, size_t, size_t, FILE *), and the summary is that the stream argument and the buffer must be non-null, and that the buffer must be at least the size of (2nd arg * 3rd arg). We can also constrain the return value, that is is at least (2nd arg * 3rd arg).

If on the function call any of the preconditions isn't met, we can emit a warning, otherwise we can increase the precision of the analysis by constraining the arguments.

This is a form of summary based analysis, and its pretty easy to pull off on standard functions because they have, well, standardized signatures and constraints.

(this isn't pulled from the checker, just a theoretical example)
 
Let's take for instance `popen`. In the mentioned patch I am adding a simple summary with argument constraints. However, sooner or later popen could be modeled in StreamChecker once that is mature enough. (1) Should we remove the summary from StdLibraryFunctionsChecker when that happens? (2) Or should we have a declaration that ranges and nullability are checked with summaries while more specific things are checked in their respective checkers? (3) Or the specialized checkers could be dependencies to the generic StdLibraryFunctionsChecker, so any bug related to e.g. popen is prioritized to StreamChecker?

Yes, this this the big one. What role should batch modeling checkers play in the analyzer alongside specific modeling checkers? Lets take an example that has already presents a problem, and the most glaring of them, to me at least, is stream modeling functions.

Take for instance the above mentioned fread. The *numerical* pre- and postconditions can be enforced just as I detailed. However, the most important effect of this function is of course the stream operation, what happens depending on the validity of the stream object (which is not a numerical property), and what happens to it after the call has finished, successful or not. If the stream hits EOF during the read, the return value is less then (2nd arg * 3rd arg), and is equal to it if the read was successful. What this implies to me is that StdLibraryFunctionChecker cannot really take on the responsibility of establishing a post condition, **unless** the function is question can be described by numerical properties only (like isalpha()).

This is something that this checker already does well I think. Its clear that we don't want to expand it to cover non-numeric cases. So, onto the topic that brought us here.

(2.) This sounds great. I think precondition checking is something that introduces a lot of code duplication in checkers, and I don't think that fread()ing a nullpointer is a stream mismanagement issue necessarily. A notable challenge to overcome, however, is how can we ask StreamChecker to add NoteTags from other checkers' reports (see: https://reviews.llvm.org/D81407?id=270161#inline-750832).

(1.) I think the only thing we could do with this idea is to implement an incomplete post condition modeling until a checker comes along and does it properly. However, even this would only make sense if we have a single post-condition (https://reviews.llvm.org/D79432#inline-757474).

(3.) If we go with option (2), we must introduce strong dependencies so that numerical preconditions are checked beforehand. However, if we *duplicate* the same checking in multiple checkers and want to use dependencies to make the correct code runs first, we would introduce deadcode, or possibly running the same check twice, and the problem of inconsistently distributing the responsibility of analysis gets even worse.
 
My other goal would be to make it possible to add argument constraints (or branches/cases) to any library functions. These libraries may never be modeled in the analyzer. In this case the above problem is non-existent. Perhaps, we should divide the checker to two different checkers once we reach that point, shouldn't we? We are very close to rename StdLibraryFunctionsChecker to LibraryFunctionsChecker anyway by now.

If we want to implement summary based analysis, it would only be fitting to bear the name SpeculativeSummaryBasedFunctionModeling and StdSummaryBasedFunctionModeling or something, right? ;) Anyway, this is the least of the worries.

I agree, pure functions (like the above mentioned isalpha()) is the real learning opportunities for generalized summary based analysis for the time being.
 
There is more. In `evalCall` if a checker returns true then that function is no longer processed by any subsequent checkers. That is the case with pure functions in StdLibraryFunctionsChecker. In this case, however, it would be anarchy if another checker also returns true from its own evalCall callback. Only one can rule. It would be great if we could enforce that no two checkers evaluate the same function this way. But I don't see any clear solution to achieve that right now, this is really challenging.

Yup, the problem is that checker callbacks from CheckerManager's perspective (I might be wrong here!) takes a set of ingoing nodes and returns a set of outgoing ones. So the solution might not be terribly complicated, just ugly (large #ifs).

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer][RFC] The future of StdLibraryFunctionsChecker and other checkers responsibility

Vassil Vassilev via cfe-dev


On Fri, 26 Jun 2020 at 09:53, Balázs Kéri <[hidden email]> wrote:
What I think now about this:
It may be better but more work to make a kind of generic interface that supports modeling of these summaries. This interface should make specification and apply of summaries easy enough to use at any checker, extensible with checker-specific behavior. Then we can have specific checkers for specific groups of functions. These can model every aspect of the functions so that responsibility for one function is not spread between multiple checkers.

That sounds like a brilliant idea! It would be great if at the very least the signature would be visible from StdLibraryFunctions. Let's take a function that would greatly benefit from this: getline.

ssize_t getline(char **lineptr, size_t *n, FILE *stream);

"getline() reads an entire line from stream, storing the address of the buffer containing the text into *lineptr.  The buffer is null terminated and includes the newline character, if one was found. If *lineptr is set to NULL and *n is set 0 before the call, then getline() will allocate a buffer for storing the line.  This buffer should be freed by the user program even if getline() failed."

This is a function that should be handled by MallocChecker (memory allocation), StreamChecker (validity of the stream), TaintChecker (to mark the buffer as tainted) and possibly CStringChecker (is the stream pointing to a well known buffer?). Checking whether the stream is null or not could be done anywhere, but it would be convenient to keep the signature/summary combo together. Having an interface to retrieve each of the possible post-condition states would make this even better, since if any of them would be impossible (say, the path where feof() returns true for a known-to-be-valid stream) the individual stream could sink that path.

Artem Dergachev via cfe-dev <[hidden email]> ezt írta (időpont: 2020. jún. 25., Cs, 18:45):
No matter how far do we want to push the flexibility of our summaries, this checker will always have limitations. I suggest we treat it as yet another method of interprocedural analysis, i.e. simply an improvement over conservative evaluation of the call.

I don't see a need for a specific one-size-fits-all policy here. We can decide every time on per-function basis. Doing evalCall in two different checkers is obviously bad (and we have an assertion that protects against that, so it won't be "anarchy" but merely a crash, at least in non-release builds but i'd expect such crash to be fairly apparent) but any other separation of concerns sounds reasonable to me.

If we actually produce a crash in that case, then I guess we should use evalCall a lot more often. Sure we miss a lot of aspects of the function (pipe creation etc), but it's not like inlining instead would help on that anyways, and until then, lets enjoy some less conservative invalidation and performance gain.
 
I suspect that if you end up setting up dependencies, it's almost always better to move the functionality to one checker. But there's no need to merge different aspects of modeling that are completely orthogonal. Note that state splits are idempotent which means that it's often ok to duplicate the post-call state-splitting code in multiple checkers without making them aware of each other.

I might be overly cautious, but that I fear that still leaves us vulnerable to hard-to-detect programming errors. I have a general gut feeling about the analyzer that we have a lot of very similar looking functions that do almost the same thing with subtle differences (https://reviews.llvm.org/rG6bedfaf5200474f9a72b059f0d99dd39ece1c03e#925680), and I'm not immediately sure that the same exact state split would happen in those occasions. But then again, this is just a gut feeling, nothing concrete.
 
On 6/25/20 7:19 AM, Kristóf Umann via cfe-dev wrote:

+Balázs Kéri

On Thu, 25 Jun 2020 at 10:39, Gábor Márton <[hidden email]> wrote:
One of my goals with StdLibraryFunctionsChecker is to add argument constraints to functions in the C, C++ and POSIX standard. It is very easy to gather these argument constraints and apply them massively. By doing this, the analysis can be more precise, so, I see a lot of gain here. However, some of these argument constraints could be handled in more specific checkers too, e.g. in CStringModeling, MallocChecker or in StreamChecker.

For those that didn't follow development closely, argument constraints in this context mean the following: For select functions, we define a signature we can use to match a call to this function (for C functions, that is return value, function name and argument types), and a so called "summary", which is a list of constraints on the arguments or the return value. For a function like fread(), the signature is size_t fread(void *, size_t, size_t, FILE *), and the summary is that the stream argument and the buffer must be non-null, and that the buffer must be at least the size of (2nd arg * 3rd arg). We can also constrain the return value, that is is at least (2nd arg * 3rd arg).

If on the function call any of the preconditions isn't met, we can emit a warning, otherwise we can increase the precision of the analysis by constraining the arguments.

This is a form of summary based analysis, and its pretty easy to pull off on standard functions because they have, well, standardized signatures and constraints.

(this isn't pulled from the checker, just a theoretical example)
 
Let's take for instance `popen`. In the mentioned patch I am adding a simple summary with argument constraints. However, sooner or later popen could be modeled in StreamChecker once that is mature enough. (1) Should we remove the summary from StdLibraryFunctionsChecker when that happens? (2) Or should we have a declaration that ranges and nullability are checked with summaries while more specific things are checked in their respective checkers? (3) Or the specialized checkers could be dependencies to the generic StdLibraryFunctionsChecker, so any bug related to e.g. popen is prioritized to StreamChecker?

Yes, this this the big one. What role should batch modeling checkers play in the analyzer alongside specific modeling checkers? Lets take an example that has already presents a problem, and the most glaring of them, to me at least, is stream modeling functions.

Take for instance the above mentioned fread. The *numerical* pre- and postconditions can be enforced just as I detailed. However, the most important effect of this function is of course the stream operation, what happens depending on the validity of the stream object (which is not a numerical property), and what happens to it after the call has finished, successful or not. If the stream hits EOF during the read, the return value is less then (2nd arg * 3rd arg), and is equal to it if the read was successful. What this implies to me is that StdLibraryFunctionChecker cannot really take on the responsibility of establishing a post condition, **unless** the function is question can be described by numerical properties only (like isalpha()).

This is something that this checker already does well I think. Its clear that we don't want to expand it to cover non-numeric cases. So, onto the topic that brought us here.

(2.) This sounds great. I think precondition checking is something that introduces a lot of code duplication in checkers, and I don't think that fread()ing a nullpointer is a stream mismanagement issue necessarily. A notable challenge to overcome, however, is how can we ask StreamChecker to add NoteTags from other checkers' reports (see: https://reviews.llvm.org/D81407?id=270161#inline-750832).

(1.) I think the only thing we could do with this idea is to implement an incomplete post condition modeling until a checker comes along and does it properly. However, even this would only make sense if we have a single post-condition (https://reviews.llvm.org/D79432#inline-757474).

(3.) If we go with option (2), we must introduce strong dependencies so that numerical preconditions are checked beforehand. However, if we *duplicate* the same checking in multiple checkers and want to use dependencies to make the correct code runs first, we would introduce deadcode, or possibly running the same check twice, and the problem of inconsistently distributing the responsibility of analysis gets even worse.
 
My other goal would be to make it possible to add argument constraints (or branches/cases) to any library functions. These libraries may never be modeled in the analyzer. In this case the above problem is non-existent. Perhaps, we should divide the checker to two different checkers once we reach that point, shouldn't we? We are very close to rename StdLibraryFunctionsChecker to LibraryFunctionsChecker anyway by now.

If we want to implement summary based analysis, it would only be fitting to bear the name SpeculativeSummaryBasedFunctionModeling and StdSummaryBasedFunctionModeling or something, right? ;) Anyway, this is the least of the worries.

I agree, pure functions (like the above mentioned isalpha()) is the real learning opportunities for generalized summary based analysis for the time being.
 
There is more. In `evalCall` if a checker returns true then that function is no longer processed by any subsequent checkers. That is the case with pure functions in StdLibraryFunctionsChecker. In this case, however, it would be anarchy if another checker also returns true from its own evalCall callback. Only one can rule. It would be great if we could enforce that no two checkers evaluate the same function this way. But I don't see any clear solution to achieve that right now, this is really challenging.

Yup, the problem is that checker callbacks from CheckerManager's perspective (I might be wrong here!) takes a set of ingoing nodes and returns a set of outgoing ones. So the solution might not be terribly complicated, just ugly (large #ifs).

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev