Check virtual calls in ctor or dtor

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Check virtual calls in ctor or dtor

Lei Zhang
Hi Clang,

The attached patch adds a C++ syntax Checker to clang, it checks whether there is a virtual call in the constructor or destructor. I add a file to svn to implement this checker, because i think there may be some other syntax checkers we could write here.

I will appreciate it if there are any advice about this patch.

--
Best regards!

Lei Zhang

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

VirtualCallinCtorOrDtor.patch (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Check virtual calls in ctor or dtor

Ted Kremenek
Hi Lei,

Do you think this would be better implemented as a compiler warning, instead of a static analysis check?

Ted

On Sep 12, 2011, at 6:45 AM, 章磊 wrote:

> Hi Clang,
>
> The attached patch adds a C++ syntax Checker to clang, it checks whether there is a virtual call in the constructor or destructor. I add a file to svn to implement this checker, because i think there may be some other syntax checkers we could write here.
>
> I will appreciate it if there are any advice about this patch.
>
> --
> Best regards!
>
> Lei Zhang
> <VirtualCallinCtorOrDtor.patch>_______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Check virtual calls in ctor or dtor

Lei Zhang
Hi Ted,

I tried to implement this as a compiler warning. But i had some
problem in how to get the body of the ctor or dtor.

I implemented this in Sema::CheckConstructor and
Sema::CheckDestructor, but i can not get the body of them when there
is no body with the declaration yet. And i also need the function body
of the CallExpr in the ctor/dtor to do recursive analysis.

So where should i implement this to give a compiler warning?

2011/9/14, 章磊 <[hidden email]>:

> Hi Ted,
>
> I think it's better to implement it as a compiler warning, where should i
> get involved in?
>
> There may some more syntax checks:
>
>    1. absence of virtual dtor( ), it seems that we need whole program
>    analysis to figure out whether a class is a base class.
>    2. define and initialize member variables in the same order
>    3. const-correctness
>
> Should this also be compiler warnings? How to define which they belong to?
>
> 2011/9/13 Ted Kremenek <[hidden email]>
>
>> Hi Lei,
>>
>> Do you think this would be better implemented as a compiler warning,
>> instead of a static analysis check?
>>
>> Ted
>>
>> On Sep 12, 2011, at 6:45 AM, 章磊 wrote:
>>
>> > Hi Clang,
>> >
>> > The attached patch adds a C++ syntax Checker to clang, it checks
>> > whether
>> there is a virtual call in the constructor or destructor. I add a file to
>> svn to implement this checker, because i think there may be some other
>> syntax checkers we could write here.
>> >
>> > I will appreciate it if there are any advice about this patch.
>> >
>> > --
>> > Best regards!
>> >
>> > Lei Zhang
>> >
>> <VirtualCallinCtorOrDtor.patch>_______________________________________________
>> > cfe-dev mailing list
>> > [hidden email]
>> > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>
>>
>
>
> --
> Best regards!
>
> Lei Zhang
>

--
Best regards!

Lei Zhang


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Check virtual calls in ctor or dtor

Ted Kremenek
On Sep 25, 2011, at 2:32 AM, 章磊 wrote:

Hi Ted,

I tried to implement this as a compiler warning. But i had some
problem in how to get the body of the ctor or dtor.

I implemented this in Sema::CheckConstructor and
Sema::CheckDestructor, but i can not get the body of them when there
is no body with the declaration yet.

The idea how I thought this would be implemented was to do the analysis when processing CallExprs as we are building them.  We should know up front when we are about to process a constructor body.  Simply set/clear a Sema flag when processing the constructor body, and your check can consult that flag when processing a CallExpr to see if it needs to do the check.  Alternatively, checking the DeclContext may be all that is needed to see if we are currently in a constructor body.

The advantage of this approach is that it keeps all the checking local.  Walking the AST of a constructor body *after* we have constructed it is slow, and not really a great approach for doing compiler warnings in the frontend (if we can at all help it).

And i also need the function body
of the CallExpr in the ctor/dtor to do recursive analysis.

Interesting.  If your goal is to do inter procedural analysis, then this should remain a static analyzer check, or that should be a case caught beyond what a compiler warning should do.  That wasn't clear to me before that this was something you wanted to do.

The tradeoffs between compiler and static analyzer warnings are fairly simple:

1) Compiler warnings have higher visibility because all users see them all the time when building their code.

2) The logic behind compiler warnings must be as fast as possible.  Inter-procedural analysis is usually a show stopper, and ideally the checks are highly local.  Sema is already "walking" all of the code as it builds the AST, so often the warning logic can be put in key places, essentially where the warning would be issued.



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Check virtual calls in ctor or dtor

Lei Zhang
Hi Ted,
 
My goal is to check whether there is a virtual call (directly or indirectly) in construction or destruction.
 
In my former patch i do this check to all the constructors and the destructor in checkASTDecl(const CXXRecordDecl ...). My way is walk through the ast body of ctor/dtor: if there is virtual call, warn it; if there is a call, walk through the callee's body recursively.
 
I think it's ok to do this check without the analysis engine and the inline IPA.
 
What you say?
在 2011年9月26日 下午9:37,Ted Kremenek <[hidden email]>写道:
On Sep 25, 2011, at 2:32 AM, 章磊 wrote:

Hi Ted,

I tried to implement this as a compiler warning. But i had some
problem in how to get the body of the ctor or dtor.

I implemented this in Sema::CheckConstructor and
Sema::CheckDestructor, but i can not get the body of them when there
is no body with the declaration yet.

The idea how I thought this would be implemented was to do the analysis when processing CallExprs as we are building them.  We should know up front when we are about to process a constructor body.  Simply set/clear a Sema flag when processing the constructor body, and your check can consult that flag when processing a CallExpr to see if it needs to do the check.  Alternatively, checking the DeclContext may be all that is needed to see if we are currently in a constructor body.

The advantage of this approach is that it keeps all the checking local.  Walking the AST of a constructor body *after* we have constructed it is slow, and not really a great approach for doing compiler warnings in the frontend (if we can at all help it).

And i also need the function body
of the CallExpr in the ctor/dtor to do recursive analysis.

Interesting.  If your goal is to do inter procedural analysis, then this should remain a static analyzer check, or that should be a case caught beyond what a compiler warning should do.  That wasn't clear to me before that this was something you wanted to do.

The tradeoffs between compiler and static analyzer warnings are fairly simple:

1) Compiler warnings have higher visibility because all users see them all the time when building their code.

2) The logic behind compiler warnings must be as fast as possible.  Inter-procedural analysis is usually a show stopper, and ideally the checks are highly local.  Sema is already "walking" all of the code as it builds the AST, so often the warning logic can be put in key places, essentially where the warning would be issued.





--
Best regards!

Lei Zhang

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Check virtual calls in ctor or dtor

Ted Kremenek
Hi Lei,

The part that scares me is the recursive walk of the callee's body.  That's not likely to have good enough performance for a compile-time warning.  For a compile-time warning, I think it is fine to just check the body of the constructor, but if you want to analyze the bodies of called functions you'd need to be extremely clever to do it efficiently enough for it to be a compile-time warning.  We really like compiler warnings to have highly predictable performance, and recursively walking the entire classes's method bodies is not really amendable to that.

If you think walking through all the bodies is necessary, then I'd leave it as a static analyzer checker.  You'd certainly get better coverage.  To handle the recursion, I'd probably make it a worklist (i.e., data recursive) to avoid stack blowup and to keep your memory footprint small.  You should also employ dynamic programming to avoid analyzing the same methods again.  The other nice thing about this approach is that it probably can be nicely generalized to global IPA once we have it.

One caveat about doing a recursive analysis of method bodies without any path-based analysis is that you are introducing additional risk of a false positive.  There could be unreachable paths where a constructor can't actually transitively trigger a call to a virtual function because of the context, but your check may report an issue.  I don't think this will typically be a problem, but it is something to consider.

Cheers,
Ted

On Sep 27, 2011, at 1:39 AM, 章磊 wrote:

Hi Ted,
 
My goal is to check whether there is a virtual call (directly or indirectly) in construction or destruction.
 
In my former patch i do this check to all the constructors and the destructor in checkASTDecl(const CXXRecordDecl ...). My way is walk through the ast body of ctor/dtor: if there is virtual call, warn it; if there is a call, walk through the callee's body recursively.
 
I think it's ok to do this check without the analysis engine and the inline IPA.
 
What you say?
在 2011年9月26日 下午9:37,Ted Kremenek <[hidden email]>写道:
On Sep 25, 2011, at 2:32 AM, 章磊 wrote:

Hi Ted,

I tried to implement this as a compiler warning. But i had some
problem in how to get the body of the ctor or dtor.

I implemented this in Sema::CheckConstructor and
Sema::CheckDestructor, but i can not get the body of them when there
is no body with the declaration yet.

The idea how I thought this would be implemented was to do the analysis when processing CallExprs as we are building them.  We should know up front when we are about to process a constructor body.  Simply set/clear a Sema flag when processing the constructor body, and your check can consult that flag when processing a CallExpr to see if it needs to do the check.  Alternatively, checking the DeclContext may be all that is needed to see if we are currently in a constructor body.

The advantage of this approach is that it keeps all the checking local.  Walking the AST of a constructor body *after* we have constructed it is slow, and not really a great approach for doing compiler warnings in the frontend (if we can at all help it).

And i also need the function body
of the CallExpr in the ctor/dtor to do recursive analysis.

Interesting.  If your goal is to do inter procedural analysis, then this should remain a static analyzer check, or that should be a case caught beyond what a compiler warning should do.  That wasn't clear to me before that this was something you wanted to do.

The tradeoffs between compiler and static analyzer warnings are fairly simple:

1) Compiler warnings have higher visibility because all users see them all the time when building their code.

2) The logic behind compiler warnings must be as fast as possible.  Inter-procedural analysis is usually a show stopper, and ideally the checks are highly local.  Sema is already "walking" all of the code as it builds the AST, so often the warning logic can be put in key places, essentially where the warning would be issued.





--
Best regards!

Lei Zhang


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Check virtual calls in ctor or dtor

Lei Zhang
Hi Ted,
 
I re-implement this check as a static analyzer checker, and i make the recursion a work list as you suggested.
 
Currently I'm not using dynamic programming but introduced a SmallPtrSet to avoid analyzing the same methods again. It's a top-down analysis, how could i introduce dynamic programming? By adjust the work list? It seems that if we introduce DP we still need a set to record those method we have analyzed.
 
What you Say?
 
2011/10/4, Ted Kremenek <[hidden email]>:
> Hi Lei,
>
> The part that scares me is the recursive walk of the callee's body.  That's
> not likely to have good enough performance for a compile-time warning.  For
> a compile-time warning, I think it is fine to just check the body of the
> constructor, but if you want to analyze the bodies of called functions you'd
> need to be extremely clever to do it efficiently enough for it to be a
> compile-time warning.  We really like compiler warnings to have highly
> predictable performance, and recursively walking the entire classes's method
> bodies is not really amendable to that.
>
> If you think walking through all the bodies is necessary, then I'd leave it
> as a static analyzer checker.  You'd certainly get better coverage.  To
> handle the recursion, I'd probably make it a worklist (i.e., data recursive)
> to avoid stack blowup and to keep your memory footprint small.  You should
> also employ dynamic programming to avoid analyzing the same methods again.
> The other nice thing about this approach is that it probably can be nicely
> generalized to global IPA once we have it.
>
> One caveat about doing a recursive analysis of method bodies without any
> path-based analysis is that you are introducing additional risk of a false
> positive.  There could be unreachable paths where a constructor can't
> actually transitively trigger a call to a virtual function because of the
> context, but your check may report an issue.  I don't think this will
> typically be a problem, but it is something to consider.
>
> Cheers,
> Ted
>
> On Sep 27, 2011, at 1:39 AM, 章磊 wrote:
>
>> Hi Ted,
>>
>> My goal is to check whether there is a virtual call (directly or
>> indirectly) in construction or destruction.
>>
>> In my former patch i do this check to all the constructors and the
>> destructor in checkASTDecl(const CXXRecordDecl ...). My way is walk
>> through the ast body of ctor/dtor: if there is virtual call, warn it; if
>> there is a call, walk through the callee's body recursively.
>>
>> I think it's ok to do this check without the analysis engine and the
>> inline IPA.
>>
>> What you say?
>> 在 2011年9月26日 下午9:37,Ted Kremenek <[hidden email]>写道:
>> On Sep 25, 2011, at 2:32 AM, 章磊 wrote:
>>
>>> Hi Ted,
>>>
>>> I tried to implement this as a compiler warning. But i had some
>>> problem in how to get the body of the ctor or dtor.
>>>
>>> I implemented this in Sema::CheckConstructor and
>>> Sema::CheckDestructor, but i can not get the body of them when there
>>> is no body with the declaration yet.
>>
>> The idea how I thought this would be implemented was to do the analysis
>> when processing CallExprs as we are building them.  We should know up
>> front when we are about to process a constructor body.  Simply set/clear a
>> Sema flag when processing the constructor body, and your check can consult
>> that flag when processing a CallExpr to see if it needs to do the check.
>> Alternatively, checking the DeclContext may be all that is needed to see
>> if we are currently in a constructor body.
>>
>> The advantage of this approach is that it keeps all the checking local.
>> Walking the AST of a constructor body *after* we have constructed it is
>> slow, and not really a great approach for doing compiler warnings in the
>> frontend (if we can at all help it).
>>
>>> And i also need the function body
>>> of the CallExpr in the ctor/dtor to do recursive analysis.
>>
>> Interesting.  If your goal is to do inter procedural analysis, then this
>> should remain a static analyzer check, or that should be a case caught
>> beyond what a compiler warning should do.  That wasn't clear to me before
>> that this was something you wanted to do.
>>
>> The tradeoffs between compiler and static analyzer warnings are fairly
>> simple:
>>
>> 1) Compiler warnings have higher visibility because all users see them all
>> the time when building their code.
>>
>> 2) The logic behind compiler warnings must be as fast as possible.
>> Inter-procedural analysis is usually a show stopper, and ideally the
>> checks are highly local.  Sema is already "walking" all of the code as it
>> builds the AST, so often the warning logic can be put in key places,
>> essentially where the warning would be issued.
>>
>>
>>
>>
>>
>> --
>> Best regards!
>>
>> Lei Zhang
>
>
 
 
--
Best regards!
 
Lei Zhang
 

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

VirtualCallinCtorOrDtor.patch (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Check virtual calls in ctor or dtor

Ted Kremenek
Hi Lei,

This is looking good.

I think one way to do DP is by modifying CheckedFunctions slightly.  Instead of CheckedFunctions just recording whether or not a function has been put on the worklist, you record whether or not it calls a virtual method (perhaps by registering the callsite?).  You could thus have three possibilities:

(1) Function hasn't been analyzed (if not, enqueue on the worklist)

(2) Function has been analyzed, and calls a virtual function.

(3) Function has been analyzed, and doesn't call a virtual function.

This should be easy to model with a DenseMap, mapping from the Decl* to the appropriate information.  You can consult the information for 1-3 before deciding the "recurse" into analyzing a called method.

As for propagating the information via dynamic programming, since your worklist is a stack, you know the immediate caller.  Once you see that a virtual method is called, essentially that means the entire chain of calls results in the call of a virtual method.  You can then annotate them all in one swoop.  This is essentially what you are assuming when you emit a warning.

On Oct 25, 2011, at 11:50 PM, 章磊 wrote:

Hi Ted,
 
I re-implement this check as a static analyzer checker, and i make the recursion a work list as you suggested.
 
Currently I'm not using dynamic programming but introduced a SmallPtrSet to avoid analyzing the same methods again. It's a top-down analysis, how could i introduce dynamic programming? By adjust the work list? It seems that if we introduce DP we still need a set to record those method we have analyzed.
 
What you Say?
 
2011/10/4, Ted Kremenek <[hidden email]>:
> Hi Lei,
>
> The part that scares me is the recursive walk of the callee's body.  That's
> not likely to have good enough performance for a compile-time warning.  For
> a compile-time warning, I think it is fine to just check the body of the
> constructor, but if you want to analyze the bodies of called functions you'd
> need to be extremely clever to do it efficiently enough for it to be a
> compile-time warning.  We really like compiler warnings to have highly
> predictable performance, and recursively walking the entire classes's method
> bodies is not really amendable to that.
>
> If you think walking through all the bodies is necessary, then I'd leave it
> as a static analyzer checker.  You'd certainly get better coverage.  To
> handle the recursion, I'd probably make it a worklist (i.e., data recursive)
> to avoid stack blowup and to keep your memory footprint small.  You should
> also employ dynamic programming to avoid analyzing the same methods again.
> The other nice thing about this approach is that it probably can be nicely
> generalized to global IPA once we have it.
>
> One caveat about doing a recursive analysis of method bodies without any
> path-based analysis is that you are introducing additional risk of a false
> positive.  There could be unreachable paths where a constructor can't
> actually transitively trigger a call to a virtual function because of the
> context, but your check may report an issue.  I don't think this will
> typically be a problem, but it is something to consider.
>
> Cheers,
> Ted
>
> On Sep 27, 2011, at 1:39 AM, 章磊 wrote:
>
>> Hi Ted,
>>
>> My goal is to check whether there is a virtual call (directly or
>> indirectly) in construction or destruction.
>>
>> In my former patch i do this check to all the constructors and the
>> destructor in checkASTDecl(const CXXRecordDecl ...). My way is walk
>> through the ast body of ctor/dtor: if there is virtual call, warn it; if
>> there is a call, walk through the callee's body recursively.
>>
>> I think it's ok to do this check without the analysis engine and the
>> inline IPA.
>>
>> What you say?
>> 在 2011年9月26日 下午9:37,Ted Kremenek <[hidden email]>写道:
>> On Sep 25, 2011, at 2:32 AM, 章磊 wrote:
>>
>>> Hi Ted,
>>>
>>> I tried to implement this as a compiler warning. But i had some
>>> problem in how to get the body of the ctor or dtor.
>>>
>>> I implemented this in Sema::CheckConstructor and
>>> Sema::CheckDestructor, but i can not get the body of them when there
>>> is no body with the declaration yet.
>>
>> The idea how I thought this would be implemented was to do the analysis
>> when processing CallExprs as we are building them.  We should know up
>> front when we are about to process a constructor body.  Simply set/clear a
>> Sema flag when processing the constructor body, and your check can consult
>> that flag when processing a CallExpr to see if it needs to do the check.
>> Alternatively, checking the DeclContext may be all that is needed to see
>> if we are currently in a constructor body.
>>
>> The advantage of this approach is that it keeps all the checking local.
>> Walking the AST of a constructor body *after* we have constructed it is
>> slow, and not really a great approach for doing compiler warnings in the
>> frontend (if we can at all help it).
>>
>>> And i also need the function body
>>> of the CallExpr in the ctor/dtor to do recursive analysis.
>>
>> Interesting.  If your goal is to do inter procedural analysis, then this
>> should remain a static analyzer check, or that should be a case caught
>> beyond what a compiler warning should do.  That wasn't clear to me before
>> that this was something you wanted to do.
>>
>> The tradeoffs between compiler and static analyzer warnings are fairly
>> simple:
>>
>> 1) Compiler warnings have higher visibility because all users see them all
>> the time when building their code.
>>
>> 2) The logic behind compiler warnings must be as fast as possible.
>> Inter-procedural analysis is usually a show stopper, and ideally the
>> checks are highly local.  Sema is already "walking" all of the code as it
>> builds the AST, so often the warning logic can be put in key places,
>> essentially where the warning would be issued.
>>
>>
>>
>>
>>
>> --
>> Best regards!
>>
>> Lei Zhang
>
>
 
 
--
Best regards!
 
Lei Zhang
 
<VirtualCallinCtorOrDtor.patch>


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Check virtual calls in ctor or dtor

Lei Zhang
Hi Ted,

I re-implement this checker again.

This time i introduce several data structure to do DP and the bugreport.
  1. I modified the worklist a bit, now its unit is a <FunctionDecl *, CallExpr *> pair from the caller's function decl to the call expr.
  2. RuntimeStack(vector of CallExpr) to simulate the call graph dynamiclly. This is for report the entire chain of the call path.
  3. A DenseMap from a FunctionDecl to all call paths in its body.
  4. A SmallPtrSet to  Record which functions have be fully handled. A function fully handled means that all function bodies it called have been handled.
I could not find a right way for this kind of bug report, it's not path diagnostic and we need to annotate the call expr in the call path. So i simply output the name with " <-- " to show the call-called relationship. It will crash on some code because "Name is not a simple identifier"? But if we use the SourceRange, there would not be any crash like this, right?

And i left several things unconsidered.
  1. The memory consume and reclaim.
  2. The readabiliy of the code. It's a little hard to read now.
Do you have any advices on these?

More comments inline, and looking forward to your reply.

在 2011年10月27日 下午12:20,Ted Kremenek <[hidden email]>写道:
Hi Lei,

This is looking good.

I think one way to do DP is by modifying CheckedFunctions slightly.  Instead of CheckedFunctions just recording whether or not a function has been put on the worklist, you record whether or not it calls a virtual method (perhaps by registering the callsite?).  You could thus have three possibilities:

(1) Function hasn't been analyzed (if not, enqueue on the worklist)

(2) Function has been analyzed, and calls a virtual function.

(3) Function has been analyzed, and doesn't call a virtual function.

This should be easy to model with a DenseMap, mapping from the Decl* to the appropriate information.  You can consult the information for 1-3 before deciding the "recurse" into analyzing a called method.

As for propagating the information via dynamic programming, since your worklist is a stack, you know the immediate caller.  Once you see that a virtual method is called, essentially that means the entire chain of calls results in the call of a virtual method.  You can then annotate them all in one swoop.  This is essentially what you are assuming when you emit a warning.

On Oct 25, 2011, at 11:50 PM, 章磊 wrote:

Hi Ted,
 
I re-implement this check as a static analyzer checker, and i make the recursion a work list as you suggested.
 
Currently I'm not using dynamic programming but introduced a SmallPtrSet to avoid analyzing the same methods again. It's a top-down analysis, how could i introduce dynamic programming? By adjust the work list? It seems that if we introduce DP we still need a set to record those method we have analyzed.
 
What you Say?
 
2011/10/4, Ted Kremenek <[hidden email]>:
> Hi Lei,
>
> The part that scares me is the recursive walk of the callee's body.  That's
> not likely to have good enough performance for a compile-time warning.  For
> a compile-time warning, I think it is fine to just check the body of the
> constructor, but if you want to analyze the bodies of called functions you'd
> need to be extremely clever to do it efficiently enough for it to be a
> compile-time warning.  We really like compiler warnings to have highly
> predictable performance, and recursively walking the entire classes's method
> bodies is not really amendable to that.
>
> If you think walking through all the bodies is necessary, then I'd leave it
> as a static analyzer checker.  You'd certainly get better coverage.  To
> handle the recursion, I'd probably make it a worklist (i.e., data recursive)
> to avoid stack blowup and to keep your memory footprint small.  You should
> also employ dynamic programming to avoid analyzing the same methods again.
> The other nice thing about this approach is that it probably can be nicely
> generalized to global IPA once we have it.
>
> One caveat about doing a recursive analysis of method bodies without any
> path-based analysis is that you are introducing additional risk of a false
> positive.  There could be unreachable paths where a constructor can't
> actually transitively trigger a call to a virtual function because of the
> context, but your check may report an issue.  I don't think this will
> typically be a problem, but it is something to consider.
>
> Cheers,
> Ted
>
> On Sep 27, 2011, at 1:39 AM, 章磊 wrote:
>
>> Hi Ted,
>>
>> My goal is to check whether there is a virtual call (directly or
>> indirectly) in construction or destruction.
>>
>> In my former patch i do this check to all the constructors and the
>> destructor in checkASTDecl(const CXXRecordDecl ...). My way is walk
>> through the ast body of ctor/dtor: if there is virtual call, warn it; if
>> there is a call, walk through the callee's body recursively.
>>
>> I think it's ok to do this check without the analysis engine and the
>> inline IPA.
>>
>> What you say?
>> 在 2011年9月26日 下午9:37,Ted Kremenek <[hidden email]>写道:
>> On Sep 25, 2011, at 2:32 AM, 章磊 wrote:
>>
>>> Hi Ted,
>>>
>>> I tried to implement this as a compiler warning. But i had some
>>> problem in how to get the body of the ctor or dtor.
>>>
>>> I implemented this in Sema::CheckConstructor and
>>> Sema::CheckDestructor, but i can not get the body of them when there
>>> is no body with the declaration yet.
>>
>> The idea how I thought this would be implemented was to do the analysis
>> when processing CallExprs as we are building them.  We should know up
>> front when we are about to process a constructor body.  Simply set/clear a
>> Sema flag when processing the constructor body, and your check can consult
>> that flag when processing a CallExpr to see if it needs to do the check.
>> Alternatively, checking the DeclContext may be all that is needed to see
>> if we are currently in a constructor body.
>>
>> The advantage of this approach is that it keeps all the checking local.
>> Walking the AST of a constructor body *after* we have constructed it is
>> slow, and not really a great approach for doing compiler warnings in the
>> frontend (if we can at all help it).
>>
>>> And i also need the function body
>>> of the CallExpr in the ctor/dtor to do recursive analysis.
>>
>> Interesting.  If your goal is to do inter procedural analysis, then this
>> should remain a static analyzer check, or that should be a case caught
>> beyond what a compiler warning should do.  That wasn't clear to me before
>> that this was something you wanted to do.
>>
>> The tradeoffs between compiler and static analyzer warnings are fairly
>> simple:
>>
>> 1) Compiler warnings have higher visibility because all users see them all
>> the time when building their code.
>>
>> 2) The logic behind compiler warnings must be as fast as possible.
>> Inter-procedural analysis is usually a show stopper, and ideally the
>> checks are highly local.  Sema is already "walking" all of the code as it
>> builds the AST, so often the warning logic can be put in key places,
>> essentially where the warning would be issued.
>>
>>
>>
>>
>>
>> --
>> Best regards!
>>
>> Lei Zhang
>
>
 
 
--
Best regards!
 
Lei Zhang
 
<VirtualCallinCtorOrDtor.patch>




--
Best regards!

Lei Zhang

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

VirtualCall.patch (17K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Check virtual calls in ctor or dtor

Ted Kremenek
On Nov 11, 2011, at 12:56 AM, 章磊 wrote:

Hi Ted,

I re-implement this checker again.

This time i introduce several data structure to do DP and the bugreport.
  1. I modified the worklist a bit, now its unit is a <FunctionDecl *, CallExpr *> pair from the caller's function decl to the call expr.
  2. RuntimeStack(vector of CallExpr) to simulate the call graph dynamiclly. This is for report the entire chain of the call path.
  3. A DenseMap from a FunctionDecl to all call paths in its body.
  4. A SmallPtrSet to  Record which functions have be fully handled. A function fully handled means that all function bodies it called have been handled.

Hi Lei,

Thanks for continuing to refine the checker.  I'll make specific comments inline on the patch below.

I could not find a right way for this kind of bug report, it's not path diagnostic and we need to annotate the call expr in the call path. So i simply output the name with " <-- " to show the call-called relationship.

Ironically, the kind of report you are looking for is something that Clang's primary diagnostic system handles well, but the static analyzer's diagnostics does not.  Since this is not a path-sensitive analysis, what essentially you want is a call-tree, represented with a single diagnostic and a collection of "notes".  This is how diagnostics are done in Clang's warnings, but we don't have support for notes in the analyzer diagnostics.

For now, I think constructing a painfully long diagnostic is acceptable, but the long term solution is to enhance analyzer diagnostics to support notes as well.

It will crash on some code because "Name is not a simple identifier"? But if we use the SourceRange, there would not be any crash like this, right?

No, unfortunately, this isn't acceptable, and using a SourceRange doesn't change the situation.  There are plenty of times where the name of a method won't be a simple identifier, especially in C++.  Fortunately, you can just use 'getNameAsString()' instead, and it won't crash.


And i left several things unconsidered.
  1. The memory consume and reclaim.
The checker cannot leak memory, so we need to address this (if necessary).  It looks to me like there aren't any issues, however.  Nothing looks dynamically allocated using the 'new' operator, and the WalkAST object has finite lifetime, so I think everything will get automatically memory managed.
  1. The readabiliy of the code. It's a little hard to read now.
The code can definitely benefit from a bit more refactoring.  For example, error reporting is duplicating in multiple places.

What I'd also like to see is some comments in the 'Execute' method that actually describes the algorithm.  I think that's the main missing piece.

Do you have any advices on these?



More comments inline in the patch:

--- lib/StaticAnalyzer/Checkers/VirtualCallChecker.cpp (revision 0)
+++ lib/StaticAnalyzer/Checkers/VirtualCallChecker.cpp (revision 0)
@@ -0,0 +1,334 @@
+//=======- VirtualCallChecker.cpp - Basic security checks --------*- C++ -*-==//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+//  This file defines a checker that checks virtual function calls during 
+//  construction or destruction.
+//
+//===----------------------------------------------------------------------===//
+
+#include "ClangSACheckers.h"
+#include "clang/StaticAnalyzer/Core/Checker.h"
+#include "clang/AST/DeclCXX.h"
+#include "clang/StaticAnalyzer/Core/BugReporter/BugReporter.h"
+#include "clang/AST/StmtVisitor.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/AnalysisManager.h"
+
+using namespace clang;
+using namespace ento;
+
+namespace {
+
+class WalkAST : public StmtVisitor<WalkAST> {
+  BugReporter &BR;
+  AnalysisDeclContext *AC;
+
+  // WorkList using stack.
+  typedef std::pair<const FunctionDecl *, const CallExpr *> TyWorkListUnit;
+  SmallVector<TyWorkListUnit, 20> Stack;

I typically prefer 'WorkListUnitTy' as opposed to 'TyWorkListUnit'.  Similarly, I expected the Stack type to also be typedef'ed, e.g.:

typedef SmallVector<WorkListUnitTy, 20> WorkListTy;
WorkListTy WorkList;

OR

typedef std::pair<const FunctionDecl *, const CallExpr *> WorkListUnit;
typedef SmallVector<WorkListUnit, 20> WorkList;
WorkList WList;

This keeps the type names cleaner and the name of the worklist object succinct.

+
+  // Runtime Stack to record call path.
+  SmallVector<const CallExpr *, 20> RuntimeStack;

Also, consider using a typedef, which ties in with my next comment.  For example:

typedef SmallVector<const CallExpr *, 20> CallChain;
typedef SmallVectorImpl<const CallExpr *> CallChainImpl;
CallChain RuntimeStack;

+
+  // DenseMap from FunctionDecl* to all call paths in its body.
+  llvm::DenseMap<const FunctionDecl *,
+                 SmallVector<SmallVector<const CallExpr *,20>,
+                             20> > VirtualCallsMap;

This is going to result with unnecessary malloc traffic, as any time the DenseMap gets resized all of the SmallVectors will get copied and destroyed.  Instead, try:

 typedef llvm::DenseMap<const FunctionDecl *, CallChainImpl *> CallsMap;
 CallsMap VirtualCallsMap;

This will require dynamically allocating CallChain objects, which will need to be manually destroyed, but this will be way more efficient.

+  
+  // Current working FunctionDecl.
+  const FunctionDecl *Current;

Why is this state even needed?

+
+  // Record which functions have be fully handled. A function fully handled 
+  // means that all function bodies it called have been handled.
+  llvm::SmallPtrSet<const FunctionDecl *, 256> CheckedFunctions;
+

'256' isn't really "small" anymore.  Not sure if this is an actual performance issue, but I'd use a DenseSet instead.  Try:

  llvm::DenseSet<const FunctionDecl *> CheckedFunctions;

+public:
+  WalkAST(BugReporter &br, AnalysisDeclContext *ac) : BR(br), AC(ac) {
+    Current = NULL;
+  }

'Current' can be initialized to NULL in the initializer list as well.

+  
+  bool hasWork() const {
+    return !Stack.empty();
+  }

Looks fine.

+  
+  void push(TyWorkListUnit WLUnit) {
+    Stack.push_back(WLUnit);
+  }

  void push(WorkListUnit WLUnit)

is much cleaner.

+  
+  TyWorkListUnit pop() {
+    assert (!Stack.empty());
+    TyWorkListUnit WLUnit = Stack.back();
+    Stack.pop_back();
+
+    return WLUnit;
+  }

I'd rename this to 'pop_back()'.  'pop()' in most data structures just pops the element without actually returning it.

+
+  void Execute() {

Please add a big comment here, explaining what this method does.

+    while (hasWork()) {
+      TyWorkListUnit WLUnit = pop();
+      const FunctionDecl *FD = WLUnit.second->getDirectCallee();
+      assert(FD && FD->getBody());
+
+      // Pop the RuntimeStack top, until the top is caller of current WLUnit.
+      while (!RuntimeStack.empty()) {
+        // get the top of runtime stack

Minor: please keep the comment style consistent.  Start with a capital letter, and end with a '.'

+        const CallExpr *Top = RuntimeStack.back();
+        if (WLUnit.first == Top->getDirectCallee())
+          break;
+        
+        // When we pop_back a CallExpr from the RuntimeStack, its body has been
+        // fully checked.
+        CheckedFunctions.insert(Top->getDirectCallee());
+
+        RuntimeStack.pop_back();
+      }
+      // After we find where is the top of the RuntimeStack, push the CallExpr
+      // into RuntimeStack.
+      RuntimeStack.push_back(WLUnit.second);
+      
+      // set Current to FD.

Minor: please keep the comment style consistent.  Start with a capital letter, and end with a '.'

+      Current = FD;
+
+      // Visit the body.
+      Visit(FD->getBody());
+    }
+
+    // TODO: cleanup the RuntimeStack.
+    RuntimeStack.clear();

I don't know what this TODO implies.  Can you clarify?  I think if the worklist empties out, and as long as RuntimeStack doesn't contain anything that its destructor can't automatically cleanup, there is nothing to worry about.

+  }
+
+  // Stmt visitor methods.
+  void VisitCallExpr(CallExpr *CE);
+  void VisitCXXMemberCallExpr(CallExpr *CE);
+  void VisitStmt(Stmt *S) { VisitChildren(S); }
+  void VisitChildren(Stmt *S);
+
+  void VisitCXXConstructorDecl(CXXConstructorDecl *Ctor) {
+    if (Ctor && Ctor->getBody()) {
+      Current = Ctor;
+      Visit(Ctor->getBody());
+    }
+    
+    // After visit the body of the CXXConstructorDecl, execute the worklist.
+    Execute();

Why do we need to recursively call 'Execute'?  Won't we already be processing the worklist?

+  }
+
+  void VisitCXXDestructorDecl(CXXDestructorDecl *Dtor) {
+    if (Dtor && Dtor->getBody()){
+      Current = Dtor;
+      Visit(Dtor->getBody());
+    }
+    
+    // After visit the body of the CXXDestructorDecl, execute the worklist.
+    Execute();

Why do we need to recursively call 'Execute'?  Won't we already be processing the worklist?

+  }
+
+  void ReportVirtualCall(const CallExpr *CE, llvm::raw_svector_ostream &os);
+
+};
+} // end anonymous namespace
+
+//===----------------------------------------------------------------------===//
+// AST walking.
+//===----------------------------------------------------------------------===//
+
+void WalkAST::VisitChildren(Stmt *S) {
+  for (Stmt::child_iterator I = S->child_begin(), E = S->child_end(); I!=E; ++I)
+    if (Stmt *child = *I)
+      Visit(child);
+}
+
+void WalkAST::VisitCallExpr(CallExpr *CE) {
+  VisitChildren(CE);
+  
+  // Get the callee.
+  const FunctionDecl *FD = CE->getDirectCallee();
+  if (!FD)
+    return;
+  
+  // If FD is already fully checked, no need to push the body into worklist
+  // again.
+  if (CheckedFunctions.count(FD)) {
+    
+    // get the runtime stack depth.

Minor: please keep the comment style consistent.  Start with a capital letter, and end with a '.'

+    unsigned StackDepth = RuntimeStack.size();
+    
+    // If there are virtual calls in the body of FD, we need to generate new
+    // bug reports. Otherwise we will do nothing.
+    if (VirtualCallsMap.count(FD)) {
+      for (unsigned i = 0; i <  VirtualCallsMap[FD].size(); i++) {
+
+        // Update virtual call paths of current function.
+        VirtualCallsMap[Current].push_back(VirtualCallsMap[FD][i]);
+        VirtualCallsMap[Current][VirtualCallsMap[Current].size()-1].
+          push_back(CE);


This is a bit gross and really inefficient.  Please hoist the VirtualCallsMap[FD].size() out of the loop, or better yet, do a find for the entry so you don't lazily create entries that don't need to exist (you avoid this with the call to 'count', but this double lookup is a bit wasteful).  This is a bunch of hashtable lookups to the same entry, when is really inefficient.  Consider:

   // If there are virtual calls in the body of FD, we need to generate new
  // bug reports. Otherwise we will do nothing.
  CallsMap::iterator I = VirtualCallsMap.find(FD);
  if (I == VirtualCallsMap.end())
   return;

  assert(I->second);
  CallChainImpl &Chain = *I->second;

  for (Chain::iterator it = Chain.begin(), et = Chain.end(); it != et; ++it) {
     // Update virtual call paths of current function.
     CallChainImpl *&CurrentChain = VirtualCallsMap[Current];
     if (!CurrentChain)
     CurrentChain = new CallChain();
     CurrentChain->push_back(*it);
     CurrentChain->back().push_back(CE);

This is much more efficient.

+        
+        // Generate the call path string for bug report.
+        llvm::SmallString<100> buf;
+        llvm::raw_svector_ostream os(buf);
+        
+        os << "\n" << VirtualCallsMap[FD][i][0]->getDirectCallee()->getName();
+        for (unsigned j = 1; j < VirtualCallsMap[FD][i].size(); i++)
+          os << " <-- "
+             << VirtualCallsMap[FD][i][j]->getDirectCallee()->getName();
+        
+        os << " <-- " << CE->getDirectCallee()->getName();
+        for (unsigned k = 1; k <= StackDepth; k++)
+          // R[i] = RuntimeStack[StackDepth-i]->getCallee()->getSourceRange();
+          os << " <-- " 
+             << RuntimeStack[StackDepth-k]->getDirectCallee()->getName();
+        
+        os << "\n" ;

This string generation logic looks identical to the logic in VisitCXXMemberCallExpr.  Why can't this just be refactored into ReportVirtualCall, or into a separate method which generates the string in the SmallString which we then just pass to ReportVirtualCall as a StringRef?  We shouldn't be copy-pasting string generation logic, as it means if we change it we need to change it in multiple places.

There is also a ton of redundant lookups to VirtualCallsMap.  Those aren't free.  While strictly speaking not a performance issue here (since it occurs when generating a bug report), we generally like to enforce the same kind of good performance conventions everywhere in the codebase to encourage best practice.  Please clean it up using the same kind of tricks I used just above.

+        
+        ReportVirtualCall(VirtualCallsMap[FD][i][0], os);

No need to do yet another lookup to VirtualCallsMap[FD].  We already have the CurrentChain object.

+      }
+    }
+    
+    return;
+  }
+  
+  if (FD->getBody()) 
+    push(std::make_pair(Current, CE));

Looks good.

+}
+
+void WalkAST::VisitCXXMemberCallExpr(CallExpr *CE) {
+  VisitChildren(CE);
+
+  // If there is a qualifier to this CallExpr, return.
+  if (MemberExpr *CME = dyn_cast<MemberExpr>(CE->getCallee())) {
+    if (CME->getQualifier())
+      return;
+  }
+
+  // Get the callee.
+  const CXXMethodDecl *MD = dyn_cast<CXXMethodDecl>(CE->getDirectCallee());
+  if (!MD)
+    return;
+  
+  // get the runtime stack depth.
+  unsigned StackDepth = RuntimeStack.size();
+
+  if (MD->isVirtual()) {
+
+    VirtualCallsMap[Current].
+      push_back(llvm::SmallVector<const CallExpr *, 20>(1, CE));

Do not create container objects in this way.  This implicitly results in a temporary object getting created, and then copied.  Instead:

  CallChainImpl *Chain = new CallChain()l
  Chain->push_back(CE);
  VirtualCallsMap[Current] = Chain;

+
+    llvm::SmallString<100> buf;
+    llvm::raw_svector_ostream os(buf);
+    
+    os << "\n" << CE->getDirectCallee()->getName();
+    for (unsigned i = 1; i <= StackDepth; i++) 
+      // R[i] = RuntimeStack[StackDepth-i]->getCallee()->getSourceRange();
+      os << " <-- " << RuntimeStack[StackDepth-i]->getDirectCallee()->getName();
+    
+    os << "\n" ;
+
+    ReportVirtualCall(CE, os);
+
+    // Do not check the body.
+    return;
+  }
+
+  // If the MethodDecl is already fully checked, no need to push the body into 
+  // worklist again.
+  if (CheckedFunctions.count(MD)) {
+    // If there are virtual calls in the body of the MethodDecl, we need to 
+    // generate new bug reports. Otherwise we should do nothing.
+    if (VirtualCallsMap.count(MD)) {
+      for (unsigned i = 0; i <  VirtualCallsMap[MD].size(); i++) {
+
+        VirtualCallsMap[Current].push_back(VirtualCallsMap[MD][i]);
+        VirtualCallsMap[Current][VirtualCallsMap[Current].size()-1].
+          push_back(CE);
+        
+        llvm::SmallString<100> buf;
+        llvm::raw_svector_ostream os(buf);
+        
+        os << "\n" << VirtualCallsMap[MD][i][0]->getDirectCallee()->getName();
+        for (unsigned j = 1; j < VirtualCallsMap[MD][i].size(); i++)
+          os << " <-- "
+             << VirtualCallsMap[MD][i][j]->getDirectCallee()->getName();
+
+        os << " <-- " << CE->getDirectCallee()->getName();
+        for (unsigned k = 1; k <= StackDepth; k++)
+          // R[i] = RuntimeStack[StackDepth-i]->getCallee()->getSourceRange();
+          os << " <-- " 
+             << RuntimeStack[StackDepth-k]->getDirectCallee()->getName();
+        
+        os << "\n" ;
+        
+        ReportVirtualCall(VirtualCallsMap[MD][i][0], os);

This logic is practically identical to the logic in VisitCallExpr().  This can be completely refactored to much simpler and avoid the duplication.

+      }
+    }
+    
+    return;
+  }
+
+  // Push the body into the worklist.
+  if (MD->getBody())
+    push(std::make_pair(Current, CE));
+}
+
+void WalkAST::ReportVirtualCall(const CallExpr *CE,
+                                llvm::raw_svector_ostream &os) {
+  PathDiagnosticLocation CELoc =
+    PathDiagnosticLocation::createBegin(CE, BR.getSourceManager(), AC);
+  SourceRange R = CE->getCallee()->getSourceRange();
+  
+  // Get the callee.
+  const CXXMethodDecl *MD = dyn_cast<CXXMethodDecl>(CE->getDirectCallee());
+  
+  if (MD->isPure()) {
+    os <<  "Call pure virtual functions during construction or "
+       << "destruction may leads undefined behaviour";
+    BR.EmitBasicReport("Call pure virtual function during construction or "
+                       "Destruction",
+                       "Cplusplus",
+                       os.str(), CELoc, &R, 1);
+    return;
+  }
+  else {
+    os << "\n" << "Call virtual functions during construction or "
+       << "destruction will never go to a more derived class";
+    BR.EmitBasicReport("Call virtual function during construction or "
+                       "Destruction",
+                       "Cplusplus",
+                       os.str(), CELoc, &R, 1);
+    return;
+  }
+}
+
+//===----------------------------------------------------------------------===//
+// VirtualCallChecker
+//===----------------------------------------------------------------------===//
+
+namespace {
+class VirtualCallChecker : public Checker<check::ASTDecl<CXXRecordDecl> > {
+public:
+  void checkASTDecl(const CXXRecordDecl *RD, AnalysisManager& mgr,
+                    BugReporter &BR) const {
+    WalkAST walker(BR, mgr.getAnalysisDeclContext(RD));
+
+    // Check the constructors.
+    for (CXXRecordDecl::ctor_iterator I = RD->ctor_begin(), E = RD->ctor_end();
+         I != E; ++I) {
+      if (I->isCopyOrMoveConstructor())
+        continue;
+      walker.VisitCXXConstructorDecl(*I);
+    }
+
+    // Check the destructor.
+    CXXDestructorDecl *DD = RD->getDestructor();
+    walker.VisitCXXDestructorDecl(DD);
+  }
+};
+}
+
+void ento::registerVirtualCallChecker(CheckerManager &mgr) {
+  mgr.registerChecker<VirtualCallChecker>();
+}

I think my main high level concern, beyond the code duplication, is that the algorithm seems a bit hard to follow.  For example, I see two code paths in VisitCallExpr(), one for the case where the body of the callee has been analyzed before, and another one where it has not.  Error reporting seems duplicated in a bunch of places, which is likely because the logic isn't quite factored the way it should be.

Stepping back, I think it's worth re-evaluating the overall algorithm.  Here's an abstraction that I think will help.

Essentially we are building a virtual function callgraph.  Once we have the callgraph, we can determine if any of the constructors/destructors can call a virtual function.  Here's a simple decoupling of the algorithm:

(1) Do a DFS worklist algorithm to build the callgraph.  This is basically the walk you are doing now, but you don't have to do any error reporting.  You also don't need to keep track of the current call stack; all you care about is the current function you are analyzing, and the functions it calls.  The worklist just consists of enqueueing FunctionDecls, dequeing them and walking their bodies looking for calls.  When you see a call, you add the called function to the set of functions called by that function.  Of course we only want to restrict ourselves to calls on the 'this' object.

(2) Once you have the (sparse) callgraph, you can iterate over all the constructors and destructors (you can cache a list of them on the side while doing step 1).  For each one, do a BFS over the callgraph, looking for virtual functions.  The BFS can be done as a worklist.  The advantage of using a BFS is you just report the shortest call chain from the constructor/destructor to the virtual function.  For each constructor/destructor, only report calling a specific virtual function once.

(2a) As a refinement to (2), you can cache when certain functions *never* call a virtual function, thus pruning the search.  This is an optional step to improve performance.

(2b) Error reporting comes down to just reporting a path through the callgraph.  Thus error reporting just becomes centralized in one place.  No code duplication.

The nice thing about this approach is you get a really nice separation of concerns.  You decouple to problem of finding the call graph (which conceptually could be extended to incorporate path-sensitive information, etc.) from the process of finding a bug (a path through the callgraph from constructor to virtual function).  You also decouple the string generation from the logic of finding bugs.

I hope my comments are not discouraging.  I think you are on the right track.  I think taking a step back and looking at how the algorithm could be restructured would be really beneficial to cleaning up the code and also potentially making the checker more powerful in the long run.  I also think you should pay a bit more attention to the accesses to containers.  While doing redundant hash table lookups on a error reporting case is not so bad, having redundant lookups in the fast path of the checker really adds unnecessary sluggishness to your code.

I hope this helps.

Cheers,
Ted


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Check virtual calls in ctor or dtor

Ted Kremenek
On Nov 22, 2011, at 10:11 PM, Ted Kremenek wrote:

Stepping back, I think it's worth re-evaluating the overall algorithm.  Here's an abstraction that I think will help.

Essentially we are building a virtual function callgraph.  Once we have the callgraph, we can determine if any of the constructors/destructors can call a virtual function.  Here's a simple decoupling of the algorithm:

(1) Do a DFS worklist algorithm to build the callgraph.  This is basically the walk you are doing now, but you don't have to do any error reporting.  You also don't need to keep track of the current call stack; all you care about is the current function you are analyzing, and the functions it calls.  The worklist just consists of enqueueing FunctionDecls, dequeing them and walking their bodies looking for calls.  When you see a call, you add the called function to the set of functions called by that function.  Of course we only want to restrict ourselves to calls on the 'this' object.

(2) Once you have the (sparse) callgraph, you can iterate over all the constructors and destructors (you can cache a list of them on the side while doing step 1).  For each one, do a BFS over the callgraph, looking for virtual functions.  The BFS can be done as a worklist.  The advantage of using a BFS is you just report the shortest call chain from the constructor/destructor to the virtual function.  For each constructor/destructor, only report calling a specific virtual function once.

(2a) As a refinement to (2), you can cache when certain functions *never* call a virtual function, thus pruning the search.  This is an optional step to improve performance.

(2b) Error reporting comes down to just reporting a path through the callgraph.  Thus error reporting just becomes centralized in one place.  No code duplication.

The nice thing about this approach is you get a really nice separation of concerns.  You decouple to problem of finding the call graph (which conceptually could be extended to incorporate path-sensitive information, etc.) from the process of finding a bug (a path through the callgraph from constructor to virtual function).  You also decouple the string generation from the logic of finding bugs.

I hope my comments are not discouraging.  I think you are on the right track.  I think taking a step back and looking at how the algorithm could be restructured would be really beneficial to cleaning up the code and also potentially making the checker more powerful in the long run.  I also think you should pay a bit more attention to the accesses to containers.  While doing redundant hash table lookups on a error reporting case is not so bad, having redundant lookups in the fast path of the checker really adds unnecessary sluggishness to your code.

I hope this helps.

Thinking about this more, I think this strategy, while generally, is more complicated then you need for a first go around.

Here is a much simpler approach, which basically matches what you are already doing:

(1) Starting from each constructor/destructor, do a DFS, using a worklist.  The worklist just has a chain of FunctionDecls.

  (a) Define 'Enqueue' to enqueue a method to the worklist.  This..

    (i) adds the method to the stack (push_back())
    (ii) adds the method to a DenseMap that records two states: "PreVisit" and "PostVisit".  Enqueue marks them as PreVisit. 

  (b) When processing a method from the worklist:

     (i) dequeue the item from the end of the stack, but don't remove it yet from the stack
     
     IF the function is in the PreVisit state:

     (ii) walk its body:
          - If we find a call to a virtual function, emit a warning by simply passing the complete stack + the virtual function called to our error reporting routine.
          - If we find a call to a non-virtual function, add it to the worklist IFF it is not already in the PreVisit or PostVisit state.
     (iii) mark the decl as being in the PostVisit state

     (iv) continue processing the worklist

    IF the function is in the PostVisit state:

    (v) pop it off the worklist.

The value of the PreVisit versus PostVisit is that we always keep a full call chain stack available.

This should provide a nice decoupling between the search and the error reporting, while completely unrolling the recursion.  As a further optimization, we can potential recording as part of step (v) whether or not a function is never transitively calls a virtual method, and thus doesn't need to be considered as part of any further search.

I think this should be a vast simplification of your existing logic, and also is much simpler than the algorithm I proposed.

What do you think?

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Check virtual calls in ctor or dtor

Lei Zhang
Hi Ted,

Thank you very much for your reply and all your advises.

Sorry for this late reply, i am busy on my job this period.

I think the last approach is a good choice, it's simple and clear.

But i have two minor problems.

  1. When we do (b)(ii), we may find many calls not already in the PreVisit or PostVisit state, then we will add them to the stack. So the complete stack is not a  call stack, right? Then we may need a iterator  point to the "call stack" top?
  2. In your former mail, you mentioned that "For each constructor/destructor, only report calling a specific virtual function once." IMO, we may need to report all the calls to a specific virtual function, these information may help the users to find out where is the exact place in the call path to fix these problems. But it may seems to noisy, so we may need a new bug report mechanism to make this kind of report clear.
After all, I will re-implement this patch lately after this busy period.

在 2011年11月23日 下午2:41,Ted Kremenek <[hidden email]>写道:
On Nov 22, 2011, at 10:11 PM, Ted Kremenek wrote:

Stepping back, I think it's worth re-evaluating the overall algorithm.  Here's an abstraction that I think will help.

Essentially we are building a virtual function callgraph.  Once we have the callgraph, we can determine if any of the constructors/destructors can call a virtual function.  Here's a simple decoupling of the algorithm:

(1) Do a DFS worklist algorithm to build the callgraph.  This is basically the walk you are doing now, but you don't have to do any error reporting.  You also don't need to keep track of the current call stack; all you care about is the current function you are analyzing, and the functions it calls.  The worklist just consists of enqueueing FunctionDecls, dequeing them and walking their bodies looking for calls.  When you see a call, you add the called function to the set of functions called by that function.  Of course we only want to restrict ourselves to calls on the 'this' object.

(2) Once you have the (sparse) callgraph, you can iterate over all the constructors and destructors (you can cache a list of them on the side while doing step 1).  For each one, do a BFS over the callgraph, looking for virtual functions.  The BFS can be done as a worklist.  The advantage of using a BFS is you just report the shortest call chain from the constructor/destructor to the virtual function.  For each constructor/destructor, only report calling a specific virtual function once.

(2a) As a refinement to (2), you can cache when certain functions *never* call a virtual function, thus pruning the search.  This is an optional step to improve performance.

(2b) Error reporting comes down to just reporting a path through the callgraph.  Thus error reporting just becomes centralized in one place.  No code duplication.

The nice thing about this approach is you get a really nice separation of concerns.  You decouple to problem of finding the call graph (which conceptually could be extended to incorporate path-sensitive information, etc.) from the process of finding a bug (a path through the callgraph from constructor to virtual function).  You also decouple the string generation from the logic of finding bugs.

I hope my comments are not discouraging.  I think you are on the right track.  I think taking a step back and looking at how the algorithm could be restructured would be really beneficial to cleaning up the code and also potentially making the checker more powerful in the long run.  I also think you should pay a bit more attention to the accesses to containers.  While doing redundant hash table lookups on a error reporting case is not so bad, having redundant lookups in the fast path of the checker really adds unnecessary sluggishness to your code.

I hope this helps.

Thinking about this more, I think this strategy, while generally, is more complicated then you need for a first go around.

Here is a much simpler approach, which basically matches what you are already doing:

(1) Starting from each constructor/destructor, do a DFS, using a worklist.  The worklist just has a chain of FunctionDecls.

  (a) Define 'Enqueue' to enqueue a method to the worklist.  This..

    (i) adds the method to the stack (push_back())
    (ii) adds the method to a DenseMap that records two states: "PreVisit" and "PostVisit".  Enqueue marks them as PreVisit. 

  (b) When processing a method from the worklist:

     (i) dequeue the item from the end of the stack, but don't remove it yet from the stack
     
     IF the function is in the PreVisit state:

     (ii) walk its body:
          - If we find a call to a virtual function, emit a warning by simply passing the complete stack + the virtual function called to our error reporting routine.
          - If we find a call to a non-virtual function, add it to the worklist IFF it is not already in the PreVisit or PostVisit state.
     (iii) mark the decl as being in the PostVisit state

     (iv) continue processing the worklist

    IF the function is in the PostVisit state:

    (v) pop it off the worklist.

The value of the PreVisit versus PostVisit is that we always keep a full call chain stack available.

This should provide a nice decoupling between the search and the error reporting, while completely unrolling the recursion.  As a further optimization, we can potential recording as part of step (v) whether or not a function is never transitively calls a virtual method, and thus doesn't need to be considered as part of any further search.

I think this should be a vast simplification of your existing logic, and also is much simpler than the algorithm I proposed.

What do you think?



--
Best regards!

Lei Zhang

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Check virtual calls in ctor or dtor

Ted Kremenek
On Dec 1, 2011, at 7:08 PM, 章磊 wrote:

I think the last approach is a good choice, it's simple and clear.

But i have two minor problems.

  1. When we do (b)(ii), we may find many calls not already in the PreVisit or PostVisit state, then we will add them to the stack. So the complete stack is not a  call stack, right? Then we may need a iterator  point to the "call stack" top?
Assuming the "stack" is implemented as a vector, where we push PreVisit/PostVisit on the end, I think you can recover the correct call stack by only looking at your ancestors that are in the PostVisit state.  You then just skip those in the vector in the PreVisit state, as those represents functions whose bodies still need to be analyzed.  This approach only works if your worklist algorithm is DFS.  The invariant there is that the only functions in the PostVisit state are those guaranteed to be your ancestors in the DFS traversal, while anything in the PreVisit state might be a sibling, a cousin, etc.

I could be missing something.  Shouldn't this work?
  1. In your former mail, you mentioned that "For each constructor/destructor, only report calling a specific virtual function once." IMO, we may need to report all the calls to a specific virtual function, these information may help the users to find out where is the exact place in the call path to fix these problems. But it may seems to noisy, so we may need a new bug report mechanism to make this kind of report clear.
Fair points.  Another way to look at it is that one path gets reported, the user fixes it, and re-runs the analyzer to see if any problems remain.  I'm fine with going for the simpler approach for now of just reporting everything, and see how bad it is in practice.

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Check virtual calls in ctor or dtor

Lei Zhang


2011/12/10 Ted Kremenek <[hidden email]>
On Dec 1, 2011, at 7:08 PM, 章磊 wrote:

I think the last approach is a good choice, it's simple and clear.

But i have two minor problems.

  1. When we do (b)(ii), we may find many calls not already in the PreVisit or PostVisit state, then we will add them to the stack. So the complete stack is not a  call stack, right? Then we may need a iterator  point to the "call stack" top?
Assuming the "stack" is implemented as a vector, where we push PreVisit/PostVisit on the end, I think you can recover the correct call stack by only looking at your ancestors that are in the PostVisit state.  You then just skip those in the vector in the PreVisit state, as those represents functions whose bodies still need to be analyzed.  This approach only works if your worklist algorithm is DFS.  The invariant there is that the only functions in the PostVisit state are those guaranteed to be your ancestors in the DFS traversal, while anything in the PreVisit state might be a sibling, a cousin, etc.

I could be missing something.  Shouldn't this work?

Sounds great. I will implement this in my later patch.
  1. In your former mail, you mentioned that "For each constructor/destructor, only report calling a specific virtual function once." IMO, we may need to report all the calls to a specific virtual function, these information may help the users to find out where is the exact place in the call path to fix these problems. But it may seems to noisy, so we may need a new bug report mechanism to make this kind of report clear.
Fair points.  Another way to look at it is that one path gets reported, the user fixes it, and re-runs the analyzer to see if any problems remain.  I'm fine with going for the simpler approach for now of just reporting everything, and see how bad it is in practice.



--
Best regards!

Lei Zhang

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Check virtual calls in ctor or dtor

Lei Zhang
Hi Ted,

I re-implement this checker as you suggested.

Based on your suggestion, I modified two minor places for the bug
report generation.

   1. Use CallExpr instead of FunctionDecl as the WorkList unit.
   2. Introduced a pointer point to the the CallExpr whose body is
current walking.

I test this patch on some real code base, and it seems to work well.

More comments inline, and looking forward to your reply.

2011/12/12, 章磊 <[hidden email]>:

> 2011/12/10 Ted Kremenek <[hidden email]>
>
>> On Dec 1, 2011, at 7:08 PM, 章磊 wrote:
>>
>> I think the last approach is a good choice, it's simple and clear.
>>
>> But i have two minor problems.
>>
>>
>>    1. When we do (b)(ii), we may find many calls not already in the
>>    PreVisit or PostVisit state, then we will add them to the stack. So
>> the
>>    complete stack is not a  call stack, right? Then we may need a
>> iterator
>>    point to the "call stack" top?
>>
>> Assuming the "stack" is implemented as a vector, where we push
>> PreVisit/PostVisit on the end, I think you can recover the correct call
>> stack by only looking at your ancestors that are in the PostVisit state.
>>  You then just skip those in the vector in the PreVisit state, as those
>> represents functions whose bodies still need to be analyzed.  This
>> approach
>> only works if your worklist algorithm is DFS.  The invariant there is
>> that
>> the only functions in the PostVisit state are those guaranteed to be your
>> ancestors in the DFS traversal, while anything in the PreVisit state
>> might
>> be a sibling, a cousin, etc.
>>
>> I could be missing something.  Shouldn't this work?
>>
>
> Sounds great. I will implement this in my later patch.
>
>>
>>    1. In your former mail, you mentioned that "For each
>>    constructor/destructor, only report calling a specific virtual
>> function
>>    once." IMO, we may need to report all the calls to a specific virtual
>>    function, these information may help the users to find out where is
>> the
>>    exact place in the call path to fix these problems. But it may seems
>> to
>>    noisy, so we may need a new bug report mechanism to make this kind of
>>    report clear.
>>
>> Fair points.  Another way to look at it is that one path gets reported,
>> the user fixes it, and re-runs the analyzer to see if any problems
>> remain.
>>  I'm fine with going for the simpler approach for now of just reporting
>> everything, and see how bad it is in practice.
>>
>
>
>
> --
> Best regards!
>
> Lei Zhang
>

--
Best regards!

Lei Zhang

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

virtualcall.patch (13K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Check virtual calls in ctor or dtor

Ted Kremenek
Thanks for your patience Lei.  After a break, I was able to get back to reviewing this properly.

Overall, I think this is a great initial version of the checker.  We will still want to refine the diagnostics, but I think the base functionality is good.  I went ahead and checked in this patch (with changes) in r147494.  The overall checker looked good that I thought only a few tweaks were necessary to get it into mainline, so I went and made the changes myself.

Here are the highlights of my changes:

(1) I had Enqueue() do all the checking to see if the method was prescanned or elible for scanning.  No need to duplicate that checking in a bunch of places.  I added a "NotVisited" value to the enum (which gets default constructed) to avoid uses of "count()" in the DenseMap.  This should be slightly more efficient.

(2) For fully resolved calls, e.g. B::foo(), your logic had the checker skipping those calls entirely.  That's not quite right.  They should be scanned like other calls, just not warned about being virtual calls.  I've gone ahead and fixed that.

(3) I doxygenified more of the comments, and cleaned up a bit of the formatting.

I think the next step is to improve the bug reporting.  I think the natural way to do this is to provide a way to issue a BugReport with a collection of PathDiagnostics.  This allows one to generate a bug report without the assistance of BugReporter constructing it for you.

On Dec 25, 2011, at 2:11 AM, 章磊 wrote:

Hi Ted,

I re-implement this checker as you suggested.

Based on your suggestion, I modified two minor places for the bug
report generation.

  1. Use CallExpr instead of FunctionDecl as the WorkList unit.
  2. Introduced a pointer point to the the CallExpr whose body is
current walking.

I test this patch on some real code base, and it seems to work well.

More comments inline, and looking forward to your reply.


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev