Quantcast

Cross Translational Unit Analysis in Clang Static Analyzer

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Cross Translational Unit Analysis in Clang Static Analyzer

Brian Cain via cfe-dev

Hi All,

 

at the EuroLLVM’17 conference we presented our results  about a new analysis mode in clang static analyzer: Cross Translational Unit analysis.

See patch https://reviews.llvm.org/D30691

which is based on the work of A. Sidorin et al. http://lists.llvm.org/pipermail/cfe-dev/2015-October/045730.html, but without function summaries and updated to the newest Clang.

 

The CTU mode allows the analyzer to “inline” function calls that are defined in another TU than the one currently analyzed.

So it allows to find bugs that span multiple source files.

Without this patch the static analyzer engine, when meets an external function call,

cannot reason about the return value of a function (unknown) and the pointed values, references passed to a function as parameter are invalidated.

 

You can find a full patched clang 4.0 (use it with llvm commit 01609a325b5f85d88e3ab5c7d470409092436cb2 )

https://github.com/dkrupp/clang/tree/ctu-master

 

We have run the analysis on some reasonably-sized  (ffmpeg, curl, vim, openssl, postgresql) open source C projects and found many additional true positive reports compared to the traditional single TU mode in all projects.

This indicates that this feature would give many new results on any project.

 

We measured the heap usage, the analysis time and the number of new findings.

You can find the detailed comparison results here:

http://cc.elte.hu/clang-ctu/

In summary, the number of reported bugs is ~1.5-5x times the original single TU analysis, at the cost of 1.5-5x higher analysis time, 1.5-5x max heap usage (roughly in proportion to the increase in the number of reported faults).

 

The design concept is described shortly in this document: http://cc.elte.hu/clang-ctu/eurollvm17/abstract.pdf

 

If you would like to try this analysis mode on your project please find the description of the 2 new additional analyzer scripts here:

https://github.com/dkrupp/clang/blob/ctu-master/tools/xtu-build-new/readme.md

 

Would be  happy to hear your opinion and experiences with this feature and would appreciate your help in reviewing the patch.

 

Thanks & Regards,

Daniel

 


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cross Translational Unit Analysis in Clang Static Analyzer

Brian Cain via cfe-dev
Hello Daniel & Gabor. Thank you very much for your work!

I saw the patch and found it mostly familiar for me. But, unfortunately, now I cannot find enough time to make its review (my solutions that were implemented 2 years ago need some revisiting too).

I can try to do this review incrementally, by small chunks, if you are OK with it. But it will still take time. Sorry for this inconvenience.


31.03.2017 18:28, Dániel Krupp via cfe-dev пишет:

Hi All,

 

at the EuroLLVM’17 conference we presented our results  about a new analysis mode in clang static analyzer: Cross Translational Unit analysis.

See patch https://reviews.llvm.org/D30691

which is based on the work of A. Sidorin et al. http://lists.llvm.org/pipermail/cfe-dev/2015-October/045730.html, but without function summaries and updated to the newest Clang.

 

The CTU mode allows the analyzer to “inline” function calls that are defined in another TU than the one currently analyzed.

So it allows to find bugs that span multiple source files.

Without this patch the static analyzer engine, when meets an external function call,

cannot reason about the return value of a function (unknown) and the pointed values, references passed to a function as parameter are invalidated.

 

You can find a full patched clang 4.0 (use it with llvm commit 01609a325b5f85d88e3ab5c7d470409092436cb2 )

https://github.com/dkrupp/clang/tree/ctu-master

 

We have run the analysis on some reasonably-sized  (ffmpeg, curl, vim, openssl, postgresql) open source C projects and found many additional true positive reports compared to the traditional single TU mode in all projects.

This indicates that this feature would give many new results on any project.

 

We measured the heap usage, the analysis time and the number of new findings.

You can find the detailed comparison results here:

http://cc.elte.hu/clang-ctu/

In summary, the number of reported bugs is ~1.5-5x times the original single TU analysis, at the cost of 1.5-5x higher analysis time, 1.5-5x max heap usage (roughly in proportion to the increase in the number of reported faults).

 

The design concept is described shortly in this document: http://cc.elte.hu/clang-ctu/eurollvm17/abstract.pdf

 

If you would like to try this analysis mode on your project please find the description of the 2 new additional analyzer scripts here:

https://github.com/dkrupp/clang/blob/ctu-master/tools/xtu-build-new/readme.md

 

Would be  happy to hear your opinion and experiences with this feature and would appreciate your help in reviewing the patch.

 

Thanks & Regards,

Daniel

 



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


-- 
Best regards,
Aleksei Sidorin,
SRR, Samsung Electronics

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cross Translational Unit Analysis in Clang Static Analyzer

Brian Cain via cfe-dev

Hi Aleksei,

 

your review would be very highly appreciated in any pace and form. J

 

Thanks,

Daniel

 

From: Aleksei Sidorin [mailto:[hidden email]]
Sent: 2017. március 31. 18:03
To: Dániel Krupp <[hidden email]>; [hidden email]
Subject: Re: [cfe-dev] Cross Translational Unit Analysis in Clang Static Analyzer

 

Hello Daniel & Gabor. Thank you very much for your work!

I saw the patch and found it mostly familiar for me. But, unfortunately, now I cannot find enough time to make its review (my solutions that were implemented 2 years ago need some revisiting too).

I can try to do this review incrementally, by small chunks, if you are OK with it. But it will still take time. Sorry for this inconvenience.


31.03.2017 18:28, Dániel Krupp via cfe-dev пишет:

Hi All,

 

at the EuroLLVM’17 conference we presented our results  about a new analysis mode in clang static analyzer: Cross Translational Unit analysis.

See patch https://reviews.llvm.org/D30691

which is based on the work of A. Sidorin et al. http://lists.llvm.org/pipermail/cfe-dev/2015-October/045730.html, but without function summaries and updated to the newest Clang.

 

The CTU mode allows the analyzer to “inline” function calls that are defined in another TU than the one currently analyzed.

So it allows to find bugs that span multiple source files.

Without this patch the static analyzer engine, when meets an external function call,

cannot reason about the return value of a function (unknown) and the pointed values, references passed to a function as parameter are invalidated.

 

You can find a full patched clang 4.0 (use it with llvm commit 01609a325b5f85d88e3ab5c7d470409092436cb2 )

https://github.com/dkrupp/clang/tree/ctu-master

 

We have run the analysis on some reasonably-sized  (ffmpeg, curl, vim, openssl, postgresql) open source C projects and found many additional true positive reports compared to the traditional single TU mode in all projects.

This indicates that this feature would give many new results on any project.

 

We measured the heap usage, the analysis time and the number of new findings.

You can find the detailed comparison results here:

http://cc.elte.hu/clang-ctu/

In summary, the number of reported bugs is ~1.5-5x times the original single TU analysis, at the cost of 1.5-5x higher analysis time, 1.5-5x max heap usage (roughly in proportion to the increase in the number of reported faults).

 

The design concept is described shortly in this document: http://cc.elte.hu/clang-ctu/eurollvm17/abstract.pdf

 

If you would like to try this analysis mode on your project please find the description of the 2 new additional analyzer scripts here:

https://github.com/dkrupp/clang/blob/ctu-master/tools/xtu-build-new/readme.md

 

Would be  happy to hear your opinion and experiences with this feature and would appreciate your help in reviewing the patch.

 

Thanks & Regards,

Daniel

 




_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

 

 

-- 
Best regards,
Aleksei Sidorin,
SRR, Samsung Electronics

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cross Translational Unit Analysis in Clang Static Analyzer

Brian Cain via cfe-dev
On 2017-03-31 15:28:49, Dániel Krupp via cfe-dev wrote:
> Would be  happy to hear your opinion and experiences with this feature
> and would appreciate your help in reviewing the patch.

Hi all,

thanks again to Dániel, Gábor, Aleksei and everyone for this work. I'm
going to brain-dump my experiences with the CTU analysis here for the
record. I'm also going to try and review patches when I get time.

At Google, we were trying to perform CTU analysis on Magenta, a
microkernel for the new Fuchsia operating system [0,1]. I wrote a
document describing these efforts; most of the document has been
published here [2]. This might help folks who want to get started
running CTU analysis, although it was last updated in December 2016.

We found several bugs that could not be caught by single-translation
unit analysis. There were both bugs at the *caller* side, where a value
set in a called function (in a different TU) caused an error in the
caller; and also bugs in the *callee* side, where a value passed in as a
parameter from a different TU caused a bug in a called function.

During EuroLLVM, it was mentioned that the analysis is truncated at a
call depth of 4. In practice, I found that the analyser read on average
between 15-20 ASTs from disk for each function that it analysed, and
never more than about 100. (Note, if every function calls 2 functions in
a different TU, then for that function we must load 2 + 4 + 8 + 16 = 30
ASTs from disk). It might be possible to find more bugs by increasing
the call depth, though I didn't experiment with this.

The main problem I ran into was incomplete implementation of
ASTImporter.cpp. In particular, whenever the analyser tries to load an
AST node from disk that does not have an implementation in the AST
Importer, the analyser crashes. So for us, most of the work involved
adding support for the AST nodes that were present in our codebase, but
which were not in the Importer. These were mostly obscure C++
constructs. Note that in some cases, support for those already exists in
Aleksei's patch but not in Gábor's; so it's always worth looking at
Aleksei's patch too.

Note, I'm no longer affiliated with Google (I was just interning there),
but I'm happy to answer whatever questions I can.

[0] https://fuchsia.googlesource.com
[1] https://lwn.net/Articles/718267
[2] https://fuchsia.googlesource.com/docs/+/411f08616d395b02e2d2861c34cace9942dee134/ctu_analysis.md

thanks!

--
Kareem

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

signature.asc (673 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cross Translational Unit Analysis in Clang Static Analyzer

Brian Cain via cfe-dev
In reply to this post by Brian Cain via cfe-dev
Hi Daniel,

Thank you for sending the patch! While I think that doing whole project analysis via “inlining” is not a scalable solution, this prototype could be useful for the community to experiment with. It can also serve as bases for other two stage analysis, where we collect some data about functions in the first pass and use it in the second pass. 

A side benefit is that this direction exercises the ASTImporter and would benefit other uses of it such as lldb.

I am sure there will be a few comments about the patch itself and it’s important to have the workflow integrated into scan-build, which is our user facing tool.

For those interested in the topic, I recommend watching Gabor’s talk at LLVM Euro 2017 once the video is available:

Thank you!
Anna
On Mar 31, 2017, at 8:28 AM, Dániel Krupp via cfe-dev <[hidden email]> wrote:

Hi All,
 
at the EuroLLVM’17 conference we presented our results  about a new analysis mode in clang static analyzer: Cross Translational Unit analysis.
which is based on the work of A. Sidorin et al. http://lists.llvm.org/pipermail/cfe-dev/2015-October/045730.html, but without function summaries and updated to the newest Clang.
 
The CTU mode allows the analyzer to “inline” function calls that are defined in another TU than the one currently analyzed.
So it allows to find bugs that span multiple source files.
Without this patch the static analyzer engine, when meets an external function call,
cannot reason about the return value of a function (unknown) and the pointed values, references passed to a function as parameter are invalidated.
 
You can find a full patched clang 4.0 (use it with llvm commit 01609a325b5f85d88e3ab5c7d470409092436cb2 )
 
We have run the analysis on some reasonably-sized  (ffmpeg, curl, vim, openssl, postgresql) open source C projects and found many additional true positive reports compared to the traditional single TU mode in all projects.
This indicates that this feature would give many new results on any project.
 
We measured the heap usage, the analysis time and the number of new findings.
You can find the detailed comparison results here:
In summary, the number of reported bugs is ~1.5-5x times the original single TU analysis, at the cost of 1.5-5x higher analysis time, 1.5-5x max heap usage (roughly in proportion to the increase in the number of reported faults).
 
The design concept is described shortly in this document: http://cc.elte.hu/clang-ctu/eurollvm17/abstract.pdf
 
If you would like to try this analysis mode on your project please find the description of the 2 new additional analyzer scripts here:
 
Would be  happy to hear your opinion and experiences with this feature and would appreciate your help in reviewing the patch.
 
Thanks & Regards,
Daniel
 
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cross Translational Unit Analysis in Clang Static Analyzer

Brian Cain via cfe-dev

Hi Anna,

 

thanks for the positive feedback. I am sure this analysis option will be useful for many.

 

We will consult with Laszlo Nagy on how to integrate 2-stage analysis into scan-build-py.  I assume you did not mean the perl version of scan-build.

 

For those who are interested in the latest version of the patch it is moved into this repo:

https://github.com/Ericsson/clang/tree/ctu-os

 

An extended version of the patch including coverage measurement is available on this branch:

https://github.com/Ericsson/clang/tree/ctu-master

 

Thanks for the review in advance,

Daniel

 

 

 

From: [hidden email] [mailto:[hidden email]]
Sent: 2017. április 8. 0:04
To: Dániel Krupp <[hidden email]>
Cc: [hidden email]
Subject: Re: [cfe-dev] Cross Translational Unit Analysis in Clang Static Analyzer

 

Hi Daniel,

 

Thank you for sending the patch! While I think that doing whole project analysis via “inlining” is not a scalable solution, this prototype could be useful for the community to experiment with. It can also serve as bases for other two stage analysis, where we collect some data about functions in the first pass and use it in the second pass. 

 

A side benefit is that this direction exercises the ASTImporter and would benefit other uses of it such as lldb.

 

I am sure there will be a few comments about the patch itself and it’s important to have the workflow integrated into scan-build, which is our user facing tool.

 

For those interested in the topic, I recommend watching Gabor’s talk at LLVM Euro 2017 once the video is available:

 

Thank you!

Anna

On Mar 31, 2017, at 8:28 AM, Dániel Krupp via cfe-dev <[hidden email]> wrote:

 

Hi All,

 

at the EuroLLVM’17 conference we presented our results  about a new analysis mode in clang static analyzer: Cross Translational Unit analysis.

which is based on the work of A. Sidorin et al. http://lists.llvm.org/pipermail/cfe-dev/2015-October/045730.html, but without function summaries and updated to the newest Clang.

 

The CTU mode allows the analyzer to “inline” function calls that are defined in another TU than the one currently analyzed.

So it allows to find bugs that span multiple source files.

Without this patch the static analyzer engine, when meets an external function call,

cannot reason about the return value of a function (unknown) and the pointed values, references passed to a function as parameter are invalidated.

 

You can find a full patched clang 4.0 (use it with llvm commit 01609a325b5f85d88e3ab5c7d470409092436cb2 )

 

We have run the analysis on some reasonably-sized  (ffmpeg, curl, vim, openssl, postgresql) open source C projects and found many additional true positive reports compared to the traditional single TU mode in all projects.

This indicates that this feature would give many new results on any project.

 

We measured the heap usage, the analysis time and the number of new findings.

You can find the detailed comparison results here:

In summary, the number of reported bugs is ~1.5-5x times the original single TU analysis, at the cost of 1.5-5x higher analysis time, 1.5-5x max heap usage (roughly in proportion to the increase in the number of reported faults).

 

The design concept is described shortly in this document: http://cc.elte.hu/clang-ctu/eurollvm17/abstract.pdf

 

If you would like to try this analysis mode on your project please find the description of the 2 new additional analyzer scripts here:

 

Would be  happy to hear your opinion and experiences with this feature and would appreciate your help in reviewing the patch.

 

Thanks & Regards,

Daniel

 

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

 


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Loading...