Many .c files as input to scan-build

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Many .c files as input to scan-build

Nathan Ridge via cfe-dev
Hello,

the clang static analyzer does a good job, performing on the individual source files.  But it with a single .c/.cpp file
as input it cannot catch all codepaths of a program having many source files.

In particular, using GLibs g_hash_table_new allocates memory and g_hash_table_destroy() frees the memory, but scan-build
does not know this and does not check for it.

I mean, scan-build provides different results for the same program, depending on how source code is split into different
files.

One way to solve this is to create a huge .h file containing recursively all function definitions needed by a .c/.cpp
file, including sources from libraries and feeding this to scan-build.

It would be however easier, if scan-build is extended to accept as input many .c and .cpp files, glue them internally
into one and then handle that big file as input.

This will help finding troubles, that are split between source files.

Regards
  Дилян

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Many .c files as input to scan-build

Nathan Ridge via cfe-dev
The problem you're describing is known as "cross translation unit
analysis". The Static Analyzer is part of Clang, and the primary purpose
of Clang is to compile one translation unit at a time, so the Static
Analyzer inherits the same limitation.

Doing "unity builds" is one way around this problem. This wouldn't scale
to huge projects and it's not that trivial to concatenate all the files,
depending on the build system (the project may use freshly compiled
executables to autogenerate source code for subsequent passes or compile
the same source code for different architectures).

Note that even if you do a unity build, the time it takes for the Static
Analyzer to perform analysis of a certain quality would grow
non-linearly (in fact, "exponentially" would way more accurate). Even
though all of the source code is available, making proper use of this
information to achieve analysis quality similar to that of a smaller
codebase would be impossible. You will be paying with loss of coverage,
the analyzer will give up sooner and only find more shallow bugs.

There's an effort to perform cross-translation-unit analysis through
ASTImporter - the same facility that supports executing arbitrary
expressions in LLDB.This allows importing only small chunks of the
program as needed without constructing a whole-program AST, but
generally i feel it's not really that much better than unity builds. See
CTU threads on this mailing list. They report success when it comes to
overall usefulness of the Static Analyzer, so i guess it's worth it to
think in that direction, but it's most likely less worth it than using
more expensive and sophisticated but more scalable techniques such as
summary-based analysis.


On 7/11/19 9:56 AM, Дилян Палаузов via cfe-dev wrote:

> Hello,
>
> the clang static analyzer does a good job, performing on the individual source files.  But it with a single .c/.cpp file
> as input it cannot catch all codepaths of a program having many source files.
>
> In particular, using GLibs g_hash_table_new allocates memory and g_hash_table_destroy() frees the memory, but scan-build
> does not know this and does not check for it.
>
> I mean, scan-build provides different results for the same program, depending on how source code is split into different
> files.
>
> One way to solve this is to create a huge .h file containing recursively all function definitions needed by a .c/.cpp
> file, including sources from libraries and feeding this to scan-build.
>
> It would be however easier, if scan-build is extended to accept as input many .c and .cpp files, glue them internally
> into one and then handle that big file as input.
>
> This will help finding troubles, that are split between source files.
>
> Regards
>    Дилян
>
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Many .c files as input to scan-build

Nathan Ridge via cfe-dev
Hello Artem,

thanks for your answer.

For combining several source files into one, that is then analyzed by scan-build,
https://clang-analyzer.llvm.org/scan-build.html in fact suggests:

> It is also possible to use scan-build to analyze specific files:
> $ scan-build gcc -c t1.c t2.c
> This example causes the files t1.c and t2.c to be analyzed.

My reading is that on this call scan-build generates a single report, resulted by merging t1.c and t2.c and then
analyzing the result, since the gcc call generates a single file.

Is the size of the input to the analyzer currently inversely proportional to the quality of the results?

I ask the last question, since you wrote, that for unity builds the compiler would give up sooner and only find more
shallow bugs.

Regards
  Дилян

On Fri, 2019-07-12 at 16:13 -0700, Artem Dergachev wrote:

> The problem you're describing is known as "cross translation unit
> analysis". The Static Analyzer is part of Clang, and the primary purpose
> of Clang is to compile one translation unit at a time, so the Static
> Analyzer inherits the same limitation.
>
> Doing "unity builds" is one way around this problem. This wouldn't scale
> to huge projects and it's not that trivial to concatenate all the files,
> depending on the build system (the project may use freshly compiled
> executables to autogenerate source code for subsequent passes or compile
> the same source code for different architectures).
>
> Note that even if you do a unity build, the time it takes for the Static
> Analyzer to perform analysis of a certain quality would grow
> non-linearly (in fact, "exponentially" would way more accurate). Even
> though all of the source code is available, making proper use of this
> information to achieve analysis quality similar to that of a smaller
> codebase would be impossible. You will be paying with loss of coverage,
> the analyzer will give up sooner and only find more shallow bugs.
>
> There's an effort to perform cross-translation-unit analysis through
> ASTImporter - the same facility that supports executing arbitrary
> expressions in LLDB.This allows importing only small chunks of the
> program as needed without constructing a whole-program AST, but
> generally i feel it's not really that much better than unity builds. See
> CTU threads on this mailing list. They report success when it comes to
> overall usefulness of the Static Analyzer, so i guess it's worth it to
> think in that direction, but it's most likely less worth it than using
> more expensive and sophisticated but more scalable techniques such as
> summary-based analysis.
>
>
> On 7/11/19 9:56 AM, Дилян Палаузов via cfe-dev wrote:
> > Hello,
> >
> > the clang static analyzer does a good job, performing on the individual source files.  But it with a single .c/.cpp file
> > as input it cannot catch all codepaths of a program having many source files.
> >
> > In particular, using GLibs g_hash_table_new allocates memory and g_hash_table_destroy() frees the memory, but scan-build
> > does not know this and does not check for it.
> >
> > I mean, scan-build provides different results for the same program, depending on how source code is split into different
> > files.
> >
> > One way to solve this is to create a huge .h file containing recursively all function definitions needed by a .c/.cpp
> > file, including sources from libraries and feeding this to scan-build.
> >
> > It would be however easier, if scan-build is extended to accept as input many .c and .cpp files, glue them internally
> > into one and then handle that big file as input.
> >
> > This will help finding troubles, that are split between source files.
> >
> > Regards
> >    Дилян
> >
> > _______________________________________________
> > cfe-dev mailing list
> > [hidden email]
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev