JumboSupport: making unity builds easier in Clang

classic Classic list List threaded Threaded
38 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: JumboSupport: making unity builds easier in Clang

Matthieu Brucher via cfe-dev
On Tue, Apr 10, 2018 at 9:13 PM, Richard Smith via cfe-dev <[hidden email]> wrote:
On 10 April 2018 at 10:05, Nico Weber via cfe-dev <[hidden email]> wrote:
On Tue, Apr 10, 2018 at 1:01 PM, David Blaikie <[hidden email]> wrote:


On Tue, Apr 10, 2018 at 9:58 AM Nico Weber <[hidden email]> wrote:
On Tue, Apr 10, 2018 at 11:56 AM, David Blaikie <[hidden email]> wrote:


On Tue, Apr 10, 2018 at 8:52 AM Mostyn Bramley-Moore <[hidden email]> wrote:
On Tue, Apr 10, 2018 at 4:27 PM, David Blaikie <[hidden email]> wrote:
I haven't looked at the patches in detail - but generally a jumbo build feels like a bit of a workaround & maybe there are better long-term solutions that might fit into the compiler. A few sort of background questions:

* Have you tried Clang header modules ( https://clang.llvm.org/docs/Modules.html )? (explicit (granted, explicit might only be practical at the moment using Google's internal version of Bazel - but you /might/ get some comparison numbers from a Google Chrome developer) and implicit)
  * The doc talks about maybe disabling jumbo builds for a single target for developer efficiency, with the risk that a header edit would maybe be worse for the developer than the jumbo build - this is where modules would help as well, since it doesn't have this tradeoff property of two different dimensions of "more work" you have to choose from.

There are ways to minimise this- an earlier proprietary jumbo build system used at Opera would detect when you're modifying and rebuilding files, and compile these in "normal" mode.  This gave fast full/clean build times but also short modify+rebuild times.  We have not attempted to implement this in the Chromium Jumbo build configuration.

Building that kind of infrastructure seems like a pretty big hammer compared to modularizing the codebase...

Modularizing the codebase doesn't give you the same build time impact, linearizes your build more,

Not sure I follow - it partially linearizes (as you say, due to the module dependency rather than header dependency issue), as does the jumbo build.

The jumbo build just needs to append a bunch of files, that's fast. Compiling a module isn't.

Well, compiling a module is just appending a bunch of headers and compiling them. It's just at a different layer of the graph.
 
and slows down incremental builds.

Compared to a traditional build? I wouldn't think so (I mean, yes, reading/writing modules has some overhead - but also some gains) on average. I'd expect slower builds if you modify a header at the very base of the dependency (the STL), but beyond that I would've thought the reading/writing modules overhead would be saved by reusing modules for infrequently modified files (like the STL).

Say you touch some header foo.h. Previously, you needed to rebuild all cc files including it. Now you need to instead rebuild the module, and since the module has changed you now need to rebuild all cc files using any header in the module, not just the users of foo.h. That's potentially way more cc files.

But say you touch some source file foo.cc. Previously, and with modules, you just need to rebuild that cc file. With a unity build, you now instead need to rebuild the concatenation of that .cc file and a bunch of others. That's also potentially way more cc files. :)

But measurements beat speculation here.

Here's one data point: on a non-ccache, non-distributed build on a fairly high end machine (20 CPU cores, 40 threads), I built a subset of Chromium (content_shell) in both jumbo and non-jumbo mode.  Then I picked a single source file that is in part of the tree that we have previously made jumbo-capable (content/public/renderer/browser_plugin_delegate.cc), touched it and timed how long the rebuilds would take in both jumbo and non-jumbo mode.  The target that this source file is part of has 16 source files in total. which is smaller than the default jumbo_merge_file_limit value of 50, so to rebuild this one source file in jumbo mode requires that we also rebuild the other 15 source files in this target, which will not be done in parallel since they're all in a single jumbo compilation unit- in other words this is a moderately bad scenario for jumbo.

The non-jumbo rebuild + relink time on this machine was between 9 and 10 seconds, and the jumbo rebuild + relink time was 23-24 seconds- a little more than double, but still nowhere near "time to grab a coffee while I wait" territory.  This time is easily won back in jumbo mode if you need to rebase on master, or build another target or configuration.

If you find yourself in a modify/rebuild/retest loop in this code, you can try a workflow optimisation mentioned in Daniel Bratell's doc (and earlier in this thread): turn jumbo off for just this target but on for all others, and you only have a one-time overhead of regenerating ninja files (which is quick) plus rebuilding 15 source files once in parallel.  Then you only need to rebuild a single source file each time around the loop.  

I am currently running the same benchmark on a lower-specced machine, one which is more realistic for many developers: a 4 core / 8 thread CPU workstation, but the test setup is excruciatingly slow to prepare so I will have to report back tomorrow with the numbers.  I expect the rebuild times to be comparable, since this test cannot make use of multiple CPU cores simultaneously (other than maybe parallel linking).  But the clean-build time speedup for this configuration is known to be a big net win in terms of absolute time saved (jumbo builds something like ~3x faster than non-jumbo which take several hours).

Jumbo builds are not a solution that you should use blindly without confirming that they work for your codebase and workflow, but in some cases they clearly have enormous benefits.

-Mostyn.
 
(wonder what the combination would be like - modularizing headers, and also jumbo-ifying .cpp files together... - whether there's much to be saved in the reading modules part of the work, reading them in fewer times - that gets into some of the ideas of compiler as a service I guess)
 
Even if it wasn't a lot more work to get modules going, it's not completely clear to me that that would address the use case that the people working on the jumbo build have.
 
(maybe still less work - but a lot of work to workaround things & produce some rather quirky behavior (in terms of how the build functions based on looking at exactly how the source files have changed & changing the build action graph depending on that) - but enough that I'd be inclined to reconsider going in the modular direction again)
 
 
* I was going to ask about the lack of parallelism in a jumbo build - but reading the doc I see it's not a 'full' jumbo build, but chunkifying the build - so there's still some/enough parallelism. Cool :)

I have heard rumours of some codebases in the games industry using a single jumbo source file for the entire build, but this is generally considered to be taking things too far and not our intended use case.

Ah, my understanding was that jumbo builds were often/mainly used for optimized builds to get cross-module optimizations (LTO-esque) & so it'd be likely to be the whole program.
 
The size of Chromium's jumbo compilation units is tunable- you can simply #include fewer real source files per jumbo source file- the bigger your build farm is, the smaller you want this number to be.  The optimal setup depends on things like the shape of the dependency graph and the relative costs of the original source files.  IIRC we currently only have build-wide "jumbo_file_merge_limit" setting, though that might have changed since I last looked (V8 would benefit from this, since its source files compile more slowly than most Chromium source files).  


-Mostyn.
 
On Tue, Apr 10, 2018 at 5:12 AM Mostyn Bramley-Moore via cfe-dev <[hidden email]> wrote:

Hi,


I am a member of a small group of Chromium developers who are working on adding a unity build[1] setup to Chromium[2], in order to reduce the project's long and ever-increasing compile times.  We're calling these "jumbo" builds, because this term is not as overloaded as "unity".


We're slowly making progress, but find that a lot of our time is spent renaming things in anonymous namespaces- it would be much simpler if it was possible to automatically treat these as if they were file-local.   Jens Widell has put together a proof-of-concept which appears to work reasonably well, it consists of a clang plugin and a small clang patch:

https://github.com/jensl/llvm-project-20170507/tree/wip/jumbo-support/v1

https://github.com/jensl/llvm-project-20170507/commit/a00d5ce3f20bf1c7a41145be8b7a3a478df9935f


After building clang and the plugin, you generate jumbo source files that look like:


jumbo_source_1.cc:


#pragma jumbo

#include "real_source_file_1.cc"

#include "real_source_file_2.cc"

#include "real_source_file_3.cc"


Then, you compile something like this:

clang++ -c jumbo_source_1.cc -Xclang -load -Xclang lib/JumboSupport.so -Xclang -add-plugin -Xclang jumbo-support


The plugin gives unique names[3] to the anonymous namespaces without otherwise changing their semantics, and also #undef's macros defined in each top-level source file before processing the next top-level source file.  That way header files can still define macros that are used in multiple source files in the jumbo translation unit. Collisions between macros defined in header files and names used in other headers and other source files are still possible, but less likely.


To show how much these two changes help, here's a patch to make Chromium's network code build in jumbo mode:

https://chromium-review.googlesource.com/c/chromium/src/+/966523 (+352/-377 lines)


And here's the corresponding patch using the proof-of-concept JumboSupport plugin:

https://chromium-review.googlesource.com/c/chromium/src/+/962062 (+53/-52 lines)


It seems clear that the version using the JumboSupport plugin would require less effort to create, review and merge into the codebase.  We have a few other feature ideas, but these two changes seem to do most of the work for us.


So now we're trying to figure out the best way forward- would a feature like this be welcome to the Clang project?  And if so, how would you recommend that we go about it? We would prefer to do this in a way that does not require a locally patched Clang and could live with building a custom plugin, although implementing this entirely in Clang would be even better.


I've been thinking about ways to get the benefits of unity builds without the semantic changes. With the functionality we introduced for -fmodules-local-submodule-visibility, we have the abililty to parse one file, then make it "invisible" and parse another file, skipping all the repeated parts from the two parses, which would give us some (maybe most) of the performance benefit of unity builds without the semantic changes. (This is not quite as good as a unity build: you'd still repeatedly lex and preprocess the files #included into both source files. We could implicitly treat header files with include guards as being "modular" to get the performance back, but then you also get back some of the semantic changes.)
 

Thanks,



-Mostyn.



[1] If you're not familiar with unity builds, the idea is to compile multiple source files per compiler invocation, reducing the overhead of processing header files (which can be surprisingly high).  We do this by taking a list of the source files in a target and generating "jumbo" source files that #include multiple "real" source files, and then we feed these jumbo files to the compiler one at a time.  This way, we don't prevent the usage of valuable build tools like ccache and icecc that only support a single source file on the command line.


[2] Daniel Bratell has a summary of our progress jumbo-ifying the Chromium codebase here:

https://docs.google.com/document/d/19jGsZxh7DX8jkAKbL1nYBa5rcByUL2EeidnYsoXfsYQ/edit#


[3] The JumboSupport plugin assigns names to the anonymous namespaces in a given file:  foo::(anonymous namespace)::bar is replaced with a symbol name of the form foo::__anonymous_<number>::bar where <number> is unique to the file within the jumbo translation unit.  Due to the internal linkage of these symbols, <number> does not need to be unique across multiple object files/jumbo source files.


--
Mostyn Bramley-Moore
Vewd Software
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



--
Mostyn Bramley-Moore
Vewd Software


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev




--
Mostyn Bramley-Moore
Vewd Software

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: JumboSupport: making unity builds easier in Clang

Matthieu Brucher via cfe-dev
On Wed, Apr 11, 2018 at 1:41 AM, Mostyn Bramley-Moore <[hidden email]> wrote:
On Tue, Apr 10, 2018 at 9:13 PM, Richard Smith via cfe-dev <[hidden email]> wrote:
On 10 April 2018 at 10:05, Nico Weber via cfe-dev <[hidden email]> wrote:
On Tue, Apr 10, 2018 at 1:01 PM, David Blaikie <[hidden email]> wrote:


On Tue, Apr 10, 2018 at 9:58 AM Nico Weber <[hidden email]> wrote:
On Tue, Apr 10, 2018 at 11:56 AM, David Blaikie <[hidden email]> wrote:


On Tue, Apr 10, 2018 at 8:52 AM Mostyn Bramley-Moore <[hidden email]> wrote:
On Tue, Apr 10, 2018 at 4:27 PM, David Blaikie <[hidden email]> wrote:
I haven't looked at the patches in detail - but generally a jumbo build feels like a bit of a workaround & maybe there are better long-term solutions that might fit into the compiler. A few sort of background questions:

* Have you tried Clang header modules ( https://clang.llvm.org/docs/Modules.html )? (explicit (granted, explicit might only be practical at the moment using Google's internal version of Bazel - but you /might/ get some comparison numbers from a Google Chrome developer) and implicit)
  * The doc talks about maybe disabling jumbo builds for a single target for developer efficiency, with the risk that a header edit would maybe be worse for the developer than the jumbo build - this is where modules would help as well, since it doesn't have this tradeoff property of two different dimensions of "more work" you have to choose from.

There are ways to minimise this- an earlier proprietary jumbo build system used at Opera would detect when you're modifying and rebuilding files, and compile these in "normal" mode.  This gave fast full/clean build times but also short modify+rebuild times.  We have not attempted to implement this in the Chromium Jumbo build configuration.

Building that kind of infrastructure seems like a pretty big hammer compared to modularizing the codebase...

Modularizing the codebase doesn't give you the same build time impact, linearizes your build more,

Not sure I follow - it partially linearizes (as you say, due to the module dependency rather than header dependency issue), as does the jumbo build.

The jumbo build just needs to append a bunch of files, that's fast. Compiling a module isn't.

Well, compiling a module is just appending a bunch of headers and compiling them. It's just at a different layer of the graph.
 
and slows down incremental builds.

Compared to a traditional build? I wouldn't think so (I mean, yes, reading/writing modules has some overhead - but also some gains) on average. I'd expect slower builds if you modify a header at the very base of the dependency (the STL), but beyond that I would've thought the reading/writing modules overhead would be saved by reusing modules for infrequently modified files (like the STL).

Say you touch some header foo.h. Previously, you needed to rebuild all cc files including it. Now you need to instead rebuild the module, and since the module has changed you now need to rebuild all cc files using any header in the module, not just the users of foo.h. That's potentially way more cc files.

But say you touch some source file foo.cc. Previously, and with modules, you just need to rebuild that cc file. With a unity build, you now instead need to rebuild the concatenation of that .cc file and a bunch of others. That's also potentially way more cc files. :)

But measurements beat speculation here.

Here's one data point: on a non-ccache, non-distributed build on a fairly high end machine (20 CPU cores, 40 threads), I built a subset of Chromium (content_shell) in both jumbo and non-jumbo mode.  Then I picked a single source file that is in part of the tree that we have previously made jumbo-capable (content/public/renderer/browser_plugin_delegate.cc), touched it and timed how long the rebuilds would take in both jumbo and non-jumbo mode.  The target that this source file is part of has 16 source files in total. which is smaller than the default jumbo_merge_file_limit value of 50, so to rebuild this one source file in jumbo mode requires that we also rebuild the other 15 source files in this target, which will not be done in parallel since they're all in a single jumbo compilation unit- in other words this is a moderately bad scenario for jumbo.

The non-jumbo rebuild + relink time on this machine was between 9 and 10 seconds, and the jumbo rebuild + relink time was 23-24 seconds- a little more than double, but still nowhere near "time to grab a coffee while I wait" territory.  This time is easily won back in jumbo mode if you need to rebase on master, or build another target or configuration.

If you find yourself in a modify/rebuild/retest loop in this code, you can try a workflow optimisation mentioned in Daniel Bratell's doc (and earlier in this thread): turn jumbo off for just this target but on for all others, and you only have a one-time overhead of regenerating ninja files (which is quick) plus rebuilding 15 source files once in parallel.  Then you only need to rebuild a single source file each time around the loop.  

I am currently running the same benchmark on a lower-specced machine, one which is more realistic for many developers: a 4 core / 8 thread CPU workstation, but the test setup is excruciatingly slow to prepare so I will have to report back tomorrow with the numbers.  I expect the rebuild times to be comparable, since this test cannot make use of multiple CPU cores simultaneously (other than maybe parallel linking).  But the clean-build time speedup for this configuration is known to be a big net win in terms of absolute time saved (jumbo builds something like ~3x faster than non-jumbo which take several hours).

Results on my 4c/8t reference machine:
non-jumbo rebuild + relink time: about 7 seconds
jumbo rebuild + relink time: about 18 seconds

So a slightly higher percentage increase than my larger machine, but lower absolute time increase.

The storage systems in these two machines are wildly different, but I suspect the main difference in this benchmark is the frequency of the cores (higher in the lower-specced machine).


-Mostyn.
 
Jumbo builds are not a solution that you should use blindly without confirming that they work for your codebase and workflow, but in some cases they clearly have enormous benefits.

-Mostyn.
 
(wonder what the combination would be like - modularizing headers, and also jumbo-ifying .cpp files together... - whether there's much to be saved in the reading modules part of the work, reading them in fewer times - that gets into some of the ideas of compiler as a service I guess)
 
Even if it wasn't a lot more work to get modules going, it's not completely clear to me that that would address the use case that the people working on the jumbo build have.
 
(maybe still less work - but a lot of work to workaround things & produce some rather quirky behavior (in terms of how the build functions based on looking at exactly how the source files have changed & changing the build action graph depending on that) - but enough that I'd be inclined to reconsider going in the modular direction again)
 
 
* I was going to ask about the lack of parallelism in a jumbo build - but reading the doc I see it's not a 'full' jumbo build, but chunkifying the build - so there's still some/enough parallelism. Cool :)

I have heard rumours of some codebases in the games industry using a single jumbo source file for the entire build, but this is generally considered to be taking things too far and not our intended use case.

Ah, my understanding was that jumbo builds were often/mainly used for optimized builds to get cross-module optimizations (LTO-esque) & so it'd be likely to be the whole program.
 
The size of Chromium's jumbo compilation units is tunable- you can simply #include fewer real source files per jumbo source file- the bigger your build farm is, the smaller you want this number to be.  The optimal setup depends on things like the shape of the dependency graph and the relative costs of the original source files.  IIRC we currently only have build-wide "jumbo_file_merge_limit" setting, though that might have changed since I last looked (V8 would benefit from this, since its source files compile more slowly than most Chromium source files).  


-Mostyn.
 
On Tue, Apr 10, 2018 at 5:12 AM Mostyn Bramley-Moore via cfe-dev <[hidden email]> wrote:

Hi,


I am a member of a small group of Chromium developers who are working on adding a unity build[1] setup to Chromium[2], in order to reduce the project's long and ever-increasing compile times.  We're calling these "jumbo" builds, because this term is not as overloaded as "unity".


We're slowly making progress, but find that a lot of our time is spent renaming things in anonymous namespaces- it would be much simpler if it was possible to automatically treat these as if they were file-local.   Jens Widell has put together a proof-of-concept which appears to work reasonably well, it consists of a clang plugin and a small clang patch:

https://github.com/jensl/llvm-project-20170507/tree/wip/jumbo-support/v1

https://github.com/jensl/llvm-project-20170507/commit/a00d5ce3f20bf1c7a41145be8b7a3a478df9935f


After building clang and the plugin, you generate jumbo source files that look like:


jumbo_source_1.cc:


#pragma jumbo

#include "real_source_file_1.cc"

#include "real_source_file_2.cc"

#include "real_source_file_3.cc"


Then, you compile something like this:

clang++ -c jumbo_source_1.cc -Xclang -load -Xclang lib/JumboSupport.so -Xclang -add-plugin -Xclang jumbo-support


The plugin gives unique names[3] to the anonymous namespaces without otherwise changing their semantics, and also #undef's macros defined in each top-level source file before processing the next top-level source file.  That way header files can still define macros that are used in multiple source files in the jumbo translation unit. Collisions between macros defined in header files and names used in other headers and other source files are still possible, but less likely.


To show how much these two changes help, here's a patch to make Chromium's network code build in jumbo mode:

https://chromium-review.googlesource.com/c/chromium/src/+/966523 (+352/-377 lines)


And here's the corresponding patch using the proof-of-concept JumboSupport plugin:

https://chromium-review.googlesource.com/c/chromium/src/+/962062 (+53/-52 lines)


It seems clear that the version using the JumboSupport plugin would require less effort to create, review and merge into the codebase.  We have a few other feature ideas, but these two changes seem to do most of the work for us.


So now we're trying to figure out the best way forward- would a feature like this be welcome to the Clang project?  And if so, how would you recommend that we go about it? We would prefer to do this in a way that does not require a locally patched Clang and could live with building a custom plugin, although implementing this entirely in Clang would be even better.


I've been thinking about ways to get the benefits of unity builds without the semantic changes. With the functionality we introduced for -fmodules-local-submodule-visibility, we have the abililty to parse one file, then make it "invisible" and parse another file, skipping all the repeated parts from the two parses, which would give us some (maybe most) of the performance benefit of unity builds without the semantic changes. (This is not quite as good as a unity build: you'd still repeatedly lex and preprocess the files #included into both source files. We could implicitly treat header files with include guards as being "modular" to get the performance back, but then you also get back some of the semantic changes.)
 

Thanks,



-Mostyn.



[1] If you're not familiar with unity builds, the idea is to compile multiple source files per compiler invocation, reducing the overhead of processing header files (which can be surprisingly high).  We do this by taking a list of the source files in a target and generating "jumbo" source files that #include multiple "real" source files, and then we feed these jumbo files to the compiler one at a time.  This way, we don't prevent the usage of valuable build tools like ccache and icecc that only support a single source file on the command line.


[2] Daniel Bratell has a summary of our progress jumbo-ifying the Chromium codebase here:

https://docs.google.com/document/d/19jGsZxh7DX8jkAKbL1nYBa5rcByUL2EeidnYsoXfsYQ/edit#


[3] The JumboSupport plugin assigns names to the anonymous namespaces in a given file:  foo::(anonymous namespace)::bar is replaced with a symbol name of the form foo::__anonymous_<number>::bar where <number> is unique to the file within the jumbo translation unit.  Due to the internal linkage of these symbols, <number> does not need to be unique across multiple object files/jumbo source files.


--
Mostyn Bramley-Moore
Vewd Software
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



--
Mostyn Bramley-Moore
Vewd Software


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev




--
Mostyn Bramley-Moore
Vewd Software



--
Mostyn Bramley-Moore
Vewd Software

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: JumboSupport: making unity builds easier in Clang

Matthieu Brucher via cfe-dev
In reply to this post by Matthieu Brucher via cfe-dev
On 10 Apr 2018, at 21:28, Daniel Bratell via cfe-dev <[hidden email]> wrote:
>
> I've heard (hearsay, I admit) from profiling that it seems the single largest time consumer in clang is template instantiation, something I assume can't easily be prepared in advance.
>
> One example is chromium's chrome/browser/browser target which is 732 files that normally need 6220 CPU seconds to compile, average 8,5 seconds per file. All combined together gives a single translation unit that takes 400 seconds to compile, a mere 0.54 seconds on average per file. That indicates that about 8 seconds per compiled file is related to the processing of headers.

It sounds as if there are two things here:

1. The time taken to parse the headers
2. The time taken to repeatedly instantiate templates that the linker will then discard

Assuming a command line where all of the relevant source files are provided to the compiler invocation:

Solving the first one is relatively easy if the files have a common prefix (which can be determined by simple string comparison).  Find the common prefix in the source files, build the clang AST, and then do a clone for each compilation unit.  Hopefully, the clone is a lot cheaper than re-parsing (and can ideally share source locations).

The second is slightly more difficult, because it relies on sharing parts of the AST across notional compilation units.

To make this work well with incremental builds, ideally you’d spit out all of the common template instantiations into a separate IR file, which could then be used with ThinLTO.  

Personally, I would prefer to have an interface where a build system can invoke clang with all of the files that need building and the degree of parallelism to use and let it share as much state as it wants across builds.  In an ideal world, clang would record which templates have been instantiated in a prior build (or a previous build step in the current build) and avoid any IRGen for them, at the very least.

Old C++ compilers, predating linker support for COMDATs, emitted templates lazily, simply emitting references to them, then parsing the linker errors and generating missing implementations until the linker errors went away.  Modern C++ compilers generate many instantiations of the same templates and then discard most of them.  It would be nice to find an intermediate point, which worked well with ThinLTO, where templates could be emitted once and be available for inlining everywhere.

David

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: JumboSupport: making unity builds easier in Clang

Matthieu Brucher via cfe-dev


> -----Original Message-----
> From: cfe-dev [mailto:[hidden email]] On Behalf Of David
> Chisnall via cfe-dev
> Sent: Wednesday, April 11, 2018 3:43 AM
> To: Daniel Bratell
> Cc: Jens Widell; [hidden email]; [hidden email]; Daniel
> Cheng; Bruce Dawson
> Subject: Re: [cfe-dev] JumboSupport: making unity builds easier in Clang
>
> On 10 Apr 2018, at 21:28, Daniel Bratell via cfe-dev <cfe-
> [hidden email]> wrote:
> >
> > I've heard (hearsay, I admit) from profiling that it seems the single
> largest time consumer in clang is template instantiation, something I
> assume can't easily be prepared in advance.
> >
> > One example is chromium's chrome/browser/browser target which is 732
> files that normally need 6220 CPU seconds to compile, average 8,5 seconds
> per file. All combined together gives a single translation unit that takes
> 400 seconds to compile, a mere 0.54 seconds on average per file. That
> indicates that about 8 seconds per compiled file is related to the
> processing of headers.
>
> It sounds as if there are two things here:
>
> 1. The time taken to parse the headers
> 2. The time taken to repeatedly instantiate templates that the linker will
> then discard
>
> Assuming a command line where all of the relevant source files are
> provided to the compiler invocation:
>
> Solving the first one is relatively easy if the files have a common prefix
> (which can be determined by simple string comparison).  Find the common
> prefix in the source files, build the clang AST, and then do a clone for
> each compilation unit.  Hopefully, the clone is a lot cheaper than re-
> parsing (and can ideally share source locations).
>
> The second is slightly more difficult, because it relies on sharing parts
> of the AST across notional compilation units.
>
> To make this work well with incremental builds, ideally you’d spit out all
> of the common template instantiations into a separate IR file, which could
> then be used with ThinLTO.
>
> Personally, I would prefer to have an interface where a build system can
> invoke clang with all of the files that need building and the degree of
> parallelism to use and let it share as much state as it wants across
> builds.  In an ideal world, clang would record which templates have been
> instantiated in a prior build (or a previous build step in the current
> build) and avoid any IRGen for them, at the very least.

Let me put in a plug for Paul Huggett's work, see his 2016 US dev talk:
https://llvm.org/devmtg/2016-11/#talk22
He's looking to do something like this with a program-fragment database.
It's obviously not anywhere near production ready but it looks like a pretty
good direction to me.
--paulr

>
> Old C++ compilers, predating linker support for COMDATs, emitted templates
> lazily, simply emitting references to them, then parsing the linker errors
> and generating missing implementations until the linker errors went away.
> Modern C++ compilers generate many instantiations of the same templates
> and then discard most of them.  It would be nice to find an intermediate
> point, which worked well with ThinLTO, where templates could be emitted
> once and be available for inlining everywhere.
>
> David
>
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: JumboSupport: making unity builds easier in Clang

Matthieu Brucher via cfe-dev
In reply to this post by Matthieu Brucher via cfe-dev
This would have issues with distributed builds, though, right? Unless clang then took on the burden of doing the distribution too, which might be a bit much.

On Wed, Apr 11, 2018 at 12:43 AM David Chisnall via cfe-dev <[hidden email]> wrote:
On 10 Apr 2018, at 21:28, Daniel Bratell via cfe-dev <[hidden email]> wrote:
>
> I've heard (hearsay, I admit) from profiling that it seems the single largest time consumer in clang is template instantiation, something I assume can't easily be prepared in advance.
>
> One example is chromium's chrome/browser/browser target which is 732 files that normally need 6220 CPU seconds to compile, average 8,5 seconds per file. All combined together gives a single translation unit that takes 400 seconds to compile, a mere 0.54 seconds on average per file. That indicates that about 8 seconds per compiled file is related to the processing of headers.

It sounds as if there are two things here:

1. The time taken to parse the headers
2. The time taken to repeatedly instantiate templates that the linker will then discard

Assuming a command line where all of the relevant source files are provided to the compiler invocation:

Solving the first one is relatively easy if the files have a common prefix (which can be determined by simple string comparison).  Find the common prefix in the source files, build the clang AST, and then do a clone for each compilation unit.  Hopefully, the clone is a lot cheaper than re-parsing (and can ideally share source locations).

The second is slightly more difficult, because it relies on sharing parts of the AST across notional compilation units.

To make this work well with incremental builds, ideally you’d spit out all of the common template instantiations into a separate IR file, which could then be used with ThinLTO.

Personally, I would prefer to have an interface where a build system can invoke clang with all of the files that need building and the degree of parallelism to use and let it share as much state as it wants across builds.  In an ideal world, clang would record which templates have been instantiated in a prior build (or a previous build step in the current build) and avoid any IRGen for them, at the very least.

Old C++ compilers, predating linker support for COMDATs, emitted templates lazily, simply emitting references to them, then parsing the linker errors and generating missing implementations until the linker errors went away.  Modern C++ compilers generate many instantiations of the same templates and then discard most of them.  It would be nice to find an intermediate point, which worked well with ThinLTO, where templates could be emitted once and be available for inlining everywhere.

David

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: JumboSupport: making unity builds easier in Clang

Matthieu Brucher via cfe-dev

If you want to share ASTs (an ephemeral structure) clang would need to do the distributing.  If you want to share IR of instantiated templates, you can do a shared database where clang is much less involved in managing the distribution.  Say the database key can be maybe a hash of the token stream of the template definition would work?  plus the template parameters.  Then you can pull precompiled IR out of the database (if you want to do optimizations) or make a reference to it (if you're doing LTO).

--paulr

 

From: cfe-dev [mailto:[hidden email]] On Behalf Of David Blaikie via cfe-dev
Sent: Wednesday, April 11, 2018 11:09 AM
To: David Chisnall
Cc: Bruce Dawson; Daniel Cheng; [hidden email]; [hidden email]; Daniel Bratell; Jens Widell
Subject: Re: [cfe-dev] JumboSupport: making unity builds easier in Clang

 

This would have issues with distributed builds, though, right? Unless clang then took on the burden of doing the distribution too, which might be a bit much.

On Wed, Apr 11, 2018 at 12:43 AM David Chisnall via cfe-dev <[hidden email]> wrote:

On 10 Apr 2018, at 21:28, Daniel Bratell via cfe-dev <[hidden email]> wrote:
>
> I've heard (hearsay, I admit) from profiling that it seems the single largest time consumer in clang is template instantiation, something I assume can't easily be prepared in advance.
>
> One example is chromium's chrome/browser/browser target which is 732 files that normally need 6220 CPU seconds to compile, average 8,5 seconds per file. All combined together gives a single translation unit that takes 400 seconds to compile, a mere 0.54 seconds on average per file. That indicates that about 8 seconds per compiled file is related to the processing of headers.

It sounds as if there are two things here:

1. The time taken to parse the headers
2. The time taken to repeatedly instantiate templates that the linker will then discard

Assuming a command line where all of the relevant source files are provided to the compiler invocation:

Solving the first one is relatively easy if the files have a common prefix (which can be determined by simple string comparison).  Find the common prefix in the source files, build the clang AST, and then do a clone for each compilation unit.  Hopefully, the clone is a lot cheaper than re-parsing (and can ideally share source locations).

The second is slightly more difficult, because it relies on sharing parts of the AST across notional compilation units.

To make this work well with incremental builds, ideally you’d spit out all of the common template instantiations into a separate IR file, which could then be used with ThinLTO.

Personally, I would prefer to have an interface where a build system can invoke clang with all of the files that need building and the degree of parallelism to use and let it share as much state as it wants across builds.  In an ideal world, clang would record which templates have been instantiated in a prior build (or a previous build step in the current build) and avoid any IRGen for them, at the very least.

Old C++ compilers, predating linker support for COMDATs, emitted templates lazily, simply emitting references to them, then parsing the linker errors and generating missing implementations until the linker errors went away.  Modern C++ compilers generate many instantiations of the same templates and then discard most of them.  It would be nice to find an intermediate point, which worked well with ThinLTO, where templates could be emitted once and be available for inlining everywhere.

David

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: JumboSupport: making unity builds easier in Clang

Matthieu Brucher via cfe-dev
See also: https://www.llvm.org/devmtg/2014-04/PDFs/Talks/Tenseconds.pdf

I started experimenting with a unity build of an LLVM/Clang-sized
proprietary project at my previous employer, and I found the basics
easy to get going. The hard part was massaging the code base to avoid
collisions, as indicated by the work by Mostyn & co.

I left the job before I had a chance to fully evaluate it, but
assuming I'd had something like `#pragma jumbo` to reduce the
friction, it might have been easier to get more data for less effort.

Mostyn/Daniel, do you have any gut feel/data on how much of the
problem a #pragma would solve? I suppose there are still constructs
that `#pragma jumbo` can't help with, that requires manual
intervention?

Also, Chromium is hardly a typical codebase, the little I've looked at
it, it's *extremely* clean and consistent, so it might be interesting
to try this on something else. Maybe LLVM itself would be an
interesting candidate.

- Kim

On Wed, Apr 11, 2018 at 7:08 PM, via cfe-dev <[hidden email]> wrote:

> If you want to share ASTs (an ephemeral structure) clang would need to do
> the distributing.  If you want to share IR of instantiated templates, you
> can do a shared database where clang is much less involved in managing the
> distribution.  Say the database key can be maybe a hash of the token stream
> of the template definition would work?  plus the template parameters.  Then
> you can pull precompiled IR out of the database (if you want to do
> optimizations) or make a reference to it (if you're doing LTO).
>
> --paulr
>
>
>
> From: cfe-dev [mailto:[hidden email]] On Behalf Of David
> Blaikie via cfe-dev
> Sent: Wednesday, April 11, 2018 11:09 AM
> To: David Chisnall
> Cc: Bruce Dawson; Daniel Cheng; [hidden email];
> [hidden email]; Daniel Bratell; Jens Widell
> Subject: Re: [cfe-dev] JumboSupport: making unity builds easier in Clang
>
>
>
> This would have issues with distributed builds, though, right? Unless clang
> then took on the burden of doing the distribution too, which might be a bit
> much.
>
> On Wed, Apr 11, 2018 at 12:43 AM David Chisnall via cfe-dev
> <[hidden email]> wrote:
>
> On 10 Apr 2018, at 21:28, Daniel Bratell via cfe-dev
> <[hidden email]> wrote:
>>
>> I've heard (hearsay, I admit) from profiling that it seems the single
>> largest time consumer in clang is template instantiation, something I assume
>> can't easily be prepared in advance.
>>
>> One example is chromium's chrome/browser/browser target which is 732 files
>> that normally need 6220 CPU seconds to compile, average 8,5 seconds per
>> file. All combined together gives a single translation unit that takes 400
>> seconds to compile, a mere 0.54 seconds on average per file. That indicates
>> that about 8 seconds per compiled file is related to the processing of
>> headers.
>
> It sounds as if there are two things here:
>
> 1. The time taken to parse the headers
> 2. The time taken to repeatedly instantiate templates that the linker will
> then discard
>
> Assuming a command line where all of the relevant source files are provided
> to the compiler invocation:
>
> Solving the first one is relatively easy if the files have a common prefix
> (which can be determined by simple string comparison).  Find the common
> prefix in the source files, build the clang AST, and then do a clone for
> each compilation unit.  Hopefully, the clone is a lot cheaper than
> re-parsing (and can ideally share source locations).
>
> The second is slightly more difficult, because it relies on sharing parts of
> the AST across notional compilation units.
>
> To make this work well with incremental builds, ideally you’d spit out all
> of the common template instantiations into a separate IR file, which could
> then be used with ThinLTO.
>
> Personally, I would prefer to have an interface where a build system can
> invoke clang with all of the files that need building and the degree of
> parallelism to use and let it share as much state as it wants across builds.
> In an ideal world, clang would record which templates have been instantiated
> in a prior build (or a previous build step in the current build) and avoid
> any IRGen for them, at the very least.
>
> Old C++ compilers, predating linker support for COMDATs, emitted templates
> lazily, simply emitting references to them, then parsing the linker errors
> and generating missing implementations until the linker errors went away.
> Modern C++ compilers generate many instantiations of the same templates and
> then discard most of them.  It would be nice to find an intermediate point,
> which worked well with ThinLTO, where templates could be emitted once and be
> available for inlining everywhere.
>
> David
>
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: JumboSupport: making unity builds easier in Clang

Matthieu Brucher via cfe-dev
On Wed, Apr 11, 2018 at 7:53 PM, Kim Gräsman via cfe-dev <[hidden email]> wrote:
See also: https://www.llvm.org/devmtg/2014-04/PDFs/Talks/Tenseconds.pdf

I CC'ed Andy in my initial post, but the email bounced.
 
I started experimenting with a unity build of an LLVM/Clang-sized
proprietary project at my previous employer, and I found the basics
easy to get going. The hard part was massaging the code base to avoid
collisions, as indicated by the work by Mostyn & co.

I left the job before I had a chance to fully evaluate it, but
assuming I'd had something like `#pragma jumbo` to reduce the
friction, it might have been easier to get more data for less effort.

Mostyn/Daniel, do you have any gut feel/data on how much of the
problem a #pragma would solve? I suppose there are still constructs
that `#pragma jumbo` can't help with, that requires manual
intervention?

The best side-by-side comparison that we have at the moment are the two chromium patch sets I mentioned- the numbers there match my gut feeling that something like the JumboSupport proof-of-concept could save us about 80% of the effort to jumbo-ify Chromium code.  

There are a few other constructs that cause trouble less often, which could be investigated later for diminishing returns.  Automatically popping clang diagnostic warning pragma states is one that came up the other day.  I think I have seen globally scoped typedefs in top-level source files cause trouble (but these are rare).

And there are of course some constructs that I don't think are feasible to try to fix automatically, eg symbols and macros leaked by library headers (which are intentionally leaky)- X11 and Windows headers are particularly bad.  
 
Also, Chromium is hardly a typical codebase, the little I've looked at
it, it's *extremely* clean and consistent, so it might be interesting
to try this on something else. Maybe LLVM itself would be an
interesting candidate.

I don't have much experience with CMake, but I see a few references to CMake unity build helpers on the web (if anyone has tips, feel free to ping me off-list).  If it would be useful I can try to put together a small experiment with a subset of LLVM or Clang.


-Mostyn.
 
- Kim

On Wed, Apr 11, 2018 at 7:08 PM, via cfe-dev <[hidden email]> wrote:
> If you want to share ASTs (an ephemeral structure) clang would need to do
> the distributing.  If you want to share IR of instantiated templates, you
> can do a shared database where clang is much less involved in managing the
> distribution.  Say the database key can be maybe a hash of the token stream
> of the template definition would work?  plus the template parameters.  Then
> you can pull precompiled IR out of the database (if you want to do
> optimizations) or make a reference to it (if you're doing LTO).
>
> --paulr
>
>
>
> From: cfe-dev [mailto:[hidden email]] On Behalf Of David
> Blaikie via cfe-dev
> Sent: Wednesday, April 11, 2018 11:09 AM
> To: David Chisnall
> Cc: Bruce Dawson; Daniel Cheng; [hidden email];
> [hidden email]; Daniel Bratell; Jens Widell
> Subject: Re: [cfe-dev] JumboSupport: making unity builds easier in Clang
>
>
>
> This would have issues with distributed builds, though, right? Unless clang
> then took on the burden of doing the distribution too, which might be a bit
> much.
>
> On Wed, Apr 11, 2018 at 12:43 AM David Chisnall via cfe-dev
> <[hidden email]> wrote:
>
> On 10 Apr 2018, at 21:28, Daniel Bratell via cfe-dev
> <[hidden email]> wrote:
>>
>> I've heard (hearsay, I admit) from profiling that it seems the single
>> largest time consumer in clang is template instantiation, something I assume
>> can't easily be prepared in advance.
>>
>> One example is chromium's chrome/browser/browser target which is 732 files
>> that normally need 6220 CPU seconds to compile, average 8,5 seconds per
>> file. All combined together gives a single translation unit that takes 400
>> seconds to compile, a mere 0.54 seconds on average per file. That indicates
>> that about 8 seconds per compiled file is related to the processing of
>> headers.
>
> It sounds as if there are two things here:
>
> 1. The time taken to parse the headers
> 2. The time taken to repeatedly instantiate templates that the linker will
> then discard
>
> Assuming a command line where all of the relevant source files are provided
> to the compiler invocation:
>
> Solving the first one is relatively easy if the files have a common prefix
> (which can be determined by simple string comparison).  Find the common
> prefix in the source files, build the clang AST, and then do a clone for
> each compilation unit.  Hopefully, the clone is a lot cheaper than
> re-parsing (and can ideally share source locations).
>
> The second is slightly more difficult, because it relies on sharing parts of
> the AST across notional compilation units.
>
> To make this work well with incremental builds, ideally you’d spit out all
> of the common template instantiations into a separate IR file, which could
> then be used with ThinLTO.
>
> Personally, I would prefer to have an interface where a build system can
> invoke clang with all of the files that need building and the degree of
> parallelism to use and let it share as much state as it wants across builds.
> In an ideal world, clang would record which templates have been instantiated
> in a prior build (or a previous build step in the current build) and avoid
> any IRGen for them, at the very least.
>
> Old C++ compilers, predating linker support for COMDATs, emitted templates
> lazily, simply emitting references to them, then parsing the linker errors
> and generating missing implementations until the linker errors went away.
> Modern C++ compilers generate many instantiations of the same templates and
> then discard most of them.  It would be nice to find an intermediate point,
> which worked well with ThinLTO, where templates could be emitted once and be
> available for inlining everywhere.
>
> David
>
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



--
Mostyn Bramley-Moore
Vewd Software

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: JumboSupport: making unity builds easier in Clang

Matthieu Brucher via cfe-dev
In reply to this post by Matthieu Brucher via cfe-dev
It is different rates for maintaining and for initially adding support.

When first preparing the code for jumbo there are several groups of  
changes necessary. Some of them are just that the code initially did  
something wrong that is suddenly detected in jumbo builds, some of them is  
that the same constant/function name is used in many files. kBufferSize,  
kIconSize, kSecondsPerMinute, GetThingWithNullCheck(), those kind of  
things.

In the initial cleanup I think names, the kind of problems clang support  
as suggested would help with, is about 60-80%, and the experiment with  
/net in Chromium supports that estimate.

After the initial cleanup, the new problems that appear seems to be of the  
"duplicate symbol name" kind to a much higher degree, maybe 90%.

So if those rough estimates are correct, it would make it 4 times as easy  
to implement something like jumbo, and 10 times as easy to maintain, and  
it would mean that developers can use the short common names they have  
become accustomed to.

It would also hide some code problems that jumbo right now expose, such as  
copy/pasted code but if we can live with it today, we can probably survive  
with that a while longer and leave it to other tools to find such problems.

/Daniel

(My notes from adding jumbo to a code part with 1000+ files, those with a  
* would probably have been unnecessary if clang had had this support:
----
* 20.5 patches to rename something
* 11.5 patches to remove duplicate code
2 fixes to bad forward declarations
1 removal of "using namespace" (not allowed by the coding standard)
1 fix to ambiguity between ::prefs and ::metric::prefs
1 fix to clash with X11 headers
3 fixes to clashes with Windows headers
* 3 changes to inline trivial code/constants
1 case of bind.h finding Bind being called the wrong way thanks to access  
to more type information
1 removal of dead code
1 patch to add include guards
)

On Wed, 11 Apr 2018 19:53:58 +0200, Kim Gräsman <[hidden email]>  
wrote:

> See also: https://www.llvm.org/devmtg/2014-04/PDFs/Talks/Tenseconds.pdf
>
> I started experimenting with a unity build of an LLVM/Clang-sized
> proprietary project at my previous employer, and I found the basics
> easy to get going. The hard part was massaging the code base to avoid
> collisions, as indicated by the work by Mostyn & co.
>
> I left the job before I had a chance to fully evaluate it, but
> assuming I'd had something like `#pragma jumbo` to reduce the
> friction, it might have been easier to get more data for less effort.
>
> Mostyn/Daniel, do you have any gut feel/data on how much of the
> problem a #pragma would solve? I suppose there are still constructs
> that `#pragma jumbo` can't help with, that requires manual
> intervention?
>
> Also, Chromium is hardly a typical codebase, the little I've looked at
> it, it's *extremely* clean and consistent, so it might be interesting
> to try this on something else. Maybe LLVM itself would be an
> interesting candidate.
>
> - Kim
>
> On Wed, Apr 11, 2018 at 7:08 PM, via cfe-dev <[hidden email]>  
> wrote:
>> If you want to share ASTs (an ephemeral structure) clang would need to  
>> do
>> the distributing.  If you want to share IR of instantiated templates,  
>> you
>> can do a shared database where clang is much less involved in managing  
>> the
>> distribution.  Say the database key can be maybe a hash of the token  
>> stream
>> of the template definition would work?  plus the template parameters.  
>> Then
>> you can pull precompiled IR out of the database (if you want to do
>> optimizations) or make a reference to it (if you're doing LTO).
>>
>> --paulr
>>
>>
>>
>> From: cfe-dev [mailto:[hidden email]] On Behalf Of David
>> Blaikie via cfe-dev
>> Sent: Wednesday, April 11, 2018 11:09 AM
>> To: David Chisnall
>> Cc: Bruce Dawson; Daniel Cheng; [hidden email];
>> [hidden email]; Daniel Bratell; Jens Widell
>> Subject: Re: [cfe-dev] JumboSupport: making unity builds easier in Clang
>>
>>
>>
>> This would have issues with distributed builds, though, right? Unless  
>> clang
>> then took on the burden of doing the distribution too, which might be a  
>> bit
>> much.
>>
>> On Wed, Apr 11, 2018 at 12:43 AM David Chisnall via cfe-dev
>> <[hidden email]> wrote:
>>
>> On 10 Apr 2018, at 21:28, Daniel Bratell via cfe-dev
>> <[hidden email]> wrote:
>>>
>>> I've heard (hearsay, I admit) from profiling that it seems the single
>>> largest time consumer in clang is template instantiation, something I  
>>> assume
>>> can't easily be prepared in advance.
>>>
>>> One example is chromium's chrome/browser/browser target which is 732  
>>> files
>>> that normally need 6220 CPU seconds to compile, average 8,5 seconds per
>>> file. All combined together gives a single translation unit that takes  
>>> 400
>>> seconds to compile, a mere 0.54 seconds on average per file. That  
>>> indicates
>>> that about 8 seconds per compiled file is related to the processing of
>>> headers.
>>
>> It sounds as if there are two things here:
>>
>> 1. The time taken to parse the headers
>> 2. The time taken to repeatedly instantiate templates that the linker  
>> will
>> then discard
>>
>> Assuming a command line where all of the relevant source files are  
>> provided
>> to the compiler invocation:
>>
>> Solving the first one is relatively easy if the files have a common  
>> prefix
>> (which can be determined by simple string comparison).  Find the common
>> prefix in the source files, build the clang AST, and then do a clone for
>> each compilation unit.  Hopefully, the clone is a lot cheaper than
>> re-parsing (and can ideally share source locations).
>>
>> The second is slightly more difficult, because it relies on sharing  
>> parts of
>> the AST across notional compilation units.
>>
>> To make this work well with incremental builds, ideally you’d spit out  
>> all
>> of the common template instantiations into a separate IR file, which  
>> could
>> then be used with ThinLTO.
>>
>> Personally, I would prefer to have an interface where a build system can
>> invoke clang with all of the files that need building and the degree of
>> parallelism to use and let it share as much state as it wants across  
>> builds.
>> In an ideal world, clang would record which templates have been  
>> instantiated
>> in a prior build (or a previous build step in the current build) and  
>> avoid
>> any IRGen for them, at the very least.
>>
>> Old C++ compilers, predating linker support for COMDATs, emitted  
>> templates
>> lazily, simply emitting references to them, then parsing the linker  
>> errors
>> and generating missing implementations until the linker errors went  
>> away.
>> Modern C++ compilers generate many instantiations of the same templates  
>> and
>> then discard most of them.  It would be nice to find an intermediate  
>> point,
>> which worked well with ThinLTO, where templates could be emitted once  
>> and be
>> available for inlining everywhere.
>>
>> David
>>
>> _______________________________________________
>> cfe-dev mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>


--
/* Opera Software, Linköping, Sweden: CEST (UTC+2) */
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: JumboSupport: making unity builds easier in Clang

Matthieu Brucher via cfe-dev
In reply to this post by Matthieu Brucher via cfe-dev
On Wed, Apr 11, 2018 at 8:52 PM, Mostyn Bramley-Moore <[hidden email]> wrote:
On Wed, Apr 11, 2018 at 7:53 PM, Kim Gräsman via cfe-dev <[hidden email]> wrote:
See also: https://www.llvm.org/devmtg/2014-04/PDFs/Talks/Tenseconds.pdf

I CC'ed Andy in my initial post, but the email bounced.
 
I started experimenting with a unity build of an LLVM/Clang-sized
proprietary project at my previous employer, and I found the basics
easy to get going. The hard part was massaging the code base to avoid
collisions, as indicated by the work by Mostyn & co.

I left the job before I had a chance to fully evaluate it, but
assuming I'd had something like `#pragma jumbo` to reduce the
friction, it might have been easier to get more data for less effort.

Mostyn/Daniel, do you have any gut feel/data on how much of the
problem a #pragma would solve? I suppose there are still constructs
that `#pragma jumbo` can't help with, that requires manual
intervention?

The best side-by-side comparison that we have at the moment are the two chromium patch sets I mentioned- the numbers there match my gut feeling that something like the JumboSupport proof-of-concept could save us about 80% of the effort to jumbo-ify Chromium code.  

There are a few other constructs that cause trouble less often, which could be investigated later for diminishing returns.  Automatically popping clang diagnostic warning pragma states is one that came up the other day.  I think I have seen globally scoped typedefs in top-level source files cause trouble (but these are rare).

And there are of course some constructs that I don't think are feasible to try to fix automatically, eg symbols and macros leaked by library headers (which are intentionally leaky)- X11 and Windows headers are particularly bad.  
 
Also, Chromium is hardly a typical codebase, the little I've looked at
it, it's *extremely* clean and consistent, so it might be interesting
to try this on something else. Maybe LLVM itself would be an
interesting candidate.

I don't have much experience with CMake, but I see a few references to CMake unity build helpers on the web (if anyone has tips, feel free to ping me off-list).  If it would be useful I can try to put together a small experiment with a subset of LLVM or Clang.
 
I decided to take a look at the clangSema target, and see what kind of difference the JumboSupport PoC would make.  Instead of digging into CMake, I just wrote some small shell scripts to build this target in the various modes.

Without JumboSupport, I had to rename a couple of static functions (isGlobalVar and getDepthAndIndex in a couple of places), and rename a struct (PartialSpecMatchResult) that was inside an anonymous namespace.  Alternatively you could decide to refactor and share the same implementations.  I also excluded two source files from the jumbo compilation unit, due to clashes caused by a file being intentionally #include'd multiple times (alternatively you could sprinkle some #undef's around to make this work).

With JumboSupport, instead of renaming the static functions I just moved them into anonymous namespaces, and excluded the same two source files which #include some .def files multiple times, for the same reasons as above.  I did not need to do anything about the PartialSpecMatchResult structs since they were already inside anonymous namespaces (one of them was at least, I did not need to check the other).

Of these two patches, the JumboSupport version was easier to produce, and I believe would require less effort to review- there would be no debate about what to rename things, or whether or not the code should be refactored and how.  I think that anonymous namespaces should generally be preferred over static functions, and JumboSupport makes anonymous namespaces even more useful- it makes them behave the way that many developers (incorrectly) assume that they work.  

Note that we don't claim that jumbo builds make sense for all codebases, and I'm not sure if it would make sense for Clang/LLVM.  But JumboSupport did appear to help in this tiny experiment.

-Mostyn.

-Mostyn.
 
- Kim

On Wed, Apr 11, 2018 at 7:08 PM, via cfe-dev <[hidden email]> wrote:
> If you want to share ASTs (an ephemeral structure) clang would need to do
> the distributing.  If you want to share IR of instantiated templates, you
> can do a shared database where clang is much less involved in managing the
> distribution.  Say the database key can be maybe a hash of the token stream
> of the template definition would work?  plus the template parameters.  Then
> you can pull precompiled IR out of the database (if you want to do
> optimizations) or make a reference to it (if you're doing LTO).
>
> --paulr
>
>
>
> From: cfe-dev [mailto:[hidden email]] On Behalf Of David
> Blaikie via cfe-dev
> Sent: Wednesday, April 11, 2018 11:09 AM
> To: David Chisnall
> Cc: Bruce Dawson; Daniel Cheng; [hidden email];
> [hidden email]; Daniel Bratell; Jens Widell
> Subject: Re: [cfe-dev] JumboSupport: making unity builds easier in Clang
>
>
>
> This would have issues with distributed builds, though, right? Unless clang
> then took on the burden of doing the distribution too, which might be a bit
> much.
>
> On Wed, Apr 11, 2018 at 12:43 AM David Chisnall via cfe-dev
> <[hidden email]> wrote:
>
> On 10 Apr 2018, at 21:28, Daniel Bratell via cfe-dev
> <[hidden email]> wrote:
>>
>> I've heard (hearsay, I admit) from profiling that it seems the single
>> largest time consumer in clang is template instantiation, something I assume
>> can't easily be prepared in advance.
>>
>> One example is chromium's chrome/browser/browser target which is 732 files
>> that normally need 6220 CPU seconds to compile, average 8,5 seconds per
>> file. All combined together gives a single translation unit that takes 400
>> seconds to compile, a mere 0.54 seconds on average per file. That indicates
>> that about 8 seconds per compiled file is related to the processing of
>> headers.
>
> It sounds as if there are two things here:
>
> 1. The time taken to parse the headers
> 2. The time taken to repeatedly instantiate templates that the linker will
> then discard
>
> Assuming a command line where all of the relevant source files are provided
> to the compiler invocation:
>
> Solving the first one is relatively easy if the files have a common prefix
> (which can be determined by simple string comparison).  Find the common
> prefix in the source files, build the clang AST, and then do a clone for
> each compilation unit.  Hopefully, the clone is a lot cheaper than
> re-parsing (and can ideally share source locations).
>
> The second is slightly more difficult, because it relies on sharing parts of
> the AST across notional compilation units.
>
> To make this work well with incremental builds, ideally you’d spit out all
> of the common template instantiations into a separate IR file, which could
> then be used with ThinLTO.
>
> Personally, I would prefer to have an interface where a build system can
> invoke clang with all of the files that need building and the degree of
> parallelism to use and let it share as much state as it wants across builds.
> In an ideal world, clang would record which templates have been instantiated
> in a prior build (or a previous build step in the current build) and avoid
> any IRGen for them, at the very least.
>
> Old C++ compilers, predating linker support for COMDATs, emitted templates
> lazily, simply emitting references to them, then parsing the linker errors
> and generating missing implementations until the linker errors went away.
> Modern C++ compilers generate many instantiations of the same templates and
> then discard most of them.  It would be nice to find an intermediate point,
> which worked well with ThinLTO, where templates could be emitted once and be
> available for inlining everywhere.
>
> David
>
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



--
Mostyn Bramley-Moore
Vewd Software



--
Mostyn Bramley-Moore
Vewd Software

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: JumboSupport: making unity builds easier in Clang

Matthieu Brucher via cfe-dev
In reply to this post by Matthieu Brucher via cfe-dev
Hans, Richard, and I spent some more time discussing this today, and we came to the conclusion that this could absolutely be built partially with existing modules functionality. In this case, by "module" I'm not referring to a chunk of serialized AST, I'm just referring to the in-memory data structures that clang uses to control name lookup.

The idea is that each .cpp file can be its own module, and all headers would be part of a global module. Each .cpp file is only allowed to look up names in the global module. My understanding is that this is where -fmodules-local-submodules-visibility comes into play, although I'm not clear on the details. This symbol hiding is the first part of what jumbo needs, and it's actually implemented similarly to the way it was done in the JumboSupport patch on github. It's basically filtering out declarations that aren't supposed to be visible during name lookup.

The second part is avoiding name mangling collisions. It seemed pretty simple to us to extend both name manglers to include a unique module id in the names of all internal linkage symbols, so 'static int f() { return 42; }' becomes _ZL1fv.1 (add .1, .2, etc). c++filt already knows how to demangle those, so that will just work. This wouldn't break any existing users, because after all, these are things with internal linkage, the names shouldn't matter as long as they look nice in the debugger.

The last thing is to make it so that all included headers not listed in the jumbo file (or perhaps on the command line) are in one global module. We weren't able to find a way to express this today with module maps, but I don't think it would be too hard to do.

---

We also discussed how we could, in the long run, get the compile time benefits of jumbo builds without the semantic changes. The basic idea is that every "modular header", i.e. a header that can successfully parse by itself with only command line macros defined, could be its own module. Again, we're not talking about AST serialization, just changing name lookup rules. It's just a module for name lookup purposes. In order for this to work, all code needs to follow very strict include-what-you-use rules: transitive includes wouldn't be visible from indirect users of a header. Obviously, we are not in this world today, but it's one we could work towards.

Once the codebase follows IWYU, then it shouldn't matter (barring bugs, of which I'm sure there will be many) what the jumbo factor is. Ignoring resource exhaustion, a build that succeeds with a jumbo factor of 50 should also succeed with a jumbo factor of 1. Devs can work locally with jumbo and not worry about forgetting includes that they happen to get transitively.

On Tue, Apr 10, 2018 at 5:12 AM Mostyn Bramley-Moore via cfe-dev <[hidden email]> wrote:

Hi,


I am a member of a small group of Chromium developers who are working on adding a unity build[1] setup to Chromium[2], in order to reduce the project's long and ever-increasing compile times.  We're calling these "jumbo" builds, because this term is not as overloaded as "unity".


We're slowly making progress, but find that a lot of our time is spent renaming things in anonymous namespaces- it would be much simpler if it was possible to automatically treat these as if they were file-local.   Jens Widell has put together a proof-of-concept which appears to work reasonably well, it consists of a clang plugin and a small clang patch:

https://github.com/jensl/llvm-project-20170507/tree/wip/jumbo-support/v1

https://github.com/jensl/llvm-project-20170507/commit/a00d5ce3f20bf1c7a41145be8b7a3a478df9935f


After building clang and the plugin, you generate jumbo source files that look like:


jumbo_source_1.cc:


#pragma jumbo

#include "real_source_file_1.cc"

#include "real_source_file_2.cc"

#include "real_source_file_3.cc"


Then, you compile something like this:

clang++ -c jumbo_source_1.cc -Xclang -load -Xclang lib/JumboSupport.so -Xclang -add-plugin -Xclang jumbo-support


The plugin gives unique names[3] to the anonymous namespaces without otherwise changing their semantics, and also #undef's macros defined in each top-level source file before processing the next top-level source file.  That way header files can still define macros that are used in multiple source files in the jumbo translation unit. Collisions between macros defined in header files and names used in other headers and other source files are still possible, but less likely.


To show how much these two changes help, here's a patch to make Chromium's network code build in jumbo mode:

https://chromium-review.googlesource.com/c/chromium/src/+/966523 (+352/-377 lines)


And here's the corresponding patch using the proof-of-concept JumboSupport plugin:

https://chromium-review.googlesource.com/c/chromium/src/+/962062 (+53/-52 lines)


It seems clear that the version using the JumboSupport plugin would require less effort to create, review and merge into the codebase.  We have a few other feature ideas, but these two changes seem to do most of the work for us.


So now we're trying to figure out the best way forward- would a feature like this be welcome to the Clang project?  And if so, how would you recommend that we go about it? We would prefer to do this in a way that does not require a locally patched Clang and could live with building a custom plugin, although implementing this entirely in Clang would be even better.


Thanks,



-Mostyn.



[1] If you're not familiar with unity builds, the idea is to compile multiple source files per compiler invocation, reducing the overhead of processing header files (which can be surprisingly high).  We do this by taking a list of the source files in a target and generating "jumbo" source files that #include multiple "real" source files, and then we feed these jumbo files to the compiler one at a time.  This way, we don't prevent the usage of valuable build tools like ccache and icecc that only support a single source file on the command line.


[2] Daniel Bratell has a summary of our progress jumbo-ifying the Chromium codebase here:

https://docs.google.com/document/d/19jGsZxh7DX8jkAKbL1nYBa5rcByUL2EeidnYsoXfsYQ/edit#


[3] The JumboSupport plugin assigns names to the anonymous namespaces in a given file:  foo::(anonymous namespace)::bar is replaced with a symbol name of the form foo::__anonymous_<number>::bar where <number> is unique to the file within the jumbo translation unit.  Due to the internal linkage of these symbols, <number> does not need to be unique across multiple object files/jumbo source files.


--
Mostyn Bramley-Moore
Vewd Software
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: JumboSupport: making unity builds easier in Clang

Matthieu Brucher via cfe-dev

This assumes the exported interface for any header is directly defined in that header, not deferred to another implementation-detail header.

I have a vague memory that some C++ library interfaces are actually available via more than one header; possibly the stream stuff?  If I'm wrong, fine; if I'm right, this approach still needs some fine tuning.

--paulr

 

We also discussed how we could, in the long run, get the compile time benefits of jumbo builds without the semantic changes. The basic idea is that every "modular header", i.e. a header that can successfully parse by itself with only command line macros defined, could be its own module. Again, we're not talking about AST serialization, just changing name lookup rules. It's just a module for name lookup purposes. In order for this to work, all code needs to follow very strict include-what-you-use rules: transitive includes wouldn't be visible from indirect users of a header. Obviously, we are not in this world today, but it's one we could work towards.

 


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: JumboSupport: making unity builds easier in Clang

Matthieu Brucher via cfe-dev
In reply to this post by Matthieu Brucher via cfe-dev
On Fri, Apr 27, 2018 at 1:23 AM, Reid Kleckner <[hidden email]> wrote:

> Hans, Richard, and I spent some more time discussing this today, and we came
> to the conclusion that this could absolutely be built partially with
> existing modules functionality. In this case, by "module" I'm not referring
> to a chunk of serialized AST, I'm just referring to the in-memory data
> structures that clang uses to control name lookup.
>
> The idea is that each .cpp file can be its own module, and all headers would
> be part of a global module. Each .cpp file is only allowed to look up names
> in the global module. My understanding is that this is where
> -fmodules-local-submodules-visibility comes into play, although I'm not
> clear on the details. This symbol hiding is the first part of what jumbo
> needs, and it's actually implemented similarly to the way it was done in the
> JumboSupport patch on github. It's basically filtering out declarations that
> aren't supposed to be visible during name lookup.

So, I've been playing around a bit. (Keep in mind here that I'm very
new to the clang code base, and even newer when it comes to anything
to do with modules.)

First off, general discussion:

I realize that the modules feature is the current feature of clang's
most suited to be extended for unity building, rather than adding
something completely new. But given that unity building could be seen
as something of a hack, applied to "legacy" code bases not yet taking
advantage of the shiny future where modules are usable... is it a good
idea to add complexity to the modules feature? Rather than treating
unity building support as a separate small feature, that is?

It seems the module feature can be tricked into doing some seemingly
useful things for unity building already (see below) but I have no
idea how much more would be needed. And I also don't know if "abusing"
a compiler feature this way is good long-term. Will this break as the
modules feature is being developed in clang, because we depended on
the internals of it? Will developing the modules feature in clang
become more difficult because it is being abused by the Chromium
project to do unity building?

(These are honest question; I'm not familiar enough with any of this
to claim to know the answers.)


Experiment report:

My test case is `test.cc`, which includes `test1.cc` and `test2.cc`,
where `test1.cc` and `test2.cc` each define a function named `f` in
the anonymous namespace, and define a function in the global namespace
(with different names) that call `f`. Normally, this of course fails
to compile due to conflicting definitions of `f`. They also contain
conflicting definitions of a macro, and both define `enum Foo { FOO
};` in the anonymous namespace, which of course also normally leads to
warnings/errors.

In my test, I'm calling clang with `-Xclang
-fmodules-local-submodule-visibility -fmodule-name=test
-fmodule-map-file=test.modulemap`.

My `test.modulemap` contains

 module test {
  module test1 {
   header "test1.cc"
  }
  module test2 {
   header "test2.cc"
  }
 }

And in `test.cc`, I've surrounded the includes with `#pragma clang
module begin test.testX`/`#pragma clang module end`.

So far, all of this seems doable in Chromium's jumbo mechanism; adding
compiler arguments is fine, and the module map file could be generated
alongside the source file that includes the real source files. And
generating some pragmas there is fine, of course.

The test result is that in `test2.cc`, lookup of `f` fails.
Specifically, `error: use of undeclared identifier 'f'`. But there are
no complaints about conflicting declarations.

However, if I expand this minimal test case a bit, by including a
common header (with an include guard) that declares a function
(`print()`) that both `test1.cc` and `test2.cc` call, things seem to
fall apart a bit. (But you did note that includes needed to be
handled.)

I then get errors like

  In file included from test.cc:15:
 ./test2.cc:12:3: error: declaration of 'print' must be imported from
module 'test.test1' before it is required
    print("second f()");
    ^
  ./test.h:4:6: note: previous declaration is here
  void print(const char*);
       ^

and

  In file included from test.cc:15:
  ./test2.cc:22:3: warning: ambiguous use of internal linkage
declaration 'f' defined in multiple modules
[-Wmodules-ambiguous-internal-linkage]
    f();
    ^
  ./test1.cc:11:6: note: declared here in module 'test.test1'
  void f() {
       ^
  ./test2.cc:11:6: note: declared here in module 'test.test2'
  void f() {
       ^

Oddly enough, `f` went from being undeclared in `test2.cc` to now
having an ambigious declaration. But maybe I just triggered an
"earlier" error that hid that one.


> The second part is avoiding name mangling collisions. It seemed pretty
> simple to us to extend both name manglers to include a unique module id in
> the names of all internal linkage symbols, so 'static int f() { return 42;
> }' becomes _ZL1fv.1 (add .1, .2, etc). c++filt already knows how to demangle
> those, so that will just work. This wouldn't break any existing users,
> because after all, these are things with internal linkage, the names
> shouldn't matter as long as they look nice in the debugger.

That seems like a nicer approach than mine, for sure.

--
Jens
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: JumboSupport: making unity builds easier in Clang

Matthieu Brucher via cfe-dev
Hi all,

I'd like to summarize this thread (so far) and try to agree on a way forward.

First, we had an interesting discussion on unity building in general,
or perhaps rather other, more "modern", ways to achieve the same or
similar effects. In the short term, this discussion is mostly
academic; the fact is that the Chromium project already has a unity
build configuration as described at the start of this thread, and it's
in "production" use. We're proposing these changes to Clang to reduce
an existing maintenance cost.

Second, we've had two principal proposed ways to support unity builds in Clang:

1) The "custom" but ultimately pretty trivial, in terms of changes to
the core Clang code, way implemented by my proof-of-concept patch.

2) An thus-far nowhere implemented way built on top of Clang's support
for C++ modules.

I'm not qualified to tell how much work (2) would be, how big the
changes to Clang would be, or how much of a maintenance burden this
would be on the Clang project, but it sounds more complicated to me,
so I'm inclined to think that (2) > (1) in all of those metrics.

I will also not be able to implement (2), because of the inevitably
limited nature of time.

So, if I don't receive push-back here (e.g. someone committing to work
on (2)), I'll proceed to submit patches for (1).

--
Jens
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: JumboSupport: making unity builds easier in Clang

Matthieu Brucher via cfe-dev
Hi Jens,

On Fri, May 18, 2018 at 9:17 AM, Jens Widell via cfe-dev
<[hidden email]> wrote:

> Hi all,
>
> I'd like to summarize this thread (so far) and try to agree on a way forward.
>
> First, we had an interesting discussion on unity building in general,
> or perhaps rather other, more "modern", ways to achieve the same or
> similar effects. In the short term, this discussion is mostly
> academic; the fact is that the Chromium project already has a unity
> build configuration as described at the start of this thread, and it's
> in "production" use. We're proposing these changes to Clang to reduce
> an existing maintenance cost.
>
> Second, we've had two principal proposed ways to support unity builds in Clang:
>
> 1) The "custom" but ultimately pretty trivial, in terms of changes to
> the core Clang code, way implemented by my proof-of-concept patch.
>
> 2) An thus-far nowhere implemented way built on top of Clang's support
> for C++ modules.
>
> I'm not qualified to tell how much work (2) would be, how big the
> changes to Clang would be, or how much of a maintenance burden this
> would be on the Clang project, but it sounds more complicated to me,
> so I'm inclined to think that (2) > (1) in all of those metrics.
>
> I will also not be able to implement (2), because of the inevitably
> limited nature of time.
>
> So, if I don't receive push-back here (e.g. someone committing to work
> on (2)), I'll proceed to submit patches for (1).

My concern is whether (1) would carry its own weight. Even though it's
fairly simple, would it be useful enough for projects besides Chromium
that it warrants the complexity of having it in the tree? Are there
other problems it would need to handle, like explicit template
instantiation declarations and definitions that weren't intended to
end up in the same translation unit.. I feel there could be lots of
problems.

But if it's possible to implement this in a very simple and clean way
(that also handles static functions and variables the same as
anonymous namespaces) that works well for many projects? If so it
seems hard to argue against it.

For Chromium, part of me hopes we could do something better though.
Maybe using proper modules, or maybe just by restructuring parts of
Blink.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: JumboSupport: making unity builds easier in Clang

Matthieu Brucher via cfe-dev
On Tue, May 22, 2018 at 2:04 PM, Hans Wennborg <[hidden email]> wrote:

> Hi Jens,
>
> On Fri, May 18, 2018 at 9:17 AM, Jens Widell via cfe-dev
> <[hidden email]> wrote:
>> Hi all,
>>
>> I'd like to summarize this thread (so far) and try to agree on a way forward.
>>
>> First, we had an interesting discussion on unity building in general,
>> or perhaps rather other, more "modern", ways to achieve the same or
>> similar effects. In the short term, this discussion is mostly
>> academic; the fact is that the Chromium project already has a unity
>> build configuration as described at the start of this thread, and it's
>> in "production" use. We're proposing these changes to Clang to reduce
>> an existing maintenance cost.
>>
>> Second, we've had two principal proposed ways to support unity builds in Clang:
>>
>> 1) The "custom" but ultimately pretty trivial, in terms of changes to
>> the core Clang code, way implemented by my proof-of-concept patch.
>>
>> 2) An thus-far nowhere implemented way built on top of Clang's support
>> for C++ modules.
>>
>> I'm not qualified to tell how much work (2) would be, how big the
>> changes to Clang would be, or how much of a maintenance burden this
>> would be on the Clang project, but it sounds more complicated to me,
>> so I'm inclined to think that (2) > (1) in all of those metrics.
>>
>> I will also not be able to implement (2), because of the inevitably
>> limited nature of time.
>>
>> So, if I don't receive push-back here (e.g. someone committing to work
>> on (2)), I'll proceed to submit patches for (1).
>
> My concern is whether (1) would carry its own weight. Even though it's
> fairly simple, would it be useful enough for projects besides Chromium
> that it warrants the complexity of having it in the tree? Are there
> other problems it would need to handle, like explicit template
> instantiation declarations and definitions that weren't intended to
> end up in the same translation unit.. I feel there could be lots of
> problems.

So, we/I never really intended for the Clang extension to completely
fix all issues that are or can be caused by combining source files
into a single compilation unit. Sure, that would be awesome to have,
but the intention was something that addressed some major annoyances,
and thus makes the unity build configuration less burdensome to
support.

The complexity of a solution, any solution, that addresses all
possible issues may well be unfeasible to have in the Clang source
tree. I don't have such a solution to look at, and never intended to
produce one, so I can't really say.

I don't really know about other projects. Chromium certainly isn't the
first project to use unity builds, but I don't know whether other
projects would find a Clang extension useful, or if they've worked
around the issues in other ways. The Chromium code base is in certain
ways rather painful to apply unity builds to, which would not
necessarily apply to all code bases, but also surely isn't
particularly unique.


> But if it's possible to implement this in a very simple and clean way
> (that also handles static functions and variables the same as
> anonymous namespaces) that works well for many projects? If so it
> seems hard to argue against it.

I think my proof-of-concept is fairly simple and clean, in particular
the parts that's in Clang and not the plugin.

It does not handle static functions or variables. Would that be a
specific requirement for this extension to be acceptable? Would
anything else?

> For Chromium, part of me hopes we could do something better though.
> Maybe using proper modules, or maybe just by restructuring parts of
> Blink.

And I'm of course in no way against that.

But we've had a huge compilation time issue for a long while. And
we've been using a very effective solution to address that issue for
quite some time. At this point we're dealing with the maintenance cost
of the existing solution. Any discussion about alternative solutions
ought to happen in parallel with discussions about improving the
existing solution, I think.

--
Jens
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: JumboSupport: making unity builds easier in Clang

Matthieu Brucher via cfe-dev
On Tue, May 22, 2018 at 4:19 PM, Jens Widell via cfe-dev <[hidden email]> wrote:
On Tue, May 22, 2018 at 2:04 PM, Hans Wennborg <[hidden email]> wrote:
> Hi Jens,
>
> On Fri, May 18, 2018 at 9:17 AM, Jens Widell via cfe-dev
> <[hidden email]> wrote:
>> Hi all,
>>
>> I'd like to summarize this thread (so far) and try to agree on a way forward.
>>
>> First, we had an interesting discussion on unity building in general,
>> or perhaps rather other, more "modern", ways to achieve the same or
>> similar effects. In the short term, this discussion is mostly
>> academic; the fact is that the Chromium project already has a unity
>> build configuration as described at the start of this thread, and it's
>> in "production" use. We're proposing these changes to Clang to reduce
>> an existing maintenance cost.
>>
>> Second, we've had two principal proposed ways to support unity builds in Clang:
>>
>> 1) The "custom" but ultimately pretty trivial, in terms of changes to
>> the core Clang code, way implemented by my proof-of-concept patch.
>>
>> 2) An thus-far nowhere implemented way built on top of Clang's support
>> for C++ modules.
>>
>> I'm not qualified to tell how much work (2) would be, how big the
>> changes to Clang would be, or how much of a maintenance burden this
>> would be on the Clang project, but it sounds more complicated to me,
>> so I'm inclined to think that (2) > (1) in all of those metrics.
>>
>> I will also not be able to implement (2), because of the inevitably
>> limited nature of time.
>>
>> So, if I don't receive push-back here (e.g. someone committing to work
>> on (2)), I'll proceed to submit patches for (1).
>
> My concern is whether (1) would carry its own weight. Even though it's
> fairly simple, would it be useful enough for projects besides Chromium
> that it warrants the complexity of having it in the tree? Are there
> other problems it would need to handle, like explicit template
> instantiation declarations and definitions that weren't intended to
> end up in the same translation unit.. I feel there could be lots of
> problems.

So, we/I never really intended for the Clang extension to completely
fix all issues that are or can be caused by combining source files
into a single compilation unit. Sure, that would be awesome to have,
but the intention was something that addressed some major annoyances,
and thus makes the unity build configuration less burdensome to
support.

The complexity of a solution, any solution, that addresses all
possible issues may well be unfeasible to have in the Clang source
tree. I don't have such a solution to look at, and never intended to
produce one, so I can't really say.

I don't really know about other projects. Chromium certainly isn't the
first project to use unity builds, but I don't know whether other
projects would find a Clang extension useful, or if they've worked
around the issues in other ways. The Chromium code base is in certain
ways rather painful to apply unity builds to, which would not
necessarily apply to all code bases, but also surely isn't
particularly unique.

Anecdotally, unity builds seem to be used most often on proprietary codebases (eg in the games industry) so they're somewhat hidden.  Reid mentioned previous discussions with Ubisoft(?) earlier in this thread- perhaps we can dig up some contacts and reach out to them?
 
> But if it's possible to implement this in a very simple and clean way
> (that also handles static functions and variables the same as
> anonymous namespaces) that works well for many projects? If so it
> seems hard to argue against it.

I think my proof-of-concept is fairly simple and clean, in particular
the parts that's in Clang and not the plugin.

It does not handle static functions or variables. Would that be a
specific requirement for this extension to be acceptable? Would
anything else?

> For Chromium, part of me hopes we could do something better though.
> Maybe using proper modules, or maybe just by restructuring parts of
> Blink.

And I'm of course in no way against that.

But we've had a huge compilation time issue for a long while. And
we've been using a very effective solution to address that issue for
quite some time. At this point we're dealing with the maintenance cost
of the existing solution. Any discussion about alternative solutions
ought to happen in parallel with discussions about improving the
existing solution, I think.

Migrating a large project like Chromium to modules sounds interesting, but would be an incredible amount of work with relatively high risk of failure.  I have been looking for field reports of teams using Clang's C++ modules but have only been able to find relatively small experiments that I would not like to extrapolate too far from.  And our existing build tools are unlikely to work (ccache, icecc etc), so there's a large productivity hump to get over before we would ever be able to see any benefit.  


-Mostyn.

--
Jens
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



--
Mostyn Bramley-Moore
Vewd Software

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: JumboSupport: making unity builds easier in Clang

Matthieu Brucher via cfe-dev
In reply to this post by Matthieu Brucher via cfe-dev
On Tue, May 22, 2018 at 4:19 PM, Jens Widell <[hidden email]> wrote:

> On Tue, May 22, 2018 at 2:04 PM, Hans Wennborg <[hidden email]> wrote:
>> Hi Jens,
>>
>> On Fri, May 18, 2018 at 9:17 AM, Jens Widell via cfe-dev
>> <[hidden email]> wrote:
>>> Hi all,
>>>
>>> I'd like to summarize this thread (so far) and try to agree on a way forward.
>>>
>>> First, we had an interesting discussion on unity building in general,
>>> or perhaps rather other, more "modern", ways to achieve the same or
>>> similar effects. In the short term, this discussion is mostly
>>> academic; the fact is that the Chromium project already has a unity
>>> build configuration as described at the start of this thread, and it's
>>> in "production" use. We're proposing these changes to Clang to reduce
>>> an existing maintenance cost.
>>>
>>> Second, we've had two principal proposed ways to support unity builds in Clang:
>>>
>>> 1) The "custom" but ultimately pretty trivial, in terms of changes to
>>> the core Clang code, way implemented by my proof-of-concept patch.
>>>
>>> 2) An thus-far nowhere implemented way built on top of Clang's support
>>> for C++ modules.
>>>
>>> I'm not qualified to tell how much work (2) would be, how big the
>>> changes to Clang would be, or how much of a maintenance burden this
>>> would be on the Clang project, but it sounds more complicated to me,
>>> so I'm inclined to think that (2) > (1) in all of those metrics.
>>>
>>> I will also not be able to implement (2), because of the inevitably
>>> limited nature of time.
>>>
>>> So, if I don't receive push-back here (e.g. someone committing to work
>>> on (2)), I'll proceed to submit patches for (1).
>>
>> My concern is whether (1) would carry its own weight. Even though it's
>> fairly simple, would it be useful enough for projects besides Chromium
>> that it warrants the complexity of having it in the tree? Are there
>> other problems it would need to handle, like explicit template
>> instantiation declarations and definitions that weren't intended to
>> end up in the same translation unit.. I feel there could be lots of
>> problems.
>
> So, we/I never really intended for the Clang extension to completely
> fix all issues that are or can be caused by combining source files
> into a single compilation unit. Sure, that would be awesome to have,
> but the intention was something that addressed some major annoyances,
> and thus makes the unity build configuration less burdensome to
> support.

Right, and that's fine for a plugin, but then it becomes hard to
motivate changes to Clang itself.

I'm starting to think that maybe one way forward is trying to figure
out ways to do what you need to do from the plugin without changing
Clang itself. After all, it has very good access to the AST and other
Clang internals.

For example, instead of introducing NamespaceDecl::IsDisabled to hide
the names, maybe the plugin could set Decl::IdentifierNamespace to
zero on the decls that should be hidden. (That's not a public member,
but perhaps it's possible to get around that obstacle.)

The mangling problem is harder to solve from the plugin obviously, but
perhaps one way could be that after each "main" file is finished, to
walk the LLVM Module and rename internal symbols to something
file-specific.

These are hacks of course, but if they're effective maybe that's OK
until some more sophisticated solution is ready.

 - Hans
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
12