Quantcast

Separate preprocess and compile: hack or feature?

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Separate preprocess and compile: hack or feature?

Lang Hames via cfe-dev
Hi,

In the build system I am working on we are looking at always performing
the preprocessing and then C/C++ compilation as two separate clang/clang++
invocations. The main reason is support for distributed compilation but
see here[1] for other reasons.

I realize that tools like ccache/distcc have been relying on this for
a while (though see the 'direct' mode in ccache and 'pump' in distcc).
However, some compilers apparently do not support this (for example,
VC; see the above link for details).

So I wonder, in the context of Clang, if this is just a hack that
happens to work "for now" or if this is a feature that is expected
to continue to work?

Also, has anyone seen/heard of any real-world issues with compiling
preprocessed source code?

[1] https://www.reddit.com/r/cpp/comments/6abi99/rfc_issues_with_separate_preprocess_and_compile/


Thanks,
Boris
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Separate preprocess and compile: hack or feature?

Lang Hames via cfe-dev
On Thu, May 11, 2017 at 11:05:43AM +0200, Boris Kolpackov via cfe-dev wrote:
> In the build system I am working on we are looking at always performing
> the preprocessing and then C/C++ compilation as two separate clang/clang++
> invocations. The main reason is support for distributed compilation but
> see here[1] for other reasons.

It is strongly recommented to *not* separate them. A lot of warnings are
sensitive to macros, i.e. will not trigger for patterns created by macro
use etc. A very basic example is

  if (FOO(x))

will not warn, but if FOO(x) expands to (x) as recommented, you get

  if ((x))

which will get a warning for double brackets without assignment. There
is the option of using the rewrite mode (-E -frewrite-includes), which
is somewhat of a compromise.

Joerg
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Separate preprocess and compile: hack or feature?

Lang Hames via cfe-dev
Hi Joerg,

Joerg Sonnenberger writes:

> There is the option of using the rewrite mode (-E -frewrite-includes),
> which is somewhat of a compromise.

Thanks, looks like similar to GCC's -fdirectives-only except that one
also handles #ifdef, etc.

Do you know if there a way to achieve something similar in Clang? That
is, remove fragments that will be preprocessed out.

Thanks,
Boris
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Separate preprocess and compile: hack or feature?

Lang Hames via cfe-dev
In reply to this post by Lang Hames via cfe-dev
Most distributed build systems I know about end up writing their own custom preprocessor to very quickly discover which .h files are included by a cc file (you don't need a full preprocessor for getting just that, and so you can be faster than clang -E), and then send .h and .cc files to the server based on content hashes, so that you don't need to send the full preprocessed text, but can send source files before preprocessing. https://github.com/facebookarchive/warp was a somewhat recent example of this (but also, as you say, pump mode, and proprietary systems). (Your thread mentions that you do this for -M / /showIncludes, but you can just do this as part of regular compilation – not sure why you need this in a separate process?)

So while this doesn't answer your question, I'd expect that you won't need it, eventually :-)

On Thu, May 11, 2017 at 5:05 AM, Boris Kolpackov via cfe-dev <[hidden email]> wrote:
Hi,

In the build system I am working on we are looking at always performing
the preprocessing and then C/C++ compilation as two separate clang/clang++
invocations. The main reason is support for distributed compilation but
see here[1] for other reasons.

I realize that tools like ccache/distcc have been relying on this for
a while (though see the 'direct' mode in ccache and 'pump' in distcc).
However, some compilers apparently do not support this (for example,
VC; see the above link for details).

So I wonder, in the context of Clang, if this is just a hack that
happens to work "for now" or if this is a feature that is expected
to continue to work?

Also, has anyone seen/heard of any real-world issues with compiling
preprocessed source code?

[1] https://www.reddit.com/r/cpp/comments/6abi99/rfc_issues_with_separate_preprocess_and_compile/


Thanks,
Boris
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Separate preprocess and compile: hack or feature?

Lang Hames via cfe-dev
Hi Nico,

Nico Weber <[hidden email]> writes:

> Most distributed build systems I know about end up writing their own custom
> preprocessor to very quickly discover which .h files are included by a cc
> file (you don't need a full preprocessor for getting just that, and so you
> can be faster than clang -E), and then send .h and .cc files to the server
> based on content hashes, so that you don't need to send the full
> preprocessed text, but can send source files before preprocessing.
> https://github.com/facebookarchive/warp was a somewhat recent example of
> this (but also, as you say, pump mode, and proprietary systems).

One property that these build systems rely on is a very controlled
environment (e.g., single compiler, all hosts have exactly the same
headers, etc). I would much rather trade some speed for using standard
and robust tooling.

Also, I saw it mentioned (I think in the pump's documentation) that
local preprocessing is a lot less of an issue on modern hardware. I
bet SSDs made quite a difference.


> Your thread mentions that you do this for -M / /showIncludes, but
> you can just do this as part of regular compilation – not sure why
> you need this in a separate process?

We do it this way to handle auto-generated headers.


Thanks for the feedback,
Boris
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Separate preprocess and compile: hack or feature?

Lang Hames via cfe-dev
On Thu, May 11, 2017 at 10:42 AM, Boris Kolpackov via cfe-dev <[hidden email]> wrote:
Hi Nico,

Nico Weber <[hidden email]> writes:

> Most distributed build systems I know about end up writing their own custom
> preprocessor to very quickly discover which .h files are included by a cc
> file (you don't need a full preprocessor for getting just that, and so you
> can be faster than clang -E), and then send .h and .cc files to the server
> based on content hashes, so that you don't need to send the full
> preprocessed text, but can send source files before preprocessing.
> https://github.com/facebookarchive/warp was a somewhat recent example of
> this (but also, as you say, pump mode, and proprietary systems).

One property that these build systems rely on is a very controlled
environment (e.g., single compiler, all hosts have exactly the same
headers, etc). I would much rather trade some speed for using standard
and robust tooling.

Also, I saw it mentioned (I think in the pump's documentation) that
local preprocessing is a lot less of an issue on modern hardware. I
bet SSDs made quite a difference.

It's still an issue, because you will end up sending the pre-processed file over the network. Time has shown that the transitive include closure of a C++ file scales linearly with the size of the codebase, so the bigger the project, the more time you spend sending 10MB .ii files over the wire.

As you say, pre-processing is more robust than trying to send each header individually, set them up on the remote builder, and cache them, but it does leave performance on the table.

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Separate preprocess and compile: hack or feature?

Lang Hames via cfe-dev
In reply to this post by Lang Hames via cfe-dev

 

As the maintainer of icecream, I have no interest in maintaining a separate preprocessor. It seems like a nightmare to maintain all the special cases, I also see modules is coming in the future C++ standard which I would have to understand. (and them modules 2 after that which might or might not be compatible enough for me)

 

Now if clang is willing to maintain a fast preprocessor that runs quick and spits out a list of files that I need to package up for my distributed build – I’m interested.  (I figure you already have to maintain a working preprocessor which means a lot of potential bugs are fixed – but I realize this would mean a number of special cases and I don’t know if you want to maintain that)

 

From: cfe-dev [[hidden email]] On Behalf Of Nico Weber via cfe-dev
Sent: Thursday, May 11, 2017 12:15 PM
To: Boris Kolpackov <[hidden email]>
Cc: cfe-dev <[hidden email]>
Subject: Re: [cfe-dev] Separate preprocess and compile: hack or feature?

 

Most distributed build systems I know about end up writing their own custom preprocessor to very quickly discover which .h files are included by a cc file (you don't need a full preprocessor for getting just that, and so you can be faster than clang -E), and then send .h and .cc files to the server based on content hashes, so that you don't need to send the full preprocessed text, but can send source files before preprocessing. https://github.com/facebookarchive/warp was a somewhat recent example of this (but also, as you say, pump mode, and proprietary systems). (Your thread mentions that you do this for -M / /showIncludes, but you can just do this as part of regular compilation – not sure why you need this in a separate process?)

 

So while this doesn't answer your question, I'd expect that you won't need it, eventually :-)

 

On Thu, May 11, 2017 at 5:05 AM, Boris Kolpackov via cfe-dev <[hidden email]> wrote:

Hi,

In the build system I am working on we are looking at always performing
the preprocessing and then C/C++ compilation as two separate clang/clang++
invocations. The main reason is support for distributed compilation but
see here[1] for other reasons.

I realize that tools like ccache/distcc have been relying on this for
a while (though see the 'direct' mode in ccache and 'pump' in distcc).
However, some compilers apparently do not support this (for example,
VC; see the above link for details).

So I wonder, in the context of Clang, if this is just a hack that
happens to work "for now" or if this is a feature that is expected
to continue to work?

Also, has anyone seen/heard of any real-world issues with compiling
preprocessed source code?

[1] https://www.reddit.com/r/cpp/comments/6abi99/rfc_issues_with_separate_preprocess_and_compile/


Thanks,
Boris
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

 


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Separate preprocess and compile: hack or feature?

Lang Hames via cfe-dev
In reply to this post by Lang Hames via cfe-dev
---- On Thu, 11 May 2017 02:05:43 -0700 Boris Kolpackov via cfe-dev <[hidden email]> wrote ----

Also, has anyone seen/heard of any real-world issues with compiling
preprocessed source code?


I think the short version of my answer is: There are pitfalls, but it may work well enough for your purposes. You may want to give your users the option to combine the preprocess and compile into a single step.

In theory, having separate preprocess and compile steps should work. A preprocessed C file is just like a non-preprocessed C file that happens not to use any preprocessor features. The C preprocessor is also used for other purposes than preprocessing C code. For example, on Unix-like systems, it is not uncommon to run assembly programs through the C preprocessor. So there is reason to believe that the C preprocessor will continue to be available to run separate from the C compiler, and that the C compiler will continue to grok files that come out of the C preprocessor. Similarly for C++.

Others have already pointed out some cases where things aren't quite that clean. I would like to add that, in my short experience working on Warp, I found that there is a lot of interdependency between the preprocessor and the compiler and the flags that are being passed to the compiler. For example, compilers like to define version macros and sometimes feature-test macros. Other macros end up being defined based on flags passed to the compiler. For example passing -mavx to gcc causes __AVX__ to be defined. So if you want to separate the preprocess step from the compile step, you have to make sure that everything that affects the preprocessor output matches between the preprocessor invocation and the compiler invocation.

Bob


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Separate preprocess and compile: hack or feature?

Lang Hames via cfe-dev
In reply to this post by Lang Hames via cfe-dev
Hi Reid,

Reid Kleckner <[hidden email]> writes:

> It's still an issue, because you will end up sending the pre-processed file
> over the network. Time has shown that the transitive include closure of a
> C++ file scales linearly with the size of the codebase, so the bigger the
> project, the more time you spend sending 10MB .ii files over the wire.

True. You can probably get /5 reduction by compressing it with something
cheap like lzo so that gives ~50 .ii files/sec over 1Gbps link.

Also, isn't there the same problem with getting the object files shipped
back? Here are some quick numbers I got from one of the "heavier" TU in
build2:

target.i      5MB (-E -frewrite-includes)
target.i.lzo  1MB

target.o      3MB
target.o.lzo  1MB

Thanks,
Boris
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Separate preprocess and compile: hack or feature?

Lang Hames via cfe-dev
In reply to this post by Lang Hames via cfe-dev
Hi Bob,

Bob Haarman <[hidden email]> writes:

> You may want to give your users the option to combine the preprocess
> and compile into a single step.

Yes, that's the current plan. The question is whether it should be on
or off by default. I think we will start with off and see what happens.


> So if you want to separate the preprocess step from the compile step,
> you have to make sure that everything that affects the preprocessor
> output matches between the preprocessor invocation and the compiler
> invocation.

Right, that's one of the main reasons we really don't want to go the
custom preprocessor route.

Boris
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Separate preprocess and compile: hack or feature?

Lang Hames via cfe-dev
On 12 May 2017, at 07:39, Boris Kolpackov via cfe-dev <[hidden email]> wrote:
>
> Bob Haarman <[hidden email]> writes:
>
>> You may want to give your users the option to combine the preprocess
>> and compile into a single step.
>
> Yes, that's the current plan. The question is whether it should be on
> or off by default. I think we will start with off and see what happens.

There are a couple of other interesting points in the design space.  For builds without debug info, you can ship the preprocessed output and re-run it only if you get some compiler warnings.  This has the nice effect that code that compiles without warnings will compile faster, which is a nice incentive.

The other is to run the preprocessor twice.  A quick tests with a trivial Objective-C file that include a huge number of headers[1] took around 0.35 seconds to compile at -O2 and around 0.18 seconds to preprocess with -E -MMD -MD (which spits out the full dependency list).  For any nontrivial source file, the difference between these two is likely to be much larger: if the cost of preprocessing is sufficiently small then it may cost less to run it twice than the speedup you get from distribution.

You might also look at the bmake meta mode work from Juniper, which uses a kernel module to track filesystem accesses to give a complete list of everything that a particular file depends on (including shared libraries linked into bits of the toolchain).  A few people in the FreeBSD packaging team have been exploring using Capsicum to explicitly limit the files that the compiler can access and lazily pull them to the target system on demand (and to avoid accidental dependencies).  If you have enough compile processes per node that some are CPU bound while the others are waiting for the network then this may be a better solution.

David

[1] Cocoa.h is huge:
#include <Cocoa/Cocoa.h>

int main(void)
{
        return 0;
}

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Separate preprocess and compile: hack or feature?

Lang Hames via cfe-dev
Hi David,

David Chisnall <[hidden email]> writes:

> There are a couple of other interesting points in the design space. For
> builds without debug info, you can ship the preprocessed output and
> re-run it only if you get some compiler warnings. This has the nice
> effect that code that compiles without warnings will compile faster,
> which is a nice incentive.

Interesting idea though I believe one of the issues is that you no
longer get warnings if you compile the preprocessed output.


> The other is to run the preprocessor twice.

Not sure how this helps. Are you talking about discovering the
included header set and shipping it along the source (and somehow
recreating the filesystem hierarchy on the remote so that everything
gets included properly)?


> You might also look at the bmake meta mode work from Juniper, which
> uses a kernel module to track filesystem accesses to give a complete
> list of everything that a particular file depends on (including shared
> libraries linked into bits of the toolchain). A few people in the
> FreeBSD packaging team have been exploring using Capsicum to explicitly
> limit the files that the compiler can access and lazily pull them to
> the target system on demand (and to avoid accidental dependencies).

While interesting idea, all this will be very platform/compiler specific.
I am trying hard to avoid that.


Thanks,
Boris
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Loading...