distcc implementation

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

distcc implementation

Mike Miller
Hi,

I'm currently taking a class in Parallel Systems. A major part of the class is a project. I asked on #llvm for project ideas, and Chris Lattner suggested a clang/llvm based distcc implementation. This was previously suggested for a GSoC project, but (according to Chris) has not been implemented yet. I pitched the idea to my professor, and he thought it would be a good project. 

Now, I'm investigating what it would take to build such an implementation. I'm curious about the following:

1. How hard will it be to navigate the LLVM/clang codebase having very little compiler domain knowledge?

2. What stages of the compilation are worth parallelizing(at least for a first step)?

3. Will it be feasible to implement a basic distcc implementation in 1-2 months? There should be 4 or so people working on the project, but none of us have significant compiler domain knowledge. If not, is there a subset of the problem that's worth working on?

4. Are there any examples of code(preferably in real-world projects) which would lend themselves to parallel compilation which come to mind? At the end of the project, we'll need to document the performance of our work, so I'd like to be thinking about how we'd create (good) presentable results along the way.

5. Where should I start? :). Obviously this is a pretty large undertaking, but is there any documentation that I should look at? Any particular source files that would be relevant?

Any other comments about the project would also be greatly appreciated!

Thanks!
Mike

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: distcc implementation

Renato Golin
Hi Mike,

Though I can't answer all your questions, I can start with the basic
ones, given my short experience on building an LLVM compiler from
scratch.

> 1. How hard will it be to navigate the LLVM/clang codebase having very
> little compiler domain knowledge?

I knew very little about compilers when I started and after a month or
so hacking LLVM I knew more about compilers than I would by reading a
book on it. LLVM is very understanding and caring, it has asserts all
over the place and an extensive documentation (tutorials, examples,
doxygen). I felt it was very easy to work with LLVM even when I knew
nothing about compilers or LLVM itself.


> 2. What stages of the compilation are worth parallelizing(at least for a
> first step)?

I'm not an expert, but I think most compilers use "off-line"
parallelization (each step can run separately and independently), like
compiling all C files, then linking them. You can't link without
compiling all sources, you can't compile without parsing the whole
file, you can't parse without pre-processor passes on everything. So,
it's very unlikely that you'll be able to parallelize for real, and be
really different than running "make -j4" with compilation and linking
separately.

Some compilers (like MS, AFAIK) have a "database" of symbols, so it
can incrementally compile and link without doing the whole run. But
that's a big thing. Also, MS controls the development environment
(Visual Studio), so they can do whatever they want. With open source
compilers, it's very hard to force any build system in particular.

You could try to create a system to hold all external symbols (a big,
indexed, object file) that can be partially updated (instead of
re-created) every time a file is compiled. That would avoid long
lasting linkage and would be simple to accommodate to most legacy
build systems.


> 3. Will it be feasible to implement a basic distcc implementation in 1-2
> months? There should be 4 or so people working on the project, but none of
> us have significant compiler domain knowledge. If not, is there a subset of
> the problem that's worth working on?

If no one has compiler expertise, I'd expect them to learn the basics
in around a month. That wouldn't leave much time for the rest of the
project... More people won't increase the speed of learning for each
one... You better get one subset.


> 4. Are there any examples of code(preferably in real-world projects) which
> would lend themselves to parallel compilation which come to mind? At the end
> of the project, we'll need to document the performance of our work, so I'd
> like to be thinking about how we'd create (good) presentable results along
> the way.

I would compare your results with the same non-parallel LLVM running
multiple times with "make -j", otherwise, there is no point in
comparing...


> 5. Where should I start? :). Obviously this is a pretty large undertaking,
> but is there any documentation that I should look at? Any particular source
> files that would be relevant?

No idea. llvm/lib/Linker maybe?

cheers,
--renato
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: distcc implementation

David Chisnall
In reply to this post by Mike Miller
Hi Mike,


On 13 Feb 2010, at 06:09, Mike Miller wrote:

> Now, I'm investigating what it would take to build such an implementation. I'm curious about the following:
>
> 1. How hard will it be to navigate the LLVM/clang codebase having very little compiler domain knowledge?

Clang is a very easy codebase to get to grips with.  I think it took me about a week from starting to read the code to getting my first patch accepted, and that's not particularly unusual.  

> 2. What stages of the compilation are worth parallelizing(at least for a first step)?

As I recall, distcc runs the preprocessor on one machine then ships the preprocessed code to the others.  That means that Amdahl starts to attack you at around 8 nodes (I think - see Chris's slides where he talked about preprocessing time for the real numbers).

There was some talk about parallelising the preprocessing too.  This would require farming out all of the headers files to each of the nodes.  This isn't quite as bad as it seems.  You can cache the files at the distribution end and send a set of timestamps for the system headers with the new source files so that they don't need to be re-requested if they are not modified.  

After preprocessing, the building of the AST, IR generation, and optimisation are all trivial to parallelise, as they are largely independent.  Just ship the preprocessed code to another node and have it run all of these steps.  Until LLVM does native machine code emission, you probably don't want to be generating the binary on the remote node even without LTO, so just ship the (optimised) IR back for linking.

Link-time optimisations probably can't easily be parallelised, because they need to run once all of the other compilation steps have run, and the same is true of linking.  Of course, if you're using a parallel make, you can maybe ship some of these off to different machines (e.g., when building clang, link each of the modules on a separate machine and then only do the final link of the clang tool on one machine.)  This is a bit beyond the scope of distcc, however.

On some systems, particularly compiling Objective-C on OS X, I've noticed that the process creation and tear-down time for the compiler is actually the bottleneck for performance in a lot of cases.  With clang's architecture, you could probably get some performance improvements by keeping the dist-clang processes around and just passing them new data to compile, rather than keeping on spawning new ones.

> 3. Will it be feasible to implement a basic distcc implementation in 1-2 months? There should be 4 or so people working on the project, but none of us have significant compiler domain knowledge. If not, is there a subset of the problem that's worth working on?

It's a pretty open-ended problem.  I think it's probably possible to achieve something useful in 1-2 months, with scope for future improvements.

> 4. Are there any examples of code(preferably in real-world projects) which would lend themselves to parallel compilation which come to mind? At the end of the project, we'll need to document the performance of our work, so I'd like to be thinking about how we'd create (good) presentable results along the way.

Clang and LLVM come to mind.  Big codebase, lots of separate files.  

> 5. Where should I start? :). Obviously this is a pretty large undertaking, but is there any documentation that I should look at? Any particular source files that would be relevant?

I'd start by taking a look in the clang driver.  For this project, you can treat most of the compiler as a black box.  The important thing is getting the source code in and the compiled code out.  You might consider extending the SourceManager stuff in Basic to allow fetching headers over the network (and implement a cache coherency protocol to let these be invalidated if they are modified on the controlling node) if you want to distribute the preprocessing.

Beyond that, you want to make sure that you are constructing an instance of the compiler classes on the remote machine that has the same options (LangOpts mainly) set as the local one would.  If you are doing something more clever than just invoking clang over ssh, you might also want to provide a new DiagnosticClient that reports errors and warnings on the control node.

David

-- Sent from my Cray X1


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: distcc implementation

Holger Schurig
In reply to this post by Mike Miller
> 2. What stages of the compilation are worth parallelizing (at
> least for a first step)?

There are benchmarks out that show you how much time the compiler
spend in which part (preprocessing, parsing, code gen).

You should also spend some time to understand how distcc (or
ccache) for gcc works. AFAIK it goes this way: the source code
get's run on the source machine throught the preprocessor. This
preprocessor reads all the *.h files on the source machine and
generates one huge file. The benefit: the other machines that
help compiling the code won't need the same headers installed.
They just get one file to process. Now they parse it, compile
and, and produce some *.o file.

Please note that distcc 3 adds a new mode, where you don't
preprocess the sources. This make the distribution process
faster, but you need identical system headers on all boxes. But
it's an optional mode. See http://distcc.org for more info.


And that get's transferred back to the source machine, which can
the do the linking once all *.o files arrived.


ccache works similar, it just makes a hash over the preprocessed
code and stores the resulting *.o into a database with this hash
as key. Or, if a *.o with the same hash exists, it hands that
*.o file quickly back, short-cutting the
parsing/code-generation.


> 4. Are there any examples of code(preferably in real-world
> projects) which would lend themselves to parallel compilation
> which come to mind?

Almost all "big" source code bases. If you have a small code base
with only 4 *.c files, it's hardly worth going via distcc. But
if you have 1000 *.c files, it makes a difference :-)   E.g.
compiling LLVM with distcc can greatly speed up the compilation,
but the same is true for Qt, some KDE-Programs, Mozilla,
OpenOffice etc.

I use ccache and distcc also when cross-compiling, with the
OpenEmbedded.org build environment.


> 5. Where should I start? :). Obviously this is a pretty large
> undertaking, but is there any documentation that I should look
> at? Any particular source files that would be relevant?

I'd re-used most of distcc's work, e.g. learn about their
protocol.

Then I would start with the preprocessed (the non-pump) method.
Learn where you can intercept the pre-processed stream. That
should be easy enought, because there's a compiler switch that
does this.

Now you need to intercept this preprocessed stuff, transport it
to the remote site, and compile it there. For this you'll need
to write an llvm-distcc-daemon. You also need to transport the
*.o back. As the real distcc found a solutions for this, you
don't need to re-invent the wheel.

The "driver" on the local box could simply block while the remote
compiles stuff, so someone can run "make -j10" when he has 10
remote boxes (or 5 remote boxes with dual-cores).


Hey, but the fun of such a project is to make a plan by yourself.
Otherwise it's dumb coding of other people's ideas :-)


--
http://www.holgerschurig.de
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev