Intros, C++ modules, and Facebook

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Intros, C++ modules, and Facebook

Don Hinton via cfe-dev
Hi cfe-dev,

My name is Louis Brandy and I work at Facebook. I’ve begun working on
getting clang modules setup in our C++ codebase. Mostly the purpose of
this email is just to introduce myself and let people know what we’re
doing and our motivation, but I’ve also brought a handful of newbie
questions. I’ve gotten the basic integrations into the build system and
have some core projects building modularly. To get this far, I hacked
together a highly unprincipled set of module maps for glibc, libstdc++.
I’m at the point, now, where I need “real” module maps for our std/system
headers.

First, is there any prior art re: glibc and libstdc++ module maps? I don’t
want to repeat any work that’s already been done, and my google-fu failed.

Second, I’m interested in the workflow of actually incrementally adding
module maps to a large codebase. I do understand the need to start at the
bottom but I’m worried about proper coverage, and then prioritizing what
to do next. In particular, I find myself really wanting a “summary” of
what #includes did and did not magically become imports so I can use that
to make sure 1) I’ve not missed anything “below” and 2) to prioritize what
to do next (by e.g. aggregating over a build the most textually included
headers). I don’t think such a diagnostic/remark exists? I’ve not looked
too deeply, yet, at clang-modularize, so perhaps my answers lie over
there?

On a final note, it’s been remarkably easy to get modules up and running
so kudos to everyone who’s gotten it this far.

-Louis

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Intros, C++ modules, and Facebook

Don Hinton via cfe-dev
Hello,

I couldn't find existing module maps for the libraries you mentioned, but you might be able to reuse parts of the modulemap for libc++:

    libcxx/include/module.modulemap

I'm interested in your second question as well. There's a line in "CompilerInstance::loadModule" that updates LastModuleImportLoc. Do you think it'd be worthwhile to dump the module name there, to get an idea of what's been loaded?

I've played around with the "modularize" utility but it's results aren't always usable. If you have the time, I'd love to read a writeup about modularizing large codebases.

vedant

> On Oct 23, 2015, at 10:28 AM, Louis Brandy via cfe-dev <[hidden email]> wrote:
>
> Hi cfe-dev,
>
> My name is Louis Brandy and I work at Facebook. I’ve begun working on
> getting clang modules setup in our C++ codebase. Mostly the purpose of
> this email is just to introduce myself and let people know what we’re
> doing and our motivation, but I’ve also brought a handful of newbie
> questions. I’ve gotten the basic integrations into the build system and
> have some core projects building modularly. To get this far, I hacked
> together a highly unprincipled set of module maps for glibc, libstdc++.
> I’m at the point, now, where I need “real” module maps for our std/system
> headers.
>
> First, is there any prior art re: glibc and libstdc++ module maps? I don’t
> want to repeat any work that’s already been done, and my google-fu failed.
>
> Second, I’m interested in the workflow of actually incrementally adding
> module maps to a large codebase. I do understand the need to start at the
> bottom but I’m worried about proper coverage, and then prioritizing what
> to do next. In particular, I find myself really wanting a “summary” of
> what #includes did and did not magically become imports so I can use that
> to make sure 1) I’ve not missed anything “below” and 2) to prioritize what
> to do next (by e.g. aggregating over a build the most textually included
> headers). I don’t think such a diagnostic/remark exists? I’ve not looked
> too deeply, yet, at clang-modularize, so perhaps my answers lie over
> there?
>
> On a final note, it’s been remarkably easy to get modules up and running
> so kudos to everyone who’s gotten it this far.
>
> -Louis
>
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Intros, C++ modules, and Facebook

Don Hinton via cfe-dev
In reply to this post by Don Hinton via cfe-dev


On Fri, Oct 23, 2015 at 10:28 AM, Louis Brandy via cfe-dev <[hidden email]> wrote:
Hi cfe-dev,

My name is Louis Brandy and I work at Facebook. I’ve begun working on
getting clang modules setup in our C++ codebase. Mostly the purpose of
this email is just to introduce myself and let people know what we’re
doing and our motivation, but I’ve also brought a handful of newbie
questions. I’ve gotten the basic integrations into the build system and
have some core projects building modularly. To get this far, I hacked
together a highly unprincipled set of module maps for glibc, libstdc++.
I’m at the point, now, where I need “real” module maps for our std/system
headers.

First, is there any prior art re: glibc and libstdc++ module maps? I don’t
want to repeat any work that’s already been done, and my google-fu failed.

Richard, I remember in the past we talked and you sent me your glibc module map and small patches. Any chance you could attach your latest ones here? Looking back in my email, you said that you didn't have a libstdc++ module map. Is that still the case?

Louis, I'm going to be setting up a LLVM/Clang buildbot (running linux) that uses modules for building LLVM itself next week or so, so I'll definitely keep you up to date.
 

Second, I’m interested in the workflow of actually incrementally adding
module maps to a large codebase. I do understand the need to start at the
bottom but I’m worried about proper coverage, and then prioritizing what
to do next. In particular, I find myself really wanting a “summary” of
what #includes did and did not magically become imports so I can use that
to make sure 1) I’ve not missed anything “below” and 2) to prioritize what
to do next (by e.g. aggregating over a build the most textually included
headers).

When I first went to investigate how much time is spent in which header, I placed some DTrace probes inside of clang and aggregated the time spent textually within a file across compiler invocations. See the thread "Some DTrace probes for measuring per-file time." (http://lists.llvm.org/pipermail/cfe-dev/2015-April/042334.html)
The raw data that comes out of that DTrace script is a list of pairs {"/path/to/file", total time spent in this file across all compiler invocations}.
I then looked at the data in this Mathematica notebook: https://drive.google.com/file/d/0B8v10qJ6EXRxTWpMTTBnaERQaVU/view?usp=sharing

Note that in that notebook (one of many) I removed the time spent after parsing (basically, codegen time), so the pie chart at the end is a bit deceptive.
I've attached two pie charts that include the time spent after parsing.
The first is a debug build (low optimization). The latter is a a release build (for a release build, a much larger fraction of time is spent in codegen).

If you don't have DTrace available so that you can directly measure the time, you can probably get a decent idea based on the inclusion counts. One easy way to do this is to tally up files mentioned by the -H option whose output you can massage. There is also '.d' files, but I forget exactly what we emit into them (we may emit header file names even if we didn't textually touch the header, but only loaded its module).

Note that for measuring the time, you need to use a timestamp that is virtualized CPU time. If you use real time then you will spuriously count IO latency and other stuff, which will give wrong results (e.g. the total sum of time will appear much larger than is possible).


I don’t think such a diagnostic/remark exists? I’ve not looked
too deeply, yet, at clang-modularize, so perhaps my answers lie over
there?

We have -Wauto-import which is sort of the opposite of this. Adding the reverse warning "warn me when you included a header but didn't know about it from a module map" could probably be done.

-- Sean Silva
 

On a final note, it’s been remarkably easy to get modules up and running
so kudos to everyone who’s gotten it this far.

-Louis

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

LLVM for_zygoloid (default CMake config).png (80K) Download Attachment
LLVM release without asserts.png (78K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Intros, C++ modules, and Facebook

Don Hinton via cfe-dev
In reply to this post by Don Hinton via cfe-dev
On 23/10/15 19:28, Louis Brandy via cfe-dev wrote:

> Second, I’m interested in the workflow of actually incrementally adding
> module maps to a large codebase. I do understand the need to start at the
> bottom but I’m worried about proper coverage, and then prioritizing what
> to do next. In particular, I find myself really wanting a “summary” of
> what #includes did and did not magically become imports so I can use that
> to make sure 1) I’ve not missed anything “below” and 2) to prioritize what
> to do next (by e.g. aggregating over a build the most textually included
> headers). I don’t think such a diagnostic/remark exists? I’ve not looked
> too deeply, yet, at clang-modularize, so perhaps my answers lie over
> there?
AFAIK "clang ... -fmodules ... -E" will tell you which #include was
turned implicitly into an import. From there on, grep could do what you
need ;)

Vassil
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Intros, C++ modules, and Facebook

Don Hinton via cfe-dev
In reply to this post by Don Hinton via cfe-dev
On Sun, Oct 25, 2015 at 2:52 AM Sean Silva via cfe-dev <[hidden email]> wrote:
On Fri, Oct 23, 2015 at 10:28 AM, Louis Brandy via cfe-dev <[hidden email]> wrote:
Hi cfe-dev,

My name is Louis Brandy and I work at Facebook. I’ve begun working on
getting clang modules setup in our C++ codebase. Mostly the purpose of
this email is just to introduce myself and let people know what we’re
doing and our motivation, but I’ve also brought a handful of newbie
questions. I’ve gotten the basic integrations into the build system and
have some core projects building modularly. To get this far, I hacked
together a highly unprincipled set of module maps for glibc, libstdc++.
I’m at the point, now, where I need “real” module maps for our std/system
headers.

First, is there any prior art re: glibc and libstdc++ module maps? I don’t
want to repeat any work that’s already been done, and my google-fu failed.

Richard, I remember in the past we talked and you sent me your glibc module map and small patches. Any chance you could attach your latest ones here? Looking back in my email, you said that you didn't have a libstdc++ module map. Is that still the case?

Louis, I'm going to be setting up a LLVM/Clang buildbot (running linux) that uses modules for building LLVM itself next week or so, so I'll definitely keep you up to date.
 

Second, I’m interested in the workflow of actually incrementally adding
module maps to a large codebase. I do understand the need to start at the
bottom

FYI: you don't need to do a full bottom-up rollout; standard libraries are of course the first priority (libc, stdandard c++ libs, your own base libs), but after that, we do support a middle-out approach.
 
but I’m worried about proper coverage, and then prioritizing what
to do next. In particular, I find myself really wanting a “summary” of
what #includes did and did not magically become imports so I can use that
to make sure 1) I’ve not missed anything “below” and 2) to prioritize what
to do next (by e.g. aggregating over a build the most textually included
headers).

When I first went to investigate how much time is spent in which header, I placed some DTrace probes inside of clang and aggregated the time spent textually within a file across compiler invocations. See the thread "Some DTrace probes for measuring per-file time." (http://lists.llvm.org/pipermail/cfe-dev/2015-April/042334.html)
The raw data that comes out of that DTrace script is a list of pairs {"/path/to/file", total time spent in this file across all compiler invocations}.
I then looked at the data in this Mathematica notebook: https://drive.google.com/file/d/0B8v10qJ6EXRxTWpMTTBnaERQaVU/view?usp=sharing

Note that in that notebook (one of many) I removed the time spent after parsing (basically, codegen time), so the pie chart at the end is a bit deceptive.
I've attached two pie charts that include the time spent after parsing.
The first is a debug build (low optimization). The latter is a a release build (for a release build, a much larger fraction of time is spent in codegen).

If you don't have DTrace available so that you can directly measure the time, you can probably get a decent idea based on the inclusion counts. One easy way to do this is to tally up files mentioned by the -H option whose output you can massage. There is also '.d' files, but I forget exactly what we emit into them (we may emit header file names even if we didn't textually touch the header, but only loaded its module).

Note that for measuring the time, you need to use a timestamp that is virtualized CPU time. If you use real time then you will spuriously count IO latency and other stuff, which will give wrong results (e.g. the total sum of time will appear much larger than is possible).


I don’t think such a diagnostic/remark exists? I’ve not looked
too deeply, yet, at clang-modularize, so perhaps my answers lie over
there?

We have -Wauto-import which is sort of the opposite of this. Adding the reverse warning "warn me when you included a header but didn't know about it from a module map" could probably be done.

-- Sean Silva
 

On a final note, it’s been remarkably easy to get modules up and running
so kudos to everyone who’s gotten it this far.

-Louis

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Intros, C++ modules, and Facebook

Don Hinton via cfe-dev
In reply to this post by Don Hinton via cfe-dev

From: Sean Silva <[hidden email]>
Date: Friday, October 23, 2015 at 9:35 PM
To: Louis Brandy <[hidden email]>
Cc: "[hidden email]" <[hidden email]>, Richard Smith <[hidden email]>
Subject: Re: [cfe-dev] Intros, C++ modules, and Facebook


If you don't have DTrace available so that you can directly measure the time, you can probably get a decent idea based on the inclusion counts. One easy way to do this is to tally up files mentioned by the -H option whose output you can massage. There is also '.d' files, but I forget exactly what we emit into them (we may emit header file names even if we didn't textually touch the header, but only loaded its module).

Spent some time playing with the different options today and -H actually does approximately what I want (telling me which headers are being textually included). It doesn't seem to emit headers that are pulled from the module, though it will emit it during the module build itself (at least with –Rmodule–build). So if I do a clean build leaving the module cache intact, I'll only get the textually included headers. 

The dep files appear to include all headers, even those pulled modularly, but it also includes the module maps, so in theory with some parsing I could work it out from there as well. I think I'll try to get by with –H for now and see where it leads.

-Louis

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev