Modules TS: binary module interface dependencies

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Modules TS: binary module interface dependencies

Xin Wang via cfe-dev
I am trying to understand how Clang's Modules TS will work so that we
have a general-enough model in the build system.

Consider these two modules and a consumer:

// module core
export module core;
export void f (int);

// module extra
export module extra;
import core;
export inline void g (int x) {f (x);}

// consumer
import extra;
int main () {g ();}

Currently, when compiling the consumer (with -fmodules-ts), Clang only
requires the binary module interface (BMI) for extra. In contrast, VC
and GCC require both extra and core (note: even though core is not
re-exported from extra).

The Clang's model is definitely more desirable from the build system's
perspective, especially for distributed compilation. So I wonder if this
is accidental and will change in the future or if this is something that
Clang is committed to, so to speak?

Here is a more interesting variant of the extra module that highlights
some of the issues to consider:

// module extra
export module extra;
import core;
export template <typename T> void g (T x) {f (x);}

Now f() can only be resolved (via ADL) when g() is instantiated.

Thanks,
Boris
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Modules TS: binary module interface dependencies

Xin Wang via cfe-dev


On Tue, Jun 27, 2017 at 2:53 AM Boris Kolpackov via cfe-dev <[hidden email]> wrote:
I am trying to understand how Clang's Modules TS will work so that we
have a general-enough model in the build system.

Consider these two modules and a consumer:

// module core
export module core;
export void f (int);

// module extra
export module extra;
import core;
export inline void g (int x) {f (x);}

// consumer
import extra;
int main () {g ();}

Currently, when compiling the consumer (with -fmodules-ts), Clang only
requires the binary module interface (BMI) for extra. In contrast, VC
and GCC require both extra and core (note: even though core is not
re-exported from extra).

The Clang's model is definitely more desirable from the build system's
perspective, especially for distributed compilation. So I wonder if this
is accidental and will change in the future or if this is something that
Clang is committed to, so to speak?

I believe this functionality is intentional (after deploying pre-TS modules at Google & finding the number of files necessary without this feature to be problematic (maybe hitting issues with command line length? Not sure what the particular constraint was that it was running up against))

CC'd Richard to correct/clarify/etc
 

Here is a more interesting variant of the extra module that highlights
some of the issues to consider:

// module extra
export module extra;
import core;
export template <typename T> void g (T x) {f (x);}

Now f() can only be resolved (via ADL) when g() is instantiated.

Thanks,
Boris
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Modules TS: binary module interface dependencies

Xin Wang via cfe-dev
In reply to this post by Xin Wang via cfe-dev
On 27 June 2017 at 02:53, Boris Kolpackov via cfe-dev <[hidden email]> wrote:
I am trying to understand how Clang's Modules TS will work so that we
have a general-enough model in the build system.

Consider these two modules and a consumer:

// module core
export module core;
export void f (int);

// module extra
export module extra;
import core;
export inline void g (int x) {f (x);}

// consumer
import extra;
int main () {g ();}

Currently, when compiling the consumer (with -fmodules-ts), Clang only
requires the binary module interface (BMI) for extra. In contrast, VC
and GCC require both extra and core (note: even though core is not
re-exported from extra).

Assuming by BMI you mean the .pcm file, this is not true. Clang requires both the core and extra .pcm files in this case. However, we found it extremely impractical to explicitly pass all such .pcm files to a compilation (and indeed on large projects doing so caused us to hit command line length limits and generally produce *highly* unwieldy command lines), so we do not require .pcm's that are reachable through the dependencies of another .pcm file to be explicitly passed to the compiler -- each .pcm stores names and relative paths to its dependencies, and we load those dependencies as part of loading the .pcm itself.
 
The Clang's model is definitely more desirable from the build system's
perspective, especially for distributed compilation. So I wonder if this
is accidental and will change in the future or if this is something that
Clang is committed to, so to speak?

Here is a more interesting variant of the extra module that highlights
some of the issues to consider:

// module extra
export module extra;
import core;
export template <typename T> void g (T x) {f (x);}

Now f() can only be resolved (via ADL) when g() is instantiated.

Thanks,
Boris
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Modules TS: binary module interface dependencies

Xin Wang via cfe-dev
In reply to this post by Xin Wang via cfe-dev
Boris Kolpackov via cfe-dev wrote:

> I am trying to understand how Clang's Modules TS will work so that we
> have a general-enough model in the build system.

You can find some more attempt to understand the impact of modules on
buildsystems here:

 https://groups.google.com/a/isocpp.org/forum/?fromgroups#!topic/modules/sDIYoU8Uljw

Maybe it's useful to you.

Thanks,

Steve.



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Modules TS: binary module interface dependencies

Xin Wang via cfe-dev
In reply to this post by Xin Wang via cfe-dev
Richard Smith <[hidden email]> writes:

> Assuming by BMI you mean the .pcm file, this is not true. Clang requires
> both the core and extra .pcm files in this case. However, we found it
> extremely impractical to explicitly pass all such .pcm files to a
> compilation (and indeed on large projects doing so caused us to hit command
> line length limits and generally produce *highly* unwieldy command lines),
> so we do not require .pcm's that are reachable through the dependencies of
> another .pcm file to be explicitly passed to the compiler -- each .pcm
> stores names and relative paths to its dependencies, and we load those
> dependencies as part of loading the .pcm itself.

Got (and tested) it, thanks. I suppose there is no reason for you to
deviate from this once you support module re-export (export import M;)
even though, in a sense, re-export is as-if injecting an implicit import
into the consumer's translation unit?

One thing I noticed is that there is no way to override this embedded
path, at least not with -fmodule-file. This could be useful for
distributed compilation since otherwise the build system will have
to recreate the directory structure on the remote host.

Would there be interest in having a low-level option that specifies
the exact module name to module .pcm mapping and, perhaps, a second
one that can read such mappings from a file? They will then override
module file references in .pcm's.

Thanks,
Boris
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Modules TS: binary module interface dependencies

Xin Wang via cfe-dev
On 28 June 2017 at 00:55, Boris Kolpackov <[hidden email]> wrote:
Richard Smith <[hidden email]> writes:

> Assuming by BMI you mean the .pcm file, this is not true. Clang requires
> both the core and extra .pcm files in this case. However, we found it
> extremely impractical to explicitly pass all such .pcm files to a
> compilation (and indeed on large projects doing so caused us to hit command
> line length limits and generally produce *highly* unwieldy command lines),
> so we do not require .pcm's that are reachable through the dependencies of
> another .pcm file to be explicitly passed to the compiler -- each .pcm
> stores names and relative paths to its dependencies, and we load those
> dependencies as part of loading the .pcm itself.

Got (and tested) it, thanks. I suppose there is no reason for you to
deviate from this once you support module re-export (export import M;)
even though, in a sense, re-export is as-if injecting an implicit import
into the consumer's translation unit?

Right. From the point of view of a user of the re-exporting module, they don't depend on M, so they should not need to specify a .pcm file for M.

One thing I noticed is that there is no way to override this embedded
path, at least not with -fmodule-file. This could be useful for
distributed compilation since otherwise the build system will have
to recreate the directory structure on the remote host.

So far, we've not seen this be a problem in practice across the (small) number of build systems where we've implemented support for explicit module builds. If this is a problem for your build system, we can certainly look at adding support for overriding this.

Would there be interest in having a low-level option that specifies
the exact module name to module .pcm mapping and, perhaps, a second
one that can read such mappings from a file? They will then override
module file references in .pcm's.

I don't think we need a mapping mechanism; giving us the module files on the command line in topological order should suffice. If we've already been handed a module file for module X, and then we load a module file for module Y that depends on X, we can simply ignore the path specified in Y's .pcm and just use the existing X .pcm. (We'd still perform the check that the X .pcm is the same as the one that Y was built against in this case.)

We could actually build the topological ordering ourselves, but that would require a two-pass approach for loading .pcm files; passing this burden on to the build system seems like the better tradeoff.

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Modules TS: binary module interface dependencies

Xin Wang via cfe-dev
Richard Smith <[hidden email]> writes:

> I don't think we need a mapping mechanism; giving us the module files on
> the command line in topological order should suffice. If we've already been
> handed a module file for module X, and then we load a module file for
> module Y that depends on X, we can simply ignore the path specified in Y's
> .pcm and just use the existing X .pcm. (We'd still perform the check that
> the X .pcm is the same as the one that Y was built against in this case.)

I've done some testing and this is not how it works today. Perhaps you
meant it in the "could be done this way" sense.

BTW, I've also tested moving the entire build directory somewhere else
to check if .pcm's store relative paths to each other. This does not
appear to work either:

fatal error: malformed or corrupted AST file: 'SourceLocation remap refers to unknown module, cannot find core.pcm'

On the more fundamental level, this still poses a problem if the build
system needs to re-map all the .pcm files (e.g., for a distributed build):
we will still have crazy-long command lines and may hit the command line
limits. So, at a minimum, we seem to need a way to load the list of modules
from a file.

Now, for why we may want a mapping, not just a list of .pcm's: if the list
of .pcm's is stored in a file, then chances are some build systems will
opt to have one file per project (or some similar granularity) rather
than per translation unit. Which means not all listed .pcm's will be
needed during every compilation. If it's only a list of .pcm's, then
Clang will have to at least read each file, which seems like a waste.


> We could actually build the topological ordering ourselves, but that would
> require a two-pass approach for loading .pcm files; passing this burden on
> to the build system seems like the better tradeoff.

I agree. Though requiring a sorted list of modules doesn't make build
system's life any easier, especially if it wants to weed out duplicates
(to keep the command line as tidy as possible) and not allocate any
extra memory while doing it.

Boris
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Modules TS: binary module interface dependencies

Xin Wang via cfe-dev
On 29 Jun 2017 12:13 am, "Boris Kolpackov" <[hidden email]> wrote:
Richard Smith <[hidden email]> writes:

> I don't think we need a mapping mechanism; giving us the module files on
> the command line in topological order should suffice. If we've already been
> handed a module file for module X, and then we load a module file for
> module Y that depends on X, we can simply ignore the path specified in Y's
> .pcm and just use the existing X .pcm. (We'd still perform the check that
> the X .pcm is the same as the one that Y was built against in this case.)

I've done some testing and this is not how it works today. Perhaps you
meant it in the "could be done this way" sense.

Yes, this is a "we can" rather than a "we already do". Sorry that wasn't clear.

BTW, I've also tested moving the entire build directory somewhere else
to check if .pcm's store relative paths to each other. This does not
appear to work either:

fatal error: malformed or corrupted AST file: 'SourceLocation remap refers to unknown module, cannot find core.pcm'

We've talked about making this kind of relocation easier by allowing the module source directory and the build directory to be relocated independently (right now you need to relocate everything together -- sources, .pcm's, working directory).

On the more fundamental level, this still poses a problem if the build
system needs to re-map all the .pcm files (e.g., for a distributed build):
we will still have crazy-long command lines and may hit the command line
limits. So, at a minimum, we seem to need a way to load the list of modules
from a file.

Clang does support specifying @file on the command line to take arguments from a file, which should at least evade the command line length limit.

Now, for why we may want a mapping, not just a list of .pcm's: if the list
of .pcm's is stored in a file, then chances are some build systems will
opt to have one file per project (or some similar granularity) rather
than per translation unit. Which means not all listed .pcm's will be
needed during every compilation. If it's only a list of .pcm's, then
Clang will have to at least read each file, which seems like a waste.

True. At that point I think you'd be better off with a directory of .pcm files following a naming convention rather than providing the compiler with a (potentially very large) set of mappings (and we already support something like that). But allowing an explicit mapping to be specified would also be fine if people would actually use that facility.

> We could actually build the topological ordering ourselves, but that would
> require a two-pass approach for loading .pcm files; passing this burden on
> to the build system seems like the better tradeoff.

I agree. Though requiring a sorted list of modules doesn't make build
system's life any easier, especially if it wants to weed out duplicates
(to keep the command line as tidy as possible) and not allocate any
extra memory while doing it.

Granted. Our design right now is pretty strongly tied to having loaded all dependency modules before loading a dependent module, though, so we need that complexity somewhere.

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Modules TS: binary module interface dependencies

Xin Wang via cfe-dev
Richard Smith <[hidden email]> writes:

> We've talked about making this kind of relocation easier by allowing the
> module source directory and the build directory to be relocated
> independently (right now you need to relocate everything together --
> sources, .pcm's, working directory).

That's what I did and got the above-mentioned error.


> At that point I think you'd be better off with a directory of .pcm files
> following a naming convention rather than providing the compiler with
> a (potentially very large) set of mappings (and we already support
> something like that).

Here is a concrete scenarios I am thinking about: I want to implement
distributed compilation that supports modules. Which means that
besides the translation unit itself, the build system also needs
to ship .pcm's of all the modules that this TU imports (transitively).

In itself, this is not a problem: the build system needs to make sure
that these .pcm's are all up-to-date before it can invoke the compiler.
So it got to know the paths to all the .pcm's which, in case of build2,
are spread out across various project directories (since we try to re-
use already compiled .pcm from projects that we import).

For distributed compilation we want to minimize the amount of stuff we
copy back and forth so it makes sense to cache .pcm's on the build
slaves (the same .pcm is likely to be used by multiple TUs). So on
the build slave I would store a list of .pcm files, their hashes,
and their module names. Since the same module can be compiled with
different options and result in a different .pcm/hash, I would use
the hash as the file name to store .pcm's on the slave (i.e., content-
addressable storage).

With this pretty straightforward setup, when time come to compile
a TU, all I need is to somehow communicate to the compiler the
mapping of module names to these hash-named .pcm's. If there were
a way to provide this mapping in a file, I would be all set.

With the directory approach, I would need to create a temporary
directory and populate it with appropriately-named symlinks (or
copies in case of Windows) of .pcm files. While not particularly
hard, it sure feels unnecessary. I would definitely try to avoid
doing this for local compilations which means I will have two
different ways of invoking the compiler depending on whether it
is remote or local. And it is still not clear to me how this will
override embedded .pcm references.


> But allowing an explicit mapping to be specified would also be fine
> if people would actually use that facility.

I will use it in build2. And I am willing to try to implement it.


> Our design right now is pretty strongly tied to having loaded all
> dependency modules before loading a dependent module, though, so
> we need that complexity somewhere.

I don't think we will need it with the mapping approach: we will have
a map of module names to file names, probably in HeaderSearchOptions
next to PrebuiltModulePaths -- in a sense it will be another module
search mechanism that will be tried before prebuilt paths (in
HeaderSearch::getModuleFileName()).

This map will be populated before we actually load any modules so
the order in which one specifies the mapping is not important
(except for overriding). I will probably need to add some extra
code to consult this map when resolving embedded .pcm references,
though.

And we could also keep updating this map when loading modules via
other means (e.g., with -fmodule-file) which will give us the
override behavior we discussed earlier (I won't need this
functionality in build2 but could implement it if others think
it would useful).

If this sounds reasonable, I can give it a go.

Thanks,
Boris
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Modules TS: binary module interface dependencies

Xin Wang via cfe-dev
On 29 June 2017 at 04:46, Boris Kolpackov <[hidden email]> wrote:
Richard Smith <[hidden email]> writes:

> We've talked about making this kind of relocation easier by allowing the
> module source directory and the build directory to be relocated
> independently (right now you need to relocate everything together --
> sources, .pcm's, working directory).

That's what I did and got the above-mentioned error.

Hmm, that could well be a bug, then. Do you by any chance have steps to reproduce this?

> At that point I think you'd be better off with a directory of .pcm files
> following a naming convention rather than providing the compiler with
> a (potentially very large) set of mappings (and we already support
> something like that).

Here is a concrete scenarios I am thinking about: I want to implement
distributed compilation that supports modules. Which means that
besides the translation unit itself, the build system also needs
to ship .pcm's of all the modules that this TU imports (transitively).

In itself, this is not a problem: the build system needs to make sure
that these .pcm's are all up-to-date before it can invoke the compiler.
So it got to know the paths to all the .pcm's which, in case of build2,
are spread out across various project directories (since we try to re-
use already compiled .pcm from projects that we import).

For distributed compilation we want to minimize the amount of stuff we
copy back and forth so it makes sense to cache .pcm's on the build
slaves (the same .pcm is likely to be used by multiple TUs). So on
the build slave I would store a list of .pcm files, their hashes,
and their module names. Since the same module can be compiled with
different options and result in a different .pcm/hash, I would use
the hash as the file name to store .pcm's on the slave (i.e., content-
addressable storage).

With this pretty straightforward setup, when time come to compile
a TU, all I need is to somehow communicate to the compiler the
mapping of module names to these hash-named .pcm's. If there were
a way to provide this mapping in a file, I would be all set.

For what it's worth, this setup with named symlinks (whose names are stable across all builds) is how our (Google's) internal build system handles this.

With the directory approach, I would need to create a temporary
directory and populate it with appropriately-named symlinks (or
copies in case of Windows) of .pcm files. While not particularly
hard, it sure feels unnecessary. I would definitely try to avoid
doing this for local compilations which means I will have two
different ways of invoking the compiler depending on whether it
is remote or local.

Because you don't use the content-addressed system locally? What we do is to use symlinks for remote compilations and just put the files in the "right" places locally, so the file system looks the same either way.
 
And it is still not clear to me how this will
override embedded .pcm references.

I don't think it would, but if the paths to dependencies are always the same, you shouldn't need to override any of those references.

> But allowing an explicit mapping to be specified would also be fine
> if people would actually use that facility.

I will use it in build2. And I am willing to try to implement it.

OK :)

> Our design right now is pretty strongly tied to having loaded all
> dependency modules before loading a dependent module, though, so
> we need that complexity somewhere.

I don't think we will need it with the mapping approach: we will have
a map of module names to file names, probably in HeaderSearchOptions
next to PrebuiltModulePaths -- in a sense it will be another module
search mechanism that will be tried before prebuilt paths (in
HeaderSearch::getModuleFileName()).

This map will be populated before we actually load any modules so
the order in which one specifies the mapping is not important
(except for overriding). I will probably need to add some extra
code to consult this map when resolving embedded .pcm references,
though.

And we could also keep updating this map when loading modules via
other means (e.g., with -fmodule-file) which will give us the
override behavior we discussed earlier (I won't need this
functionality in build2 but could implement it if others think
it would useful).

If this sounds reasonable, I can give it a go.

Sure. I think my only remaining concerns are:

1) this is likely to end up with a set of command line arguments that grows linearly with the total number of modules in the project, and you're likely to find the build system needs or wants to prune the list down to just the dependencies anyway
2) we can't do any validation that the command line arguments are reasonable if the corresponding module is not used (we don't want to stat a large number of .pcm files if most of them are not going to be used, and definitely don't want to read the file header to find if it names the right module)

I don't think (2) is really a big deal, though, since we'll get at least a "file not found" error if the module is actually used by the compilation. And (1) is ultimately your problem as the build system maintainer, not ours. ;-)

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Modules TS: binary module interface dependencies

Xin Wang via cfe-dev
Richard Smith <[hidden email]> writes:

> Do you by any chance have steps to reproduce this?

This is with 5.0.0-svn305177-1~exp1 (trunk):

mkdir /tmp/test
cd /tmp/test

cat >core.mxx <<EOF
export module core;
export void f ();
EOF

cat >extra.mxx <<EOF
export module extra;
import core;
EOF

cat >driver.cxx <<EOF
import extra;
int main () {}
EOF

clang++-5.0 -std=c++1z -fmodules-ts -o core.pcm --precompile -Xclang -fmodules-embed-all-files -Xclang -fmodules-codegen -Xclang -fmodules-debuginfo -x c++-module core.mxx
clang++-5.0 -std=c++1z -fmodules-ts -fmodule-file=core.pcm -o extra.pcm --precompile -Xclang -fmodules-embed-all-files -Xclang -fmodules-codegen -Xclang -fmodules-debuginfo -x c++-module extra.mxx
clang++-5.0 -std=c++1z -fmodules-ts -fmodule-file=extra.pcm -o driver.o -c driver.cxx

cd ..
mv test ~/
cd ~/test

clang++-5.0 -std=c++1z -fmodules-ts -fmodule-file=extra.pcm -o driver.o -c driver.cxx
fatal error: module file '/tmp/test/core.pcm' not found: module file not found
note: imported by module 'extra' in 'extra.pcm'
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Modules TS: binary module interface dependencies

Xin Wang via cfe-dev
In reply to this post by Xin Wang via cfe-dev
Richard Smith <[hidden email]> writes:

> Because you don't use the content-addressed system locally? What we do is
> to use symlinks for remote compilations and just put the files in the
> "right" places locally, so the file system looks the same either way.

Locally we use things as arranged by the user. For example, if project
bar uses libfoo that contains libfoo/foo.pcm, then we will (try to)
use this libfoo/foo.pcm where the user built it.

On the build slave, however, we may be building things for multiple
projects simultaneously and each may have its own foo.pcm. So here
we will call it something like modules/93255[...]5f6db7.pcm.

If I am able to specify the module-name to module-file mapping, I
will be doing essentially the same thing locally and remotely:

-fmodule-blah=foo=libfoo/foo.pcm

-fmodule-blah=foo=modules/93255[...]5f6db7.pcm


> 1) this is likely to end up with a set of command line arguments that grows
> linearly with the total number of modules in the project, and you're likely
> to find the build system needs or wants to prune the list down to just the
> dependencies anyway
> 2) we can't do any validation that the command line arguments are
> reasonable if the corresponding module is not used (we don't want to stat a
> large number of .pcm files if most of them are not going to be used, and
> definitely don't want to read the file header to find if it names the right
> module)
>
> I don't think (2) is really a big deal, though, since we'll get at least a
> "file not found" error if the module is actually used by the compilation.

Agree. It will either be detected at some point or it will be harmless.


> And (1) is ultimately your problem as the build system maintainer, not
> ours. ;-)

Agree. Also, my plan is to have two options: one to specify the mapping
on the command line (one entry at a time) and the other to read it from
a file. So the file option will help build systems that, for example,
want to specify a single (and potentially large) mapping file per project
or some such.

Which brings me to the most difficult part: choosing option names that
everyone likes ;-). And, BTW, I am hoping to implement the same in GCC
with the same names.

So we are looking for two options, one to specify a mapping entry and
the other to specify a mapping file with multiple entries:

-fmodule-blah=<name>=<file> | -fmodule-blah-blah=<file>

Here is what I came up with:

(1) -fmodule=      | -fmodule-map=
(2) -fmodule-map=  | -fmodule-map-file=
(3) -fmodule-loc=  | -fmodule-loc-file=
(4) -fmodmap=      | -fmodmap-file=

1. While nice and short, the use of -fmodule might be too close to
   -fmodules. On the other hand, these options will normally be used
   by build systems (the user will just use -fmodule-file) so probably
   not a major issue.

2. These are nice except -fmodule-map-file is already used. One way
   to resolve this would be to "overload" -fmodule-map-file to mean
   something different in the -fmodules-ts mode. Though I suspect its
   current meaning could be useful even in -fmodules-ts.

3. This is an attempt at using something other than 'map'. It has a
   nice property of suggesting that specifying these options doesn't
   actually cause the modules to be loaded.

4. Another play on the 'map' theme. I think it will be hard to sell
   to the GCC folks since they don't have the -fmodule-map-file issue.

Any preferences/suggestions? My favorite is (1).

Thanks,
Boris
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Modules TS: binary module interface dependencies

Xin Wang via cfe-dev
On 30 June 2017 at 06:39, Boris Kolpackov <[hidden email]> wrote:
Richard Smith <[hidden email]> writes:

> Because you don't use the content-addressed system locally? What we do is
> to use symlinks for remote compilations and just put the files in the
> "right" places locally, so the file system looks the same either way.

Locally we use things as arranged by the user. For example, if project
bar uses libfoo that contains libfoo/foo.pcm, then we will (try to)
use this libfoo/foo.pcm where the user built it.

On the build slave, however, we may be building things for multiple
projects simultaneously and each may have its own foo.pcm. So here
we will call it something like modules/93255[...]5f6db7.pcm.

If I am able to specify the module-name to module-file mapping, I
will be doing essentially the same thing locally and remotely:

-fmodule-blah=foo=libfoo/foo.pcm

-fmodule-blah=foo=modules/93255[...]5f6db7.pcm


> 1) this is likely to end up with a set of command line arguments that grows
> linearly with the total number of modules in the project, and you're likely
> to find the build system needs or wants to prune the list down to just the
> dependencies anyway
> 2) we can't do any validation that the command line arguments are
> reasonable if the corresponding module is not used (we don't want to stat a
> large number of .pcm files if most of them are not going to be used, and
> definitely don't want to read the file header to find if it names the right
> module)
>
> I don't think (2) is really a big deal, though, since we'll get at least a
> "file not found" error if the module is actually used by the compilation.

Agree. It will either be detected at some point or it will be harmless.


> And (1) is ultimately your problem as the build system maintainer, not
> ours. ;-)

Agree. Also, my plan is to have two options: one to specify the mapping
on the command line (one entry at a time) and the other to read it from
a file. So the file option will help build systems that, for example,
want to specify a single (and potentially large) mapping file per project
or some such.

Which brings me to the most difficult part: choosing option names that
everyone likes ;-). And, BTW, I am hoping to implement the same in GCC
with the same names.

So we are looking for two options, one to specify a mapping entry and
the other to specify a mapping file with multiple entries:

-fmodule-blah=<name>=<file> | -fmodule-blah-blah=<file>

Here is what I came up with:

(1) -fmodule=      | -fmodule-map=
(2) -fmodule-map=  | -fmodule-map-file=
(3) -fmodule-loc=  | -fmodule-loc-file=
(4) -fmodmap=      | -fmodmap-file=

1. While nice and short, the use of -fmodule might be too close to
   -fmodules. On the other hand, these options will normally be used
   by build systems (the user will just use -fmodule-file) so probably
   not a major issue.

2. These are nice except -fmodule-map-file is already used. One way
   to resolve this would be to "overload" -fmodule-map-file to mean
   something different in the -fmodules-ts mode. Though I suspect its
   current meaning could be useful even in -fmodules-ts.

3. This is an attempt at using something other than 'map'. It has a
   nice property of suggesting that specifying these options doesn't
   actually cause the modules to be loaded.

4. Another play on the 'map' theme. I think it will be hard to sell
   to the GCC folks since they don't have the -fmodule-map-file issue.

Any preferences/suggestions? My favorite is (1).

-fmodule= is a little too nonspecific for my tastes; I'd expect this to do what clang's -fmodule-name= does (that is, specify the name of the current module) before I'd expect it to specify an external module file's path.

How about something like -fmodule-file-<name>=path?

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Modules TS: binary module interface dependencies

Xin Wang via cfe-dev
Richard Smith <[hidden email]> writes:

> > -fmodule-blah=<name>=<file> | -fmodule-blah-blah=<file>
> >
> > Here is what I came up with:
> >
> > (1) -fmodule=      | -fmodule-map=
> > (2) -fmodule-map=  | -fmodule-map-file=
> > (3) -fmodule-loc=  | -fmodule-loc-file=
> > (4) -fmodmap=      | -fmodmap-file=
> >
> > 1. While nice and short, the use of -fmodule might be too close to
> >    -fmodules. On the other hand, these options will normally be used
> >    by build systems (the user will just use -fmodule-file) so probably
> >    not a major issue.
> >
> > 2. These are nice except -fmodule-map-file is already used. One way
> >    to resolve this would be to "overload" -fmodule-map-file to mean
> >    something different in the -fmodules-ts mode. Though I suspect its
> >    current meaning could be useful even in -fmodules-ts.
> >
> > 3. This is an attempt at using something other than 'map'. It has a
> >    nice property of suggesting that specifying these options doesn't
> >    actually cause the modules to be loaded.
> >
> > 4. Another play on the 'map' theme. I think it will be hard to sell
> >    to the GCC folks since they don't have the -fmodule-map-file issue.
> >
>
> -fmodule= is a little too nonspecific for my tastes; I'd expect this to do
> what clang's -fmodule-name= does (that is, specify the name of the current
> module) before I'd expect it to specify an external module file's path.

On the other hand, -fmodule=<name>=<file> describes the module completely
(name and .pcm) while -fmodule-name and -fmodule-file are sub-components
(thought in slightly different contexts). But I agree, -fmodule is probably
too terse.


> How about something like -fmodule-file-<name>=path?

Is this really -fmodule-file-<name> (as in -fmodule-file-foo.core=core.pcm)
or was it supposed to be '=' (as in -fmodule-file=[<name>=]<file>)?

I think the former is too unconventional and will be hard to support
in most option parsers (I know for sure GCC will be a pain).

I like the latter, that is, "extend" -fmodule-file with optional module
name. The semantics, as I understand it, will be a bit different though:
-fmodule-file=<file> will cause the module to be loaded while
-fmodule-file=<name>=<file> only makes the location of the module known.
But I don't think the difference will be observable by the end user (i.e.,
loading a module that is not imported does not change anything)?

If we go with -fmodule-file=[<name>=]<file> then the second options will
naturally be -fmodule-file-map=<file>. I like it.

Boris
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Loading...