[RFC] C++20 modules dependency discovery

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[RFC] C++20 modules dependency discovery

Gavin Cui via cfe-dev
C++20 is coming and we need to decide how clang will handle dependency discovery for modules.  In the following, module means compiled C++20 module interface unit, and I will use header unit to refer to the thing generated by a clang module map.

There are two different modes we care about when it comes to module dependencies:  implicit and explicit.

Implicit Modules
================

For implicit modules the build system doesn’t know anything about them, and thus can’t care about any intermediate files.  It needs to know about all source files that if changed should cause a rebuild of this translation unit.

For this case clang needs to output the full transitive set of dependencies, excluding any intermediate temporaries.  This also means that we can’t get the full set of dependencies without actually at least preprocessing every module transitively referenced.  This means that `-E -MD` should fail if it can’t find a module or header unit.

Explicit Modules
================

For explicit modules we only need to know the direct dependencies, as the build system will handle the transitive set.

For preprocessing we still need to import header units (but only their preprocessor state), but not normal modules.  For this case it’s ok if `-E -MD` fails to find a module.  But it does still need to be able to find header units and module maps.  Additionally the normal Make output syntax is not sufficient to represent the needed information unless the driver decides how modules and header units should be built and where intermediate files should go.  There’s currently a json format working its way through the tooling subgroup of the standards committee that I think we should adopt for this.

I think we need separate modes in clang for these along with support for scanning through header units without actually building a clang module for them. clang-scan-deps will make use of the explicit mode.  The question I have is how should we select this mode, and what clang options do we need to add?

Proposal
========

As a rough idea I propose the following:

* `-M?` means output the json format which can correctly represent dependencies on a module for which we don’t know what the final file path will be.
* `clang++ -std=c++20 -E -MD -fimplicit-header-units` should implicitly find header unit sources, but not modules (as we've not given it any way to look up how to build modules).
    * This means that the dep file will contain a bunch of `.h`s, `.modulemap`s, and any `.pcm`s explicitly listed on the command line.
    * This also means erroring on unknown imported modules as we don't know what to put in the dep file for them.
* `clang++ -std=c++20 -E -MD -fimplicit-header-units -fimplicit-module-lookup=?`  should do the same as the above, except that it does know how to find modules, and should list all of the transitive dependencies of any modules it finds.
* `clang++ -std=c++20 -E -MD` should fail if it hits a module or header unit, and should never do implicit lookup.
* `clang++ -std=c++20 -E -M?` should scan through header units without actually building clang modules for them (to get the macros it needs), and should note all module imports.
    * This means that the dep file will contain only `.h`s that it includes, and use the json representation of header units and modules.
    * It will also be shallow, with only direct dependencies.

Additionally, we should (eventually) make:

`$ clang++ -std=c++20 a.cpp b.cpp c.cpp a.cppm -o program`

Work without a build system, even in the presence of modules.  To do this we will need to prescan the files to determine the module dependencies between them and then build them in dependency order.  This does mean adding a (simple) build system to the driver (maybe [llbuild](https://github.com/apple/swift-llbuild)?), but I think it’s worth it to make simple cases simple.  It may also make sense to actually push this work out to a real build system.  For example have clang write a temporary ninja file and invoke ninja to perform the build.

- Michael Spencer


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] C++20 modules dependency discovery

Gavin Cui via cfe-dev


On 8/12/19 8:37 PM, Michael Spencer via cfe-dev wrote:
C++20 is coming and we need to decide how clang will handle dependency discovery for modules.  In the following, module means compiled C++20 module interface unit, and I will use header unit to refer to the thing generated by a clang module map.

There are two different modes we care about when it comes to module dependencies:  implicit and explicit.

Implicit Modules
================

For implicit modules the build system doesn’t know anything about them, and thus can’t care about any intermediate files.  It needs to know about all source files that if changed should cause a rebuild of this translation unit.

For this case clang needs to output the full transitive set of dependencies, excluding any intermediate temporaries.  This also means that we can’t get the full set of dependencies without actually at least preprocessing every module transitively referenced.  This means that `-E -MD` should fail if it can’t find a module or header unit.

Explicit Modules
================

For explicit modules we only need to know the direct dependencies, as the build system will handle the transitive set.

For preprocessing we still need to import header units (but only their preprocessor state), but not normal modules.  For this case it’s ok if `-E -MD` fails to find a module.  But it does still need to be able to find header units and module maps.  Additionally the normal Make output syntax is not sufficient to represent the needed information unless the driver decides how modules and header units should be built and where intermediate files should go.  There’s currently a json format working its way through the tooling subgroup of the standards committee that I think we should adopt for this.


I don't object to supporting the json format, but are there defaults that would make sense? Maybe using the preprocessor state implied by the current command-line options and putting intermediate files / interface files in the current directory, or in TMDIR/.clang/<hash of path>, or something else? We'd need defaults for your `-M?` below anyway?

Also, does finding a module involve matching a cppm file with compatible preprocessor state, or is it just by name?



I think we need separate modes in clang for these along with support for scanning through header units without actually building a clang module for them. clang-scan-deps will make use of the explicit mode.  The question I have is how should we select this mode, and what clang options do we need to add?

Proposal
========

As a rough idea I propose the following:

* `-M?` means output the json format which can correctly represent dependencies on a module for which we don’t know what the final file path will be.
* `clang++ -std=c++20 -E -MD -fimplicit-header-units` should implicitly find header unit sources, but not modules (as we've not given it any way to look up how to build modules).
    * This means that the dep file will contain a bunch of `.h`s, `.modulemap`s, and any `.pcm`s explicitly listed on the command line.
    * This also means erroring on unknown imported modules as we don't know what to put in the dep file for them.
* `clang++ -std=c++20 -E -MD -fimplicit-header-units -fimplicit-module-lookup=?`  should do the same as the above, except that it does know how to find modules, and should list all of the transitive dependencies of any modules it finds.
* `clang++ -std=c++20 -E -MD` should fail if it hits a module or header unit, and should never do implicit lookup.
* `clang++ -std=c++20 -E -M?` should scan through header units without actually building clang modules for them (to get the macros it needs), and should note all module imports.
    * This means that the dep file will contain only `.h`s that it includes, and use the json representation of header units and modules.
    * It will also be shallow, with only direct dependencies.

Additionally, we should (eventually) make:

`$ clang++ -std=c++20 a.cpp b.cpp c.cpp a.cppm -o program`

Work without a build system, even in the presence of modules.  To do this we will need to prescan the files to determine the module dependencies between them and then build them in dependency order.  This does mean adding a (simple) build system to the driver (maybe [llbuild](https://github.com/apple/swift-llbuild)?), but I think it’s worth it to make simple cases simple.  It may also make sense to actually push this work out to a real build system.  For example have clang write a temporary ninja file and invoke ninja to perform the build.


In the name of making simple cases simple, trying to hand this off to an external build system seems fragile and, perhaps, over complicated. Performing a topological sort of the inputs with their dependencies and processing in that order seems relatively straightforward.

Thanks again,

Hal



- Michael Spencer


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] C++20 modules dependency discovery

Gavin Cui via cfe-dev
On Tue, Aug 13, 2019 at 1:52 AM Finkel, Hal J. <[hidden email]> wrote:


On 8/12/19 8:37 PM, Michael Spencer via cfe-dev wrote:
C++20 is coming and we need to decide how clang will handle dependency discovery for modules.  In the following, module means compiled C++20 module interface unit, and I will use header unit to refer to the thing generated by a clang module map.

There are two different modes we care about when it comes to module dependencies:  implicit and explicit.

Implicit Modules
================

For implicit modules the build system doesn’t know anything about them, and thus can’t care about any intermediate files.  It needs to know about all source files that if changed should cause a rebuild of this translation unit.

For this case clang needs to output the full transitive set of dependencies, excluding any intermediate temporaries.  This also means that we can’t get the full set of dependencies without actually at least preprocessing every module transitively referenced.  This means that `-E -MD` should fail if it can’t find a module or header unit.

Explicit Modules
================

For explicit modules we only need to know the direct dependencies, as the build system will handle the transitive set.

For preprocessing we still need to import header units (but only their preprocessor state), but not normal modules.  For this case it’s ok if `-E -MD` fails to find a module.  But it does still need to be able to find header units and module maps.  Additionally the normal Make output syntax is not sufficient to represent the needed information unless the driver decides how modules and header units should be built and where intermediate files should go.  There’s currently a json format working its way through the tooling subgroup of the standards committee that I think we should adopt for this.


I don't object to supporting the json format, but are there defaults that would make sense? Maybe using the preprocessor state implied by the current command-line options and putting intermediate files / interface files in the current directory, or in TMDIR/.clang/<hash of path>, or something else? We'd need defaults for your `-M?` below anyway?

The json format doesn't include pcm paths.  It just says which source files provide which modules, and what modules and header units each source file imports.  It's up to the build system to construct an actual build.  The other issue with -MD is that I believe tools that use `.d` files wouldn't even be able to handle a `.d` that included actual commands.
 

Also, does finding a module involve matching a cppm file with compatible preprocessor state, or is it just by name?

It's just by name.  The assumption here is that you have a compilation database or similar and thus know the command line options passed to every source file.
 



I think we need separate modes in clang for these along with support for scanning through header units without actually building a clang module for them. clang-scan-deps will make use of the explicit mode.  The question I have is how should we select this mode, and what clang options do we need to add?

Proposal
========

As a rough idea I propose the following:

* `-M?` means output the json format which can correctly represent dependencies on a module for which we don’t know what the final file path will be.
* `clang++ -std=c++20 -E -MD -fimplicit-header-units` should implicitly find header unit sources, but not modules (as we've not given it any way to look up how to build modules).
    * This means that the dep file will contain a bunch of `.h`s, `.modulemap`s, and any `.pcm`s explicitly listed on the command line.
    * This also means erroring on unknown imported modules as we don't know what to put in the dep file for them.
* `clang++ -std=c++20 -E -MD -fimplicit-header-units -fimplicit-module-lookup=?`  should do the same as the above, except that it does know how to find modules, and should list all of the transitive dependencies of any modules it finds.
* `clang++ -std=c++20 -E -MD` should fail if it hits a module or header unit, and should never do implicit lookup.
* `clang++ -std=c++20 -E -M?` should scan through header units without actually building clang modules for them (to get the macros it needs), and should note all module imports.
    * This means that the dep file will contain only `.h`s that it includes, and use the json representation of header units and modules.
    * It will also be shallow, with only direct dependencies.

Additionally, we should (eventually) make:

`$ clang++ -std=c++20 a.cpp b.cpp c.cpp a.cppm -o program`

Work without a build system, even in the presence of modules.  To do this we will need to prescan the files to determine the module dependencies between them and then build them in dependency order.  This does mean adding a (simple) build system to the driver (maybe [llbuild](https://github.com/apple/swift-llbuild)?), but I think it’s worth it to make simple cases simple.  It may also make sense to actually push this work out to a real build system.  For example have clang write a temporary ninja file and invoke ninja to perform the build.


In the name of making simple cases simple, trying to hand this off to an external build system seems fragile and, perhaps, over complicated. Performing a topological sort of the inputs with their dependencies and processing in that order seems relatively straightforward.


Generating a Ninja file is pretty trivial, but it may well end up being simpler to just run the build in the driver.  My near term goals don't really involve solving this problem, it's just an important use case for dependency discovery.

Thanks again,

Hal


Thanks for the feedback,

- Michael Spencer


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] C++20 modules dependency discovery

Gavin Cui via cfe-dev
In reply to this post by Gavin Cui via cfe-dev
This is likely going to be a bit weird since I just subscribed and don't
have the original email(s) to reply to, so apologies if my
reconstruction is incorrect.

On Mon, Aug 12, 2019 at 18:37:05 PDT, Michael Spencer wrote:
> For explicit modules we only need to know the direct dependencies, as the
> build system will handle the transitive set.

Correct. Though `import` statements in `#include` files still need to be
mentioned.

> For preprocessing we still need to import header units (but only their
> preprocessor state), but not normal modules.  For this case it’s ok if `-E
> -MD` fails to find a module.  But it does still need to be able to find
> header units and module maps.  Additionally the normal Make output syntax
> is not sufficient to represent the needed information unless the driver
> decides how modules and header units should be built and where intermediate
> files should go.  There’s currently a json format working its way through
> the tooling subgroup of the standards committee that I think we should
> adopt for this.
>
> I think we need separate modes in clang for these along with support for
> scanning through header units without actually building a clang module for
> them. clang-scan-deps will make use of the explicit mode.  The question I
> have is how should we select this mode, and what clang options do we need
> to add?
>
> Proposal
> ========
>
> As a rough idea I propose the following:
>
> * `-M?` means output the json format which can correctly represent
> dependencies on a module for which we don’t know what the final file path
> will be.

[ I'm the author of the paper specifying the mentioned format. ]

For my GCC patch, I've spelled the flags for the output in the following
way:

  - `-fdep-format=trtbd`: Necessary to support creating old format
    versions (the "trtbd" part is in search of a much better name :) ).
  - `-fdep-output=<PATH>`: The path that will be passed to the `-o` flag
    when compiling the TU being scanned. This is needed to hook up which
    scan result goes with which compilation rule (it can't be associated
    with the source because a single source path may be compiled
    multiple times within a build; the output object file does need to
    be unique however).
  - `-fdep-file=<PATH>` where to write the output for the format.

I avoided the `-M` flag family because that means "make". This is not
make syntax, so it doesn't belong there. In addition, the existing `-M`
flags are still useful because the "should I rerun this rule" logic for
the scan step itself can be satisfied with the `-M` flags here.

> * `clang++ -std=c++20 -E -MD -fimplicit-header-units` should implicitly
> find header unit sources, but not modules (as we've not given it any way to
> look up how to build modules).
>     * This means that the dep file will contain a bunch of `.h`s,
> `.modulemap`s, and any `.pcm`s explicitly listed on the command line.
>     * This also means erroring on unknown imported modules as we don't know
> what to put in the dep file for them.

Sounds reasonable. Matching GCC's output for them might be a viable
option, but that is going to make not-make parsers of the `.d` files
choke (since that output involves appending to make variables).

> * `clang++ -std=c++20 -E -MD -fimplicit-header-units
> -fimplicit-module-lookup=?`  should do the same as the above, except that
> it does know how to find modules, and should list all of the transitive
> dependencies of any modules it finds.
> * `clang++ -std=c++20 -E -MD` should fail if it hits a module or header
> unit, and should never do implicit lookup.
> * `clang++ -std=c++20 -E -M?` should scan through header units without
> actually building clang modules for them (to get the macros it needs), and
> should note all module imports.
>     * This means that the dep file will contain only `.h`s that it
> includes, and use the json representation of header units and modules.
>     * It will also be shallow, with only direct dependencies.

Sounds good.

> Additionally, we should (eventually) make:
>
> `$ clang++ -std=c++20 a.cpp b.cpp c.cpp a.cppm -o program`
>
> Work without a build system, even in the presence of modules.  To do this
> we will need to prescan the files to determine the module dependencies
> between them and then build them in dependency order.  This does mean
> adding a (simple) build system to the driver (maybe [llbuild](
> https://github.com/apple/swift-llbuild)?), but I think it’s worth it to
> make simple cases simple.  It may also make sense to actually push this
> work out to a real build system.  For example have clang write a temporary
> ninja file and invoke ninja to perform the build.

This sounds like what a Meson developer is expecting in this blog post:

    https://nibblestew.blogspot.com/2019/08/building-c-modules-take-n1.html

I don't know how "simple" they're able to force their compilation model
into what would be provided here. I'm also not sure how much a nested
ninja would be appreciated (there's no notion of a jobserver for
ninja-under-ninja to propagate things like `-l` or `-j` flags down).
Pool information may also be useful there. There is a patchset for
ninja-under-make to obey jobserver information though, but that doesn't
help Meson at all.

On Tue, Aug 13, 2019 at 02:08:42 PDT, Michael Spencer wrote:
> On Tue, Aug 13, 2019 at  01:52:46 PDT, Finkel, Hal J. wrote:
> > I don't object to supporting the json format, but are there defaults
> > that would make sense? Maybe using the preprocessor state implied by
> > the current command-line options and putting intermediate files /
> > interface files in the current directory, or in
> > TMDIR/.clang/<hash of path>, or something else? We'd need defaults
> > for your `-M?` below anyway?

I think that defaults for the `-M?` (or `-fdep-*` flags) is unnecessary.
The flags are only really meaningful to a build system sophisticated
enough to understand module dependencies anyways, so just requiring at
least `-fdep-format=` and `-fdep-file=` to be set sounds OK to me at
least (`-fdep-output=` being unset means the build tool knows what it's
doing I guess). I suppose `-fdep-file=` could have a default too, but
hat sounds like a build system being too trusting of cross-version
compatibility to me.

> The json format doesn't include pcm paths.

It doesn't require them, but there is a slot for the scan tool to say
something. In CMake's implementation, I take the filename of the pcm
path placed there, but relocate it to a target-specific directory. If it
is missing, I create my own filepath based on the logical name of the
module. This is communicated to the actual build by creating a file for
GCC's module mapper to locate it (which is used for import and export
locations). If clang wants a response file, that can be done too (with
the flag just being spelled as `@` instead of `-fmodule-mapper=`).

> It just says which source
> files provide which modules, and what modules and header units each
> source file imports.  It's up to the build system to construct an actual
> build.

Yep.

> The other issue with -MD is that I believe tools that use `.d`
> files wouldn't even be able to handle a `.d` that included actual
> commands.

Correct. Ninja tries to handle the barest of syntax for these files
(basically what is seen in the wild).

> > Also, does finding a module involve matching a cppm file with
> > compatible preprocessor state, or is it just by name?
> >
> It's just by name.  The assumption here is that you have a compilation
> database or similar and thus know the command line options passed to
> every source file.

In CMake, mismatched preprocessor state is expected to be detected by
the compiler (something like "-D flags change the interpretation of the
BMI") or linker (as `_ITERATOR_DEBUG_LEVEL` is handled in Windows).

--Ben
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] C++20 modules dependency discovery

Gavin Cui via cfe-dev
On Tue, Aug 13, 2019 at 1:33 PM Ben Boeckel <[hidden email]> wrote:
This is likely going to be a bit weird since I just subscribed and don't
have the original email(s) to reply to, so apologies if my
reconstruction is incorrect.

On Mon, Aug 12, 2019 at 18:37:05 PDT, Michael Spencer wrote:
> For explicit modules we only need to know the direct dependencies, as the
> build system will handle the transitive set.

Correct. Though `import` statements in `#include` files still need to be
mentioned.

> For preprocessing we still need to import header units (but only their
> preprocessor state), but not normal modules.  For this case it’s ok if `-E
> -MD` fails to find a module.  But it does still need to be able to find
> header units and module maps.  Additionally the normal Make output syntax
> is not sufficient to represent the needed information unless the driver
> decides how modules and header units should be built and where intermediate
> files should go.  There’s currently a json format working its way through
> the tooling subgroup of the standards committee that I think we should
> adopt for this.
>
> I think we need separate modes in clang for these along with support for
> scanning through header units without actually building a clang module for
> them. clang-scan-deps will make use of the explicit mode.  The question I
> have is how should we select this mode, and what clang options do we need
> to add?
>
> Proposal
> ========
>
> As a rough idea I propose the following:
>
> * `-M?` means output the json format which can correctly represent
> dependencies on a module for which we don’t know what the final file path
> will be.

[ I'm the author of the paper specifying the mentioned format. ]

For my GCC patch, I've spelled the flags for the output in the following
way:

  - `-fdep-format=trtbd`: Necessary to support creating old format
    versions (the "trtbd" part is in search of a much better name :) ).
  - `-fdep-output=<PATH>`: The path that will be passed to the `-o` flag
    when compiling the TU being scanned. This is needed to hook up which
    scan result goes with which compilation rule (it can't be associated
    with the source because a single source path may be compiled
    multiple times within a build; the output object file does need to
    be unique however).
  - `-fdep-file=<PATH>` where to write the output for the format.

I avoided the `-M` flag family because that means "make". This is not
make syntax, so it doesn't belong there. In addition, the existing `-M`
flags are still useful because the "should I rerun this rule" logic for
the scan step itself can be satisfied with the `-M` flags here.

This is not something I had considered.  I agree it's highly useful to be able to not rescan if nothing changed.  It's also important that clang uses the same flags as gcc here, have you heard from the GCC devs on your GCC patch?
 

> * `clang++ -std=c++20 -E -MD -fimplicit-header-units` should implicitly
> find header unit sources, but not modules (as we've not given it any way to
> look up how to build modules).
>     * This means that the dep file will contain a bunch of `.h`s,
> `.modulemap`s, and any `.pcm`s explicitly listed on the command line.
>     * This also means erroring on unknown imported modules as we don't know
> what to put in the dep file for them.

Sounds reasonable. Matching GCC's output for them might be a viable
option, but that is going to make not-make parsers of the `.d` files
choke (since that output involves appending to make variables).

What output do you do for GCC?
 

> * `clang++ -std=c++20 -E -MD -fimplicit-header-units
> -fimplicit-module-lookup=?`  should do the same as the above, except that
> it does know how to find modules, and should list all of the transitive
> dependencies of any modules it finds.
> * `clang++ -std=c++20 -E -MD` should fail if it hits a module or header
> unit, and should never do implicit lookup.
> * `clang++ -std=c++20 -E -M?` should scan through header units without
> actually building clang modules for them (to get the macros it needs), and
> should note all module imports.
>     * This means that the dep file will contain only `.h`s that it
> includes, and use the json representation of header units and modules.
>     * It will also be shallow, with only direct dependencies.

Sounds good.

> Additionally, we should (eventually) make:
>
> `$ clang++ -std=c++20 a.cpp b.cpp c.cpp a.cppm -o program`
>
> Work without a build system, even in the presence of modules.  To do this
> we will need to prescan the files to determine the module dependencies
> between them and then build them in dependency order.  This does mean
> adding a (simple) build system to the driver (maybe [llbuild](
> https://github.com/apple/swift-llbuild)?), but I think it’s worth it to
> make simple cases simple.  It may also make sense to actually push this
> work out to a real build system.  For example have clang write a temporary
> ninja file and invoke ninja to perform the build.

This sounds like what a Meson developer is expecting in this blog post:

    https://nibblestew.blogspot.com/2019/08/building-c-modules-take-n1.html

It seems similar, but the intent isn't really for "real" builds.  It's just to support simple cases so that step one of using C++ isn't setting up a build system.
 

I don't know how "simple" they're able to force their compilation model
into what would be provided here. I'm also not sure how much a nested
ninja would be appreciated (there's no notion of a jobserver for
ninja-under-ninja to propagate things like `-l` or `-j` flags down).
Pool information may also be useful there. There is a patchset for
ninja-under-make to obey jobserver information though, but that doesn't
help Meson at all.

On Tue, Aug 13, 2019 at 02:08:42 PDT, Michael Spencer wrote:
> On Tue, Aug 13, 2019 at  01:52:46 PDT, Finkel, Hal J. wrote:
> > I don't object to supporting the json format, but are there defaults
> > that would make sense? Maybe using the preprocessor state implied by
> > the current command-line options and putting intermediate files /
> > interface files in the current directory, or in
> > TMDIR/.clang/<hash of path>, or something else? We'd need defaults
> > for your `-M?` below anyway?

I think that defaults for the `-M?` (or `-fdep-*` flags) is unnecessary.
The flags are only really meaningful to a build system sophisticated
enough to understand module dependencies anyways, so just requiring at
least `-fdep-format=` and `-fdep-file=` to be set sounds OK to me at
least (`-fdep-output=` being unset means the build tool knows what it's
doing I guess). I suppose `-fdep-file=` could have a default too, but
hat sounds like a build system being too trusting of cross-version
compatibility to me.

> The json format doesn't include pcm paths.

It doesn't require them, but there is a slot for the scan tool to say
something. In CMake's implementation, I take the filename of the pcm
path placed there, but relocate it to a target-specific directory. If it
is missing, I create my own filepath based on the logical name of the
module. This is communicated to the actual build by creating a file for
GCC's module mapper to locate it (which is used for import and export
locations). If clang wants a response file, that can be done too (with
the flag just being spelled as `@` instead of `-fmodule-mapper=`).

> It just says which source
> files provide which modules, and what modules and header units each
> source file imports.  It's up to the build system to construct an actual
> build.

Yep.

> The other issue with -MD is that I believe tools that use `.d`
> files wouldn't even be able to handle a `.d` that included actual
> commands.

Correct. Ninja tries to handle the barest of syntax for these files
(basically what is seen in the wild).

This makes me think we really shouldn't even try to do that then.

- Michael Spencer
 

> > Also, does finding a module involve matching a cppm file with
> > compatible preprocessor state, or is it just by name?
> >
> It's just by name.  The assumption here is that you have a compilation
> database or similar and thus know the command line options passed to
> every source file.

In CMake, mismatched preprocessor state is expected to be detected by
the compiler (something like "-D flags change the interpretation of the
BMI") or linker (as `_ITERATOR_DEBUG_LEVEL` is handled in Windows).

--Ben

 

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] C++20 modules dependency discovery

Gavin Cui via cfe-dev
On Tue, Aug 13, 2019 at 13:49:47 -0700, Michael Spencer wrote:

> On Tue, Aug 13, 2019 at 1:33 PM Ben Boeckel <[hidden email]> wrote:
> > I avoided the `-M` flag family because that means "make". This is not
> > make syntax, so it doesn't belong there. In addition, the existing `-M`
> > flags are still useful because the "should I rerun this rule" logic for
> > the scan step itself can be satisfied with the `-M` flags here.
>
> This is not something I had considered.  I agree it's highly useful to be
> able to not rescan if nothing changed.  It's also important that clang uses
> the same flags as gcc here, have you heard from the GCC devs on your GCC
> patch?

Nathan wants to wait for the TR before merging it to his branch. I can
send the patch as an RFC to the GCC list I suppose. Should I CC you?

> > Sounds reasonable. Matching GCC's output for them might be a viable
> > option, but that is going to make not-make parsers of the `.d` files
> > choke (since that output involves appending to make variables).
>
> What output do you do for GCC?

If modules are enabled and `-fdep-format=` is specified, it is basically
just a list of paths read because of `#include` directives.

> > This sounds like what a Meson developer is expecting in this blog post:
> >
> > https://nibblestew.blogspot.com/2019/08/building-c-modules-take-n1.html
>
> It seems similar, but the intent isn't really for "real" builds.  It's just
> to support simple cases so that step one of using C++ isn't setting up a
> build system.

I'm aware that it would really be a simplified build model compared to
what is possible today. Discussion on Reddit was a little heated if
you're interested, but I feel like we were mostly talking on different
levels (me wanting to support what is possible with the IS, others just
wanting to support some idealized C++ build model):

    https://www.reddit.com/r/cpp/comments/cn6osf/building_c_modules_take_n1/

I don't think the "just have the compiler do the hard part" is viable
because module deps between targets still need wired up and doing things
naively means your builds end up being more entangled than one really
wants it to be. But, other build systems can choose the easier problem
than CMake ends up solving. I just don't think expecting compilers to do
all the heavy lifting with module deps is a viable solution for the
wider C++ community.

> > > The other issue with -MD is that I believe tools that use `.d`
> > > files wouldn't even be able to handle a `.d` that included actual
> > > commands.
> >
> > Correct. Ninja tries to handle the barest of syntax for these files
> > (basically what is seen in the wild).
>
> This makes me think we really shouldn't even try to do that then.

Agreed.

--Ben
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] C++20 modules dependency discovery

Gavin Cui via cfe-dev
In reply to this post by Gavin Cui via cfe-dev
On Mon, Aug 12, 2019 at 6:37 PM Michael Spencer via cfe-dev
<[hidden email]> wrote:

>
> C++20 is coming and we need to decide how clang will handle dependency discovery for modules.  In the following, module means compiled C++20 module interface unit, and I will use header unit to refer to the thing generated by a clang module map.
>
> There are two different modes we care about when it comes to module dependencies:  implicit and explicit.
>
> Implicit Modules
> ================
>
> For implicit modules the build system doesn’t know anything about them, and thus can’t care about any intermediate files.  It needs to know about all source files that if changed should cause a rebuild of this translation unit.
>
> For this case clang needs to output the full transitive set of dependencies, excluding any intermediate temporaries.  This also means that we can’t get the full set of dependencies without actually at least preprocessing every module transitively referenced.  This means that `-E -MD` should fail if it can’t find a module or header unit.
>
> Explicit Modules
> ================
>
> For explicit modules we only need to know the direct dependencies, as the build system will handle the transitive set.
>
> For preprocessing we still need to import header units (but only their preprocessor state), but not normal modules.  For this case it’s ok if `-E -MD` fails to find a module.  But it does still need to be able to find header units and module maps.  Additionally the normal Make output syntax is not sufficient to represent the needed information unless the driver decides how modules and header units should be built and where intermediate files should go.  There’s currently a json format working its way through the tooling subgroup of the standards committee that I think we should adopt for this.
>
> I think we need separate modes in clang for these along with support for scanning through header units without actually building a clang module for them. clang-scan-deps will make use of the explicit mode.  The question I have is how should we select this mode, and what clang options do we need to add?
>
> Proposal
> ========
>
> As a rough idea I propose the following:
>
> * `-M?` means output the json format which can correctly represent dependencies on a module for which we don’t know what the final file path will be.
> * `clang++ -std=c++20 -E -MD -fimplicit-header-units` should implicitly find header unit sources, but not modules (as we've not given it any way to look up how to build modules).
>     * This means that the dep file will contain a bunch of `.h`s, `.modulemap`s, and any `.pcm`s explicitly listed on the command line.
>     * This also means erroring on unknown imported modules as we don't know what to put in the dep file for them.
> * `clang++ -std=c++20 -E -MD -fimplicit-header-units -fimplicit-module-lookup=?`  should do the same as the above, except that it does know how to find modules, and should list all of the transitive dependencies of any modules it finds.
> * `clang++ -std=c++20 -E -MD` should fail if it hits a module or header unit, and should never do implicit lookup.
> * `clang++ -std=c++20 -E -M?` should scan through header units without actually building clang modules for them (to get the macros it needs), and should note all module imports.
>     * This means that the dep file will contain only `.h`s that it includes, and use the json representation of header units and modules.
>     * It will also be shallow, with only direct dependencies.

Very nice break down of the different things we can get!

This might be a good opportunity to have more descriptive names than
-M<> related stuff. I really like the -fdep-* approach as pointed out
by Ben, open the opportunity for nice customization. As an alternative
approach on top of your idea, we could introduce a -fdep-mode=<mode>,
such as:

... -std=c++20 -E -fdep-mode=headerunit # `... -std=c++20 -E -MD
-fimplicit-header-units`
... -std=c++20 -E -fdep-mode=transitive # `... -std=c++20 -E -MD
-fimplicit-header-units -fimplicit-module-lookup=?`
... -std=c++20 -E -fdep-mode=shallow # `... -std=c++20 -E -M?`

WDYT?

Ben, how do you express different modes in GCC? Does it have more than
one currently?

> Additionally, we should (eventually) make:
>
> `$ clang++ -std=c++20 a.cpp b.cpp c.cpp a.cppm -o program`
>
> Work without a build system, even in the presence of modules.  To do this we will need to prescan the files to determine the module dependencies between them and then build them in dependency order.  This does mean adding a (simple) build system to the driver (maybe [llbuild](https://github.com/apple/swift-llbuild)?), but I think it’s worth it to make simple cases simple.  It may also make sense to actually push this work out to a real build system.  For example have clang write a temporary ninja file and invoke ninja to perform the build.
>
> - Michael Spencer
>
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



--
Bruno Cardoso Lopes
http://www.brunocardoso.cc
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] C++20 modules dependency discovery

Gavin Cui via cfe-dev
In reply to this post by Gavin Cui via cfe-dev
On Tue, Aug 13, 2019 at 2:02 PM Ben Boeckel via cfe-dev
<[hidden email]> wrote:

>
> On Tue, Aug 13, 2019 at 13:49:47 -0700, Michael Spencer wrote:
> > On Tue, Aug 13, 2019 at 1:33 PM Ben Boeckel <[hidden email]> wrote:
> > > I avoided the `-M` flag family because that means "make". This is not
> > > make syntax, so it doesn't belong there. In addition, the existing `-M`
> > > flags are still useful because the "should I rerun this rule" logic for
> > > the scan step itself can be satisfied with the `-M` flags here.
> >
> > This is not something I had considered.  I agree it's highly useful to be
> > able to not rescan if nothing changed.  It's also important that clang uses
> > the same flags as gcc here, have you heard from the GCC devs on your GCC
> > patch?
>
> Nathan wants to wait for the TR before merging it to his branch. I can
> send the patch as an RFC to the GCC list I suppose. Should I CC you?
>
> > > Sounds reasonable. Matching GCC's output for them might be a viable
> > > option, but that is going to make not-make parsers of the `.d` files
> > > choke (since that output involves appending to make variables).
> >
> > What output do you do for GCC?
>
> If modules are enabled and `-fdep-format=` is specified, it is basically
> just a list of paths read because of `#include` directives.
>
> > > This sounds like what a Meson developer is expecting in this blog post:
> > >
> > > https://nibblestew.blogspot.com/2019/08/building-c-modules-take-n1.html
> >
> > It seems similar, but the intent isn't really for "real" builds.  It's just
> > to support simple cases so that step one of using C++ isn't setting up a
> > build system.
>
> I'm aware that it would really be a simplified build model compared to
> what is possible today. Discussion on Reddit was a little heated if
> you're interested, but I feel like we were mostly talking on different
> levels (me wanting to support what is possible with the IS, others just
> wanting to support some idealized C++ build model):
>
>     https://www.reddit.com/r/cpp/comments/cn6osf/building_c_modules_take_n1/
>
> I don't think the "just have the compiler do the hard part" is viable
> because module deps between targets still need wired up and doing things
> naively means your builds end up being more entangled than one really
> wants it to be. But, other build systems can choose the easier problem
> than CMake ends up solving. I just don't think expecting compilers to do
> all the heavy lifting with module deps is a viable solution for the
> wider C++ community.

Probably won't scale but it has an educational value, and seems to be
in scope for implementations (including clang) to be able to do it
(not really sure about how much viable).

If we had a mode like `$ clang++ -std=c++20 a.cpp b.cpp c.cpp a.cppm
-o program`, we could still add something like `-fdep-mode=transitive`
and get the additional output we would get when dep scanning. Such
output would be useful to at least bulletproof an implicit modules
implementation by replaying it using a build system.

>
> > > > The other issue with -MD is that I believe tools that use `.d`
> > > > files wouldn't even be able to handle a `.d` that included actual
> > > > commands.
> > >
> > > Correct. Ninja tries to handle the barest of syntax for these files
> > > (basically what is seen in the wild).
> >
> > This makes me think we really shouldn't even try to do that then.
>
> Agreed.
>
> --Ben
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



--
Bruno Cardoso Lopes
http://www.brunocardoso.cc
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] C++20 modules dependency discovery

Gavin Cui via cfe-dev
In reply to this post by Gavin Cui via cfe-dev
On Thu, Aug 15, 2019 at 17:27:16 -0700, Bruno Cardoso Lopes via cfe-dev wrote:
> Ben, how do you express different modes in GCC? Does it have more than
> one currently?

It just has the one. It's just that `-std=c++2a` (and `-fmodules-ts` for
now) modifies logic done by `-E`. It was implemented by hooking into the
infrastructure that exists rather than reworking it.

--Ben
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev