C++20 module protocol

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

C++20 module protocol

Fangrui Song via cfe-dev
Hi,
these files are the GCC implementation of the p1184 (wg21.link/p1184)
protocol.  Although part of GCC, they are entirely authored by me, so I
hereby relicense[*] them under the Apache-2.0 with LLVM exception
license, in the hope they may be useful in Clang's implementation.  I
also append the current documentation.

Iain and I are discussing whether a separate upstream project, from
whence both GCC and Clang can sync, may be the best approach.

nathan

[*] Contributions to the FSF give back to the contributor a license to
that code, allowing them to relicense as desired.

--
Nathan Sidwell

@node C++ Module Mapper
@subsection Module Mapper
@cindex C++ Module Mapper

A module mapper provides a line-based server or file that the
compiler queries to determine the mapping between module names and CMI
files.  It is also used to build CMIs on demand.  A mapper may be
specified with the @option{-fmodule-mapper=@var{val}} option or
@env{CXX_MODULE_MAPPER} environment variable.  The value may have
one of the following forms:

@table @gcctabopt

@item @r{[}@var{hostname}@r{]}:@var{port}@r{[}?@var{ident}@r{]}
An optional hostname and a numeric port number to connect to.  If the
hostname is omitted, the loopback address is used.  If the hostname
corresponds to multiple IPV6 addresses, these are tried in turn, until
one is successful.  If your host lacks ipv6, this form is
non-functional.  If you must use ipv4 @emph{get with the 21st century},
or failing that use @option{-fmodule-mapper='|ncat @var{ipv4host}
@var{port}'}.

@item =@var{socket}@r{[}?@var{ident}@r{]}
A local domain socket.  If your host lacks local domain sockets, this
form is non-functional.

@item |@var{program}@r{[}?@var{ident}@r{]} @r{[}@var{args...}@r{]}
A program to spawn, and communicate with on its stdin/stdout streams.
Your @var{PATH} environment variable is searched for the program.
Arguments are separated by space characters, (it is not possible for
one of the arguments delivered to the program to contain a space).

@item <>@r{[}?@var{ident}@r{]}
@item <>@var{fdinout}@r{[}?@var{ident}@r{]}
@item <@var{fdin}>@var{fdout}@r{[}?@var{ident}@r{]}
File descriptors to communicate over.  The first form, @option{<>},
communicates over stdin and stdout.  The second form specifies a
bidirectional file descriptor and the last form allows specifying
two independent descriptors.  Note that other compiler options might
cause the compiler to read stdin or write stdout.

@item @var{file}@r{[}?@var{ident}@r{]}
A mapping file consisting of space-separated module-name, filename
pairs, one per line.  Only the mappings for the direct imports and any
module export name need be provided.  If other mappings are provided,
they override those stored in any imported CMI files.  A repository
root may be specified in the mapping file by using @samp{$root} as the
module name in the first active line.

@end table

As shown, an optional @var{ident} may suffix the first word of the
option, indicated by a @samp{?} prefix.  The value is used in the
initial handshake with the module server, or to specify a prefix on
mapping file lines.  In the server case, the main source file name is
used if no @var{ident} is specified.  In the file case, all non-blank
lines are significant, unless a value is specified, in which case only
lines beginning with @var{ident} are significant.  The @var{ident}
must be separated by whitespace from the module name.  Be aware that
@samp{<}, @samp{>}, @samp{?}  and @samp{|} characters are often
significant to the shell, and therefore may need quoting.

The mapper is connected to or loaded lazily, when the first module
mapping is required.  The networking protocols are only supported on
hosts that provide networking.  If no mapper is specified a default is
provided.

Messages consist of whitespace-separated tokens and a possible final
filename.  As filenames are the last item on a line, they may contain
embeded or trailing spaces without difficulty (they cannot begin with
a space).  All non-ascii characters are expected to be UTF8 encoded.  Each
line is terminated by a line-feed (@code{0xa}) character.  The server
should accept and respond to the following commands:

@table @gcctabopt

@item DONE @var{module}
The compilation has completed the interface of @var{module}.  There is
no response.  It is now safe to read the generated CMI.  Note that the
compilation may not have completed the object-file generation of the
interface unit.

@item EXPORT @var{module}
The compilation is of a module interface unit, and will generate a CMI
for @var{module}.  The response should be @samp{OK @var{cmipath}}.

@item HELLO @var{ver} @var{kind} @var{ident}
This is the first command.  It informs the server of the name of the
source being compiled.  Response is either @samp{HELLO
@var{ver} @var{agent} @var{repopath}}, or @samp{ERROR @var{msg}}.

@item IMPORT @var{module}
A query for an import (including for a module implementation unit).
The response is @samp{OK @var{cmipath}} to indicate a CMI file.  If
the request is not fulfilable, the response is @samp{ERROR
[@var{msg}]}.  Usually an error response will cause compilation to
terminate.

@item INCLUDE @var{header}
A @code{#include} directive for @var{header} is about to be processed.
The response informs the compiler how to treat the inclusion.  A
response of @samp{TEXT} causes textual inclusion.  A response of
@samp{IMPORT} causes importation as a header unit and a subsequent
@samp{IMPORT} query will then be forthcoming.

@end table

It is recommended that any unrecognized command causes an @samp{ERROR}
response with a suitable message.

Requests and responses may be batched.  If a request line begins with
a @samp{+} character, before waiting for a response another request
should be made.  That too may begin with @samp{+}.  The final request
of the batch should begin with @samp{-}, and may be empty.  Similarly
responses may be batched, both in response to a set of batched
requests.  Each non-ultimate line of a batched response begins with a
@samp{+}.  The final line should begin with @samp{-}, and may
otherwise be empty.  Responses to a batched request are in request
order.  Servers should not commence responses until all requests of a
batch have been received.  There may be a fixed-capacity pipe between
client and server, and sending responses before the client has started
reading could result in deadlock.

The following metavariables were used:

@table @gcctabopt

@item @var{cmipath}
Pathname of a CMI file.

@item @var{from}
The source path of the file containing the import or include.

@item @var{module}
A module name.  Header unit names are absolute pathnames, or
pathnames prefixed with @samp{./}.  Header units are resolved using
the include path.

@item @var{msg}
A human readable message.  This may contain whitespace.

@item @var{ident}
An identity provided when invoking the compiler.  This may be helpful
to distinguish different connections to a common server.

@item @var{ver}
A numeric version number, currently 0.

@end table

A project-specific mapper is expected to be provided by the build
system that invokes the compiler.  It is not expected that a
general-purpose server is provided for all compilations.  As such, the
server will know the build configuration, the compiler it invoked, and
the environment (such as working directory) in which that is
operating.  As it may parallelize builds, several compilations may
connect to the same socket.

When delivering paths to the compiler, paths relative to the a
repository-root directory should be used.  This server informs the
compiler of this root in the initial handshake, using a path relative
to the compiler's working directory, or an absolute one.  Compilers
may embed the path of a direct import CMI file into an output CMI.
This path will be relative to the repository.  Such a path reduces the
server traffic, but requires the build system to recreate the same
directory structure within the repository across a parellelized build
system.

The default mapper generates CMI files in a @samp{gcm.cache}
directory.  CMI files have a @samp{.gcm} suffix.  The module unit name
is used directly to provide the basename.  Header units construct a
relative path using the underlying header file name.  If the path is
already relative, a @samp{!} directory is prepended.  Internal
@samp{..} components are translated to @samp{!!}.  No attempt is made
to canonicalize these filenames beyond that done by the preprocessor's
include search algorithm, as in general it is ambiguous when symbolic
links are present.

The mapper protocol was published as ``A Module Mapper''
@uref{http//wg21.link/p1184}.  It is intended that build systems will
provide their own mappers.

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

mapper-client.cc (17K) Download Attachment
mapper-client.h (3K) Download Attachment
mapper-server.cc (41K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: C++20 module protocol

Fangrui Song via cfe-dev
Thanks for designing/working on/contributing this! (I think it's a rather neat thing & do hope it takes off/becomes adopted as the solution for this complicated new compiler surface area required by C++20 modules)

On Tue, May 19, 2020 at 12:48 AM Nathan Sidwell via cfe-dev <[hidden email]> wrote:
Hi,
these files are the GCC implementation of the p1184 (wg21.link/p1184)
protocol.  Although part of GCC, they are entirely authored by me, so I
hereby relicense[*] them under the Apache-2.0 with LLVM exception
license, in the hope they may be useful in Clang's implementation.  I
also append the current documentation.

Iain and I are discussing whether a separate upstream project, from
whence both GCC and Clang can sync, may be the best approach.

nathan

[*] Contributions to the FSF give back to the contributor a license to
that code, allowing them to relicense as desired.

--
Nathan Sidwell

@node C++ Module Mapper
@subsection Module Mapper
@cindex C++ Module Mapper

A module mapper provides a line-based server or file that the
compiler queries to determine the mapping between module names and CMI
files.  It is also used to build CMIs on demand.  A mapper may be
specified with the @option{-fmodule-mapper=@var{val}} option or
@env{CXX_MODULE_MAPPER} environment variable.  The value may have
one of the following forms:

@table @gcctabopt

@item @r{[}@var{hostname}@r{]}:@var{port}@r{[}?@var{ident}@r{]}
An optional hostname and a numeric port number to connect to.  If the
hostname is omitted, the loopback address is used.  If the hostname
corresponds to multiple IPV6 addresses, these are tried in turn, until
one is successful.  If your host lacks ipv6, this form is
non-functional.  If you must use ipv4 @emph{get with the 21st century},
or failing that use @option{-fmodule-mapper='|ncat @var{ipv4host}
@var{port}'}.

@item =@var{socket}@r{[}?@var{ident}@r{]}
A local domain socket.  If your host lacks local domain sockets, this
form is non-functional.

@item |@var{program}@r{[}?@var{ident}@r{]} @r{[}@var{args...}@r{]}
A program to spawn, and communicate with on its stdin/stdout streams.
Your @var{PATH} environment variable is searched for the program.
Arguments are separated by space characters, (it is not possible for
one of the arguments delivered to the program to contain a space).

@item <>@r{[}?@var{ident}@r{]}
@item <>@var{fdinout}@r{[}?@var{ident}@r{]}
@item <@var{fdin}>@var{fdout}@r{[}?@var{ident}@r{]}
File descriptors to communicate over.  The first form, @option{<>},
communicates over stdin and stdout.  The second form specifies a
bidirectional file descriptor and the last form allows specifying
two independent descriptors.  Note that other compiler options might
cause the compiler to read stdin or write stdout.

@item @var{file}@r{[}?@var{ident}@r{]}
A mapping file consisting of space-separated module-name, filename
pairs, one per line.  Only the mappings for the direct imports and any
module export name need be provided.  If other mappings are provided,
they override those stored in any imported CMI files.  A repository
root may be specified in the mapping file by using @samp{$root} as the
module name in the first active line.

@end table

As shown, an optional @var{ident} may suffix the first word of the
option, indicated by a @samp{?} prefix.  The value is used in the
initial handshake with the module server, or to specify a prefix on
mapping file lines.  In the server case, the main source file name is
used if no @var{ident} is specified.  In the file case, all non-blank
lines are significant, unless a value is specified, in which case only
lines beginning with @var{ident} are significant.  The @var{ident}
must be separated by whitespace from the module name.  Be aware that
@samp{<}, @samp{>}, @samp{?}  and @samp{|} characters are often
significant to the shell, and therefore may need quoting.

The mapper is connected to or loaded lazily, when the first module
mapping is required.  The networking protocols are only supported on
hosts that provide networking.  If no mapper is specified a default is
provided.

Messages consist of whitespace-separated tokens and a possible final
filename.  As filenames are the last item on a line, they may contain
embeded or trailing spaces without difficulty (they cannot begin with
a space).  All non-ascii characters are expected to be UTF8 encoded.  Each
line is terminated by a line-feed (@code{0xa}) character.  The server
should accept and respond to the following commands:

@table @gcctabopt

@item DONE @var{module}
The compilation has completed the interface of @var{module}.  There is
no response.  It is now safe to read the generated CMI.  Note that the
compilation may not have completed the object-file generation of the
interface unit.

@item EXPORT @var{module}
The compilation is of a module interface unit, and will generate a CMI
for @var{module}.  The response should be @samp{OK @var{cmipath}}.

@item HELLO @var{ver} @var{kind} @var{ident}
This is the first command.  It informs the server of the name of the
source being compiled.  Response is either @samp{HELLO
@var{ver} @var{agent} @var{repopath}}, or @samp{ERROR @var{msg}}.

@item IMPORT @var{module}
A query for an import (including for a module implementation unit).
The response is @samp{OK @var{cmipath}} to indicate a CMI file.  If
the request is not fulfilable, the response is @samp{ERROR
[@var{msg}]}.  Usually an error response will cause compilation to
terminate.

@item INCLUDE @var{header}
A @code{#include} directive for @var{header} is about to be processed.
The response informs the compiler how to treat the inclusion.  A
response of @samp{TEXT} causes textual inclusion.  A response of
@samp{IMPORT} causes importation as a header unit and a subsequent
@samp{IMPORT} query will then be forthcoming.

@end table

It is recommended that any unrecognized command causes an @samp{ERROR}
response with a suitable message.

Requests and responses may be batched.  If a request line begins with
a @samp{+} character, before waiting for a response another request
should be made.  That too may begin with @samp{+}.  The final request
of the batch should begin with @samp{-}, and may be empty.  Similarly
responses may be batched, both in response to a set of batched
requests.  Each non-ultimate line of a batched response begins with a
@samp{+}.  The final line should begin with @samp{-}, and may
otherwise be empty.  Responses to a batched request are in request
order.  Servers should not commence responses until all requests of a
batch have been received.  There may be a fixed-capacity pipe between
client and server, and sending responses before the client has started
reading could result in deadlock.

The following metavariables were used:

@table @gcctabopt

@item @var{cmipath}
Pathname of a CMI file.

@item @var{from}
The source path of the file containing the import or include.

@item @var{module}
A module name.  Header unit names are absolute pathnames, or
pathnames prefixed with @samp{./}.  Header units are resolved using
the include path.

@item @var{msg}
A human readable message.  This may contain whitespace.

@item @var{ident}
An identity provided when invoking the compiler.  This may be helpful
to distinguish different connections to a common server.

@item @var{ver}
A numeric version number, currently 0.

@end table

A project-specific mapper is expected to be provided by the build
system that invokes the compiler.  It is not expected that a
general-purpose server is provided for all compilations.  As such, the
server will know the build configuration, the compiler it invoked, and
the environment (such as working directory) in which that is
operating.  As it may parallelize builds, several compilations may
connect to the same socket.

When delivering paths to the compiler, paths relative to the a
repository-root directory should be used.  This server informs the
compiler of this root in the initial handshake, using a path relative
to the compiler's working directory, or an absolute one.  Compilers
may embed the path of a direct import CMI file into an output CMI.
This path will be relative to the repository.  Such a path reduces the
server traffic, but requires the build system to recreate the same
directory structure within the repository across a parellelized build
system.

The default mapper generates CMI files in a @samp{gcm.cache}
directory.  CMI files have a @samp{.gcm} suffix.  The module unit name
is used directly to provide the basename.  Header units construct a
relative path using the underlying header file name.  If the path is
already relative, a @samp{!} directory is prepended.  Internal
@samp{..} components are translated to @samp{!!}.  No attempt is made
to canonicalize these filenames beyond that done by the preprocessor's
include search algorithm, as in general it is ambiguous when symbolic
links are present.

The mapper protocol was published as ``A Module Mapper''
@uref{http//wg21.link/p1184}.  It is intended that build systems will
provide their own mappers.
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: C++20 module protocol

Fangrui Song via cfe-dev
In reply to this post by Fangrui Song via cfe-dev
Very nice Nathan!  Thank you for helping to foster cross-compiler collaboration,

-Chris

> On May 18, 2020, at 8:36 AM, Nathan Sidwell via cfe-dev <[hidden email]> wrote:
>
> Hi,
> these files are the GCC implementation of the p1184 (wg21.link/p1184) protocol.  Although part of GCC, they are entirely authored by me, so I hereby relicense[*] them under the Apache-2.0 with LLVM exception license, in the hope they may be useful in Clang's implementation.  I also append the current documentation.
>
> Iain and I are discussing whether a separate upstream project, from whence both GCC and Clang can sync, may be the best approach.
>
> nathan
>
> [*] Contributions to the FSF give back to the contributor a license to that code, allowing them to relicense as desired.
>
> --
> Nathan Sidwell
>
> @node C++ Module Mapper
> @subsection Module Mapper
> @cindex C++ Module Mapper
>
> A module mapper provides a line-based server or file that the
> compiler queries to determine the mapping between module names and CMI
> files.  It is also used to build CMIs on demand.  A mapper may be
> specified with the @option{-fmodule-mapper=@var{val}} option or
> @env{CXX_MODULE_MAPPER} environment variable.  The value may have
> one of the following forms:
>
> @table @gcctabopt
>
> @item @r{[}@var{hostname}@r{]}:@var{port}@r{[}?@var{ident}@r{]}
> An optional hostname and a numeric port number to connect to.  If the
> hostname is omitted, the loopback address is used.  If the hostname
> corresponds to multiple IPV6 addresses, these are tried in turn, until
> one is successful.  If your host lacks ipv6, this form is
> non-functional.  If you must use ipv4 @emph{get with the 21st century},
> or failing that use @option{-fmodule-mapper='|ncat @var{ipv4host}
> @var{port}'}.
>
> @item =@var{socket}@r{[}?@var{ident}@r{]}
> A local domain socket.  If your host lacks local domain sockets, this
> form is non-functional.
>
> @item |@var{program}@r{[}?@var{ident}@r{]} @r{[}@var{args...}@r{]}
> A program to spawn, and communicate with on its stdin/stdout streams.
> Your @var{PATH} environment variable is searched for the program.
> Arguments are separated by space characters, (it is not possible for
> one of the arguments delivered to the program to contain a space).
>
> @item <>@r{[}?@var{ident}@r{]}
> @item <>@var{fdinout}@r{[}?@var{ident}@r{]}
> @item <@var{fdin}>@var{fdout}@r{[}?@var{ident}@r{]}
> File descriptors to communicate over.  The first form, @option{<>},
> communicates over stdin and stdout.  The second form specifies a
> bidirectional file descriptor and the last form allows specifying
> two independent descriptors.  Note that other compiler options might
> cause the compiler to read stdin or write stdout.
>
> @item @var{file}@r{[}?@var{ident}@r{]}
> A mapping file consisting of space-separated module-name, filename
> pairs, one per line.  Only the mappings for the direct imports and any
> module export name need be provided.  If other mappings are provided,
> they override those stored in any imported CMI files.  A repository
> root may be specified in the mapping file by using @samp{$root} as the
> module name in the first active line.
>
> @end table
>
> As shown, an optional @var{ident} may suffix the first word of the
> option, indicated by a @samp{?} prefix.  The value is used in the
> initial handshake with the module server, or to specify a prefix on
> mapping file lines.  In the server case, the main source file name is
> used if no @var{ident} is specified.  In the file case, all non-blank
> lines are significant, unless a value is specified, in which case only
> lines beginning with @var{ident} are significant.  The @var{ident}
> must be separated by whitespace from the module name.  Be aware that
> @samp{<}, @samp{>}, @samp{?}  and @samp{|} characters are often
> significant to the shell, and therefore may need quoting.
>
> The mapper is connected to or loaded lazily, when the first module
> mapping is required.  The networking protocols are only supported on
> hosts that provide networking.  If no mapper is specified a default is
> provided.
>
> Messages consist of whitespace-separated tokens and a possible final
> filename.  As filenames are the last item on a line, they may contain
> embeded or trailing spaces without difficulty (they cannot begin with
> a space).  All non-ascii characters are expected to be UTF8 encoded.  Each
> line is terminated by a line-feed (@code{0xa}) character.  The server
> should accept and respond to the following commands:
>
> @table @gcctabopt
>
> @item DONE @var{module}
> The compilation has completed the interface of @var{module}.  There is
> no response.  It is now safe to read the generated CMI.  Note that the
> compilation may not have completed the object-file generation of the
> interface unit.
>
> @item EXPORT @var{module}
> The compilation is of a module interface unit, and will generate a CMI
> for @var{module}.  The response should be @samp{OK @var{cmipath}}.
>
> @item HELLO @var{ver} @var{kind} @var{ident}
> This is the first command.  It informs the server of the name of the
> source being compiled.  Response is either @samp{HELLO
> @var{ver} @var{agent} @var{repopath}}, or @samp{ERROR @var{msg}}.
>
> @item IMPORT @var{module}
> A query for an import (including for a module implementation unit).
> The response is @samp{OK @var{cmipath}} to indicate a CMI file.  If
> the request is not fulfilable, the response is @samp{ERROR
> [@var{msg}]}.  Usually an error response will cause compilation to
> terminate.
>
> @item INCLUDE @var{header}
> A @code{#include} directive for @var{header} is about to be processed.
> The response informs the compiler how to treat the inclusion.  A
> response of @samp{TEXT} causes textual inclusion.  A response of
> @samp{IMPORT} causes importation as a header unit and a subsequent
> @samp{IMPORT} query will then be forthcoming.
>
> @end table
>
> It is recommended that any unrecognized command causes an @samp{ERROR}
> response with a suitable message.
>
> Requests and responses may be batched.  If a request line begins with
> a @samp{+} character, before waiting for a response another request
> should be made.  That too may begin with @samp{+}.  The final request
> of the batch should begin with @samp{-}, and may be empty.  Similarly
> responses may be batched, both in response to a set of batched
> requests.  Each non-ultimate line of a batched response begins with a
> @samp{+}.  The final line should begin with @samp{-}, and may
> otherwise be empty.  Responses to a batched request are in request
> order.  Servers should not commence responses until all requests of a
> batch have been received.  There may be a fixed-capacity pipe between
> client and server, and sending responses before the client has started
> reading could result in deadlock.
>
> The following metavariables were used:
>
> @table @gcctabopt
>
> @item @var{cmipath}
> Pathname of a CMI file.
>
> @item @var{from}
> The source path of the file containing the import or include.
>
> @item @var{module}
> A module name.  Header unit names are absolute pathnames, or
> pathnames prefixed with @samp{./}.  Header units are resolved using
> the include path.
>
> @item @var{msg}
> A human readable message.  This may contain whitespace.
>
> @item @var{ident}
> An identity provided when invoking the compiler.  This may be helpful
> to distinguish different connections to a common server.
>
> @item @var{ver}
> A numeric version number, currently 0.
>
> @end table
>
> A project-specific mapper is expected to be provided by the build
> system that invokes the compiler.  It is not expected that a
> general-purpose server is provided for all compilations.  As such, the
> server will know the build configuration, the compiler it invoked, and
> the environment (such as working directory) in which that is
> operating.  As it may parallelize builds, several compilations may
> connect to the same socket.
>
> When delivering paths to the compiler, paths relative to the a
> repository-root directory should be used.  This server informs the
> compiler of this root in the initial handshake, using a path relative
> to the compiler's working directory, or an absolute one.  Compilers
> may embed the path of a direct import CMI file into an output CMI.
> This path will be relative to the repository.  Such a path reduces the
> server traffic, but requires the build system to recreate the same
> directory structure within the repository across a parellelized build
> system.
>
> The default mapper generates CMI files in a @samp{gcm.cache}
> directory.  CMI files have a @samp{.gcm} suffix.  The module unit name
> is used directly to provide the basename.  Header units construct a
> relative path using the underlying header file name.  If the path is
> already relative, a @samp{!} directory is prepended.  Internal
> @samp{..} components are translated to @samp{!!}.  No attempt is made
> to canonicalize these filenames beyond that done by the preprocessor's
> include search algorithm, as in general it is ambiguous when symbolic
> links are present.
>
> The mapper protocol was published as ``A Module Mapper''
> @uref{http//wg21.link/p1184}.  It is intended that build systems will
> provide their own mappers.
> <mapper-client.cc><mapper-client.h><mapper-server.cc>_______________________________________________
> cfe-dev mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: C++20 module protocol

Fangrui Song via cfe-dev
In reply to this post by Fangrui Song via cfe-dev
This is pretty cool, thanks for sharing Nathan!

On Tue, May 19, 2020 at 12:48 AM Nathan Sidwell via cfe-dev
<[hidden email]> wrote:

>
> Hi,
> these files are the GCC implementation of the p1184 (wg21.link/p1184)
> protocol.  Although part of GCC, they are entirely authored by me, so I
> hereby relicense[*] them under the Apache-2.0 with LLVM exception
> license, in the hope they may be useful in Clang's implementation.  I
> also append the current documentation.
>
> Iain and I are discussing whether a separate upstream project, from
> whence both GCC and Clang can sync, may be the best approach.
>
> nathan
>
> [*] Contributions to the FSF give back to the contributor a license to
> that code, allowing them to relicense as desired.
>
> --
> Nathan Sidwell
>
> @node C++ Module Mapper
> @subsection Module Mapper
> @cindex C++ Module Mapper
>
> A module mapper provides a line-based server or file that the
> compiler queries to determine the mapping between module names and CMI
> files.  It is also used to build CMIs on demand.  A mapper may be
> specified with the @option{-fmodule-mapper=@var{val}} option or
> @env{CXX_MODULE_MAPPER} environment variable.  The value may have
> one of the following forms:
>
> @table @gcctabopt
>
> @item @r{[}@var{hostname}@r{]}:@var{port}@r{[}?@var{ident}@r{]}
> An optional hostname and a numeric port number to connect to.  If the
> hostname is omitted, the loopback address is used.  If the hostname
> corresponds to multiple IPV6 addresses, these are tried in turn, until
> one is successful.  If your host lacks ipv6, this form is
> non-functional.  If you must use ipv4 @emph{get with the 21st century},
> or failing that use @option{-fmodule-mapper='|ncat @var{ipv4host}
> @var{port}'}.
>
> @item =@var{socket}@r{[}?@var{ident}@r{]}
> A local domain socket.  If your host lacks local domain sockets, this
> form is non-functional.
>
> @item |@var{program}@r{[}?@var{ident}@r{]} @r{[}@var{args...}@r{]}
> A program to spawn, and communicate with on its stdin/stdout streams.
> Your @var{PATH} environment variable is searched for the program.
> Arguments are separated by space characters, (it is not possible for
> one of the arguments delivered to the program to contain a space).
>
> @item <>@r{[}?@var{ident}@r{]}
> @item <>@var{fdinout}@r{[}?@var{ident}@r{]}
> @item <@var{fdin}>@var{fdout}@r{[}?@var{ident}@r{]}
> File descriptors to communicate over.  The first form, @option{<>},
> communicates over stdin and stdout.  The second form specifies a
> bidirectional file descriptor and the last form allows specifying
> two independent descriptors.  Note that other compiler options might
> cause the compiler to read stdin or write stdout.
>
> @item @var{file}@r{[}?@var{ident}@r{]}
> A mapping file consisting of space-separated module-name, filename
> pairs, one per line.  Only the mappings for the direct imports and any
> module export name need be provided.  If other mappings are provided,
> they override those stored in any imported CMI files.  A repository
> root may be specified in the mapping file by using @samp{$root} as the
> module name in the first active line.
>
> @end table
>
> As shown, an optional @var{ident} may suffix the first word of the
> option, indicated by a @samp{?} prefix.  The value is used in the
> initial handshake with the module server, or to specify a prefix on
> mapping file lines.  In the server case, the main source file name is
> used if no @var{ident} is specified.  In the file case, all non-blank
> lines are significant, unless a value is specified, in which case only
> lines beginning with @var{ident} are significant.  The @var{ident}
> must be separated by whitespace from the module name.  Be aware that
> @samp{<}, @samp{>}, @samp{?}  and @samp{|} characters are often
> significant to the shell, and therefore may need quoting.
>
> The mapper is connected to or loaded lazily, when the first module
> mapping is required.  The networking protocols are only supported on
> hosts that provide networking.  If no mapper is specified a default is
> provided.
>
> Messages consist of whitespace-separated tokens and a possible final
> filename.  As filenames are the last item on a line, they may contain
> embeded or trailing spaces without difficulty (they cannot begin with
> a space).  All non-ascii characters are expected to be UTF8 encoded.  Each
> line is terminated by a line-feed (@code{0xa}) character.  The server
> should accept and respond to the following commands:
>
> @table @gcctabopt
>
> @item DONE @var{module}
> The compilation has completed the interface of @var{module}.  There is
> no response.  It is now safe to read the generated CMI.  Note that the
> compilation may not have completed the object-file generation of the
> interface unit.
>
> @item EXPORT @var{module}
> The compilation is of a module interface unit, and will generate a CMI
> for @var{module}.  The response should be @samp{OK @var{cmipath}}.
>
> @item HELLO @var{ver} @var{kind} @var{ident}
> This is the first command.  It informs the server of the name of the
> source being compiled.  Response is either @samp{HELLO
> @var{ver} @var{agent} @var{repopath}}, or @samp{ERROR @var{msg}}.
>
> @item IMPORT @var{module}
> A query for an import (including for a module implementation unit).
> The response is @samp{OK @var{cmipath}} to indicate a CMI file.  If
> the request is not fulfilable, the response is @samp{ERROR
> [@var{msg}]}.  Usually an error response will cause compilation to
> terminate.
>
> @item INCLUDE @var{header}
> A @code{#include} directive for @var{header} is about to be processed.
> The response informs the compiler how to treat the inclusion.  A
> response of @samp{TEXT} causes textual inclusion.  A response of
> @samp{IMPORT} causes importation as a header unit and a subsequent
> @samp{IMPORT} query will then be forthcoming.
>
> @end table
>
> It is recommended that any unrecognized command causes an @samp{ERROR}
> response with a suitable message.
>
> Requests and responses may be batched.  If a request line begins with
> a @samp{+} character, before waiting for a response another request
> should be made.  That too may begin with @samp{+}.  The final request
> of the batch should begin with @samp{-}, and may be empty.  Similarly
> responses may be batched, both in response to a set of batched
> requests.  Each non-ultimate line of a batched response begins with a
> @samp{+}.  The final line should begin with @samp{-}, and may
> otherwise be empty.  Responses to a batched request are in request
> order.  Servers should not commence responses until all requests of a
> batch have been received.  There may be a fixed-capacity pipe between
> client and server, and sending responses before the client has started
> reading could result in deadlock.
>
> The following metavariables were used:
>
> @table @gcctabopt
>
> @item @var{cmipath}
> Pathname of a CMI file.
>
> @item @var{from}
> The source path of the file containing the import or include.
>
> @item @var{module}
> A module name.  Header unit names are absolute pathnames, or
> pathnames prefixed with @samp{./}.  Header units are resolved using
> the include path.
>
> @item @var{msg}
> A human readable message.  This may contain whitespace.
>
> @item @var{ident}
> An identity provided when invoking the compiler.  This may be helpful
> to distinguish different connections to a common server.
>
> @item @var{ver}
> A numeric version number, currently 0.
>
> @end table
>
> A project-specific mapper is expected to be provided by the build
> system that invokes the compiler.  It is not expected that a
> general-purpose server is provided for all compilations.  As such, the
> server will know the build configuration, the compiler it invoked, and
> the environment (such as working directory) in which that is
> operating.  As it may parallelize builds, several compilations may
> connect to the same socket.
>
> When delivering paths to the compiler, paths relative to the a
> repository-root directory should be used.  This server informs the
> compiler of this root in the initial handshake, using a path relative
> to the compiler's working directory, or an absolute one.  Compilers
> may embed the path of a direct import CMI file into an output CMI.
> This path will be relative to the repository.  Such a path reduces the
> server traffic, but requires the build system to recreate the same
> directory structure within the repository across a parellelized build
> system.
>
> The default mapper generates CMI files in a @samp{gcm.cache}
> directory.  CMI files have a @samp{.gcm} suffix.  The module unit name
> is used directly to provide the basename.  Header units construct a
> relative path using the underlying header file name.  If the path is
> already relative, a @samp{!} directory is prepended.  Internal
> @samp{..} components are translated to @samp{!!}.  No attempt is made
> to canonicalize these filenames beyond that done by the preprocessor's
> include search algorithm, as in general it is ambiguous when symbolic
> links are present.
>
> The mapper protocol was published as ``A Module Mapper''
> @uref{http//wg21.link/p1184}.  It is intended that build systems will
> provide their own mappers.
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



--
Bruno Cardoso Lopes
http://www.brunocardoso.cc
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev