Controlling instantiation of templates from PCH

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Controlling instantiation of templates from PCH

Lubos Lunak-2

 Hello,

 I'm working on a Clang patch that can make C++ builds noticeably faster in
some setups by allowing control over how templates are instantiated, but I
have some problems finishing it and need advice.

 Background: I am a LibreOffice developer. When enabling precompiled headers,
e.g. for LO Calc precompiled headers save ~2/3 of build time when MSVC is
used, but with Clang they save only ~10%. Moreover the larger the PCH the
more time is saved with MSVC, but this is not so with Clang, in fact larger
PCHs often make things slower.

 The recent -ftime-trace feature allowed me to investigate this and it turns
out that the time saved by having to parse less is outweighted by having to
instantiate (many) more templates. You can see -ftime-trace graphs at
http://llunak.blogspot.com/2019/05/why-precompiled-headers-do-not-improve.html 
(1nd row - no PCH, 2nd row - small PCH, 3rd row - large PCH), the .json files
are at http://ge.tt/7RHeLHw2 if somebody wants to see them.

 Specifically, the time is spent in Sema::PerformPendingInstantiations() and
Sema::InstantiateFunctionDefinition(). The vast majority of the
instantiations comes from the PCH itself. This means that this is performed
for every TU using the PCH, and it also means that it's useless work, as the
linker will discard all but one copy of that.

 My WIP patch implements a new option to avoid that. The idea is that all
sources using the PCH will be built with -fpch-template-instantiation=skip,
which will prevent Sema::InstantiateFunctionDefinition() from actually
instantiating templates coming from the PCH if they would be uneeded
duplicates (note that means almost all PCH template instantiations in the
case of a developer build with -O0 -g, which is my primary use case). Then
one extra source file is built with -fpch-template-instantiation=force, which
will provide one copy of instantiations. I assume that this is similar to how
MSVC manages to have much better gains with PCH, the .obj created during PCH
creation presumably contains single instantiations.

 In the -ftime-trace graphs linked above, the 4th row is large PCH with my
patch. The compilation time saved by this is 50% and 60% for the two examples
(and I think moving some templates into the PCH might get it to 70-75% for
the second file).

 As I said, I have some problems that prevent the patch from being fully
usable, so in order to finish it, could somebody help me with the following:

- I don't understand how it is controlled which kind of ctor/dtor is emitted
(complete ctor vs base ctor, i.e. C1 vs C2 type in the Itanium ABI). I get
many undefined references because the TU built with instances does not have
both types, yet other TUs refer to them. How can I force both of them be
emitted?

- I have an undefined reference to one template function that should be
included in the TU with instances, but it isn't. The Sema part instantiates
it and I could track it as far as getting generated in Codegen, but then I'm
lost. I assume that it gets discarded because something in codegen or llvm
considers it unused. Is there a place like that and where is it? Are there
other places in codegen/llvm where I could check to see why this function
doesn't get generated in the object file?

- In Sema::InstantiateFunctionDefinition() the code for extern templates still
instantiates a function if it has getContainedAutoType(), so my code should
probably also check that. But I'm not sure what that is (is that 'auto foo()
{ return 1; }' ?) or why that would need an instance in every TU.

- I used BENIGN_ENUM_LANGOPT because Clang otherwise complains that the PCH is
used with a different option than what it was generated with, which is
necessary in this case, but I'm not sure if this is the correct handling of
the option.

- Is there a simple rule for what decides that a template needs to be
instantiated? As far as I can tell, even using a template as a class member
or having an inline member function manipulating it doesn't. When I mentioned
moving some templates into the PCH in order to get possible 70% savings, I
actually don't know how to cause an instantiation from the PCH, the templates
and their uses are included there.

 Thank you.

--
 Lubos Lunak

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

pch-instantiate-templates.patch (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Controlling instantiation of templates from PCH

Nathan Ridge via cfe-dev
This seems like a nice idea, and has a lot in common with our existing "modular codegen" mode, which does largely the same thing but for PCMs rather than PCHs. I'd hope we could share a lot of the implementation between the two features.

+David Blaikie, who implemented modular codegen and might be able to advise as to the best way to integrate similar functionality into our PCH support.

On Sat, 25 May 2019 at 12:32, Lubos Lunak via cfe-dev <[hidden email]> wrote:

 Hello,

 I'm working on a Clang patch that can make C++ builds noticeably faster in
some setups by allowing control over how templates are instantiated, but I
have some problems finishing it and need advice.

 Background: I am a LibreOffice developer. When enabling precompiled headers,
e.g. for LO Calc precompiled headers save ~2/3 of build time when MSVC is
used, but with Clang they save only ~10%. Moreover the larger the PCH the
more time is saved with MSVC, but this is not so with Clang, in fact larger
PCHs often make things slower.

 The recent -ftime-trace feature allowed me to investigate this and it turns
out that the time saved by having to parse less is outweighted by having to
instantiate (many) more templates. You can see -ftime-trace graphs at
http://llunak.blogspot.com/2019/05/why-precompiled-headers-do-not-improve.html
(1nd row - no PCH, 2nd row - small PCH, 3rd row - large PCH), the .json files
are at http://ge.tt/7RHeLHw2 if somebody wants to see them.

 Specifically, the time is spent in Sema::PerformPendingInstantiations() and
Sema::InstantiateFunctionDefinition(). The vast majority of the
instantiations comes from the PCH itself. This means that this is performed
for every TU using the PCH, and it also means that it's useless work, as the
linker will discard all but one copy of that.

 My WIP patch implements a new option to avoid that. The idea is that all
sources using the PCH will be built with -fpch-template-instantiation=skip,
which will prevent Sema::InstantiateFunctionDefinition() from actually
instantiating templates coming from the PCH if they would be uneeded
duplicates (note that means almost all PCH template instantiations in the
case of a developer build with -O0 -g, which is my primary use case). Then
one extra source file is built with -fpch-template-instantiation=force, which
will provide one copy of instantiations. I assume that this is similar to how
MSVC manages to have much better gains with PCH, the .obj created during PCH
creation presumably contains single instantiations.

 In the -ftime-trace graphs linked above, the 4th row is large PCH with my
patch. The compilation time saved by this is 50% and 60% for the two examples
(and I think moving some templates into the PCH might get it to 70-75% for
the second file).

 As I said, I have some problems that prevent the patch from being fully
usable, so in order to finish it, could somebody help me with the following:

- I don't understand how it is controlled which kind of ctor/dtor is emitted
(complete ctor vs base ctor, i.e. C1 vs C2 type in the Itanium ABI). I get
many undefined references because the TU built with instances does not have
both types, yet other TUs refer to them. How can I force both of them be
emitted?

- I have an undefined reference to one template function that should be
included in the TU with instances, but it isn't. The Sema part instantiates
it and I could track it as far as getting generated in Codegen, but then I'm
lost. I assume that it gets discarded because something in codegen or llvm
considers it unused. Is there a place like that and where is it? Are there
other places in codegen/llvm where I could check to see why this function
doesn't get generated in the object file?

- In Sema::InstantiateFunctionDefinition() the code for extern templates still
instantiates a function if it has getContainedAutoType(), so my code should
probably also check that. But I'm not sure what that is (is that 'auto foo()
{ return 1; }' ?) or why that would need an instance in every TU.

- I used BENIGN_ENUM_LANGOPT because Clang otherwise complains that the PCH is
used with a different option than what it was generated with, which is
necessary in this case, but I'm not sure if this is the correct handling of
the option.

- Is there a simple rule for what decides that a template needs to be
instantiated? As far as I can tell, even using a template as a class member
or having an inline member function manipulating it doesn't. When I mentioned
moving some templates into the PCH in order to get possible 70% savings, I
actually don't know how to cause an instantiation from the PCH, the templates
and their uses are included there.

 Thank you.

--
 Lubos Lunak
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Controlling instantiation of templates from PCH

Nathan Ridge via cfe-dev
Thanks Richard - yeah sounds pretty similar though I'm a bit confused about what's happening in this case, in part because I know next to nothing about how Clang's PCH works (& especially how it differs from PCM/modules).

Richard: why would any module or PCH cause a subsequent compilation to perform more pending instantiations? (I would've thought/my understanding was that nothing in the module would be used if it wasn't referenced from the source file, so why would a pch cause more pending instantiations?)

Lubos: Could you provide a small standalone example of this increase in pending instantiations so it's a bit easier for me to understand the kind of code & what's happening?
You mentioned in the blog post that the use of a PCH causes more functions to be emitted into the final object file (than if a PCH had not been used, and the source remained the same). Especially the possibility of functions being emitted into the object file that are totally unused by the object file. (again, I'm especially interested in comparing the non-PCH with the PCH case here, rather than the Clang PCH with the VS PCH situation) - those are situations that would be very surprising to me.

On Sat, May 25, 2019 at 6:38 PM Richard Smith <[hidden email]> wrote:
This seems like a nice idea, and has a lot in common with our existing "modular codegen" mode, which does largely the same thing but for PCMs rather than PCHs. I'd hope we could share a lot of the implementation between the two features.

+David Blaikie, who implemented modular codegen and might be able to advise as to the best way to integrate similar functionality into our PCH support.

On Sat, 25 May 2019 at 12:32, Lubos Lunak via cfe-dev <[hidden email]> wrote:

 Hello,

 I'm working on a Clang patch that can make C++ builds noticeably faster in
some setups by allowing control over how templates are instantiated, but I
have some problems finishing it and need advice.

 Background: I am a LibreOffice developer. When enabling precompiled headers,
e.g. for LO Calc precompiled headers save ~2/3 of build time when MSVC is
used, but with Clang they save only ~10%. Moreover the larger the PCH the
more time is saved with MSVC, but this is not so with Clang, in fact larger
PCHs often make things slower.

 The recent -ftime-trace feature allowed me to investigate this and it turns
out that the time saved by having to parse less is outweighted by having to
instantiate (many) more templates. You can see -ftime-trace graphs at
http://llunak.blogspot.com/2019/05/why-precompiled-headers-do-not-improve.html
(1nd row - no PCH, 2nd row - small PCH, 3rd row - large PCH), the .json files
are at http://ge.tt/7RHeLHw2 if somebody wants to see them.

 Specifically, the time is spent in Sema::PerformPendingInstantiations() and
Sema::InstantiateFunctionDefinition(). The vast majority of the
instantiations comes from the PCH itself. This means that this is performed
for every TU using the PCH, and it also means that it's useless work, as the
linker will discard all but one copy of that.

 My WIP patch implements a new option to avoid that. The idea is that all
sources using the PCH will be built with -fpch-template-instantiation=skip,
which will prevent Sema::InstantiateFunctionDefinition() from actually
instantiating templates coming from the PCH if they would be uneeded
duplicates (note that means almost all PCH template instantiations in the
case of a developer build with -O0 -g, which is my primary use case). Then
one extra source file is built with -fpch-template-instantiation=force, which
will provide one copy of instantiations. I assume that this is similar to how
MSVC manages to have much better gains with PCH, the .obj created during PCH
creation presumably contains single instantiations.

 In the -ftime-trace graphs linked above, the 4th row is large PCH with my
patch. The compilation time saved by this is 50% and 60% for the two examples
(and I think moving some templates into the PCH might get it to 70-75% for
the second file).

 As I said, I have some problems that prevent the patch from being fully
usable, so in order to finish it, could somebody help me with the following:

- I don't understand how it is controlled which kind of ctor/dtor is emitted
(complete ctor vs base ctor, i.e. C1 vs C2 type in the Itanium ABI). I get
many undefined references because the TU built with instances does not have
both types, yet other TUs refer to them. How can I force both of them be
emitted?

- I have an undefined reference to one template function that should be
included in the TU with instances, but it isn't. The Sema part instantiates
it and I could track it as far as getting generated in Codegen, but then I'm
lost. I assume that it gets discarded because something in codegen or llvm
considers it unused. Is there a place like that and where is it? Are there
other places in codegen/llvm where I could check to see why this function
doesn't get generated in the object file?

- In Sema::InstantiateFunctionDefinition() the code for extern templates still
instantiates a function if it has getContainedAutoType(), so my code should
probably also check that. But I'm not sure what that is (is that 'auto foo()
{ return 1; }' ?) or why that would need an instance in every TU.

- I used BENIGN_ENUM_LANGOPT because Clang otherwise complains that the PCH is
used with a different option than what it was generated with, which is
necessary in this case, but I'm not sure if this is the correct handling of
the option.

- Is there a simple rule for what decides that a template needs to be
instantiated? As far as I can tell, even using a template as a class member
or having an inline member function manipulating it doesn't. When I mentioned
moving some templates into the PCH in order to get possible 70% savings, I
actually don't know how to cause an instantiation from the PCH, the templates
and their uses are included there.

 Thank you.

--
 Lubos Lunak
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Controlling instantiation of templates from PCH

Nathan Ridge via cfe-dev
On Sun, 26 May 2019 at 13:26, David Blaikie via cfe-dev <[hidden email]> wrote:
Thanks Richard - yeah sounds pretty similar though I'm a bit confused about what's happening in this case, in part because I know next to nothing about how Clang's PCH works (& especially how it differs from PCM/modules).

Richard: why would any module or PCH cause a subsequent compilation to perform more pending instantiations? (I would've thought/my understanding was that nothing in the module would be used if it wasn't referenced from the source file, so why would a pch cause more pending instantiations?)

Our design philosophy for modules and preamble precompilation is for a compilation using a precompiled header / preamble to behave identically to a compilation that parsed the header rather than using a precompiled form. So we don't perform end-of-translation-unit template instantiation at the end of a precompiled header, and instead perform the instantiation (and emit all the instantiated definitions and likewise all definitions of all used inline functions in the PCH) in all consumers of the PCH.
 
Lubos: Could you provide a small standalone example of this increase in pending instantiations so it's a bit easier for me to understand the kind of code & what's happening?
You mentioned in the blog post that the use of a PCH causes more functions to be emitted into the final object file (than if a PCH had not been used, and the source remained the same). Especially the possibility of functions being emitted into the object file that are totally unused by the object file. (again, I'm especially interested in comparing the non-PCH with the PCH case here, rather than the Clang PCH with the VS PCH situation) - those are situations that would be very surprising to me.

On Sat, May 25, 2019 at 6:38 PM Richard Smith <[hidden email]> wrote:
This seems like a nice idea, and has a lot in common with our existing "modular codegen" mode, which does largely the same thing but for PCMs rather than PCHs. I'd hope we could share a lot of the implementation between the two features.

+David Blaikie, who implemented modular codegen and might be able to advise as to the best way to integrate similar functionality into our PCH support.

On Sat, 25 May 2019 at 12:32, Lubos Lunak via cfe-dev <[hidden email]> wrote:

 Hello,

 I'm working on a Clang patch that can make C++ builds noticeably faster in
some setups by allowing control over how templates are instantiated, but I
have some problems finishing it and need advice.

 Background: I am a LibreOffice developer. When enabling precompiled headers,
e.g. for LO Calc precompiled headers save ~2/3 of build time when MSVC is
used, but with Clang they save only ~10%. Moreover the larger the PCH the
more time is saved with MSVC, but this is not so with Clang, in fact larger
PCHs often make things slower.

 The recent -ftime-trace feature allowed me to investigate this and it turns
out that the time saved by having to parse less is outweighted by having to
instantiate (many) more templates. You can see -ftime-trace graphs at
http://llunak.blogspot.com/2019/05/why-precompiled-headers-do-not-improve.html
(1nd row - no PCH, 2nd row - small PCH, 3rd row - large PCH), the .json files
are at http://ge.tt/7RHeLHw2 if somebody wants to see them.

 Specifically, the time is spent in Sema::PerformPendingInstantiations() and
Sema::InstantiateFunctionDefinition(). The vast majority of the
instantiations comes from the PCH itself. This means that this is performed
for every TU using the PCH, and it also means that it's useless work, as the
linker will discard all but one copy of that.

 My WIP patch implements a new option to avoid that. The idea is that all
sources using the PCH will be built with -fpch-template-instantiation=skip,
which will prevent Sema::InstantiateFunctionDefinition() from actually
instantiating templates coming from the PCH if they would be uneeded
duplicates (note that means almost all PCH template instantiations in the
case of a developer build with -O0 -g, which is my primary use case). Then
one extra source file is built with -fpch-template-instantiation=force, which
will provide one copy of instantiations. I assume that this is similar to how
MSVC manages to have much better gains with PCH, the .obj created during PCH
creation presumably contains single instantiations.

 In the -ftime-trace graphs linked above, the 4th row is large PCH with my
patch. The compilation time saved by this is 50% and 60% for the two examples
(and I think moving some templates into the PCH might get it to 70-75% for
the second file).

 As I said, I have some problems that prevent the patch from being fully
usable, so in order to finish it, could somebody help me with the following:

- I don't understand how it is controlled which kind of ctor/dtor is emitted
(complete ctor vs base ctor, i.e. C1 vs C2 type in the Itanium ABI). I get
many undefined references because the TU built with instances does not have
both types, yet other TUs refer to them. How can I force both of them be
emitted?

- I have an undefined reference to one template function that should be
included in the TU with instances, but it isn't. The Sema part instantiates
it and I could track it as far as getting generated in Codegen, but then I'm
lost. I assume that it gets discarded because something in codegen or llvm
considers it unused. Is there a place like that and where is it? Are there
other places in codegen/llvm where I could check to see why this function
doesn't get generated in the object file?

- In Sema::InstantiateFunctionDefinition() the code for extern templates still
instantiates a function if it has getContainedAutoType(), so my code should
probably also check that. But I'm not sure what that is (is that 'auto foo()
{ return 1; }' ?) or why that would need an instance in every TU.

- I used BENIGN_ENUM_LANGOPT because Clang otherwise complains that the PCH is
used with a different option than what it was generated with, which is
necessary in this case, but I'm not sure if this is the correct handling of
the option.

- Is there a simple rule for what decides that a template needs to be
instantiated? As far as I can tell, even using a template as a class member
or having an inline member function manipulating it doesn't. When I mentioned
moving some templates into the PCH in order to get possible 70% savings, I
actually don't know how to cause an instantiation from the PCH, the templates
and their uses are included there.

 Thank you.

--
 Lubos Lunak
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Controlling instantiation of templates from PCH

Nathan Ridge via cfe-dev


On Sun, May 26, 2019 at 2:36 PM Richard Smith <[hidden email]> wrote:
On Sun, 26 May 2019 at 13:26, David Blaikie via cfe-dev <[hidden email]> wrote:
Thanks Richard - yeah sounds pretty similar though I'm a bit confused about what's happening in this case, in part because I know next to nothing about how Clang's PCH works (& especially how it differs from PCM/modules).

Richard: why would any module or PCH cause a subsequent compilation to perform more pending instantiations? (I would've thought/my understanding was that nothing in the module would be used if it wasn't referenced from the source file, so why would a pch cause more pending instantiations?)

Our design philosophy for modules and preamble precompilation is for a compilation using a precompiled header / preamble to behave identically to a compilation that parsed the header rather than using a precompiled form. So we don't perform end-of-translation-unit template instantiation at the end of a precompiled header, and instead perform the instantiation (and emit all the instantiated definitions and likewise all definitions of all used inline functions in the PCH) in all consumers of the PCH.

OK - thanks. That makes sense.

Though do you know if/how any of this could account for /more/ time spent with pending instantiations with a PCH than without? (assuming the same headers are included - and that's perhaps where the assumption is incorrect/flawed, perhaps in Lubos's case the PCH is being added in addition to the headers used in the non-PCH build, rather than instead of) - and this shouldn't ever result in more/different bits in the object file (assuming there's nothing with external linkage* in the PCH), right?

* no doubt more nuanced than that, but at least rough idea
 
 
Lubos: Could you provide a small standalone example of this increase in pending instantiations so it's a bit easier for me to understand the kind of code & what's happening?
You mentioned in the blog post that the use of a PCH causes more functions to be emitted into the final object file (than if a PCH had not been used, and the source remained the same). Especially the possibility of functions being emitted into the object file that are totally unused by the object file. (again, I'm especially interested in comparing the non-PCH with the PCH case here, rather than the Clang PCH with the VS PCH situation) - those are situations that would be very surprising to me.

On Sat, May 25, 2019 at 6:38 PM Richard Smith <[hidden email]> wrote:
This seems like a nice idea, and has a lot in common with our existing "modular codegen" mode, which does largely the same thing but for PCMs rather than PCHs. I'd hope we could share a lot of the implementation between the two features.

+David Blaikie, who implemented modular codegen and might be able to advise as to the best way to integrate similar functionality into our PCH support.

On Sat, 25 May 2019 at 12:32, Lubos Lunak via cfe-dev <[hidden email]> wrote:

 Hello,

 I'm working on a Clang patch that can make C++ builds noticeably faster in
some setups by allowing control over how templates are instantiated, but I
have some problems finishing it and need advice.

 Background: I am a LibreOffice developer. When enabling precompiled headers,
e.g. for LO Calc precompiled headers save ~2/3 of build time when MSVC is
used, but with Clang they save only ~10%. Moreover the larger the PCH the
more time is saved with MSVC, but this is not so with Clang, in fact larger
PCHs often make things slower.

 The recent -ftime-trace feature allowed me to investigate this and it turns
out that the time saved by having to parse less is outweighted by having to
instantiate (many) more templates. You can see -ftime-trace graphs at
http://llunak.blogspot.com/2019/05/why-precompiled-headers-do-not-improve.html
(1nd row - no PCH, 2nd row - small PCH, 3rd row - large PCH), the .json files
are at http://ge.tt/7RHeLHw2 if somebody wants to see them.

 Specifically, the time is spent in Sema::PerformPendingInstantiations() and
Sema::InstantiateFunctionDefinition(). The vast majority of the
instantiations comes from the PCH itself. This means that this is performed
for every TU using the PCH, and it also means that it's useless work, as the
linker will discard all but one copy of that.

 My WIP patch implements a new option to avoid that. The idea is that all
sources using the PCH will be built with -fpch-template-instantiation=skip,
which will prevent Sema::InstantiateFunctionDefinition() from actually
instantiating templates coming from the PCH if they would be uneeded
duplicates (note that means almost all PCH template instantiations in the
case of a developer build with -O0 -g, which is my primary use case). Then
one extra source file is built with -fpch-template-instantiation=force, which
will provide one copy of instantiations. I assume that this is similar to how
MSVC manages to have much better gains with PCH, the .obj created during PCH
creation presumably contains single instantiations.

 In the -ftime-trace graphs linked above, the 4th row is large PCH with my
patch. The compilation time saved by this is 50% and 60% for the two examples
(and I think moving some templates into the PCH might get it to 70-75% for
the second file).

 As I said, I have some problems that prevent the patch from being fully
usable, so in order to finish it, could somebody help me with the following:

- I don't understand how it is controlled which kind of ctor/dtor is emitted
(complete ctor vs base ctor, i.e. C1 vs C2 type in the Itanium ABI). I get
many undefined references because the TU built with instances does not have
both types, yet other TUs refer to them. How can I force both of them be
emitted?

- I have an undefined reference to one template function that should be
included in the TU with instances, but it isn't. The Sema part instantiates
it and I could track it as far as getting generated in Codegen, but then I'm
lost. I assume that it gets discarded because something in codegen or llvm
considers it unused. Is there a place like that and where is it? Are there
other places in codegen/llvm where I could check to see why this function
doesn't get generated in the object file?

- In Sema::InstantiateFunctionDefinition() the code for extern templates still
instantiates a function if it has getContainedAutoType(), so my code should
probably also check that. But I'm not sure what that is (is that 'auto foo()
{ return 1; }' ?) or why that would need an instance in every TU.

- I used BENIGN_ENUM_LANGOPT because Clang otherwise complains that the PCH is
used with a different option than what it was generated with, which is
necessary in this case, but I'm not sure if this is the correct handling of
the option.

- Is there a simple rule for what decides that a template needs to be
instantiated? As far as I can tell, even using a template as a class member
or having an inline member function manipulating it doesn't. When I mentioned
moving some templates into the PCH in order to get possible 70% savings, I
actually don't know how to cause an instantiation from the PCH, the templates
and their uses are included there.

 Thank you.

--
 Lubos Lunak
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Controlling instantiation of templates from PCH

Lubos Lunak-2

 Let me merge answers to various parts into one mail:

On Sunday 26 of May 2019, Richard Smith wrote:
> This seems like a nice idea, and has a lot in common with our existing
> "modular codegen" mode, which does largely the same thing but for PCMs
> rather than PCHs. I'd hope we could share a lot of the implementation
> between the two features.

 The core of my patch is basically an if statement in
Sema::InstantiateFunctionDefinition() that decides whether to bail out and
avoid doing something that'd eventually get thrown away anyway, so speaking
of sharing implementation is probably bit of a stretch. Unless you mean that
the problem should be rather handled by changing how PCHs internally work,
which I have no idea about.

On Monday 27 of May 2019, David Blaikie via cfe-dev wrote:
> Though do you know if/how any of this could account for /more/ time spent
> with pending instantiations with a PCH than without? (assuming the same
> headers are included - and that's perhaps where the assumption is
> incorrect/flawed, perhaps in Lubos's case the PCH is being added in
> addition to the headers used in the non-PCH build, rather than instead of)

 That assumption is indeed incorrect. E.g. for libsclo we have
precompiled_sc.hxx, which contains everything that makes sense to be in PCH
for the library. And it is used as -include-pch precompiled_sc.hxx.pch, but
only if PCH is enabled. That's the reasonable way to use it, except for MSVC
PCHs are not considered worth it and so are not enabled by default, and it
doesn't make sense to include more than necessary in the non-PCH case. So the
increase in time spent in PerformPendingInstantiations() is (AFAICT) caused
solely by the PCH bringing in more stuff. Presumably if we used
precompiled_sc.hxx unconditionally we'd always have this cost.

On Sunday 26 of May 2019, David Blaikie wrote:
> Lubos: Could you provide a small standalone example of this increase in
> pending instantiations so it's a bit easier for me to understand the kind
> of code & what's happening?

$ cat a.cpp
// empty source file
$ cat a.h
#include <vector>
struct F
    {
    std::vector< char > c;
    int size() { return c.size(); }
    };
$ ... make Clang print info in Sema::InstantiateFunctionDefinition() ...
$ clang++ -Wall -c a.cpp
[nothing]
$ clang++ -Wall -c a.cpp -include a.h
INST:std::_Hash_impl::hash
INST:std::_Hash_impl::hash
INST:std::vector<char, std::allocator<char> >::size

 You get the same with -include-pch a.h.pch . Also, I'm on Linux, so this is
libstdc++, but I assume it'll be similar with libc++.

 The important thing to note is that the resulting object file is the same in
both cases, the instantiated functions eventually get thrown away. But Clang
has to spend time processing that, and as you can see in the -ftime-trace
graphs that can get very costly.

--
 Lubos Lunak
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Controlling instantiation of templates from PCH

Nathan Ridge via cfe-dev


On Mon, May 27, 2019 at 6:29 AM Lubos Lunak <[hidden email]> wrote:

 Let me merge answers to various parts into one mail:

On Sunday 26 of May 2019, Richard Smith wrote:
> This seems like a nice idea, and has a lot in common with our existing
> "modular codegen" mode, which does largely the same thing but for PCMs
> rather than PCHs. I'd hope we could share a lot of the implementation
> between the two features.

 The core of my patch is basically an if statement in
Sema::InstantiateFunctionDefinition() that decides whether to bail out and
avoid doing something that'd eventually get thrown away anyway, so speaking
of sharing implementation is probably bit of a stretch. Unless you mean that
the problem should be rather handled by changing how PCHs internally work,
which I have no idea about.

On Monday 27 of May 2019, David Blaikie via cfe-dev wrote:
> Though do you know if/how any of this could account for /more/ time spent
> with pending instantiations with a PCH than without? (assuming the same
> headers are included - and that's perhaps where the assumption is
> incorrect/flawed, perhaps in Lubos's case the PCH is being added in
> addition to the headers used in the non-PCH build, rather than instead of)

 That assumption is indeed incorrect. E.g. for libsclo we have
precompiled_sc.hxx, which contains everything that makes sense to be in PCH
for the library. And it is used as -include-pch precompiled_sc.hxx.pch, but
only if PCH is enabled. That's the reasonable way to use it, except for MSVC
PCHs are not considered worth it and so are not enabled by default, and it
doesn't make sense to include more than necessary in the non-PCH case. So the
increase in time spent in PerformPendingInstantiations() is (AFAICT) caused
solely by the PCH bringing in more stuff. Presumably if we used
precompiled_sc.hxx unconditionally we'd always have this cost.

Ah, OK. Good to understand.
 

On Sunday 26 of May 2019, David Blaikie wrote:
> Lubos: Could you provide a small standalone example of this increase in
> pending instantiations so it's a bit easier for me to understand the kind
> of code & what's happening?

$ cat a.cpp
// empty source file
$ cat a.h
#include <vector>
struct F
    {
    std::vector< char > c;
    int size() { return c.size(); }
    };
$ ... make Clang print info in Sema::InstantiateFunctionDefinition() ...
$ clang++ -Wall -c a.cpp
[nothing]
$ clang++ -Wall -c a.cpp -include a.h
INST:std::_Hash_impl::hash
INST:std::_Hash_impl::hash
INST:std::vector<char, std::allocator<char> >::size

 You get the same with -include-pch a.h.pch . Also, I'm on Linux, so this is
libstdc++, but I assume it'll be similar with libc++.

 The important thing to note is that the resulting object file is the same in
both cases, the instantiated functions eventually get thrown away. But Clang
has to spend time processing that, and as you can see in the -ftime-trace
graphs that can get very costly.

OK. 

So I'm not sure I understand this comment:

"And, if you look carefully, 4 seconds more to generate code, most of it for those templates. And after the compiler spends all this time on templates in all the source files, it gets all passed to the linker, which will shrug and then throw most of it away (and that will too take a load of time, if you still happen to use the BFD linker instead of gold/lld with -gsplit-dwarf -Wl,--gdb-index). What a marvel."

What extra code generation occurred with the PCH? Any change in generated code with a PCH would surprise me.

- Dave
 

--
 Lubos Lunak

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Controlling instantiation of templates from PCH

Nathan Ridge via cfe-dev
In reply to this post by Nathan Ridge via cfe-dev


On Sun, May 26, 2019 at 2:36 PM Richard Smith <[hidden email]> wrote:
On Sun, 26 May 2019 at 13:26, David Blaikie via cfe-dev <[hidden email]> wrote:
Thanks Richard - yeah sounds pretty similar though I'm a bit confused about what's happening in this case, in part because I know next to nothing about how Clang's PCH works (& especially how it differs from PCM/modules).

Richard: why would any module or PCH cause a subsequent compilation to perform more pending instantiations? (I would've thought/my understanding was that nothing in the module would be used if it wasn't referenced from the source file, so why would a pch cause more pending instantiations?)

Our design philosophy for modules and preamble precompilation is for a compilation using a precompiled header / preamble to behave identically to a compilation that parsed the header rather than using a precompiled form. So we don't perform end-of-translation-unit template instantiation at the end of a precompiled header, and instead perform the instantiation (and emit all the instantiated definitions and likewise all definitions of all used inline functions in the PCH) in all consumers of the PCH.

What would happen if we didn't perform pending instantiations that came from (were already pending) from the module? We'd miss some error messages in the code that uses the module. But if we instantiated the templates during the module building - would that be OK? If we did them only in the modular code generation?

- Dave
 
 
Lubos: Could you provide a small standalone example of this increase in pending instantiations so it's a bit easier for me to understand the kind of code & what's happening?
You mentioned in the blog post that the use of a PCH causes more functions to be emitted into the final object file (than if a PCH had not been used, and the source remained the same). Especially the possibility of functions being emitted into the object file that are totally unused by the object file. (again, I'm especially interested in comparing the non-PCH with the PCH case here, rather than the Clang PCH with the VS PCH situation) - those are situations that would be very surprising to me.

On Sat, May 25, 2019 at 6:38 PM Richard Smith <[hidden email]> wrote:
This seems like a nice idea, and has a lot in common with our existing "modular codegen" mode, which does largely the same thing but for PCMs rather than PCHs. I'd hope we could share a lot of the implementation between the two features.

+David Blaikie, who implemented modular codegen and might be able to advise as to the best way to integrate similar functionality into our PCH support.

On Sat, 25 May 2019 at 12:32, Lubos Lunak via cfe-dev <[hidden email]> wrote:

 Hello,

 I'm working on a Clang patch that can make C++ builds noticeably faster in
some setups by allowing control over how templates are instantiated, but I
have some problems finishing it and need advice.

 Background: I am a LibreOffice developer. When enabling precompiled headers,
e.g. for LO Calc precompiled headers save ~2/3 of build time when MSVC is
used, but with Clang they save only ~10%. Moreover the larger the PCH the
more time is saved with MSVC, but this is not so with Clang, in fact larger
PCHs often make things slower.

 The recent -ftime-trace feature allowed me to investigate this and it turns
out that the time saved by having to parse less is outweighted by having to
instantiate (many) more templates. You can see -ftime-trace graphs at
http://llunak.blogspot.com/2019/05/why-precompiled-headers-do-not-improve.html
(1nd row - no PCH, 2nd row - small PCH, 3rd row - large PCH), the .json files
are at http://ge.tt/7RHeLHw2 if somebody wants to see them.

 Specifically, the time is spent in Sema::PerformPendingInstantiations() and
Sema::InstantiateFunctionDefinition(). The vast majority of the
instantiations comes from the PCH itself. This means that this is performed
for every TU using the PCH, and it also means that it's useless work, as the
linker will discard all but one copy of that.

 My WIP patch implements a new option to avoid that. The idea is that all
sources using the PCH will be built with -fpch-template-instantiation=skip,
which will prevent Sema::InstantiateFunctionDefinition() from actually
instantiating templates coming from the PCH if they would be uneeded
duplicates (note that means almost all PCH template instantiations in the
case of a developer build with -O0 -g, which is my primary use case). Then
one extra source file is built with -fpch-template-instantiation=force, which
will provide one copy of instantiations. I assume that this is similar to how
MSVC manages to have much better gains with PCH, the .obj created during PCH
creation presumably contains single instantiations.

 In the -ftime-trace graphs linked above, the 4th row is large PCH with my
patch. The compilation time saved by this is 50% and 60% for the two examples
(and I think moving some templates into the PCH might get it to 70-75% for
the second file).

 As I said, I have some problems that prevent the patch from being fully
usable, so in order to finish it, could somebody help me with the following:

- I don't understand how it is controlled which kind of ctor/dtor is emitted
(complete ctor vs base ctor, i.e. C1 vs C2 type in the Itanium ABI). I get
many undefined references because the TU built with instances does not have
both types, yet other TUs refer to them. How can I force both of them be
emitted?

- I have an undefined reference to one template function that should be
included in the TU with instances, but it isn't. The Sema part instantiates
it and I could track it as far as getting generated in Codegen, but then I'm
lost. I assume that it gets discarded because something in codegen or llvm
considers it unused. Is there a place like that and where is it? Are there
other places in codegen/llvm where I could check to see why this function
doesn't get generated in the object file?

- In Sema::InstantiateFunctionDefinition() the code for extern templates still
instantiates a function if it has getContainedAutoType(), so my code should
probably also check that. But I'm not sure what that is (is that 'auto foo()
{ return 1; }' ?) or why that would need an instance in every TU.

- I used BENIGN_ENUM_LANGOPT because Clang otherwise complains that the PCH is
used with a different option than what it was generated with, which is
necessary in this case, but I'm not sure if this is the correct handling of
the option.

- Is there a simple rule for what decides that a template needs to be
instantiated? As far as I can tell, even using a template as a class member
or having an inline member function manipulating it doesn't. When I mentioned
moving some templates into the PCH in order to get possible 70% savings, I
actually don't know how to cause an instantiation from the PCH, the templates
and their uses are included there.

 Thank you.

--
 Lubos Lunak
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Controlling instantiation of templates from PCH

Lubos Lunak-2
In reply to this post by Nathan Ridge via cfe-dev
On Tuesday 28 of May 2019, David Blaikie wrote:

> So I'm not sure I understand this comment:
>
> "And, if you look carefully, 4 seconds more to generate code, most of it
> for those templates. And after the compiler spends all this time on
> templates in all the source files, it gets all passed to the linker, which
> will shrug and then throw most of it away (and that will too take a load of
> time, if you still happen to use the BFD linker instead of gold/lld
> <https://lists.freedesktop.org/archives/libreoffice/2018-July/080484.html>
>  with -gsplit-dwarf -Wl,--gdb-index
> <https://lists.freedesktop.org/archives/libreoffice/2018-June/080437.html>)
>. What a marvel."
>
> What extra code generation occurred with the PCH? Any change in generated
> code with a PCH would surprise me.


 If I understand it correctly, the small testcase from me means that adding a
PCH generally does not change the resulting object file, only make Clang
spend more time processing something it throws away as unused somewhen in the
later stages of creating the object file, so there's no extra code generation
caused by the PCH. So, to make it more clean what I meant there, it's more
like saying that there's a missed opportunity:

- Let's say that I have a library built from a.cpp and b.cpp, and both those
sources use std::vector< int >. As in, they really use it, so both a.o and
b.o end up with weak copies of std::vector< int > code.
- That seems to be basically inevitable with the normal non-PCH code, as the
Clang instance compiling a.cpp cannot know that std::vector< int > code will
be also present in b.o, and so both compiling a.cpp and b.cpp results in
generating std::vector< int >, even though we can clearly see it's
unnecessary.
- I say it's basically inevitable in the non-PCH case, because I don't know a
reasonable way to avoid that in practice. There is extern template, which
would work in this minimal testcase, but for a real-world large codebase I
find that impractical, tedious and what not (please correct if I'm wrong and
there is a reasonable way, but beware that I've already tried that and
decided that writing a compiler patch was an easier way of going about it).
- However, in the PCH case, both Clang instances do know that they share all
the template instantiations from the PCH. And that's where my patch steps in
and -fpch-template-instantiation=force tell one instance "take care of it
all" and -fpch-template-instantiation=skip tells all the other
instance "don't bother with those, somebody else will take care of that". So
all but one Clang instances can skip all those numerous
Sema::InstantiateFunctionDefinition() and also code generation for all of
those instances that actually are used in that TU.
- To put it differently, you can also view -fpch-template-instantiation=skip
as automatic extern template for whatever is used by the PCH,
and -fpch-template-instantiation=force as explicit instantiation for it,
where all the hassle of extern template is replaced by just putting all the
template stuff in the PCH. (To be precise, it's not exactly like explicit
instantiation, because it involves only what is instantiated by the PCH, but
if wanted that can be handled by actually explicitly instantiating in the
PCH, without having to bother with the extern template stuff).

--
 Lubos Lunak
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Controlling instantiation of templates from PCH

Nathan Ridge via cfe-dev
In reply to this post by Lubos Lunak-2
That's a cool observation!

A question independent to the other discussions happening on this thread: Since you're comparing build times between MSVC and clang, do you use clang-cl in Windows builds? The clang-cl / cl.exe PCH flags (/Yc, /Yu) would allow implementing your suggested optimization without a need for any new driver flags.

This seems similar to doing http://blog.llvm.org/2018/11/30-faster-windows-builds-with-clang-cl_14.html for all inlines, not just for dllexported ones.

On Sat, May 25, 2019 at 3:32 PM Lubos Lunak via cfe-dev <[hidden email]> wrote:

 Hello,

 I'm working on a Clang patch that can make C++ builds noticeably faster in
some setups by allowing control over how templates are instantiated, but I
have some problems finishing it and need advice.

 Background: I am a LibreOffice developer. When enabling precompiled headers,
e.g. for LO Calc precompiled headers save ~2/3 of build time when MSVC is
used, but with Clang they save only ~10%. Moreover the larger the PCH the
more time is saved with MSVC, but this is not so with Clang, in fact larger
PCHs often make things slower.

 The recent -ftime-trace feature allowed me to investigate this and it turns
out that the time saved by having to parse less is outweighted by having to
instantiate (many) more templates. You can see -ftime-trace graphs at
http://llunak.blogspot.com/2019/05/why-precompiled-headers-do-not-improve.html
(1nd row - no PCH, 2nd row - small PCH, 3rd row - large PCH), the .json files
are at http://ge.tt/7RHeLHw2 if somebody wants to see them.

 Specifically, the time is spent in Sema::PerformPendingInstantiations() and
Sema::InstantiateFunctionDefinition(). The vast majority of the
instantiations comes from the PCH itself. This means that this is performed
for every TU using the PCH, and it also means that it's useless work, as the
linker will discard all but one copy of that.

 My WIP patch implements a new option to avoid that. The idea is that all
sources using the PCH will be built with -fpch-template-instantiation=skip,
which will prevent Sema::InstantiateFunctionDefinition() from actually
instantiating templates coming from the PCH if they would be uneeded
duplicates (note that means almost all PCH template instantiations in the
case of a developer build with -O0 -g, which is my primary use case). Then
one extra source file is built with -fpch-template-instantiation=force, which
will provide one copy of instantiations. I assume that this is similar to how
MSVC manages to have much better gains with PCH, the .obj created during PCH
creation presumably contains single instantiations.

 In the -ftime-trace graphs linked above, the 4th row is large PCH with my
patch. The compilation time saved by this is 50% and 60% for the two examples
(and I think moving some templates into the PCH might get it to 70-75% for
the second file).

 As I said, I have some problems that prevent the patch from being fully
usable, so in order to finish it, could somebody help me with the following:

- I don't understand how it is controlled which kind of ctor/dtor is emitted
(complete ctor vs base ctor, i.e. C1 vs C2 type in the Itanium ABI). I get
many undefined references because the TU built with instances does not have
both types, yet other TUs refer to them. How can I force both of them be
emitted?

- I have an undefined reference to one template function that should be
included in the TU with instances, but it isn't. The Sema part instantiates
it and I could track it as far as getting generated in Codegen, but then I'm
lost. I assume that it gets discarded because something in codegen or llvm
considers it unused. Is there a place like that and where is it? Are there
other places in codegen/llvm where I could check to see why this function
doesn't get generated in the object file?

- In Sema::InstantiateFunctionDefinition() the code for extern templates still
instantiates a function if it has getContainedAutoType(), so my code should
probably also check that. But I'm not sure what that is (is that 'auto foo()
{ return 1; }' ?) or why that would need an instance in every TU.

- I used BENIGN_ENUM_LANGOPT because Clang otherwise complains that the PCH is
used with a different option than what it was generated with, which is
necessary in this case, but I'm not sure if this is the correct handling of
the option.

- Is there a simple rule for what decides that a template needs to be
instantiated? As far as I can tell, even using a template as a class member
or having an inline member function manipulating it doesn't. When I mentioned
moving some templates into the PCH in order to get possible 70% savings, I
actually don't know how to cause an instantiation from the PCH, the templates
and their uses are included there.

 Thank you.

--
 Lubos Lunak
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Controlling instantiation of templates from PCH

Lubos Lunak-2
On Tuesday 28 of May 2019, Nico Weber wrote:
> That's a cool observation!
>
> A question independent to the other discussions happening on this thread:
> Since you're comparing build times between MSVC and clang, do you use
> clang-cl in Windows builds?

 We do have support for clang-cl AFAIK, but I don't know how much it's used.
The release binaries are built with MSVC and I think most developers are on
Unix-likes anyway. I've never used clang-cl myself.

> The clang-cl / cl.exe PCH flags (/Yc, /Yu)
> would allow implementing your suggested optimization without a need for any
> new driver flags.

 /Yc /Yu could act that way without extra flags, but I think they don't. I've
just skimmed over the sources, so I may be mistaken, but it seems to me
the /Yc mode is not different there. Unless /Yc somehow already instantiates
everything in the PCH and avoids such instantiations in TUs using the PCH,
there is still going to be the cost of Sema::PerformPendingInstantiations()
doing something that's not needed. Remember that this is actually about
improving the build time, not necessarily the build result.

> This seems similar to doing
> http://blog.llvm.org/2018/11/30-faster-windows-builds-with-clang-cl_14.html
> for all inlines, not just for dllexported ones.

 I think that's different. That one is like -fvisibility-inlines-hidden, which
only causes inlines not to be exported. But they will still be processed.


> On Sat, May 25, 2019 at 3:32 PM Lubos Lunak via cfe-dev <
>
> [hidden email]> wrote:
> >  Hello,
> >
> >  I'm working on a Clang patch that can make C++ builds noticeably faster
> > in
> > some setups by allowing control over how templates are instantiated, but
> > I have some problems finishing it and need advice.
> >
> >  Background: I am a LibreOffice developer. When enabling precompiled
> > headers,
> > e.g. for LO Calc precompiled headers save ~2/3 of build time when MSVC is
> > used, but with Clang they save only ~10%. Moreover the larger the PCH the
> > more time is saved with MSVC, but this is not so with Clang, in fact
> > larger
> > PCHs often make things slower.
> >
> >  The recent -ftime-trace feature allowed me to investigate this and it
> > turns
> > out that the time saved by having to parse less is outweighted by having
> > to
> > instantiate (many) more templates. You can see -ftime-trace graphs at
> >
> > http://llunak.blogspot.com/2019/05/why-precompiled-headers-do-not-improve
> >.html (1nd row - no PCH, 2nd row - small PCH, 3rd row - large PCH), the
> > .json files
> > are at http://ge.tt/7RHeLHw2 if somebody wants to see them.
> >
> >  Specifically, the time is spent in Sema::PerformPendingInstantiations()
> > and
> > Sema::InstantiateFunctionDefinition(). The vast majority of the
> > instantiations comes from the PCH itself. This means that this is
> > performed
> > for every TU using the PCH, and it also means that it's useless work, as
> > the
> > linker will discard all but one copy of that.
> >
> >  My WIP patch implements a new option to avoid that. The idea is that all
> > sources using the PCH will be built with
> > -fpch-template-instantiation=skip,
> > which will prevent Sema::InstantiateFunctionDefinition() from actually
> > instantiating templates coming from the PCH if they would be uneeded
> > duplicates (note that means almost all PCH template instantiations in the
> > case of a developer build with -O0 -g, which is my primary use case).
> > Then one extra source file is built with
> > -fpch-template-instantiation=force, which
> > will provide one copy of instantiations. I assume that this is similar to
> > how
> > MSVC manages to have much better gains with PCH, the .obj created during
> > PCH
> > creation presumably contains single instantiations.
> >
> >  In the -ftime-trace graphs linked above, the 4th row is large PCH with
> > my patch. The compilation time saved by this is 50% and 60% for the two
> > examples
> > (and I think moving some templates into the PCH might get it to 70-75%
> > for the second file).
> >
> >  As I said, I have some problems that prevent the patch from being fully
> > usable, so in order to finish it, could somebody help me with the
> > following:
> >
> > - I don't understand how it is controlled which kind of ctor/dtor is
> > emitted
> > (complete ctor vs base ctor, i.e. C1 vs C2 type in the Itanium ABI). I
> > get many undefined references because the TU built with instances does
> > not have
> > both types, yet other TUs refer to them. How can I force both of them be
> > emitted?
> >
> > - I have an undefined reference to one template function that should be
> > included in the TU with instances, but it isn't. The Sema part
> > instantiates
> > it and I could track it as far as getting generated in Codegen, but then
> > I'm
> > lost. I assume that it gets discarded because something in codegen or
> > llvm considers it unused. Is there a place like that and where is it? Are
> > there other places in codegen/llvm where I could check to see why this
> > function doesn't get generated in the object file?
> >
> > - In Sema::InstantiateFunctionDefinition() the code for extern templates
> > still
> > instantiates a function if it has getContainedAutoType(), so my code
> > should
> > probably also check that. But I'm not sure what that is (is that 'auto
> > foo()
> > { return 1; }' ?) or why that would need an instance in every TU.
> >
> > - I used BENIGN_ENUM_LANGOPT because Clang otherwise complains that the
> > PCH is
> > used with a different option than what it was generated with, which is
> > necessary in this case, but I'm not sure if this is the correct handling
> > of
> > the option.
> >
> > - Is there a simple rule for what decides that a template needs to be
> > instantiated? As far as I can tell, even using a template as a class
> > member
> > or having an inline member function manipulating it doesn't. When I
> > mentioned
> > moving some templates into the PCH in order to get possible 70% savings,
> > I actually don't know how to cause an instantiation from the PCH, the
> > templates
> > and their uses are included there.
> >
> >  Thank you.
> >
> > --
> >  Lubos Lunak
> > _______________________________________________
> > cfe-dev mailing list
> > [hidden email]
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



--
 Lubos Lunak
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Controlling instantiation of templates from PCH

Nathan Ridge via cfe-dev
On 28/05/2019 21:38, Lubos Lunak via cfe-dev wrote:
> On Tuesday 28 of May 2019, Nico Weber wrote:
>> A question independent to the other discussions happening on this thread:
>> Since you're comparing build times between MSVC and clang, do you use
>> clang-cl in Windows builds?
>
>   We do have support for clang-cl AFAIK, but I don't know how much it's used.
> The release binaries are built with MSVC and I think most developers are on
> Unix-likes anyway. I've never used clang-cl myself.

(I use clang-cl, but only to run LO's Clang plugin also on Windows, to
catch issues in Windows-specific LO code.  My build explicitly disables
use of PCH, for one because clang-cl didn't support it yet back when I
set this up, for another because my fear would be that PCH negatively
affects the quality of the plugin diagnostics by increasing the set of
included files, and for yet another because build times are not that
relevant for my occasional clang-cl builds, anyway.)
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Controlling instantiation of templates from PCH

Nathan Ridge via cfe-dev
In reply to this post by Lubos Lunak-2
On Tue, May 28, 2019 at 9:38 PM Lubos Lunak via cfe-dev
<[hidden email]> wrote:

>
> On Tuesday 28 of May 2019, Nico Weber wrote:
> > That's a cool observation!
> >
> > A question independent to the other discussions happening on this thread:
> > Since you're comparing build times between MSVC and clang, do you use
> > clang-cl in Windows builds?
>
>  We do have support for clang-cl AFAIK, but I don't know how much it's used.
> The release binaries are built with MSVC and I think most developers are on
> Unix-likes anyway. I've never used clang-cl myself.
>
> > The clang-cl / cl.exe PCH flags (/Yc, /Yu)
> > would allow implementing your suggested optimization without a need for any
> > new driver flags.
>
>  /Yc /Yu could act that way without extra flags, but I think they don't. I've
> just skimmed over the sources, so I may be mistaken, but it seems to me
> the /Yc mode is not different there. Unless /Yc somehow already instantiates
> everything in the PCH and avoids such instantiations in TUs using the PCH,
> there is still going to be the cost of Sema::PerformPendingInstantiations()
> doing something that's not needed. Remember that this is actually about
> improving the build time, not necessarily the build result.
>
> > This seems similar to doing
> > http://blog.llvm.org/2018/11/30-faster-windows-builds-with-clang-cl_14.html
> > for all inlines, not just for dllexported ones.
>
>  I think that's different. That one is like -fvisibility-inlines-hidden, which
> only causes inlines not to be exported. But they will still be processed.

I think this is more in the vein of
http://llvm.org/viewvc/llvm-project?view=revision&revision=335466

We could know that the definitions of the instantiated templates are
in the PCH .obj file, and could hopefully skip it in other users of
the PCH. And that's probably how MSVC does it.
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Controlling instantiation of templates from PCH

Lubos Lunak-2
On Wednesday 29 of May 2019, Hans Wennborg wrote:
> On Tue, May 28, 2019 at 9:38 PM Lubos Lunak via cfe-dev
> >  /Yc /Yu could act that way without extra flags, but I think they don't.
> > I've just skimmed over the sources, so I may be mistaken, but it seems to
> > me the /Yc mode is not different there. Unless /Yc somehow already
> > instantiates everything in the PCH and avoids such instantiations in TUs
> > using the PCH, there is still going to be the cost of
> > Sema::PerformPendingInstantiations() doing something that's not needed.
> > Remember that this is actually about improving the build time, not
> > necessarily the build result.
...
> I think this is more in the vein of
> http://llvm.org/viewvc/llvm-project?view=revision&revision=335466
>
> We could know that the definitions of the instantiated templates are
> in the PCH .obj file, and could hopefully skip it in other users of
> the PCH. And that's probably how MSVC does it.


 That commit is based on the same idea as mine, but it's not exactly the same.
The point is that ASTContext::DeclMustBeEmitted() comes only after
Sema::InstantiateFunctionDefinition(). So your commit should result in
smaller object files and save the work of generating code for instantiations
from PCH in every TU, but I do not see anything that'd affect
Sema::InstantiateFunctionDefinition(), which is where the main cost of build
time is in my case. Moreover your patch always emits decls if they're
referenced, so it passes on the possibility to skip emitting those that are
already in the PCH's object file. I think your patch as it is in practice
makes a difference only for (dll)export-ed decls and nothing else, am I
getting that right?

 But seeing that commit it makes sense to me to use the same base mechanism
for deciding if these optimizations can be done. Am I getting it right
that -building-pch-with-obj is really just a flag that gets written to
the .pch file? Then I think I can map my intended use to that easily
(use -building-pch-with-obj when creating the .pch just to add the flag, use
building-pch-with-obj -include-pch for empty source to create the
accompanying .o and then remaining files use the .pch normally).

--
 Lubos Lunak
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Controlling instantiation of templates from PCH

Nathan Ridge via cfe-dev
On Wed, May 29, 2019 at 1:18 PM Lubos Lunak <[hidden email]> wrote:

>
> On Wednesday 29 of May 2019, Hans Wennborg wrote:
> > On Tue, May 28, 2019 at 9:38 PM Lubos Lunak via cfe-dev
> > >  /Yc /Yu could act that way without extra flags, but I think they don't.
> > > I've just skimmed over the sources, so I may be mistaken, but it seems to
> > > me the /Yc mode is not different there. Unless /Yc somehow already
> > > instantiates everything in the PCH and avoids such instantiations in TUs
> > > using the PCH, there is still going to be the cost of
> > > Sema::PerformPendingInstantiations() doing something that's not needed.
> > > Remember that this is actually about improving the build time, not
> > > necessarily the build result.
> ...
> > I think this is more in the vein of
> > http://llvm.org/viewvc/llvm-project?view=revision&revision=335466
> >
> > We could know that the definitions of the instantiated templates are
> > in the PCH .obj file, and could hopefully skip it in other users of
> > the PCH. And that's probably how MSVC does it.
>
>
>  That commit is based on the same idea as mine, but it's not exactly the same.
> The point is that ASTContext::DeclMustBeEmitted() comes only after
> Sema::InstantiateFunctionDefinition(). So your commit should result in
> smaller object files and save the work of generating code for instantiations
> from PCH in every TU, but I do not see anything that'd affect
> Sema::InstantiateFunctionDefinition(), which is where the main cost of build
> time is in my case. Moreover your patch always emits decls if they're
> referenced, so it passes on the possibility to skip emitting those that are
> already in the PCH's object file. I think your patch as it is in practice
> makes a difference only for (dll)export-ed decls and nothing else, am I
> getting that right?

Yes exactly. I'm not saying it's the same, only that's it's a similar
direction: emitting some things in the PCH .obj file only, saving some
work in the other translation units.

>  But seeing that commit it makes sense to me to use the same base mechanism
> for deciding if these optimizations can be done. Am I getting it right
> that -building-pch-with-obj is really just a flag that gets written to
> the .pch file?

Yes.

> Then I think I can map my intended use to that easily
> (use -building-pch-with-obj when creating the .pch just to add the flag, use
> building-pch-with-obj -include-pch for empty source to create the
> accompanying .o and then remaining files use the .pch normally).

Yup, that's what's happening behind the scenes with clang-cl's /Yc and /Yu.
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Controlling instantiation of templates from PCH

Nathan Ridge via cfe-dev
In reply to this post by Lubos Lunak-2


On Tue, May 28, 2019 at 2:31 AM Lubos Lunak <[hidden email]> wrote:
On Tuesday 28 of May 2019, David Blaikie wrote:
> So I'm not sure I understand this comment:
>
> "And, if you look carefully, 4 seconds more to generate code, most of it
> for those templates. And after the compiler spends all this time on
> templates in all the source files, it gets all passed to the linker, which
> will shrug and then throw most of it away (and that will too take a load of
> time, if you still happen to use the BFD linker instead of gold/lld
> <https://lists.freedesktop.org/archives/libreoffice/2018-July/080484.html>
>  with -gsplit-dwarf -Wl,--gdb-index
> <https://lists.freedesktop.org/archives/libreoffice/2018-June/080437.html>)
>. What a marvel."
>
> What extra code generation occurred with the PCH? Any change in generated
> code with a PCH would surprise me.


 If I understand it correctly, the small testcase from me means that adding a
PCH generally does not change the resulting object file, only make Clang
spend more time processing something it throws away as unused somewhen in the
later stages of creating the object file, so there's no extra code generation
caused by the PCH. So, to make it more clean what I meant there, it's more
like saying that there's a missed opportunity:

- Let's say that I have a library built from a.cpp and b.cpp, and both those
sources use std::vector< int >. As in, they really use it, so both a.o and
b.o end up with weak copies of std::vector< int > code.
- That seems to be basically inevitable with the normal non-PCH code, as the
Clang instance compiling a.cpp cannot know that std::vector< int > code will
be also present in b.o, and so both compiling a.cpp and b.cpp results in
generating std::vector< int >, even though we can clearly see it's
unnecessary.
- I say it's basically inevitable in the non-PCH case, because I don't know a
reasonable way to avoid that in practice. There is extern template, which
would work in this minimal testcase, but for a real-world large codebase I
find that impractical, tedious and what not (please correct if I'm wrong and
there is a reasonable way, but beware that I've already tried that and
decided that writing a compiler patch was an easier way of going about it).
- However, in the PCH case, both Clang instances do know that they share all
the template instantiations from the PCH. And that's where my patch steps in
and -fpch-template-instantiation=force tell one instance "take care of it
all" and -fpch-template-instantiation=skip tells all the other
instance "don't bother with those, somebody else will take care of that". So
all but one Clang instances can skip all those numerous
Sema::InstantiateFunctionDefinition() and also code generation for all of
those instances that actually are used in that TU.
- To put it differently, you can also view -fpch-template-instantiation=skip
as automatic extern template for whatever is used by the PCH,
and -fpch-template-instantiation=force as explicit instantiation for it,
where all the hassle of extern template is replaced by just putting all the
template stuff in the PCH. (To be precise, it's not exactly like explicit
instantiation, because it involves only what is instantiated by the PCH, but
if wanted that can be handled by actually explicitly instantiating in the
PCH, without having to bother with the extern template stuff).

Ah, OK. Was this an indirect benefit of the feature/patch you created - or did you specifically code for that in addition to the moving the pending instantiations out to the separate PCH processing stage, rather than in every compilation that uses the PCH - or did it come out as a happy coincidence?

In any case, talking to Richard Smith about all this, here's some things:

* Clang header modules don't have the pending instantiation performance problem described here - because they handle the pending instantiations at the end of building the module, rather than in every consumer.
* It's possible moving PCH to the modules semantics might be valid in general, or good enough to put behind a flag. (doing this in general would of course be easier, code-wise - fewer supported code paths, etc)
* Moving the pending instantiation processing to the end of the PCH would make PCH generation a little slower, but given a project would only have one PCH that might not be a huge problem.
* In addition to that, we could support -fmodules-codegen/debuginfo - which would implement the "building an object from the PCH" step you've described using existing infrastructure in Clang (& that would then include other non-template inline functions, so it'd be a bit broader)
* Then we could potentially do something more like what you're proposing here - if modules-codegen is used, defer pending instantiations from the initial module/PCH creation step, to the module/PCH-to-object step, to speed up module/PCH generation & unblock the downstream compilations that use it

- Dave


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Controlling instantiation of templates from PCH

Lubos Lunak-2
On Wednesday 29 of May 2019, David Blaikie wrote:

> On Tue, May 28, 2019 at 2:31 AM Lubos Lunak <[hidden email]> wrote:
> > - To put it differently, you can also view
> > -fpch-template-instantiation=skip
> > as automatic extern template for whatever is used by the PCH,
> > and -fpch-template-instantiation=force as explicit instantiation for it,
> > where all the hassle of extern template is replaced by just putting all
> > the
> > template stuff in the PCH. (To be precise, it's not exactly like explicit
> > instantiation, because it involves only what is instantiated by the PCH,
> > but
> > if wanted that can be handled by actually explicitly instantiating in the
> > PCH, without having to bother with the extern template stuff).
>
> Ah, OK. Was this an indirect benefit of the feature/patch you created - or
> did you specifically code for that in addition to the moving the pending
> instantiations out to the separate PCH processing stage, rather than in
> every compilation that uses the PCH - or did it come out as a happy
> coincidence?


 I'm not sure what you're referring to exactly with 'this' and 'that'. My
patch does not move pending instantiations anywhere, it just avoids them in
all but one compilation using the PCH. So whatever template you instantiate
in the PCH, all those compilations don't have to spend any time on it.


> In any case, talking to Richard Smith about all this, here's some things:
>
> * Clang header modules don't have the pending instantiation performance
> problem described here - because they handle the pending instantiations at
> the end of building the module, rather than in every consumer.
> * It's possible moving PCH to the modules semantics might be valid in
> general, or good enough to put behind a flag. (doing this in general would
> of course be easier, code-wise - fewer supported code paths, etc)

 Two questions come to mind here:
- is it reasonably ready for use?
- how much work would it be to use it?

 I tried Clang modules after they were mentioned in the first reply, and I got
the impression that they need preparation for every header file used by the
project, even external ones. Unless that can be automated, I don't quite see
that happening for something the size and complexity of LibreOffice (we don't
manually create our headers-to-become-PCHs either). And an unofficial
tongue-in-cheek motton of LibreOffice is "proudly breaking your toolchain
since 1985", so unless modules are reasonably usable, we'll run into all the
bugs there and nobody will want to use it.

 If doing this is good technically, long-term, fine, do it. But I'd like to
have something that works this summer, and my rather simple patch can deliver
that.

> * Moving the pending instantiation processing to the end of the PCH would
> make PCH generation a little slower, but given a project would only have
> one PCH that might not be a huge problem.

 I think that would be very well worth it.

> * In addition to that, we could support -fmodules-codegen/debuginfo - which
> would implement the "building an object from the PCH" step you've described
> using existing infrastructure in Clang (& that would then include other
> non-template inline functions, so it'd be a bit broader)
> * Then we could potentially do something more like what you're proposing
> here - if modules-codegen is used, defer pending instantiations from the
> initial module/PCH creation step, to the module/PCH-to-object step, to
> speed up module/PCH generation & unblock the downstream compilations that
> use it

 That could work for me too. And I'd be probably willing to help, if my Clang
skill would be up to that. How much time/work would this be?

--
 Lubos Lunak
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Controlling instantiation of templates from PCH

Lubos Lunak-2
On Thursday 30 of May 2019, Lubos Lunak via cfe-dev wrote:
> On Wednesday 29 of May 2019, David Blaikie wrote:
> > In any case, talking to Richard Smith about all this, here's some things:
> >
> > * Clang header modules don't have the pending instantiation performance
> > problem described here - because they handle the pending instantiations
> > at the end of building the module, rather than in every consumer.
> > * It's possible moving PCH to the modules semantics


 Oh, wait, "semantics". I originally understood this as a kind of "do not use
PCHs, use modules", but I might have misunderstood. If this actually means
that PCHs should internally handle templates etc. the same way modules do,
but to the outside world they would still keep looking like normal PCHs, then
yeah, sure, as long as it works.


> > might be valid in
> > general, or good enough to put behind a flag. (doing this in general
> > would of course be easier, code-wise - fewer supported code paths, etc)
>
>  Two questions come to mind here:
> - is it reasonably ready for use?
> - how much work would it be to use it?
>
>  I tried Clang modules after they were mentioned in the first reply, and I
> got the impression that they need preparation for every header file used by
> the project, even external ones. Unless that can be automated, I don't
> quite see that happening for something the size and complexity of
> LibreOffice (we don't manually create our headers-to-become-PCHs either).
> And an unofficial tongue-in-cheek motton of LibreOffice is "proudly
> breaking your toolchain since 1985", so unless modules are reasonably
> usable, we'll run into all the bugs there and nobody will want to use it.
>
>  If doing this is good technically, long-term, fine, do it. But I'd like to
> have something that works this summer, and my rather simple patch can
> deliver that.


 And so this part would be irrelevant in that case I hope?

> > * Moving the pending instantiation processing to the end of the PCH would
> > make PCH generation a little slower, but given a project would only have
> > one PCH that might not be a huge problem.
>
>  I think that would be very well worth it.


 This should be the same either way.

> > * In addition to that, we could support -fmodules-codegen/debuginfo -
> > which would implement the "building an object from the PCH" step you've
> > described using existing infrastructure in Clang (& that would then
> > include other non-template inline functions, so it'd be a bit broader)
> > * Then we could potentially do something more like what you're proposing
> > here - if modules-codegen is used, defer pending instantiations from the
> > initial module/PCH creation step, to the module/PCH-to-object step, to
> > speed up module/PCH generation & unblock the downstream compilations that
> > use it
>
>  That could work for me too. And I'd be probably willing to help, if my
> Clang skill would be up to that. How much time/work would this be?


 This part supported my understanding as "use modules instead of PCHs", so I'm
not sure which interpretation you meant, but that's not the correct one, then
I think these flags are not really needed, at least for PCHs. The step of
creating an object file accompanying the PCH can be easily achieved
using "clang++ -c empty.cpp -include-pch
whatever.h.pch -Xclang -building-pch-with-obj", internally the
BuildingPCHWithObjectFile flag can be used for controlling special handling
as such the need to force emitting the shared code, and doing it as another
separate step would help with build parallelization. So, in fact, this way
there would be no need to move the processing of pending instantiations to
the end of PCH if BuildingPCHWithObjectFile is set.

--
 Lubos Lunak
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Controlling instantiation of templates from PCH

Nathan Ridge via cfe-dev


On Thu, May 30, 2019 at 6:04 AM Lubos Lunak <[hidden email]> wrote:
On Thursday 30 of May 2019, Lubos Lunak via cfe-dev wrote:
> On Wednesday 29 of May 2019, David Blaikie wrote:
> > In any case, talking to Richard Smith about all this, here's some things:
> >
> > * Clang header modules don't have the pending instantiation performance
> > problem described here - because they handle the pending instantiations
> > at the end of building the module, rather than in every consumer.
> > * It's possible moving PCH to the modules semantics


 Oh, wait, "semantics". I originally understood this as a kind of "do not use
PCHs, use modules", but I might have misunderstood. If this actually means
that PCHs should internally handle templates etc. the same way modules do,
but to the outside world they would still keep looking like normal PCHs, then
yeah, sure, as long as it works.

Right - that's what I was suggesting. Sorry for the confusion.
 
> > might be valid in
> > general, or good enough to put behind a flag. (doing this in general
> > would of course be easier, code-wise - fewer supported code paths, etc)
>
>  Two questions come to mind here:
> - is it reasonably ready for use?
> - how much work would it be to use it?
>
>  I tried Clang modules after they were mentioned in the first reply, and I
> got the impression that they need preparation for every header file used by
> the project, even external ones. Unless that can be automated, I don't
> quite see that happening for something the size and complexity of
> LibreOffice (we don't manually create our headers-to-become-PCHs either).
> And an unofficial tongue-in-cheek motton of LibreOffice is "proudly
> breaking your toolchain since 1985", so unless modules are reasonably
> usable, we'll run into all the bugs there and nobody will want to use it.
>
>  If doing this is good technically, long-term, fine, do it. But I'd like to
> have something that works this summer, and my rather simple patch can
> deliver that.

 And so this part would be irrelevant in that case I hope?

Right. (I mean, a separate question is whether you'd want to use modules - but yes, at the very least it does involve standardizing your headers on "well behaved" sort of restraints (basically "you can include the header anywhere, any time, and it always behaves the same way" - so not having headers that depend on macros locally defined to different values in different translation units, etc) - but yeah, it's a lot more work than the PCH situation)
 
> > * Moving the pending instantiation processing to the end of the PCH would
> > make PCH generation a little slower, but given a project would only have
> > one PCH that might not be a huge problem.
>
>  I think that would be very well worth it.

 This should be the same either way.

Yeah, I think I'd misunderstood your proposal - I had assumed there was a separation between PCH generation and the PCH->Object step (& in that latter step, the pending instantiations would be done). Sounds like your .h->PCH step also generates the object?
 
> > * In addition to that, we could support -fmodules-codegen/debuginfo -
> > which would implement the "building an object from the PCH" step you've
> > described using existing infrastructure in Clang (& that would then
> > include other non-template inline functions, so it'd be a bit broader)
> > * Then we could potentially do something more like what you're proposing
> > here - if modules-codegen is used, defer pending instantiations from the
> > initial module/PCH creation step, to the module/PCH-to-object step, to
> > speed up module/PCH generation & unblock the downstream compilations that
> > use it
>
>  That could work for me too. And I'd be probably willing to help, if my
> Clang skill would be up to that. How much time/work would this be?


 This part supported my understanding as "use modules instead of PCHs", so I'm
not sure which interpretation you meant, but that's not the correct one, then
I think these flags are not really needed, at least for PCHs. The step of
creating an object file accompanying the PCH can be easily achieved
using "clang++ -c empty.cpp -include-pch
whatever.h.pch -Xclang -building-pch-with-obj", internally the
BuildingPCHWithObjectFile flag can be used for controlling special handling
as such the need to force emitting the shared code, and doing it as another
separate step would help with build parallelization. So, in fact, this way
there would be no need to move the processing of pending instantiations to
the end of PCH if BuildingPCHWithObjectFile is set.

I didn't realize there was already a building-pch-with-obj option - I see now that there is. Sorry for the confusion there.

But I still suspect whatever that implements isn't quite modules-codegen, but I could be wrong.
 

--
 Lubos Lunak

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Controlling instantiation of templates from PCH

Lubos Lunak-2
On Friday 31 of May 2019, David Blaikie wrote:

> On Thu, May 30, 2019 at 6:04 AM Lubos Lunak <[hidden email]> wrote:
> > >  Two questions come to mind here:
> > > - is it reasonably ready for use?
> > > - how much work would it be to use it?
> > >
> > >  I tried Clang modules after they were mentioned in the first reply,
> > > and I got the impression that they need preparation for every header
> > > file used by the project, even external ones. Unless that can be
> > > automated, I don't
> > > quite see that happening for something the size and complexity of
> > > LibreOffice (we don't manually create our headers-to-become-PCHs
> > > either). And an unofficial tongue-in-cheek motton of LibreOffice is
> > > "proudly breaking your toolchain since 1985", so unless modules are
> > > reasonably usable, we'll run into all the bugs there and nobody will
> > > want to use it.
> > >
> > >  If doing this is good technically, long-term, fine, do it. But I'd
> > > like to
> > > have something that works this summer, and my rather simple patch can
> > > deliver that.
> >
> >  And so this part would be irrelevant in that case I hope?
>
> Right. (I mean, a separate question is whether you'd want to use modules -
> but yes, at the very least it does involve standardizing your headers on
> "well behaved" sort of restraints (basically "you can include the header
> anywhere, any time, and it always behaves the same way" - so not having
> headers that depend on macros locally defined to different values in
> different translation units, etc) - but yeah, it's a lot more work than the
> PCH situation)


 We can try modules eventually, but as said above, I expect the switch to
those would be possibly quite some work, and there's also the question how
build tools like ccache and icecream would cope with modules, so unlikely
now.

>
> > > > * Moving the pending instantiation processing to the end of the PCH
> > > > would make PCH generation a little slower, but given a project would
> > > > only have one PCH that might not be a huge problem.
> > >
> > >  I think that would be very well worth it.
> >
> >  This should be the same either way.


 Just to make it clear, this means "this should be worth it for either case of
whether I understood correctly or not that we should use modules". However
I'm later basically contradicting this by saying that if the build mode is
switched to generate an object file to aacompany the PCH then this shouldn't
be done.

> Yeah, I think I'd misunderstood your proposal - I had assumed there was a
> separation between PCH generation and the PCH->Object step (& in that
> latter step, the pending instantiations would be done). Sounds like your
> .h->PCH step also generates the object?


 My current plan is still to have a separate PCH->object step, but the idea
now is to use -building-pch-with-obj. So the build steps now should be:

# generate PCH and mark it as having accompanying .o
clang++ -c precompiled.hxx -o precompiled.pch -Xclang -building-pch-with-obj
# generate that .o
clang++ -c empty.cxx -include-pch
precompiled.pch -Xclang -building-pch-with-obj
# compile the rest
clang++ -c whatever.cxx -include-pch precompiled.pch

 So the first step will mark the PCH, the second step will generate all the
shared code, and the remaining steps will see the PCH as marked and will skip
generating things again. I like this separation better than /Yu merging the
first two steps into one, here each step generates just one output and
compilations depending on the PCH can already run alongside generating the
PCH's object. And keeping instantiating PCH templates the way it is now
instead of moving it to the end of PCH creation would mean that work is done
only in the second step, so the PCH generation wouldn't get slower.

 And this could be easily later extended to e.g. non-template inline functions
that get out-of-line copies in debug mode:
- Are we using a marked PCH and -building-pch-with-obj is set? => generate
shared copies
- Are we using a marked PCH but -building-pch-with-obj is not set? => skip
generation
- The PCH is not marked as having an object? => work normally

 I don't know how this relates to modules, but for PCHs I expect this should
work fine.

> But I still suspect whatever that implements isn't quite modules-codegen,
> but I could be wrong.


 I still don't quite get how modules relate to my patch, so I can't really
comment on this.

--
 Lubos Lunak
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
12