RFC: Adding constructor homing to clang's limited debug info mode

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

RFC: Adding constructor homing to clang's limited debug info mode

Fangrui Song via cfe-dev
Recently I added a level to clang's debug info (-debug-info-kind=constructor) based on Reid's constructor type homing idea. Since classes typically live in header files and have no natural "home", class type information is emitted in every translation unit where it is required to be complete, which results in a lot of duplicate debug info. Constructor type homing attempts to reduce the amount of duplicate debug info by emitting debug info for classes only where the constructor is emitted. It's based on the assumption that if complete debug info for a class is needed, then it should be constructed somewhere in the program. Currently this applies to classes with nontrivial, user-defined constructors. For classes with no constructors or no nontrivial constructors, there is no change, so debug info is still emitted everywhere. This RFC proposes using constructor homing as part of clang's limited debug info level.

Currently clang's limited debug info mode already has some optimizations for limiting the amount of class debug info; for example, we emit debug info for dynamic classes only when the vtable is emitted, since the program will fail to link if the vtable is not provided.

Link to the original patch adding `-debug-info-kind=constructor`: https://reviews.llvm.org/D72427
Patch for the proposed change: https://reviews.llvm.org/D79147

Some numbers from the LLVM build
This is a comparison I did on my machine of total object file size in Clang debug builds, with and without constructor homing (-debug-info-kind=limited vs. constructor). In general it seems to reduce the total object file size by 30-50%.
on Linux
before: 9345 MB
after: 4553 MB

on Windows
before: 6096 MB
after: 3979 MB

We also enabled this in Chromium a few months ago, and saw a similar change in object / split dwarf file size.

Testing
There isn't a very comprehensive way that I know of to test debug info. I tested a few things:
- Ran the LLDB test suite with this change as part of -debug-info-kind=limited. This caught an edge case with constexpr constructors. After fixing this, the LLDB test suite now passes with this mode (minus one test case that happens to fail, and is updated in the proposed patch)
-Compared clang.pdb files in a Windows LLVM debug build, and looked at the list of types that are no longer complete with the constructor homing change. We looked into some of these types, and they are constructed in functions that aren't used in clang.exe. In this case, it makes sense to not emit complete debug information.
-Enabled this in Chromium and we haven't yet received any bug reports about debug info.

We talked about running the GDB test suite with Clang in this mode, but the GDB test suite doesn't currently pass with Clang. Triaging GDB test suite failures is probably more than a month of work, so we don't plan to pursue it.

Any feedback is welcome! Especially looking for opinions on whether this should be the default for limited debug info, or if more testing is preferred, what that might look like.

-Amy

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Adding constructor homing to clang's limited debug info mode

Fangrui Song via cfe-dev
Hey Amy - thanks for the awesome work on this, I'm really excited to/hope to see this behavior get rolled into debug-info-kind=limited soon.

Mechanically speaking, I'm guessing we might want to change the default for debug-info-kind (on platforms where the default is already 'limited') to 'constructor' initially - and maybe wait a while/for an LLVM release before removing 'constructor' and rolling the behavior into 'limited' to remove the differentiation/extra code to handle that.

As for testing:

The GDB test suite doesn't really have a "pass/fail" kind of mentality, so far as I know, the best you can do is "does it fail more". But honestly even when I was running the GDB test suite against Clang years ago, the first thing I did was add the -fstandalone-debug (/-fno-limit-debug-info) because it wasn't really designed to cope with some of the optimizations in there (there are currently 3 optimizations in that bucket: complete type (if the type is only used for pointers that are never dereferenced, only emit a declaration of the type even if a definition is available), vtable homing, and template explicit instantiation decl/def homing - GCC implements vtable homing but not the other two, and the first one trips up at least several GDB tests).

Rumeet ( [hidden email] ) might be able to help you with getting data from the GDB test suite as we run it internally for LLVM release validation.

> Compared clang.pdb files in a Windows LLVM debug build, and looked at the list of types that are no longer complete with the constructor homing change. We looked into some of these types, and they are constructed in functions that aren't used in clang.exe. In this case, it makes sense to not emit complete debug information.

Would it be possible to compare the types from the object files rather than the final linked binary - there's still a risk that some of those missing types would be in unused libraries that never even needed to be built/attempted to be linked into the final clang binary, but might reduce the linker reachability stripping of object files an make the comparison close enough to put in a spreadsheet/provide more definitive "these are all the types that were no longer emitted and here's why they weren't/why that's OK", rather than a sampling? (admittedly, it's always still a bit of a sampling - testing clang, rather than some/all other binaries/etc)

On Wed, May 6, 2020 at 11:19 PM Amy Huang via cfe-dev <[hidden email]> wrote:
Recently I added a level to clang's debug info (-debug-info-kind=constructor) based on Reid's constructor type homing idea. Since classes typically live in header files and have no natural "home", class type information is emitted in every translation unit where it is required to be complete, which results in a lot of duplicate debug info. Constructor type homing attempts to reduce the amount of duplicate debug info by emitting debug info for classes only where the constructor is emitted. It's based on the assumption that if complete debug info for a class is needed, then it should be constructed somewhere in the program. Currently this applies to classes with nontrivial, user-defined constructors. For classes with no constructors or no nontrivial constructors, there is no change, so debug info is still emitted everywhere. This RFC proposes using constructor homing as part of clang's limited debug info level.

Currently clang's limited debug info mode already has some optimizations for limiting the amount of class debug info; for example, we emit debug info for dynamic classes only when the vtable is emitted, since the program will fail to link if the vtable is not provided.

Link to the original patch adding `-debug-info-kind=constructor`: https://reviews.llvm.org/D72427
Patch for the proposed change: https://reviews.llvm.org/D79147

Some numbers from the LLVM build
This is a comparison I did on my machine of total object file size in Clang debug builds, with and without constructor homing (-debug-info-kind=limited vs. constructor). In general it seems to reduce the total object file size by 30-50%.
on Linux
before: 9345 MB
after: 4553 MB

on Windows
before: 6096 MB
after: 3979 MB

We also enabled this in Chromium a few months ago, and saw a similar change in object / split dwarf file size.

Testing
There isn't a very comprehensive way that I know of to test debug info. I tested a few things:
- Ran the LLDB test suite with this change as part of -debug-info-kind=limited. This caught an edge case with constexpr constructors. After fixing this, the LLDB test suite now passes with this mode (minus one test case that happens to fail, and is updated in the proposed patch)
-Compared clang.pdb files in a Windows LLVM debug build, and looked at the list of types that are no longer complete with the constructor homing change. We looked into some of these types, and they are constructed in functions that aren't used in clang.exe. In this case, it makes sense to not emit complete debug information.
-Enabled this in Chromium and we haven't yet received any bug reports about debug info.

We talked about running the GDB test suite with Clang in this mode, but the GDB test suite doesn't currently pass with Clang. Triaging GDB test suite failures is probably more than a month of work, so we don't plan to pursue it.

Any feedback is welcome! Especially looking for opinions on whether this should be the default for limited debug info, or if more testing is preferred, what that might look like.

-Amy
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Adding constructor homing to clang's limited debug info mode

Fangrui Song via cfe-dev
In reply to this post by Fangrui Song via cfe-dev
On Thursday 07 of May 2020, Amy Huang via cfe-dev wrote:
> Recently I added a level to clang's debug info
> (-debug-info-kind=constructor) based on Reid's constructor type homing
> idea.
...
> Any feedback is welcome! Especially looking for opinions on whether this
> should be the default for limited debug info, or if more testing is
> preferred, what that might look like.

 FWIW I've been using -debug-info-kind=constructor locally for LibreOffice
development, and I haven't noticed a single problem (and yes, it noticeably
reduces file sizes and build time). Since one of LibreOffice's
tongue-in-cheek mottos is "proudly breaking your toolchain since 1985", I'd
be happy to run any tests with it if you'd be interested in testing the
feature with yet another large C++ codebase.

--
 Lubos Lunak
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Adding constructor homing to clang's limited debug info mode

Fangrui Song via cfe-dev
In reply to this post by Fangrui Song via cfe-dev

Would it be possible to compare the types from the object files rather than the final linked binary - there's still a risk that some of those missing types would be in unused libraries that never even needed to be built/attempted to be linked into the final clang binary, but might reduce the linker reachability stripping of object files an make the comparison close enough to put in a spreadsheet/provide more definitive "these are all the types that were no longer emitted and here's why they weren't/why that's OK", rather than a sampling? (admittedly, it's always still a bit of a sampling - testing clang, rather than some/all other binaries/etc)
That's a good idea, since it does seem like a lot of the types are just not linked into the clang binary. I'll try it and report back.

Also, thanks for the pointers about the GDB test suite!

On Wed, May 6, 2020 at 11:55 PM David Blaikie <[hidden email]> wrote:
Hey Amy - thanks for the awesome work on this, I'm really excited to/hope to see this behavior get rolled into debug-info-kind=limited soon.

Mechanically speaking, I'm guessing we might want to change the default for debug-info-kind (on platforms where the default is already 'limited') to 'constructor' initially - and maybe wait a while/for an LLVM release before removing 'constructor' and rolling the behavior into 'limited' to remove the differentiation/extra code to handle that.

As for testing:

The GDB test suite doesn't really have a "pass/fail" kind of mentality, so far as I know, the best you can do is "does it fail more". But honestly even when I was running the GDB test suite against Clang years ago, the first thing I did was add the -fstandalone-debug (/-fno-limit-debug-info) because it wasn't really designed to cope with some of the optimizations in there (there are currently 3 optimizations in that bucket: complete type (if the type is only used for pointers that are never dereferenced, only emit a declaration of the type even if a definition is available), vtable homing, and template explicit instantiation decl/def homing - GCC implements vtable homing but not the other two, and the first one trips up at least several GDB tests).

Rumeet ( [hidden email] ) might be able to help you with getting data from the GDB test suite as we run it internally for LLVM release validation.

> Compared clang.pdb files in a Windows LLVM debug build, and looked at the list of types that are no longer complete with the constructor homing change. We looked into some of these types, and they are constructed in functions that aren't used in clang.exe. In this case, it makes sense to not emit complete debug information.

Would it be possible to compare the types from the object files rather than the final linked binary - there's still a risk that some of those missing types would be in unused libraries that never even needed to be built/attempted to be linked into the final clang binary, but might reduce the linker reachability stripping of object files an make the comparison close enough to put in a spreadsheet/provide more definitive "these are all the types that were no longer emitted and here's why they weren't/why that's OK", rather than a sampling? (admittedly, it's always still a bit of a sampling - testing clang, rather than some/all other binaries/etc)

On Wed, May 6, 2020 at 11:19 PM Amy Huang via cfe-dev <[hidden email]> wrote:
Recently I added a level to clang's debug info (-debug-info-kind=constructor) based on Reid's constructor type homing idea. Since classes typically live in header files and have no natural "home", class type information is emitted in every translation unit where it is required to be complete, which results in a lot of duplicate debug info. Constructor type homing attempts to reduce the amount of duplicate debug info by emitting debug info for classes only where the constructor is emitted. It's based on the assumption that if complete debug info for a class is needed, then it should be constructed somewhere in the program. Currently this applies to classes with nontrivial, user-defined constructors. For classes with no constructors or no nontrivial constructors, there is no change, so debug info is still emitted everywhere. This RFC proposes using constructor homing as part of clang's limited debug info level.

Currently clang's limited debug info mode already has some optimizations for limiting the amount of class debug info; for example, we emit debug info for dynamic classes only when the vtable is emitted, since the program will fail to link if the vtable is not provided.

Link to the original patch adding `-debug-info-kind=constructor`: https://reviews.llvm.org/D72427
Patch for the proposed change: https://reviews.llvm.org/D79147

Some numbers from the LLVM build
This is a comparison I did on my machine of total object file size in Clang debug builds, with and without constructor homing (-debug-info-kind=limited vs. constructor). In general it seems to reduce the total object file size by 30-50%.
on Linux
before: 9345 MB
after: 4553 MB

on Windows
before: 6096 MB
after: 3979 MB

We also enabled this in Chromium a few months ago, and saw a similar change in object / split dwarf file size.

Testing
There isn't a very comprehensive way that I know of to test debug info. I tested a few things:
- Ran the LLDB test suite with this change as part of -debug-info-kind=limited. This caught an edge case with constexpr constructors. After fixing this, the LLDB test suite now passes with this mode (minus one test case that happens to fail, and is updated in the proposed patch)
-Compared clang.pdb files in a Windows LLVM debug build, and looked at the list of types that are no longer complete with the constructor homing change. We looked into some of these types, and they are constructed in functions that aren't used in clang.exe. In this case, it makes sense to not emit complete debug information.
-Enabled this in Chromium and we haven't yet received any bug reports about debug info.

We talked about running the GDB test suite with Clang in this mode, but the GDB test suite doesn't currently pass with Clang. Triaging GDB test suite failures is probably more than a month of work, so we don't plan to pursue it.

Any feedback is welcome! Especially looking for opinions on whether this should be the default for limited debug info, or if more testing is preferred, what that might look like.

-Amy
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev