Re: [llvm-dev] RFC: Adding a code size analysis tool

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Adding a code size analysis tool

Yvan Roux via cfe-dev

Something that I’ve been looking to do for a while now is to do this at the .o level, and have something to combine the per .o results as well.  I’ve wanted to do that to figure out where I can speed up builds that are overly slow because of redundant template instantiations.

 

You might also consider a view that goes across templates and across namespaces.  It can be useful to see that X% of your code is in std::map instantiations (for example).  This seems similar to how you have inheritance covered.  I’ve also wanted to find ways to visualize the opposite… where I have class Foo and I want to see its total cost, including the size of std::vector<Foo>.

 

From: llvm-dev <[hidden email]> On Behalf Of Jake Ehrlich via llvm-dev
Sent: Friday, September 28, 2018 3:51 PM
To: Vedant Kumar <[hidden email]>
Cc: Apple Inc. <[hidden email]>
Subject: Re: [llvm-dev] RFC: Adding a code size analysis tool

 

Fantastic! I have been looking at creating a tool that a) only spits out actionable size reductions (preferably with a specific action should be specified) and b) only analyzes the size of allocated sections. The other deficiency I've seen with bloaty is speed and scaling. It's very hard to get bloaty to analyze across a large system of interdependent shared libraries. You can add me as a reviewer to any changes as I would very much like to see such a tool exist.

Unlike bloaty, this tool focuses exclusively on the text segment.

I'd like to see support for everything within PT_LOAD segments, not just the executable parts. Everything else you've said is basically what I wanted.

 

On Wed, Sep 26, 2018 at 12:03 PM Vedant Kumar via llvm-dev <[hidden email]> wrote:

Hello,

I worked on a code size analysis tool for a 'week of code' project and think
that it might be useful enough to upstream.

The tool is inspired by bloaty (https://github.com/google/bloaty), but tries to
do more to attribute code size in actionable ways.

For example, it can calculate how many bytes inlined instances of a function
added to a binary. In its diff mode, it can show how much more aggressively a
function was inlined compared to a baseline. This can be useful when you're,
say, trying to figure out why firmware compiled by a new compiler is just a few
bytes over the size limit imposed by your embedded device :). In this case,
extra information about inlining can help inform a decision to either tweak the
inliner's cost model or to judiciously add a few `noinline` attributes. (Note
that if you're willing to recompile & write a few SQL queries, optimization
remarks can give you similar information, albeit at the IR level.)

As another example, this code size tool can attribute code size to semantically
interesting groups of code, like C++/Swift classes, or files. In the diff mode,
you can see how the code size of a class/file grew compared to a baseline. The
tool understands inheritance, so you can also see interesting high-level trends.
E.g `clang::Sema` grew more than `llvm::Pass` between clang-6 and clang-7.

Unlike bloaty, this tool focuses exclusively on the text segment. Also unlike
bloaty, it uses LLVM's DWARF parser instead of rolling its own. The tool is
currently implemented as a sub-tool of llvm-dwarfdump.

To get size information about a program, you do:

  llvm-dwarfdump size-info -baseline <object> -stats-dir <dir>

This emits four *.stats files into <dir>, each containing a distinct 'view' into
the code groups in <object>. There's a file view, a function view, a class view,
and an inlining view. Each view is sorted by code size, so you can see the
largest functions/classes/etc immediately.

The *.stats files are just human-readable text files. As it happens, they use
the flamegraph format (http://brendangregg.com/flamegraphs.html). This makes it
easy to visualize any view as a flamegraph. (If you haven't seen one before,
it's a hierarchical visualization where the width of each entry corresponds to
its frequency (or in this case size).)

To look at code growth between two programs, you'd do:

  llvm-dwarfdump size-info -baseline <object> -target <object> -stats-dir <dir>

Similarly, this emits four 'view' files into <dir>, but with a *.diffstats
suffix. The format is the same.

Pending Work
------------

I think the main piece of work the tool needs is better testing. Currently
there's just a single end-to-end test in clang. It might be better to check in
a few binaries so we can check that the tool reports sizes correctly.

Also, it may turn out that folks are interested in different ways of visualizing
size data. While the textual format of flamegraphs is really convenient for
humans to read, the graphs themselves do make more sense when the underlying
data have a frequentist interpretation. If there's enough interest I can explore
using an alternative format for visualization, e.g:

  http://neugierig.org/software/chromium/bloat/
  https://github.com/evmar/webtreemap

(Thanks JF for pointing these out!)

Here's a link to the source code:

  https://github.com/vedantk/llvm-project/tree/sizeinfo 

Selected Examples
-----------------

Here are a few interesting snippets from a comparison of clang-6 vs. clang-7.

First, let's take a look at the function view diffstat. Here are the 10
functions which grew in size the most. On the left hand side, you'll see the
demangled function name. The *change* in code size in bytes is reported on the
right hand side (only positive changes are reported).

  clang::Sema::CheckHexagonBuiltinCpu([snip]) [function] 170316
  ProcessDeclAttribute([snip]) [function] 125893
  llvm::AArch64InstPrinter::printAliasInstr([snip]) [function] 105133
  llvm::AArch64AppleInstPrinter::printAliasInstr([snip]) [function] 105133
  ParseCodeGenArgs([snip]) [function] 64692
  unswitchNontrivialInvariants([snip]) [function] 40180
  getAttrKind([snip]) [function] 35811
  clang::DumpCompilerOptionsAction::ExecuteAction() [function] 32417
  llvm::UpgradeIntrinsicCall([snip]) [function] 30239
  bool llvm::InstructionSelector::executeMatchTable<(anonymous namespace)::ARMInstructionSelector const, [snip]) const [function] 29352


Next, let's look at the file view diffstat. This can be useful because it goes
beyond simply identifying the files which grew the most. It actually describes
which *functions* grew the most in those files, creating more opportunites to
do something about the code growth.

  lib/Target/X86/X86ISelLowering.cpp [file];combineX86ShuffleChain([snip]) [function] 24864
  lib/Target/X86/X86ISelLowering.cpp [file];combineMul([snip]) [function] 14907
  lib/Target/X86/X86ISelLowering.cpp [file];combineStore([snip]) [function] 12220
  ...
  tools/clang/lib/Sema/SemaExpr.cpp [file];clang::Sema::CheckCompareOperands([snip]) [function] 16024
  tools/clang/lib/Sema/SemaExpr.cpp [file];diagnoseTautologicalComparison([snip]) [function] 1740
  tools/clang/lib/Sema/SemaExpr.cpp [file];clang::Sema::ActOnNumericConstant([snip]) [function] 1436
  tools/clang/lib/Sema/SemaExpr.cpp [file];checkThreeWayNarrowingConversion([snip]) [function] 1356
  tools/clang/lib/Sema/SemaExpr.cpp [file];CheckIdentityFieldAssignment([snip]) [function] 1280


The class view diffstat is a bit different because it has more levels of
nesting than the other views, due to inheritance. This might help give a sense
for the high-level changes in a program, but may also be less actionable.

  clang::Sema [class];clang::Sema::CheckHexagonBuiltinCpu([snip]) [function] 170316
  clang::Sema [class];clang::Sema::CheckHexagonBuiltinArgument([snip]) [function] 24156
  clang::Sema [class];clang::Sema::ActOnTag([snip]) [function] 22373
  ...
  llvm::AArch64InstPrinter [class];llvm::AArch64AppleInstPrinter [class];llvm::AArch64AppleInstPrinter::printAliasInstr([snip]) [function] 105133
  llvm::AArch64InstPrinter [class];llvm::AArch64AppleInstPrinter [class];llvm::AArch64AppleInstPrinter::printInstruction([snip]) [function] 5824
  ...
  llvm::Pass [class];llvm::FunctionPass [class];llvm::MachineFunctionPass [class];(anon)::X86SpeculativeLoadHardeningPass [class];(anonymous namespace)::X86SpeculativeLoadHardeningPass::checkAllLoads(llvm::MachineFunction&) [function] 19287
  ...
  llvm::Pass [class];llvm::FunctionPass [class];llvm::MachineFunctionPass [class];(anon)::MachineLICMBase [class];(anonymous namespace)::MachineLICMBase::runOnMachineFunction(llvm::MachineFunction&) [function] 20343

Here's a link to a flamegraph of the class view diffstat (warning: it's big):

  http://net.vedantk.com/static/llvm/swift-clang-4.2-vs-5.0.class-view.diffstats.svg

Finally, here are a few interesting entries from the inlining view diffstat. As
with all of the other views, the right hand side still shows code growth in
bytes. For a given inlining target, this size is computed by diffing the sum of
PC range lengths from all DW_TAG_inlined_subroutines referring to that target.
This allows the size tool to attribute code size to an inlining target even
when the inlined code is not contiguous in the caller.

  llvm::raw_ostream::operator<<(char const*) [inlining-target] 66720
  llvm::MCRegisterClass::contains(unsigned int) const [inlining-target] 64161
  llvm::StringRef::StringRef(char const*) [inlining-target] 39262
  llvm::MCInst::getOperand(unsigned int) const [inlining-target] 33268
  clang::CodeCompletionResult::~CodeCompletionResult() [inlining-target] 25763
  llvm::operator+(llvm::Twine const&, llvm::Twine const&) [inlining-target] 25525
  clang::ASTImporter::Import(clang::SourceLocation) [inlining-target] 21096
  clang::Sema::Diag(clang::SourceLocation, unsigned int) [inlining-target] 20898

Feedback & questions welcome!

thanks,
vedant
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Adding a code size analysis tool

Yvan Roux via cfe-dev
Hi Ben,

On Oct 1, 2018, at 12:48 PM, Ben Craig <[hidden email]> wrote:

Something that I’ve been looking to do for a while now is to do this at the .o level, and have something to combine the per .o results as well.  I’ve wanted to do that to figure out where I can speed up builds that are overly slow because of redundant template instantiations.
 
You might also consider a view that goes across templates and across namespaces.  It can be useful to see that X% of your code is in std::map instantiations (for example).  This seems similar to how you have inheritance covered.  I’ve also wanted to find ways to visualize the opposite… where I have class Foo and I want to see its total cost, including the size of std::vector<Foo>.

These are great ideas. DWARF might not provide enough information to generate these views (it doesn’t explicitly describe the types which parameterize classes, or the names of un-specialized templates). But it should be possible to piece some of this together by parsing type names.

vedant

 
From: llvm-dev <[hidden email]> On Behalf Of Jake Ehrlich via llvm-dev
Sent: Friday, September 28, 2018 3:51 PM
To: Vedant Kumar <[hidden email]>
Cc: Apple Inc. <[hidden email]>
Subject: Re: [llvm-dev] RFC: Adding a code size analysis tool
 
Fantastic! I have been looking at creating a tool that a) only spits out actionable size reductions (preferably with a specific action should be specified) and b) only analyzes the size of allocated sections. The other deficiency I've seen with bloaty is speed and scaling. It's very hard to get bloaty to analyze across a large system of interdependent shared libraries. You can add me as a reviewer to any changes as I would very much like to see such a tool exist.

Unlike bloaty, this tool focuses exclusively on the text segment.

I'd like to see support for everything within PT_LOAD segments, not just the executable parts. Everything else you've said is basically what I wanted.
 
On Wed, Sep 26, 2018 at 12:03 PM Vedant Kumar via llvm-dev <[hidden email]> wrote:
Hello,

I worked on a code size analysis tool for a 'week of code' project and think
that it might be useful enough to upstream.

The tool is inspired by bloaty (https://github.com/google/bloaty), but tries to
do more to attribute code size in actionable ways.

For example, it can calculate how many bytes inlined instances of a function
added to a binary. In its diff mode, it can show how much more aggressively a
function was inlined compared to a baseline. This can be useful when you're,
say, trying to figure out why firmware compiled by a new compiler is just a few
bytes over the size limit imposed by your embedded device :). In this case,
extra information about inlining can help inform a decision to either tweak the
inliner's cost model or to judiciously add a few `noinline` attributes. (Note
that if you're willing to recompile & write a few SQL queries, optimization
remarks can give you similar information, albeit at the IR level.)

As another example, this code size tool can attribute code size to semantically
interesting groups of code, like C++/Swift classes, or files. In the diff mode,
you can see how the code size of a class/file grew compared to a baseline. The
tool understands inheritance, so you can also see interesting high-level trends.
E.g `clang::Sema` grew more than `llvm::Pass` between clang-6 and clang-7.

Unlike bloaty, this tool focuses exclusively on the text segment. Also unlike
bloaty, it uses LLVM's DWARF parser instead of rolling its own. The tool is
currently implemented as a sub-tool of llvm-dwarfdump.

To get size information about a program, you do:

  llvm-dwarfdump size-info -baseline <object> -stats-dir <dir>

This emits four *.stats files into <dir>, each containing a distinct 'view' into
the code groups in <object>. There's a file view, a function view, a class view,
and an inlining view. Each view is sorted by code size, so you can see the
largest functions/classes/etc immediately.

The *.stats files are just human-readable text files. As it happens, they use
the flamegraph format (http://brendangregg.com/flamegraphs.html). This makes it
easy to visualize any view as a flamegraph. (If you haven't seen one before,
it's a hierarchical visualization where the width of each entry corresponds to
its frequency (or in this case size).)

To look at code growth between two programs, you'd do:

  llvm-dwarfdump size-info -baseline <object> -target <object> -stats-dir <dir>

Similarly, this emits four 'view' files into <dir>, but with a *.diffstats
suffix. The format is the same.

Pending Work
------------

I think the main piece of work the tool needs is better testing. Currently
there's just a single end-to-end test in clang. It might be better to check in
a few binaries so we can check that the tool reports sizes correctly.

Also, it may turn out that folks are interested in different ways of visualizing
size data. While the textual format of flamegraphs is really convenient for
humans to read, the graphs themselves do make more sense when the underlying
data have a frequentist interpretation. If there's enough interest I can explore
using an alternative format for visualization, e.g:

  http://neugierig.org/software/chromium/bloat/
  https://github.com/evmar/webtreemap

(Thanks JF for pointing these out!)

Here's a link to the source code:

  https://github.com/vedantk/llvm-project/tree/sizeinfo  

Selected Examples
-----------------

Here are a few interesting snippets from a comparison of clang-6 vs. clang-7.

First, let's take a look at the function view diffstat. Here are the 10
functions which grew in size the most. On the left hand side, you'll see the
demangled function name. The *change* in code size in bytes is reported on the
right hand side (only positive changes are reported).

  clang::Sema::CheckHexagonBuiltinCpu([snip]) [function] 170316
  ProcessDeclAttribute([snip]) [function] 125893
  llvm::AArch64InstPrinter::printAliasInstr([snip]) [function] 105133
  llvm::AArch64AppleInstPrinter::printAliasInstr([snip]) [function] 105133
  ParseCodeGenArgs([snip]) [function] 64692
  unswitchNontrivialInvariants([snip]) [function] 40180
  getAttrKind([snip]) [function] 35811
  clang::DumpCompilerOptionsAction::ExecuteAction() [function] 32417
  llvm::UpgradeIntrinsicCall([snip]) [function] 30239
  bool llvm::InstructionSelector::executeMatchTable<(anonymous namespace)::ARMInstructionSelector const, [snip]) const [function] 29352


Next, let's look at the file view diffstat. This can be useful because it goes
beyond simply identifying the files which grew the most. It actually describes
which *functions* grew the most in those files, creating more opportunites to
do something about the code growth.

  lib/Target/X86/X86ISelLowering.cpp [file];combineX86ShuffleChain([snip]) [function] 24864
  lib/Target/X86/X86ISelLowering.cpp [file];combineMul([snip]) [function] 14907
  lib/Target/X86/X86ISelLowering.cpp [file];combineStore([snip]) [function] 12220
  ...
  tools/clang/lib/Sema/SemaExpr.cpp [file];clang::Sema::CheckCompareOperands([snip]) [function] 16024
  tools/clang/lib/Sema/SemaExpr.cpp [file];diagnoseTautologicalComparison([snip]) [function] 1740
  tools/clang/lib/Sema/SemaExpr.cpp [file];clang::Sema::ActOnNumericConstant([snip]) [function] 1436
  tools/clang/lib/Sema/SemaExpr.cpp [file];checkThreeWayNarrowingConversion([snip]) [function] 1356
  tools/clang/lib/Sema/SemaExpr.cpp [file];CheckIdentityFieldAssignment([snip]) [function] 1280


The class view diffstat is a bit different because it has more levels of
nesting than the other views, due to inheritance. This might help give a sense
for the high-level changes in a program, but may also be less actionable.

  clang::Sema [class];clang::Sema::CheckHexagonBuiltinCpu([snip]) [function] 170316
  clang::Sema [class];clang::Sema::CheckHexagonBuiltinArgument([snip]) [function] 24156
  clang::Sema [class];clang::Sema::ActOnTag([snip]) [function] 22373
  ...
  llvm::AArch64InstPrinter [class];llvm::AArch64AppleInstPrinter [class];llvm::AArch64AppleInstPrinter::printAliasInstr([snip]) [function] 105133
  llvm::AArch64InstPrinter [class];llvm::AArch64AppleInstPrinter [class];llvm::AArch64AppleInstPrinter::printInstruction([snip]) [function] 5824
  ...
  llvm::Pass [class];llvm::FunctionPass [class];llvm::MachineFunctionPass [class];(anon)::X86SpeculativeLoadHardeningPass [class];(anonymous namespace)::X86SpeculativeLoadHardeningPass::checkAllLoads(llvm::MachineFunction&) [function] 19287
  ...
  llvm::Pass [class];llvm::FunctionPass [class];llvm::MachineFunctionPass [class];(anon)::MachineLICMBase [class];(anonymous namespace)::MachineLICMBase::runOnMachineFunction(llvm::MachineFunction&) [function] 20343

Here's a link to a flamegraph of the class view diffstat (warning: it's big):

  http://net.vedantk.com/static/llvm/swift-clang-4.2-vs-5.0.class-view.diffstats.svg

Finally, here are a few interesting entries from the inlining view diffstat. As
with all of the other views, the right hand side still shows code growth in
bytes. For a given inlining target, this size is computed by diffing the sum of
PC range lengths from all DW_TAG_inlined_subroutines referring to that target.
This allows the size tool to attribute code size to an inlining target even
when the inlined code is not contiguous in the caller.

  llvm::raw_ostream::operator<<(char const*) [inlining-target] 66720
  llvm::MCRegisterClass::contains(unsigned int) const [inlining-target] 64161
  llvm::StringRef::StringRef(char const*) [inlining-target] 39262
  llvm::MCInst::getOperand(unsigned int) const [inlining-target] 33268
  clang::CodeCompletionResult::~CodeCompletionResult() [inlining-target] 25763
  llvm::operator+(llvm::Twine const&, llvm::Twine const&) [inlining-target] 25525
  clang::ASTImporter::Import(clang::SourceLocation) [inlining-target] 21096
  clang::Sema::Diag(clang::SourceLocation, unsigned int) [inlining-target] 20898

Feedback & questions welcome!

thanks,
vedant
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Adding a code size analysis tool

Yvan Roux via cfe-dev
Will this only be strictly for binary size, or can we use it for memory size too?

One thing I implemented in llvm-pdbutil kind of as a side-exercise to see if it found anything useful was a padding detector.  It turns out it's really annoyingly difficult to reconstruct an exact class layout from debug info, but I think it's about 85% correct now (although for now it only works on Windows until the native high-level PDB access api is complete -- currently only the native low level api is complete).  It will allow you to sort all classes by amount of padding or percentage of class size attributable to padding.  We shaved a couple of percent off of V8's memory usage with this tool.

Granted, it's better to have the compiler detect this if possible, but we're talking about a tool that can be run on an arbitrary executable not necessarily built with a compiler we control.

BTW, even though DWARF doesn't describe types which parameterize templates, it does give you mangled names, so you should be able to reconstruct those types from the mangled names.

On Mon, Oct 1, 2018 at 2:25 PM Vedant Kumar via cfe-dev <[hidden email]> wrote:
Hi Ben,

On Oct 1, 2018, at 12:48 PM, Ben Craig <[hidden email]> wrote:

Something that I’ve been looking to do for a while now is to do this at the .o level, and have something to combine the per .o results as well.  I’ve wanted to do that to figure out where I can speed up builds that are overly slow because of redundant template instantiations.
 
You might also consider a view that goes across templates and across namespaces.  It can be useful to see that X% of your code is in std::map instantiations (for example).  This seems similar to how you have inheritance covered.  I’ve also wanted to find ways to visualize the opposite… where I have class Foo and I want to see its total cost, including the size of std::vector<Foo>.

These are great ideas. DWARF might not provide enough information to generate these views (it doesn’t explicitly describe the types which parameterize classes, or the names of un-specialized templates). But it should be possible to piece some of this together by parsing type names.

vedant

 
From: llvm-dev <[hidden email]> On Behalf Of Jake Ehrlich via llvm-dev
Sent: Friday, September 28, 2018 3:51 PM
To: Vedant Kumar <[hidden email]>
Cc: Apple Inc. <[hidden email]>
Subject: Re: [llvm-dev] RFC: Adding a code size analysis tool
 
Fantastic! I have been looking at creating a tool that a) only spits out actionable size reductions (preferably with a specific action should be specified) and b) only analyzes the size of allocated sections. The other deficiency I've seen with bloaty is speed and scaling. It's very hard to get bloaty to analyze across a large system of interdependent shared libraries. You can add me as a reviewer to any changes as I would very much like to see such a tool exist.

Unlike bloaty, this tool focuses exclusively on the text segment.

I'd like to see support for everything within PT_LOAD segments, not just the executable parts. Everything else you've said is basically what I wanted.
 
On Wed, Sep 26, 2018 at 12:03 PM Vedant Kumar via llvm-dev <[hidden email]> wrote:
Hello,

I worked on a code size analysis tool for a 'week of code' project and think
that it might be useful enough to upstream.

The tool is inspired by bloaty (https://github.com/google/bloaty), but tries to
do more to attribute code size in actionable ways.

For example, it can calculate how many bytes inlined instances of a function
added to a binary. In its diff mode, it can show how much more aggressively a
function was inlined compared to a baseline. This can be useful when you're,
say, trying to figure out why firmware compiled by a new compiler is just a few
bytes over the size limit imposed by your embedded device :). In this case,
extra information about inlining can help inform a decision to either tweak the
inliner's cost model or to judiciously add a few `noinline` attributes. (Note
that if you're willing to recompile & write a few SQL queries, optimization
remarks can give you similar information, albeit at the IR level.)

As another example, this code size tool can attribute code size to semantically
interesting groups of code, like C++/Swift classes, or files. In the diff mode,
you can see how the code size of a class/file grew compared to a baseline. The
tool understands inheritance, so you can also see interesting high-level trends.
E.g `clang::Sema` grew more than `llvm::Pass` between clang-6 and clang-7.

Unlike bloaty, this tool focuses exclusively on the text segment. Also unlike
bloaty, it uses LLVM's DWARF parser instead of rolling its own. The tool is
currently implemented as a sub-tool of llvm-dwarfdump.

To get size information about a program, you do:

  llvm-dwarfdump size-info -baseline <object> -stats-dir <dir>

This emits four *.stats files into <dir>, each containing a distinct 'view' into
the code groups in <object>. There's a file view, a function view, a class view,
and an inlining view. Each view is sorted by code size, so you can see the
largest functions/classes/etc immediately.

The *.stats files are just human-readable text files. As it happens, they use
the flamegraph format (http://brendangregg.com/flamegraphs.html). This makes it
easy to visualize any view as a flamegraph. (If you haven't seen one before,
it's a hierarchical visualization where the width of each entry corresponds to
its frequency (or in this case size).)

To look at code growth between two programs, you'd do:

  llvm-dwarfdump size-info -baseline <object> -target <object> -stats-dir <dir>

Similarly, this emits four 'view' files into <dir>, but with a *.diffstats
suffix. The format is the same.

Pending Work
------------

I think the main piece of work the tool needs is better testing. Currently
there's just a single end-to-end test in clang. It might be better to check in
a few binaries so we can check that the tool reports sizes correctly.

Also, it may turn out that folks are interested in different ways of visualizing
size data. While the textual format of flamegraphs is really convenient for
humans to read, the graphs themselves do make more sense when the underlying
data have a frequentist interpretation. If there's enough interest I can explore
using an alternative format for visualization, e.g:

  http://neugierig.org/software/chromium/bloat/
  https://github.com/evmar/webtreemap

(Thanks JF for pointing these out!)

Here's a link to the source code:

  https://github.com/vedantk/llvm-project/tree/sizeinfo  

Selected Examples
-----------------

Here are a few interesting snippets from a comparison of clang-6 vs. clang-7.

First, let's take a look at the function view diffstat. Here are the 10
functions which grew in size the most. On the left hand side, you'll see the
demangled function name. The *change* in code size in bytes is reported on the
right hand side (only positive changes are reported).

  clang::Sema::CheckHexagonBuiltinCpu([snip]) [function] 170316
  ProcessDeclAttribute([snip]) [function] 125893
  llvm::AArch64InstPrinter::printAliasInstr([snip]) [function] 105133
  llvm::AArch64AppleInstPrinter::printAliasInstr([snip]) [function] 105133
  ParseCodeGenArgs([snip]) [function] 64692
  unswitchNontrivialInvariants([snip]) [function] 40180
  getAttrKind([snip]) [function] 35811
  clang::DumpCompilerOptionsAction::ExecuteAction() [function] 32417
  llvm::UpgradeIntrinsicCall([snip]) [function] 30239
  bool llvm::InstructionSelector::executeMatchTable<(anonymous namespace)::ARMInstructionSelector const, [snip]) const [function] 29352


Next, let's look at the file view diffstat. This can be useful because it goes
beyond simply identifying the files which grew the most. It actually describes
which *functions* grew the most in those files, creating more opportunites to
do something about the code growth.

  lib/Target/X86/X86ISelLowering.cpp [file];combineX86ShuffleChain([snip]) [function] 24864
  lib/Target/X86/X86ISelLowering.cpp [file];combineMul([snip]) [function] 14907
  lib/Target/X86/X86ISelLowering.cpp [file];combineStore([snip]) [function] 12220
  ...
  tools/clang/lib/Sema/SemaExpr.cpp [file];clang::Sema::CheckCompareOperands([snip]) [function] 16024
  tools/clang/lib/Sema/SemaExpr.cpp [file];diagnoseTautologicalComparison([snip]) [function] 1740
  tools/clang/lib/Sema/SemaExpr.cpp [file];clang::Sema::ActOnNumericConstant([snip]) [function] 1436
  tools/clang/lib/Sema/SemaExpr.cpp [file];checkThreeWayNarrowingConversion([snip]) [function] 1356
  tools/clang/lib/Sema/SemaExpr.cpp [file];CheckIdentityFieldAssignment([snip]) [function] 1280


The class view diffstat is a bit different because it has more levels of
nesting than the other views, due to inheritance. This might help give a sense
for the high-level changes in a program, but may also be less actionable.

  clang::Sema [class];clang::Sema::CheckHexagonBuiltinCpu([snip]) [function] 170316
  clang::Sema [class];clang::Sema::CheckHexagonBuiltinArgument([snip]) [function] 24156
  clang::Sema [class];clang::Sema::ActOnTag([snip]) [function] 22373
  ...
  llvm::AArch64InstPrinter [class];llvm::AArch64AppleInstPrinter [class];llvm::AArch64AppleInstPrinter::printAliasInstr([snip]) [function] 105133
  llvm::AArch64InstPrinter [class];llvm::AArch64AppleInstPrinter [class];llvm::AArch64AppleInstPrinter::printInstruction([snip]) [function] 5824
  ...
  llvm::Pass [class];llvm::FunctionPass [class];llvm::MachineFunctionPass [class];(anon)::X86SpeculativeLoadHardeningPass [class];(anonymous namespace)::X86SpeculativeLoadHardeningPass::checkAllLoads(llvm::MachineFunction&) [function] 19287
  ...
  llvm::Pass [class];llvm::FunctionPass [class];llvm::MachineFunctionPass [class];(anon)::MachineLICMBase [class];(anonymous namespace)::MachineLICMBase::runOnMachineFunction(llvm::MachineFunction&) [function] 20343

Here's a link to a flamegraph of the class view diffstat (warning: it's big):

  http://net.vedantk.com/static/llvm/swift-clang-4.2-vs-5.0.class-view.diffstats.svg

Finally, here are a few interesting entries from the inlining view diffstat. As
with all of the other views, the right hand side still shows code growth in
bytes. For a given inlining target, this size is computed by diffing the sum of
PC range lengths from all DW_TAG_inlined_subroutines referring to that target.
This allows the size tool to attribute code size to an inlining target even
when the inlined code is not contiguous in the caller.

  llvm::raw_ostream::operator<<(char const*) [inlining-target] 66720
  llvm::MCRegisterClass::contains(unsigned int) const [inlining-target] 64161
  llvm::StringRef::StringRef(char const*) [inlining-target] 39262
  llvm::MCInst::getOperand(unsigned int) const [inlining-target] 33268
  clang::CodeCompletionResult::~CodeCompletionResult() [inlining-target] 25763
  llvm::operator+(llvm::Twine const&, llvm::Twine const&) [inlining-target] 25525
  clang::ASTImporter::Import(clang::SourceLocation) [inlining-target] 21096
  clang::Sema::Diag(clang::SourceLocation, unsigned int) [inlining-target] 20898

Feedback & questions welcome!

thanks,
vedant
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Adding a code size analysis tool

Yvan Roux via cfe-dev
On Oct 1, 2018, at 3:09 PM, Zachary Turner <[hidden email]> wrote:

Will this only be strictly for binary size, or can we use it for memory size too?

This could be a good fit. I’m not sure what would be helpful beyond a padding detector (which, btw, DWARF’s AT_data_member_location might help with?). DWARF does describe the size of each class/struct, but I’m skeptical that surfacing that could be very helpful.


One thing I implemented in llvm-pdbutil kind of as a side-exercise to see if it found anything useful was a padding detector.  It turns out it's really annoyingly difficult to reconstruct an exact class layout from debug info, but I think it's about 85% correct now (although for now it only works on Windows until the native high-level PDB access api is complete -- currently only the native low level api is complete).  It will allow you to sort all classes by amount of padding or percentage of class size attributable to padding.  We shaved a couple of percent off of V8's memory usage with this tool.

This is really neat.


Granted, it's better to have the compiler detect this if possible, but we're talking about a tool that can be run on an arbitrary executable not necessarily built with a compiler we control.

BTW, even though DWARF doesn't describe types which parameterize templates, it does give you mangled names, so you should be able to reconstruct those types from the mangled names.

Right.

vedant


On Mon, Oct 1, 2018 at 2:25 PM Vedant Kumar via cfe-dev <[hidden email]> wrote:
Hi Ben,

On Oct 1, 2018, at 12:48 PM, Ben Craig <[hidden email]> wrote:

Something that I’ve been looking to do for a while now is to do this at the .o level, and have something to combine the per .o results as well.  I’ve wanted to do that to figure out where I can speed up builds that are overly slow because of redundant template instantiations.
 
You might also consider a view that goes across templates and across namespaces.  It can be useful to see that X% of your code is in std::map instantiations (for example).  This seems similar to how you have inheritance covered.  I’ve also wanted to find ways to visualize the opposite… where I have class Foo and I want to see its total cost, including the size of std::vector<Foo>.

These are great ideas. DWARF might not provide enough information to generate these views (it doesn’t explicitly describe the types which parameterize classes, or the names of un-specialized templates). But it should be possible to piece some of this together by parsing type names.

vedant

 
From: llvm-dev <[hidden email]> On Behalf Of Jake Ehrlich via llvm-dev
Sent: Friday, September 28, 2018 3:51 PM
To: Vedant Kumar <[hidden email]>
Cc: Apple Inc. <[hidden email]>
Subject: Re: [llvm-dev] RFC: Adding a code size analysis tool
 
Fantastic! I have been looking at creating a tool that a) only spits out actionable size reductions (preferably with a specific action should be specified) and b) only analyzes the size of allocated sections. The other deficiency I've seen with bloaty is speed and scaling. It's very hard to get bloaty to analyze across a large system of interdependent shared libraries. You can add me as a reviewer to any changes as I would very much like to see such a tool exist.

Unlike bloaty, this tool focuses exclusively on the text segment.

I'd like to see support for everything within PT_LOAD segments, not just the executable parts. Everything else you've said is basically what I wanted.
 
On Wed, Sep 26, 2018 at 12:03 PM Vedant Kumar via llvm-dev <[hidden email]> wrote:
Hello,

I worked on a code size analysis tool for a 'week of code' project and think
that it might be useful enough to upstream.

The tool is inspired by bloaty (https://github.com/google/bloaty), but tries to
do more to attribute code size in actionable ways.

For example, it can calculate how many bytes inlined instances of a function
added to a binary. In its diff mode, it can show how much more aggressively a
function was inlined compared to a baseline. This can be useful when you're,
say, trying to figure out why firmware compiled by a new compiler is just a few
bytes over the size limit imposed by your embedded device :). In this case,
extra information about inlining can help inform a decision to either tweak the
inliner's cost model or to judiciously add a few `noinline` attributes. (Note
that if you're willing to recompile & write a few SQL queries, optimization
remarks can give you similar information, albeit at the IR level.)

As another example, this code size tool can attribute code size to semantically
interesting groups of code, like C++/Swift classes, or files. In the diff mode,
you can see how the code size of a class/file grew compared to a baseline. The
tool understands inheritance, so you can also see interesting high-level trends.
E.g `clang::Sema` grew more than `llvm::Pass` between clang-6 and clang-7.

Unlike bloaty, this tool focuses exclusively on the text segment. Also unlike
bloaty, it uses LLVM's DWARF parser instead of rolling its own. The tool is
currently implemented as a sub-tool of llvm-dwarfdump.

To get size information about a program, you do:

  llvm-dwarfdump size-info -baseline <object> -stats-dir <dir>

This emits four *.stats files into <dir>, each containing a distinct 'view' into
the code groups in <object>. There's a file view, a function view, a class view,
and an inlining view. Each view is sorted by code size, so you can see the
largest functions/classes/etc immediately.

The *.stats files are just human-readable text files. As it happens, they use
the flamegraph format (http://brendangregg.com/flamegraphs.html). This makes it
easy to visualize any view as a flamegraph. (If you haven't seen one before,
it's a hierarchical visualization where the width of each entry corresponds to
its frequency (or in this case size).)

To look at code growth between two programs, you'd do:

  llvm-dwarfdump size-info -baseline <object> -target <object> -stats-dir <dir>

Similarly, this emits four 'view' files into <dir>, but with a *.diffstats
suffix. The format is the same.

Pending Work
------------

I think the main piece of work the tool needs is better testing. Currently
there's just a single end-to-end test in clang. It might be better to check in
a few binaries so we can check that the tool reports sizes correctly.

Also, it may turn out that folks are interested in different ways of visualizing
size data. While the textual format of flamegraphs is really convenient for
humans to read, the graphs themselves do make more sense when the underlying
data have a frequentist interpretation. If there's enough interest I can explore
using an alternative format for visualization, e.g:

  http://neugierig.org/software/chromium/bloat/
  https://github.com/evmar/webtreemap

(Thanks JF for pointing these out!)

Here's a link to the source code:

  https://github.com/vedantk/llvm-project/tree/sizeinfo  

Selected Examples
-----------------

Here are a few interesting snippets from a comparison of clang-6 vs. clang-7.

First, let's take a look at the function view diffstat. Here are the 10
functions which grew in size the most. On the left hand side, you'll see the
demangled function name. The *change* in code size in bytes is reported on the
right hand side (only positive changes are reported).

  clang::Sema::CheckHexagonBuiltinCpu([snip]) [function] 170316
  ProcessDeclAttribute([snip]) [function] 125893
  llvm::AArch64InstPrinter::printAliasInstr([snip]) [function] 105133
  llvm::AArch64AppleInstPrinter::printAliasInstr([snip]) [function] 105133
  ParseCodeGenArgs([snip]) [function] 64692
  unswitchNontrivialInvariants([snip]) [function] 40180
  getAttrKind([snip]) [function] 35811
  clang::DumpCompilerOptionsAction::ExecuteAction() [function] 32417
  llvm::UpgradeIntrinsicCall([snip]) [function] 30239
  bool llvm::InstructionSelector::executeMatchTable<(anonymous namespace)::ARMInstructionSelector const, [snip]) const [function] 29352


Next, let's look at the file view diffstat. This can be useful because it goes
beyond simply identifying the files which grew the most. It actually describes
which *functions* grew the most in those files, creating more opportunites to
do something about the code growth.

  lib/Target/X86/X86ISelLowering.cpp [file];combineX86ShuffleChain([snip]) [function] 24864
  lib/Target/X86/X86ISelLowering.cpp [file];combineMul([snip]) [function] 14907
  lib/Target/X86/X86ISelLowering.cpp [file];combineStore([snip]) [function] 12220
  ...
  tools/clang/lib/Sema/SemaExpr.cpp [file];clang::Sema::CheckCompareOperands([snip]) [function] 16024
  tools/clang/lib/Sema/SemaExpr.cpp [file];diagnoseTautologicalComparison([snip]) [function] 1740
  tools/clang/lib/Sema/SemaExpr.cpp [file];clang::Sema::ActOnNumericConstant([snip]) [function] 1436
  tools/clang/lib/Sema/SemaExpr.cpp [file];checkThreeWayNarrowingConversion([snip]) [function] 1356
  tools/clang/lib/Sema/SemaExpr.cpp [file];CheckIdentityFieldAssignment([snip]) [function] 1280


The class view diffstat is a bit different because it has more levels of
nesting than the other views, due to inheritance. This might help give a sense
for the high-level changes in a program, but may also be less actionable.

  clang::Sema [class];clang::Sema::CheckHexagonBuiltinCpu([snip]) [function] 170316
  clang::Sema [class];clang::Sema::CheckHexagonBuiltinArgument([snip]) [function] 24156
  clang::Sema [class];clang::Sema::ActOnTag([snip]) [function] 22373
  ...
  llvm::AArch64InstPrinter [class];llvm::AArch64AppleInstPrinter [class];llvm::AArch64AppleInstPrinter::printAliasInstr([snip]) [function] 105133
  llvm::AArch64InstPrinter [class];llvm::AArch64AppleInstPrinter [class];llvm::AArch64AppleInstPrinter::printInstruction([snip]) [function] 5824
  ...
  llvm::Pass [class];llvm::FunctionPass [class];llvm::MachineFunctionPass [class];(anon)::X86SpeculativeLoadHardeningPass [class];(anonymous namespace)::X86SpeculativeLoadHardeningPass::checkAllLoads(llvm::MachineFunction&) [function] 19287
  ...
  llvm::Pass [class];llvm::FunctionPass [class];llvm::MachineFunctionPass [class];(anon)::MachineLICMBase [class];(anonymous namespace)::MachineLICMBase::runOnMachineFunction(llvm::MachineFunction&) [function] 20343

Here's a link to a flamegraph of the class view diffstat (warning: it's big):

  http://net.vedantk.com/static/llvm/swift-clang-4.2-vs-5.0.class-view.diffstats.svg

Finally, here are a few interesting entries from the inlining view diffstat. As
with all of the other views, the right hand side still shows code growth in
bytes. For a given inlining target, this size is computed by diffing the sum of
PC range lengths from all DW_TAG_inlined_subroutines referring to that target.
This allows the size tool to attribute code size to an inlining target even
when the inlined code is not contiguous in the caller.

  llvm::raw_ostream::operator<<(char const*) [inlining-target] 66720
  llvm::MCRegisterClass::contains(unsigned int) const [inlining-target] 64161
  llvm::StringRef::StringRef(char const*) [inlining-target] 39262
  llvm::MCInst::getOperand(unsigned int) const [inlining-target] 33268
  clang::CodeCompletionResult::~CodeCompletionResult() [inlining-target] 25763
  llvm::operator+(llvm::Twine const&, llvm::Twine const&) [inlining-target] 25525
  clang::ASTImporter::Import(clang::SourceLocation) [inlining-target] 21096
  clang::Sema::Diag(clang::SourceLocation, unsigned int) [inlining-target] 20898

Feedback & questions welcome!

thanks,
vedant
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev