Clang executable sizes and build stats

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Clang executable sizes and build stats

Tom Stellard via cfe-dev
Hi all,

I recently did a run where I built clang executables on FreeBSD 12-CURRENT [1], from trunk r250000 (2015-10-11) all through r327700 (2018-03-16), with increments of 100 revisions.  This is mainly meant as an archive, for easily doing bisections, but there are also some interesting statistics.

From r250000 through r327700:
* the total (stripped) executable size grew by approximately 43%
* the size of the text segment grew by approximately 41%
* the size of the data segment grew by approximately 61%
* the size of the bss segment grew by approximately 185%
* real build time (on a 32 core system) grew by approximately 60%
* user build time (on a 32 core system) grew by approximately 62%
* maximum resident set size (RSS) grew by approximately 32%

Google spreadsheet with more numbers and some graphs:

https://docs.google.com/spreadsheets/d/e/2PACX-1vSGq1U7j45JNC_bcG4HV3jKOV4WBUPbTSgMMFXd5SD0IEPTAFwWnlU2ysprmnHsNe5WONRCjg8F5mHK/pubhtml

-Dimitry

[1] These were built using the "ninja clang clang-headers" target, followed by "ninja install-clang install-clang-headers".


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

signature.asc (230 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Clang executable sizes and build stats

Tom Stellard via cfe-dev
Thanks for raising this. This is something we've recently been looking at too at Sony, as over the course of PS4's lifetime so far we've seen our clang executable on Windows approximately double in size, which isn't ideal for things like distributed build systems.  A graph of clang.exe size on our internal staging branch matches yours closely with it being more of a death by a thousand cuts rather than being down to a small number of sudden big-bang changes.  

I did spot one range of about 25 upstream commits in our data where the exe size increased by over 1MB. My prime suspect in that range was a new scheduling model being added to the X86 backend but I've not bisected further to be sure yet.  This would be an interesting case for us as we don't really need to support any models other than Jaguar for our users but don't want to break the LLVM tests, nor introduce loads of private changes to our branch.

I know our test/QA team have been doing some analysis using Bloaty McBloatFace to see exactly where the size is coming from and produced some really nice visualizations of that data.  They've also been looking at how the MinSizeRelease config does on Windows. I think the size savings were decent but I'm not sure of performance numbers, if they have any yet.

I'll ask around at what we have to share once back in the office. 

Thanks for sharing your data!

-Greg



On Sat, 17 Mar 2018 at 12:36, Dimitry Andric via cfe-dev <[hidden email]> wrote:
Hi all,

I recently did a run where I built clang executables on FreeBSD 12-CURRENT [1], from trunk r250000 (2015-10-11) all through r327700 (2018-03-16), with increments of 100 revisions.  This is mainly meant as an archive, for easily doing bisections, but there are also some interesting statistics.

From r250000 through r327700:
* the total (stripped) executable size grew by approximately 43%
* the size of the text segment grew by approximately 41%
* the size of the data segment grew by approximately 61%
* the size of the bss segment grew by approximately 185%
* real build time (on a 32 core system) grew by approximately 60%
* user build time (on a 32 core system) grew by approximately 62%
* maximum resident set size (RSS) grew by approximately 32%

Google spreadsheet with more numbers and some graphs:

https://docs.google.com/spreadsheets/d/e/2PACX-1vSGq1U7j45JNC_bcG4HV3jKOV4WBUPbTSgMMFXd5SD0IEPTAFwWnlU2ysprmnHsNe5WONRCjg8F5mHK/pubhtml

-Dimitry

[1] These were built using the "ninja clang clang-headers" target, followed by "ninja install-clang install-clang-headers".

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Clang executable sizes and build stats

Tom Stellard via cfe-dev
I'm sure the x86 scheduler models are causing bloat. Every time a single instruction appears on a line by itself like this in a scheduler model:

def: InstRW<[SBWriteResGroup2], (instregex "ANDNPDrr")>;

It causes that instruction to be its own group in the generated output. And its replicated for each CPU. We should look into better using regular expressions or taking advantage of the fact that InstRW can take a list of instructions. That makes those instructions part of a single group and the tablegen backend will only split the group if two CPUs have different ports, latency, etc. for instructions within the group.

~Craig

On Sat, Mar 17, 2018 at 6:26 AM, Greg Bedwell via cfe-dev <[hidden email]> wrote:
Thanks for raising this. This is something we've recently been looking at too at Sony, as over the course of PS4's lifetime so far we've seen our clang executable on Windows approximately double in size, which isn't ideal for things like distributed build systems.  A graph of clang.exe size on our internal staging branch matches yours closely with it being more of a death by a thousand cuts rather than being down to a small number of sudden big-bang changes.  

I did spot one range of about 25 upstream commits in our data where the exe size increased by over 1MB. My prime suspect in that range was a new scheduling model being added to the X86 backend but I've not bisected further to be sure yet.  This would be an interesting case for us as we don't really need to support any models other than Jaguar for our users but don't want to break the LLVM tests, nor introduce loads of private changes to our branch.

I know our test/QA team have been doing some analysis using Bloaty McBloatFace to see exactly where the size is coming from and produced some really nice visualizations of that data.  They've also been looking at how the MinSizeRelease config does on Windows. I think the size savings were decent but I'm not sure of performance numbers, if they have any yet.

I'll ask around at what we have to share once back in the office. 

Thanks for sharing your data!

-Greg



On Sat, 17 Mar 2018 at 12:36, Dimitry Andric via cfe-dev <[hidden email]> wrote:
Hi all,

I recently did a run where I built clang executables on FreeBSD 12-CURRENT [1], from trunk r250000 (2015-10-11) all through r327700 (2018-03-16), with increments of 100 revisions.  This is mainly meant as an archive, for easily doing bisections, but there are also some interesting statistics.

From r250000 through r327700:
* the total (stripped) executable size grew by approximately 43%
* the size of the text segment grew by approximately 41%
* the size of the data segment grew by approximately 61%
* the size of the bss segment grew by approximately 185%
* real build time (on a 32 core system) grew by approximately 60%
* user build time (on a 32 core system) grew by approximately 62%
* maximum resident set size (RSS) grew by approximately 32%

Google spreadsheet with more numbers and some graphs:

https://docs.google.com/spreadsheets/d/e/2PACX-1vSGq1U7j45JNC_bcG4HV3jKOV4WBUPbTSgMMFXd5SD0IEPTAFwWnlU2ysprmnHsNe5WONRCjg8F5mHK/pubhtml

-Dimitry

[1] These were built using the "ninja clang clang-headers" target, followed by "ninja install-clang install-clang-headers".

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Clang executable sizes and build stats

Tom Stellard via cfe-dev


On Mar 17, 2018, at 4:04 PM, Craig Topper via cfe-dev <[hidden email]> wrote:

I'm sure the x86 scheduler models are causing bloat. Every time a single instruction appears on a line by itself like this in a scheduler model:

def: InstRW<[SBWriteResGroup2], (instregex "ANDNPDrr")>;

It causes that instruction to be its own group in the generated output. And its replicated for each CPU. We should look into better using regular expressions or taking advantage of the fact that InstRW can take a list of instructions. That makes those instructions part of a single group and the tablegen backend will only split the group if two CPUs have different ports, latency, etc. for instructions within the group.

~Craig

The tables themselves are compact. There’s actually a lot of complexity spent on compacting the resource and latency tables. But, yes, there are 5k+ entries per cpu, roughly 28 byte each. However, if you're looking at a debug build, the tables will be huge. The scheduling class names are much bigger than the data.

-Andy

On Sat, Mar 17, 2018 at 6:26 AM, Greg Bedwell via cfe-dev <[hidden email]> wrote:
Thanks for raising this. This is something we've recently been looking at too at Sony, as over the course of PS4's lifetime so far we've seen our clang executable on Windows approximately double in size, which isn't ideal for things like distributed build systems.  A graph of clang.exe size on our internal staging branch matches yours closely with it being more of a death by a thousand cuts rather than being down to a small number of sudden big-bang changes.  

I did spot one range of about 25 upstream commits in our data where the exe size increased by over 1MB. My prime suspect in that range was a new scheduling model being added to the X86 backend but I've not bisected further to be sure yet.  This would be an interesting case for us as we don't really need to support any models other than Jaguar for our users but don't want to break the LLVM tests, nor introduce loads of private changes to our branch.

I know our test/QA team have been doing some analysis using Bloaty McBloatFace to see exactly where the size is coming from and produced some really nice visualizations of that data.  They've also been looking at how the MinSizeRelease config does on Windows. I think the size savings were decent but I'm not sure of performance numbers, if they have any yet.

I'll ask around at what we have to share once back in the office. 

Thanks for sharing your data!

-Greg



On Sat, 17 Mar 2018 at 12:36, Dimitry Andric via cfe-dev <[hidden email]> wrote:
Hi all,

I recently did a run where I built clang executables on FreeBSD 12-CURRENT [1], from trunk r250000 (2015-10-11) all through r327700 (2018-03-16), with increments of 100 revisions.  This is mainly meant as an archive, for easily doing bisections, but there are also some interesting statistics.

From r250000 through r327700:
* the total (stripped) executable size grew by approximately 43%
* the size of the text segment grew by approximately 41%
* the size of the data segment grew by approximately 61%
* the size of the bss segment grew by approximately 185%
* real build time (on a 32 core system) grew by approximately 60%
* user build time (on a 32 core system) grew by approximately 62%
* maximum resident set size (RSS) grew by approximately 32%

Google spreadsheet with more numbers and some graphs:

https://docs.google.com/spreadsheets/d/e/2PACX-1vSGq1U7j45JNC_bcG4HV3jKOV4WBUPbTSgMMFXd5SD0IEPTAFwWnlU2ysprmnHsNe5WONRCjg8F5mHK/pubhtml

-Dimitry

[1] These were built using the "ninja clang clang-headers" target, followed by "ninja install-clang install-clang-headers".

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Clang executable sizes and build stats

Tom Stellard via cfe-dev
I just knocked ~400k off the size of the x86 scheduler tables by reducing from 5k+ entries to 2k+ entries per cpu.

~Craig

On Tue, Mar 20, 2018 at 6:34 PM, Andrew Trick <[hidden email]> wrote:


On Mar 17, 2018, at 4:04 PM, Craig Topper via cfe-dev <[hidden email]> wrote:

I'm sure the x86 scheduler models are causing bloat. Every time a single instruction appears on a line by itself like this in a scheduler model:

def: InstRW<[SBWriteResGroup2], (instregex "ANDNPDrr")>;

It causes that instruction to be its own group in the generated output. And its replicated for each CPU. We should look into better using regular expressions or taking advantage of the fact that InstRW can take a list of instructions. That makes those instructions part of a single group and the tablegen backend will only split the group if two CPUs have different ports, latency, etc. for instructions within the group.

~Craig

The tables themselves are compact. There’s actually a lot of complexity spent on compacting the resource and latency tables. But, yes, there are 5k+ entries per cpu, roughly 28 byte each. However, if you're looking at a debug build, the tables will be huge. The scheduling class names are much bigger than the data.

-Andy

On Sat, Mar 17, 2018 at 6:26 AM, Greg Bedwell via cfe-dev <[hidden email]> wrote:
Thanks for raising this. This is something we've recently been looking at too at Sony, as over the course of PS4's lifetime so far we've seen our clang executable on Windows approximately double in size, which isn't ideal for things like distributed build systems.  A graph of clang.exe size on our internal staging branch matches yours closely with it being more of a death by a thousand cuts rather than being down to a small number of sudden big-bang changes.  

I did spot one range of about 25 upstream commits in our data where the exe size increased by over 1MB. My prime suspect in that range was a new scheduling model being added to the X86 backend but I've not bisected further to be sure yet.  This would be an interesting case for us as we don't really need to support any models other than Jaguar for our users but don't want to break the LLVM tests, nor introduce loads of private changes to our branch.

I know our test/QA team have been doing some analysis using Bloaty McBloatFace to see exactly where the size is coming from and produced some really nice visualizations of that data.  They've also been looking at how the MinSizeRelease config does on Windows. I think the size savings were decent but I'm not sure of performance numbers, if they have any yet.

I'll ask around at what we have to share once back in the office. 

Thanks for sharing your data!

-Greg



On Sat, 17 Mar 2018 at 12:36, Dimitry Andric via cfe-dev <[hidden email]> wrote:
Hi all,

I recently did a run where I built clang executables on FreeBSD 12-CURRENT [1], from trunk r250000 (2015-10-11) all through r327700 (2018-03-16), with increments of 100 revisions.  This is mainly meant as an archive, for easily doing bisections, but there are also some interesting statistics.

From r250000 through r327700:
* the total (stripped) executable size grew by approximately 43%
* the size of the text segment grew by approximately 41%
* the size of the data segment grew by approximately 61%
* the size of the bss segment grew by approximately 185%
* real build time (on a 32 core system) grew by approximately 60%
* user build time (on a 32 core system) grew by approximately 62%
* maximum resident set size (RSS) grew by approximately 32%

Google spreadsheet with more numbers and some graphs:

https://docs.google.com/spreadsheets/d/e/2PACX-1vSGq1U7j45JNC_bcG4HV3jKOV4WBUPbTSgMMFXd5SD0IEPTAFwWnlU2ysprmnHsNe5WONRCjg8F5mHK/pubhtml

-Dimitry

[1] These were built using the "ninja clang clang-headers" target, followed by "ninja install-clang install-clang-headers".

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Clang executable sizes and build stats

Tom Stellard via cfe-dev
Thanks.  We see that reduction too in our clang.exe builds on Windows.  Nice to see that we can make at least some difference!

-Greg

On 22 March 2018 at 04:28, Craig Topper <[hidden email]> wrote:
I just knocked ~400k off the size of the x86 scheduler tables by reducing from 5k+ entries to 2k+ entries per cpu.

~Craig

On Tue, Mar 20, 2018 at 6:34 PM, Andrew Trick <[hidden email]> wrote:


On Mar 17, 2018, at 4:04 PM, Craig Topper via cfe-dev <[hidden email]> wrote:

I'm sure the x86 scheduler models are causing bloat. Every time a single instruction appears on a line by itself like this in a scheduler model:

def: InstRW<[SBWriteResGroup2], (instregex "ANDNPDrr")>;

It causes that instruction to be its own group in the generated output. And its replicated for each CPU. We should look into better using regular expressions or taking advantage of the fact that InstRW can take a list of instructions. That makes those instructions part of a single group and the tablegen backend will only split the group if two CPUs have different ports, latency, etc. for instructions within the group.

~Craig

The tables themselves are compact. There’s actually a lot of complexity spent on compacting the resource and latency tables. But, yes, there are 5k+ entries per cpu, roughly 28 byte each. However, if you're looking at a debug build, the tables will be huge. The scheduling class names are much bigger than the data.

-Andy

On Sat, Mar 17, 2018 at 6:26 AM, Greg Bedwell via cfe-dev <[hidden email]> wrote:
Thanks for raising this. This is something we've recently been looking at too at Sony, as over the course of PS4's lifetime so far we've seen our clang executable on Windows approximately double in size, which isn't ideal for things like distributed build systems.  A graph of clang.exe size on our internal staging branch matches yours closely with it being more of a death by a thousand cuts rather than being down to a small number of sudden big-bang changes.  

I did spot one range of about 25 upstream commits in our data where the exe size increased by over 1MB. My prime suspect in that range was a new scheduling model being added to the X86 backend but I've not bisected further to be sure yet.  This would be an interesting case for us as we don't really need to support any models other than Jaguar for our users but don't want to break the LLVM tests, nor introduce loads of private changes to our branch.

I know our test/QA team have been doing some analysis using Bloaty McBloatFace to see exactly where the size is coming from and produced some really nice visualizations of that data.  They've also been looking at how the MinSizeRelease config does on Windows. I think the size savings were decent but I'm not sure of performance numbers, if they have any yet.

I'll ask around at what we have to share once back in the office. 

Thanks for sharing your data!

-Greg



On Sat, 17 Mar 2018 at 12:36, Dimitry Andric via cfe-dev <[hidden email]> wrote:
Hi all,

I recently did a run where I built clang executables on FreeBSD 12-CURRENT [1], from trunk r250000 (2015-10-11) all through r327700 (2018-03-16), with increments of 100 revisions.  This is mainly meant as an archive, for easily doing bisections, but there are also some interesting statistics.

From r250000 through r327700:
* the total (stripped) executable size grew by approximately 43%
* the size of the text segment grew by approximately 41%
* the size of the data segment grew by approximately 61%
* the size of the bss segment grew by approximately 185%
* real build time (on a 32 core system) grew by approximately 60%
* user build time (on a 32 core system) grew by approximately 62%
* maximum resident set size (RSS) grew by approximately 32%

Google spreadsheet with more numbers and some graphs:

https://docs.google.com/spreadsheets/d/e/2PACX-1vSGq1U7j45JNC_bcG4HV3jKOV4WBUPbTSgMMFXd5SD0IEPTAFwWnlU2ysprmnHsNe5WONRCjg8F5mHK/pubhtml

-Dimitry

[1] These were built using the "ninja clang clang-headers" target, followed by "ninja install-clang install-clang-headers".

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev




_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev