[RFC] Adding a different mode of "where clang spends time" reporting (timeline/flamegraph style)

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

[RFC] Adding a different mode of "where clang spends time" reporting (timeline/flamegraph style)

David Blaikie via cfe-dev
Hello!

TL;DR: I have made a Clang/LLVM code change that adds "-ftime-trace" option, that produces Chrome Tracing format output. Would like comments on whether that is a good idea or not, or perhaps someone else is already doing this. My current (WIP) patch in github PR format is here https://github.com/aras-p/llvm-project-20170507/pull/2 -- with images and trace output files attached.

Longer version:

Current implementation of "-ftime-report" has several issues, particularly when I'm just a "user" of the compiler:

- it outputs a lot of information (almost half of it is duplicated since clang 7.0),
- a lot of that information is things that only compiler developers would know about,
- has quite large overhead, I've seen it make compile times take 1.5x longer,
- has very little information about "frontend" part (preprocessing, parsing, instantiation, C++ modules),
- the things it reports are only "summaries", i.e. "how much time it took to do work X in total". e.g. it can tell that "inlining all functions took X seconds", but in case there was just one super slow function to inline among
thousands, it will not tell which one was the slow one.

I have written a blog post about this (as well as lack of "good" time reporting tools in Visual Studio and gcc) recently, http://aras-p.info/blog/2019/01/12/Investigating-compile-times-and-Clang-ftime-report/

At work (Unity game engine), with codebase of similar size to whole of Clang/LLVM (2-3 million lines of code), we had really good experience in adding timeline/flamegraph visualizations to various parts of our "build system". This can tell us which .cpp files were slowest to compile in the whole build, but I also wanted similar tooling for things "inside" single .cpp file compilation.

Thus this attempt at adding a new time trace profiling mode.

I have current changes on github here, https://github.com/aras-p/llvm-project-20170507/pull/2 -- can do a proper "patch" thing via Phabricator if needed.

My current code change does not quite match Clang/LLVM code standards and probably needs some work, but the general approach seems to work. Already found one case of Clang being very slow at parsing some weird recursive macro thingamabob that we had; was causing about 5-8 seconds just to include one header file. I probably would have never found it without this type of visualization. Here it is very clear that among all the things, parsing just that one header file takes almost all the time: https://user-images.githubusercontent.com/348087/51038295-76efb780-15bb-11e9-926f-a6be1ffd03f1.png


Regards,

--
Aras Pranckevičius
work: http://unity3d.com
home: http://aras-p.info

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding a different mode of "where clang spends time" reporting (timeline/flamegraph style)

David Blaikie via cfe-dev
Non-feedback passing-by remark:
while a command-line switch to enable such timing traces is good, it
should be noted that llvm xray exists,
and it can nicely produce this "Chrome Tracing format output", so i
almost wonder if simply building clang
with that instrumentation would be better overall, if not for the ease
of adoption.

On Sun, Jan 13, 2019 at 3:31 PM Aras Pranckevicius via cfe-dev
<[hidden email]> wrote:

>
> Hello!
>
> TL;DR: I have made a Clang/LLVM code change that adds "-ftime-trace" option, that produces Chrome Tracing format output. Would like comments on whether that is a good idea or not, or perhaps someone else is already doing this. My current (WIP) patch in github PR format is here https://github.com/aras-p/llvm-project-20170507/pull/2 -- with images and trace output files attached.
>
> Longer version:
>
> Current implementation of "-ftime-report" has several issues, particularly when I'm just a "user" of the compiler:
>
> - it outputs a lot of information (almost half of it is duplicated since clang 7.0),
> - a lot of that information is things that only compiler developers would know about,
> - has quite large overhead, I've seen it make compile times take 1.5x longer,
> - has very little information about "frontend" part (preprocessing, parsing, instantiation, C++ modules),
> - the things it reports are only "summaries", i.e. "how much time it took to do work X in total". e.g. it can tell that "inlining all functions took X seconds", but in case there was just one super slow function to inline among
> thousands, it will not tell which one was the slow one.
>
> I have written a blog post about this (as well as lack of "good" time reporting tools in Visual Studio and gcc) recently, http://aras-p.info/blog/2019/01/12/Investigating-compile-times-and-Clang-ftime-report/
>
> At work (Unity game engine), with codebase of similar size to whole of Clang/LLVM (2-3 million lines of code), we had really good experience in adding timeline/flamegraph visualizations to various parts of our "build system". This can tell us which .cpp files were slowest to compile in the whole build, but I also wanted similar tooling for things "inside" single .cpp file compilation.
>
> Thus this attempt at adding a new time trace profiling mode.
>
> I have current changes on github here, https://github.com/aras-p/llvm-project-20170507/pull/2 -- can do a proper "patch" thing via Phabricator if needed.
>
> My current code change does not quite match Clang/LLVM code standards and probably needs some work, but the general approach seems to work. Already found one case of Clang being very slow at parsing some weird recursive macro thingamabob that we had; was causing about 5-8 seconds just to include one header file. I probably would have never found it without this type of visualization. Here it is very clear that among all the things, parsing just that one header file takes almost all the time: https://user-images.githubusercontent.com/348087/51038295-76efb780-15bb-11e9-926f-a6be1ffd03f1.png
>
>
> Regards,
>
> --
> Aras Pranckevičius
Roman.

> work: http://unity3d.com
> home: http://aras-p.info
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding a different mode of "where clang spends time" reporting (timeline/flamegraph style)

David Blaikie via cfe-dev
> while a command-line switch to enable such timing traces is good, it
> should be noted that llvm xray exists,
> and it can nicely produce this "Chrome Tracing format output", so i
> almost wonder if simply building clang
> with that instrumentation would be better overall, if not for the ease
> of adoption.

I'm not familiar much with Xray.

From a quick glance, it seems to only exist on Linux (my own use case at work is: 50/50 split between Windows & Mac, rarely if ever use Linux -- mostly because the main platforms that we work on are Windows or Mac based).

And then I don't know what's the overhead of having clang with Xray compiled in. Would it be low enough to have it there on by default?

--
Aras Pranckevičius
work: http://unity3d.com
home: http://aras-p.info


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding a different mode of "where clang spends time" reporting (timeline/flamegraph style)

David Blaikie via cfe-dev
In reply to this post by David Blaikie via cfe-dev
Hi Aras,

Nice work! Visualizing the timeline is a powerful tool, and I'd appreciate having an easy way to annotate how spans of time are spent.
We also have a facility in clangd[1] to record traces and output them in chrome tracing format, and I've also done some performance work with similar tools internally.

Some random thoughts:
 - the distinction between tracing/profiling automatically (stack frames, using xray or a profiler)
 - the timeline view, flame graph, and 


[1]

On Sun, Jan 13, 2019 at 1:31 PM Aras Pranckevicius via cfe-dev <[hidden email]> wrote:
Hello!

TL;DR: I have made a Clang/LLVM code change that adds "-ftime-trace" option, that produces Chrome Tracing format output. Would like comments on whether that is a good idea or not, or perhaps someone else is already doing this. My current (WIP) patch in github PR format is here https://github.com/aras-p/llvm-project-20170507/pull/2 -- with images and trace output files attached.

Longer version:

Current implementation of "-ftime-report" has several issues, particularly when I'm just a "user" of the compiler:

- it outputs a lot of information (almost half of it is duplicated since clang 7.0),
- a lot of that information is things that only compiler developers would know about,
- has quite large overhead, I've seen it make compile times take 1.5x longer,
- has very little information about "frontend" part (preprocessing, parsing, instantiation, C++ modules),
- the things it reports are only "summaries", i.e. "how much time it took to do work X in total". e.g. it can tell that "inlining all functions took X seconds", but in case there was just one super slow function to inline among
thousands, it will not tell which one was the slow one.

I have written a blog post about this (as well as lack of "good" time reporting tools in Visual Studio and gcc) recently, http://aras-p.info/blog/2019/01/12/Investigating-compile-times-and-Clang-ftime-report/

At work (Unity game engine), with codebase of similar size to whole of Clang/LLVM (2-3 million lines of code), we had really good experience in adding timeline/flamegraph visualizations to various parts of our "build system". This can tell us which .cpp files were slowest to compile in the whole build, but I also wanted similar tooling for things "inside" single .cpp file compilation.

Thus this attempt at adding a new time trace profiling mode.

I have current changes on github here, https://github.com/aras-p/llvm-project-20170507/pull/2 -- can do a proper "patch" thing via Phabricator if needed.

My current code change does not quite match Clang/LLVM code standards and probably needs some work, but the general approach seems to work. Already found one case of Clang being very slow at parsing some weird recursive macro thingamabob that we had; was causing about 5-8 seconds just to include one header file. I probably would have never found it without this type of visualization. Here it is very clear that among all the things, parsing just that one header file takes almost all the time: https://user-images.githubusercontent.com/348087/51038295-76efb780-15bb-11e9-926f-a6be1ffd03f1.png


Regards,

--
Aras Pranckevičius
work: http://unity3d.com
home: http://aras-p.info
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding a different mode of "where clang spends time" reporting (timeline/flamegraph style)

David Blaikie via cfe-dev
(Sorry, hit send too early)

On Tue, Jan 15, 2019 at 12:20 PM Sam McCall <[hidden email]> wrote:
Hi Aras,

Nice work! Visualizing the timeline is a powerful tool, and I'd appreciate having an easy way to annotate how spans of time are spent.
We also have a facility in clangd[1] to record traces and output them in chrome tracing format, and I've also done some performance work with similar tools internally.

Some random thoughts:
 - the distinction between tracing/profiling automatically (stack frames, using xray or a profiler) and manually annotating ranges is an important one - it may be possible to use XRay to combine the two.
 - the timeline view, flame graph, and graphviz representations are independently useful, so being able to output something pprof can read is nice
 - for multi-threaded programs, being able to represent events split across or transferred between threads is important, but complicates the model
 - it's been useful to us to plug in other backends (e.g. internal cross-server tracing tools)
 - the chrome tracing format is pretty awkward to work with: bulky, odd semantics, doesn't survive truncation. It'd be nice to have something better.

The clangd implementation probably isn't general purpose enough to use for clang (particularly its approach to threading), but there might be some useful ideas in there.

Cheers, Sam

[1] https://reviews.llvm.org/source/clang-tools-extra/browse/clang-tools-extra/trunk/clangd/Trace.h

On Sun, Jan 13, 2019 at 1:31 PM Aras Pranckevicius via cfe-dev <[hidden email]> wrote:
Hello!

TL;DR: I have made a Clang/LLVM code change that adds "-ftime-trace" option, that produces Chrome Tracing format output. Would like comments on whether that is a good idea or not, or perhaps someone else is already doing this. My current (WIP) patch in github PR format is here https://github.com/aras-p/llvm-project-20170507/pull/2 -- with images and trace output files attached.

Longer version:

Current implementation of "-ftime-report" has several issues, particularly when I'm just a "user" of the compiler:

- it outputs a lot of information (almost half of it is duplicated since clang 7.0),
- a lot of that information is things that only compiler developers would know about,
- has quite large overhead, I've seen it make compile times take 1.5x longer,
- has very little information about "frontend" part (preprocessing, parsing, instantiation, C++ modules),
- the things it reports are only "summaries", i.e. "how much time it took to do work X in total". e.g. it can tell that "inlining all functions took X seconds", but in case there was just one super slow function to inline among
thousands, it will not tell which one was the slow one.

I have written a blog post about this (as well as lack of "good" time reporting tools in Visual Studio and gcc) recently, http://aras-p.info/blog/2019/01/12/Investigating-compile-times-and-Clang-ftime-report/

At work (Unity game engine), with codebase of similar size to whole of Clang/LLVM (2-3 million lines of code), we had really good experience in adding timeline/flamegraph visualizations to various parts of our "build system". This can tell us which .cpp files were slowest to compile in the whole build, but I also wanted similar tooling for things "inside" single .cpp file compilation.

Thus this attempt at adding a new time trace profiling mode.

I have current changes on github here, https://github.com/aras-p/llvm-project-20170507/pull/2 -- can do a proper "patch" thing via Phabricator if needed.

My current code change does not quite match Clang/LLVM code standards and probably needs some work, but the general approach seems to work. Already found one case of Clang being very slow at parsing some weird recursive macro thingamabob that we had; was causing about 5-8 seconds just to include one header file. I probably would have never found it without this type of visualization. Here it is very clear that among all the things, parsing just that one header file takes almost all the time: https://user-images.githubusercontent.com/348087/51038295-76efb780-15bb-11e9-926f-a6be1ffd03f1.png


Regards,

--
Aras Pranckevičius
work: http://unity3d.com
home: http://aras-p.info
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding a different mode of "where clang spends time" reporting (timeline/flamegraph style)

David Blaikie via cfe-dev
(I'm not familar with many/most of llvm/clang tooling, so some of my questions below might not make sense... bear with me)
 
 - the distinction between tracing/profiling automatically (stack frames, using xray or a profiler) and manually annotating ranges is an important one - it may be possible to use XRay to combine the two.

Is Xray as "a general solution for this type of problem" general enough? From a quick glance, it seems to only work on Linux, which would exclude most of clang users who are on other OSes.

 
 - the timeline view, flame graph, and graphviz representations are independently useful, so being able to output something pprof can read is nice

Agreed. I only did Chrome Tracing one because it's trivial to write, and a tool to view it exists on all OSes I care about. What other profiling output formats are "common enough" to have support for? I guess this might be highly industry specific (e.g. I haven't even heard of pprof before -- maybe because my work rarely if ever involves Linux).


 - for multi-threaded programs, being able to represent events split across or transferred between threads is important, but complicates the model

Yeah. I thought about making it support multiple threads, but then seemingly all (or majority) of clang itself is single threaded. I found some utilities in llvm codebase (thread locals, thread pool etc.) but they seem to not be actually used in the whole codebase much.

For other implementations of a similar profiler  in other codebases that I've done, I approached the threading situation by having all profiler data be thread-local; that way profiler regions don't need to do any mutex locks; the only additional cost is a TLS lookup. And then "at the end" when profiling data is written into a file, which is required to happen when all thread work is done already, it just merges all the per-thread data and emits that. That is easy to implement and does not have high performance impact.

 
 - the chrome tracing format is pretty awkward to work with: bulky, odd semantics, doesn't survive truncation. It'd be nice to have something better.

Similar to question before -- do you have suggestions for which format has readily available viewing tools (e.g. Speedoscope supports it, or similar -- https://github.com/jlfwong/speedscope), and is better?
 


--
Aras Pranckevičius
work: http://unity3d.com
home: http://aras-p.info

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding a different mode of "where clang spends time" reporting (timeline/flamegraph style)

David Blaikie via cfe-dev
In reply to this post by David Blaikie via cfe-dev
This is pretty cool!

I think the question is, "what next"? I don't really want to say "merge it right away!", because then we will have two systems for generating time reports, but I also don't want to send you off into the hills to refactor the existing Timer to support both output formats, and then decide what to do with all the fine-grained timers geared towards compiler developers. Would you be interested in working in that that direction, or would you rather hand the code off to someone else to try to integrate it more tightly?

I feel like right now C++ users don't have a lot of visibility into build performance, so it ends up falling to specialist toolchain people to go and debug these problems. If we made it easy to see into the performance of compilation, it would democratize build time optimization, letting generalists do these kinds of codebase cleanups. At least, I hope it will. :)

On Sun, Jan 13, 2019 at 4:31 AM Aras Pranckevicius via cfe-dev <[hidden email]> wrote:
Hello!

TL;DR: I have made a Clang/LLVM code change that adds "-ftime-trace" option, that produces Chrome Tracing format output. Would like comments on whether that is a good idea or not, or perhaps someone else is already doing this. My current (WIP) patch in github PR format is here https://github.com/aras-p/llvm-project-20170507/pull/2 -- with images and trace output files attached.

Longer version:

Current implementation of "-ftime-report" has several issues, particularly when I'm just a "user" of the compiler:

- it outputs a lot of information (almost half of it is duplicated since clang 7.0),
- a lot of that information is things that only compiler developers would know about,
- has quite large overhead, I've seen it make compile times take 1.5x longer,
- has very little information about "frontend" part (preprocessing, parsing, instantiation, C++ modules),
- the things it reports are only "summaries", i.e. "how much time it took to do work X in total". e.g. it can tell that "inlining all functions took X seconds", but in case there was just one super slow function to inline among
thousands, it will not tell which one was the slow one.

I have written a blog post about this (as well as lack of "good" time reporting tools in Visual Studio and gcc) recently, http://aras-p.info/blog/2019/01/12/Investigating-compile-times-and-Clang-ftime-report/

At work (Unity game engine), with codebase of similar size to whole of Clang/LLVM (2-3 million lines of code), we had really good experience in adding timeline/flamegraph visualizations to various parts of our "build system". This can tell us which .cpp files were slowest to compile in the whole build, but I also wanted similar tooling for things "inside" single .cpp file compilation.

Thus this attempt at adding a new time trace profiling mode.

I have current changes on github here, https://github.com/aras-p/llvm-project-20170507/pull/2 -- can do a proper "patch" thing via Phabricator if needed.

My current code change does not quite match Clang/LLVM code standards and probably needs some work, but the general approach seems to work. Already found one case of Clang being very slow at parsing some weird recursive macro thingamabob that we had; was causing about 5-8 seconds just to include one header file. I probably would have never found it without this type of visualization. Here it is very clear that among all the things, parsing just that one header file takes almost all the time: https://user-images.githubusercontent.com/348087/51038295-76efb780-15bb-11e9-926f-a6be1ffd03f1.png


Regards,

--
Aras Pranckevičius
work: http://unity3d.com
home: http://aras-p.info
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding a different mode of "where clang spends time" reporting (timeline/flamegraph style)

David Blaikie via cfe-dev
Just to say that I am also interested by something like this, although more from
the point of view of someone working on the compiler itself. I think however that
it would make sense to find a way to have one system for generating time reports
which would be both useful to people working on clang and people using clang.

Another thing I would like to have is a way to trace memory allocations, both in the
BumpAllocators and in malloc, and both in time and where they are triggered from
logically.

Bruno

On 23/01/2019 01:00, Reid Kleckner via cfe-dev wrote:

> This is pretty cool!
>
> I think the question is, "what next"? I don't really want to say "merge it right away!", because then we will have two systems for generating time reports, but I also don't want to send you off into the hills to refactor the existing Timer to support both output formats, and then decide what to do with all the fine-grained timers geared towards compiler developers. Would you be interested in working in that that direction, or would you rather hand the code off to someone else to try to integrate it more tightly?
>
> I feel like right now C++ users don't have a lot of visibility into build performance, so it ends up falling to specialist toolchain people to go and debug these problems. If we made it easy to see into the performance of compilation, it would democratize build time optimization, letting generalists do these kinds of codebase cleanups. At least, I hope it will. :)
>
> On Sun, Jan 13, 2019 at 4:31 AM Aras Pranckevicius via cfe-dev <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Hello!
>
>     TL;DR: I have made a Clang/LLVM code change that adds "-ftime-trace" option, that produces Chrome Tracing format output. Would like comments on whether that is a good idea or not, or perhaps someone else is already doing this. My current (WIP) patch in github PR format is here https://github.com/aras-p/llvm-project-20170507/pull/2 -- with images and trace output files attached.
>
>     Longer version:
>
>     Current implementation of "-ftime-report" has several issues, particularly when I'm just a "user" of the compiler:
>
>     - it outputs a lot of information (almost half of it is duplicated since clang 7.0),
>     - a lot of that information is things that only compiler developers would know about,
>     - has quite large overhead, I've seen it make compile times take 1.5x longer,
>     - has very little information about "frontend" part (preprocessing, parsing, instantiation, C++ modules),
>     - the things it reports are only "summaries", i.e. "how much time it took to do work X in total". e.g. it can tell that "inlining all functions took X seconds", but in case there was just one super slow function to inline among
>     thousands, it will not tell which one was the slow one.
>
>     I have written a blog post about this (as well as lack of "good" time reporting tools in Visual Studio and gcc) recently, http://aras-p.info/blog/2019/01/12/Investigating-compile-times-and-Clang-ftime-report/
>
>     At work (Unity game engine), with codebase of similar size to whole of Clang/LLVM (2-3 million lines of code), we had really good experience in adding timeline/flamegraph visualizations to various parts of our "build system". This can tell us which .cpp files were slowest to compile in the whole build, but I also wanted similar tooling for things "inside" single .cpp file compilation.
>
>     Thus this attempt at adding a new time trace profiling mode.
>
>     I have current changes on github here, https://github.com/aras-p/llvm-project-20170507/pull/2 -- can do a proper "patch" thing via Phabricator if needed.
>
>     My current code change does not quite match Clang/LLVM code standards and probably needs some work, but the general approach seems to work. Already found one case of Clang being very slow at parsing some weird recursive macro thingamabob that we had; was causing about 5-8 seconds just to include one header file. I probably would have never found it without this type of visualization. Here it is very clear that among all the things, parsing just that one header file takes almost all the time: https://user-images.githubusercontent.com/348087/51038295-76efb780-15bb-11e9-926f-a6be1ffd03f1.png
>
>
>     Regards,
>
>     --
>     Aras Pranckevičius
>     work: http://unity3d.com
>     home: http://aras-p.info
>     _______________________________________________
>     cfe-dev mailing list
>     [hidden email] <mailto:[hidden email]>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding a different mode of "where clang spends time" reporting (timeline/flamegraph style)

David Blaikie via cfe-dev
In reply to this post by David Blaikie via cfe-dev


On Wed, Jan 23, 2019, 02:00 Reid Kleckner via cfe-dev <[hidden email] wrote:
This is pretty cool!

I think the question is, "what next"? I don't really want to say "merge it right away!", because then we will have two systems for generating time reports, but I also don't want to send you off into the hills to refactor the existing Timer to support both output formats, and then decide what to do with all the fine-grained timers geared towards compiler developers. Would you be interested in working in that that direction, or would you rather hand the code off to someone else to try to integrate it more tightly?

I feel like right now C++ users don't have a lot of visibility into build performance, so it ends up falling to specialist toolchain people to go and debug these problems. If we made it easy to see into the performance of compilation, it would democratize build time optimization, letting generalists do these kinds of codebase cleanups. At least, I hope it will. :)

This. I've spent so much time trying to visualize build times based on incomplete data. I couldn't agree more. 

- Kim 



_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding a different mode of "where clang spends time" reporting (timeline/flamegraph style)

David Blaikie via cfe-dev
In reply to this post by David Blaikie via cfe-dev
I think the question is, "what next"? I don't really want to say "merge it right away!", because then we will have two systems for generating time reports, but I also don't want to send you off into the hills to refactor the existing Timer to support both output formats, and then decide what to do with all the fine-grained timers geared towards compiler developers. Would you be interested in working in that that direction, or would you rather hand the code off to someone else to try to integrate it more tightly?

Good question.

I could take a stab at trying to make some sort of "one implementation" that could both generate the old -ftime-report report, and a hierarchical timeline / flame chart style one. But I've never worked with llvm/clang codebase, so I'd have some questions! The major one is:

- is it fine to change some interface (of e.g. Timer.h) and update all users inside LLVM monorepo to it? Or are there a million other projects using LLVM, with e.g. Timer.h being effectively an API that can't be changed?


Alternatively, I could make the whole change be contained just within clang itself. Right now I have put the new timer under llvm Support library, but I only use it in one place within llvm; all other usages are from within Clang only. It could almost just as well be some timer thing that only Clang uses. Not sure if that's worth doing or is easier accepted, vs. "the rest of llvm would want a timeline timer utility too". Thoughts?

 
 
I feel like right now C++ users don't have a lot of visibility into build performance, so it ends up falling to specialist toolchain people to go and debug these problems. If we made it easy to see into the performance of compilation, it would democratize build time optimization, letting generalists do these kinds of codebase cleanups. At least, I hope it will. :)

Indeed. A lot of "optimize C++ codebase compile times" is viewed as black box, also with quite a lot of myths and cargo culting from the inernet.

FWIW, Visual C++ has frontend timing report functionality too, but it was not documented at all. So I wrote about it a few days ago: http://aras-p.info/blog/2019/01/21/Another-cool-MSVC-flag-d1reportTime/


--
Aras Pranckevičius
work: http://unity3d.com
home: http://aras-p.info


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding a different mode of "where clang spends time" reporting (timeline/flamegraph style)

David Blaikie via cfe-dev
Aras Pranckevicius via cfe-dev <[hidden email]> writes:

>>
>>
>> *I think the question is, "what next"? I don't really want to say "merge
>> it right away!", because then we will have two systems for generating time
>> reports, but I also don't want to send you off into the hills to refactor
>> the existing Timer to support both output formats, and then decide what to
>> do with all the fine-grained timers geared towards compiler developers.
>> Would you be interested in working in that that direction, or would you
>> rather hand the code off to someone else to try to integrate it more
>> tightly?*
>>
>
> Good question.
>
> I could take a stab at trying to make some sort of "one implementation"
> that could both generate the old -ftime-report report, and a hierarchical
> timeline / flame chart style one. But I've never worked with llvm/clang
> codebase, so I'd have some questions! The major one is:
>
> - is it fine to change some interface (of e.g. Timer.h) and update all
> users inside LLVM monorepo to it? Or are there a million other projects
> using LLVM, with e.g. Timer.h being effectively an API that can't be
> changed?

Changing LLVM interfaces and updating the uses in the monorepo is
sufficient. LLVM does not make any guarantees regarding stability of the
C++ APIs.

Feel free to ask as many questions as you need :)

> Alternatively, I could make the whole change be contained just within clang
> itself. Right now I have put the new timer under llvm Support library, but
> I only use it in one place within llvm; all other usages are from within
> Clang only. It could almost just as well be some timer thing that only
> Clang uses. Not sure if that's worth doing or is easier accepted, vs. "the
> rest of llvm would want a timeline timer utility too". Thoughts?

IMO Doing this work in llvm/Support makes the most sense, since it's
likely that frontends other than clang will benefit from this
functionality as well.

>> *I feel like right now C++ users don't have a lot of visibility into build
>> performance, so it ends up falling to specialist toolchain people to go and
>> debug these problems. If we made it easy to see into the performance of
>> compilation, it would democratize build time optimization, letting
>> generalists do these kinds of codebase cleanups. At least, I hope it will.
>> :)*
>>
>
> Indeed. A lot of "optimize C++ codebase compile times" is viewed as black
> box, also with quite a lot of myths and cargo culting from the inernet.
>
> FWIW, Visual C++ has frontend timing report functionality too, but it was
> not documented at all. So I wrote about it a few days ago:
> http://aras-p.info/blog/2019/01/21/Another-cool-MSVC-flag-d1reportTime/
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding a different mode of "where clang spends time" reporting (timeline/flamegraph style)

David Blaikie via cfe-dev
In reply to this post by David Blaikie via cfe-dev
Thank you for working on this!

I think this is a really good direction, and a sorely missing feature.
I very much like how user-focused it is: all of your time categories
correspond to steps that I'd expect to be meaningful and actionable
for users.

Have you considered putting begin/end markers in the lexer for when we
enter and leave a file, to make it easier to see which header files
are expensive?

On Sun, 13 Jan 2019 at 04:31, Aras Pranckevicius via cfe-dev
<[hidden email]> wrote:

> Hello!
>
> TL;DR: I have made a Clang/LLVM code change that adds "-ftime-trace" option, that produces Chrome Tracing format output. Would like comments on whether that is a good idea or not, or perhaps someone else is already doing this. My current (WIP) patch in github PR format is here https://github.com/aras-p/llvm-project-20170507/pull/2 -- with images and trace output files attached.
>
> Longer version:
>
> Current implementation of "-ftime-report" has several issues, particularly when I'm just a "user" of the compiler:
>
> - it outputs a lot of information (almost half of it is duplicated since clang 7.0),
> - a lot of that information is things that only compiler developers would know about,
> - has quite large overhead, I've seen it make compile times take 1.5x longer,
> - has very little information about "frontend" part (preprocessing, parsing, instantiation, C++ modules),
> - the things it reports are only "summaries", i.e. "how much time it took to do work X in total". e.g. it can tell that "inlining all functions took X seconds", but in case there was just one super slow function to inline among
> thousands, it will not tell which one was the slow one.
>
> I have written a blog post about this (as well as lack of "good" time reporting tools in Visual Studio and gcc) recently, http://aras-p.info/blog/2019/01/12/Investigating-compile-times-and-Clang-ftime-report/
>
> At work (Unity game engine), with codebase of similar size to whole of Clang/LLVM (2-3 million lines of code), we had really good experience in adding timeline/flamegraph visualizations to various parts of our "build system". This can tell us which .cpp files were slowest to compile in the whole build, but I also wanted similar tooling for things "inside" single .cpp file compilation.
>
> Thus this attempt at adding a new time trace profiling mode.
>
> I have current changes on github here, https://github.com/aras-p/llvm-project-20170507/pull/2 -- can do a proper "patch" thing via Phabricator if needed.
>
> My current code change does not quite match Clang/LLVM code standards and probably needs some work, but the general approach seems to work. Already found one case of Clang being very slow at parsing some weird recursive macro thingamabob that we had; was causing about 5-8 seconds just to include one header file. I probably would have never found it without this type of visualization. Here it is very clear that among all the things, parsing just that one header file takes almost all the time: https://user-images.githubusercontent.com/348087/51038295-76efb780-15bb-11e9-926f-a6be1ffd03f1.png
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding a different mode of "where clang spends time" reporting (timeline/flamegraph style)

David Blaikie via cfe-dev


On Jan 24, 2019, at 6:36 PM, Richard Smith via cfe-dev <[hidden email]> wrote:

I think this is a really good direction, and a sorely missing feature.
I very much like how user-focused it is: all of your time categories
correspond to steps that I'd expect to be meaningful and actionable
for users.

Have you considered putting begin/end markers in the lexer for when we
enter and leave a file, to make it easier to see which header files
are expensive?

The lexer itself is so cheap that it is effectively free.  Are you trying to capture lexer time itself, or the time taken for AST building and sema of the declarations defined in the header?

-Chris

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding a different mode of "where clang spends time" reporting (timeline/flamegraph style)

David Blaikie via cfe-dev
On Fri, 25 Jan 2019, 20:39 Chris Lattner via cfe-dev, <[hidden email]> wrote:


On Jan 24, 2019, at 6:36 PM, Richard Smith via cfe-dev <[hidden email]> wrote:

I think this is a really good direction, and a sorely missing feature.
I very much like how user-focused it is: all of your time categories
correspond to steps that I'd expect to be meaningful and actionable
for users.

Have you considered putting begin/end markers in the lexer for when we
enter and leave a file, to make it easier to see which header files
are expensive?

The lexer itself is so cheap that it is effectively free.  Are you trying to capture lexer time itself, or the time taken for AST building and sema of the declarations defined in the header?

The latter. The idea is to expose to users a measure of how much a particular #include costs.

-Chris
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding a different mode of "where clang spends time" reporting (timeline/flamegraph style)

David Blaikie via cfe-dev
In reply to this post by David Blaikie via cfe-dev
Have you considered putting begin/end markers in the lexer for when we
enter and leave a file, to make it easier to see which header files
are expensive?

Not the lexer per se, but my current patch does have sections for "which files the frontend spends time on", that I implemented by adding begin/end sections in SemaPPCallbacks (code around here https://github.com/aras-p/llvm-project-20170507/pull/2/files#diff-2856c75f844d1133220fc68717852bf1R98).

The files are visible in any tracing viewer, e.g. SpeedScope (https://www.speedscope.app/) nicely displays their filenames: http://aras-p.info/img/blog/2019/clang-timetrace-speedscope.png



--
Aras Pranckevičius
work: http://unity3d.com
home: http://aras-p.info


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev