Your help needed: List of LLVM Open Projects 2017

classic Classic list List threaded Threaded
35 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Your help needed: List of LLVM Open Projects 2017

Roman Popov via cfe-dev
Hi folks,

   Happy new year!

   Last LLVM Developers' Meeting I had a BoF: 'Raising Next Generation
LLVM Developers'. It was suggested that we should update our open
projects page and possibly restructure it a little bit.

   I volunteered to do this work and I need your help.


   Chandler and I started working on a google doc [1]. We pinged few
code owners asking them to list of work items we should get done in 2017
but we do not have the manpower. Now we would like to ask for your
input, too.

   I believe an up to date list can serve as a good entry point for
students, interns and new contributors.

   Feel free to propose a new item or comment under an existing one. I
expect to start gradually updating the page beginning of Feb.

-- Vassil

[1]
https://docs.google.com/document/d/1YLK_xINSg1Ei0w8w39uAMR1n0dlf6wrzfypiX0YDQBc/edit?usp=sharing 

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Your help needed: List of LLVM Open Projects 2017

Roman Popov via cfe-dev
Do we have any open projects on LLD?

I know we usually try to avoid any big "projects" and mainly add/fix things in response to user needs, but just wondering if somebody has any ideas.

Some really generic/simple stuff I can think of:
1. trying out LLD on a large program corpus and reporting/reducing/fixing bugs (e.g. contributing to the FreeBSD effort or trying to build a bunch of packages from a linux distro like Debian or Gentoo)
2. performance analysis and optimization of LLD
3. getting LLD to link a bootable Linux kernel and/or GRUB
4. write an input verifier such that LLD can survive intensive fuzzing with no crashes / fatal errors [1] when the verifier says the input is okay. This will allow us to measure what the overhead of doing this actually is.


[1] As of the latest LLD discussion (in the thread "[llvm-dev] LLD status update and performance chart") it sounds like people are okay with LLD treating fatal errors the same way that LLVM uses assertions; for inputs from the C++ API, we can document to not pass corrupted object files. For inputs read from files, there is still community interest in at least having the option to run a "verifier" to validate the inputs. I think the best way to approach the verifier is to essentially follow the approach suggested by Peter (in the context of "hardening") in https://llvm.org/bugs/show_bug.cgi?id=30540#c5 i.e. getting to the point where LLD can survive intensive fuzzing.

-- Sean Silva

On Mon, Jan 16, 2017 at 5:18 AM, Vassil Vassilev via llvm-dev <[hidden email]> wrote:
Hi folks,

  Happy new year!

  Last LLVM Developers' Meeting I had a BoF: 'Raising Next Generation LLVM Developers'. It was suggested that we should update our open projects page and possibly restructure it a little bit.

  I volunteered to do this work and I need your help.


  Chandler and I started working on a google doc [1]. We pinged few code owners asking them to list of work items we should get done in 2017 but we do not have the manpower. Now we would like to ask for your input, too.

  I believe an up to date list can serve as a good entry point for students, interns and new contributors.

  Feel free to propose a new item or comment under an existing one. I expect to start gradually updating the page beginning of Feb.

-- Vassil

[1] https://docs.google.com/document/d/1YLK_xINSg1Ei0w8w39uAMR1n0dlf6wrzfypiX0YDQBc/edit?usp=sharing
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Your help needed: List of LLVM Open Projects 2017

Roman Popov via cfe-dev
The list can't ommit clang-tidy. 
There are many ideas about new checks on llvm bugzilla.
Everything matching ".*Feature Request.*"

Piotr

2017-01-16 21:31 GMT+01:00 Sean Silva via cfe-dev <[hidden email]>:
Do we have any open projects on LLD?

I know we usually try to avoid any big "projects" and mainly add/fix things in response to user needs, but just wondering if somebody has any ideas.

Some really generic/simple stuff I can think of:
1. trying out LLD on a large program corpus and reporting/reducing/fixing bugs (e.g. contributing to the FreeBSD effort or trying to build a bunch of packages from a linux distro like Debian or Gentoo)
2. performance analysis and optimization of LLD
3. getting LLD to link a bootable Linux kernel and/or GRUB
4. write an input verifier such that LLD can survive intensive fuzzing with no crashes / fatal errors [1] when the verifier says the input is okay. This will allow us to measure what the overhead of doing this actually is.


[1] As of the latest LLD discussion (in the thread "[llvm-dev] LLD status update and performance chart") it sounds like people are okay with LLD treating fatal errors the same way that LLVM uses assertions; for inputs from the C++ API, we can document to not pass corrupted object files. For inputs read from files, there is still community interest in at least having the option to run a "verifier" to validate the inputs. I think the best way to approach the verifier is to essentially follow the approach suggested by Peter (in the context of "hardening") in https://llvm.org/bugs/show_bug.cgi?id=30540#c5 i.e. getting to the point where LLD can survive intensive fuzzing.

-- Sean Silva

On Mon, Jan 16, 2017 at 5:18 AM, Vassil Vassilev via llvm-dev <[hidden email]> wrote:
Hi folks,

  Happy new year!

  Last LLVM Developers' Meeting I had a BoF: 'Raising Next Generation LLVM Developers'. It was suggested that we should update our open projects page and possibly restructure it a little bit.

  I volunteered to do this work and I need your help.


  Chandler and I started working on a google doc [1]. We pinged few code owners asking them to list of work items we should get done in 2017 but we do not have the manpower. Now we would like to ask for your input, too.

  I believe an up to date list can serve as a good entry point for students, interns and new contributors.

  Feel free to propose a new item or comment under an existing one. I expect to start gradually updating the page beginning of Feb.

-- Vassil

[1] https://docs.google.com/document/d/1YLK_xINSg1Ei0w8w39uAMR1n0dlf6wrzfypiX0YDQBc/edit?usp=sharing
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Your help needed: List of LLVM Open Projects 2017

Roman Popov via cfe-dev
Please submit patches to Open Projects pages! Winter^WSummer of Code is coming!

On Mon, Jan 16, 2017 at 1:13 PM, Piotr Padlewski via llvm-dev
<[hidden email]> wrote:

> The list can't ommit clang-tidy.
> There are many ideas about new checks on llvm bugzilla.
> https://llvm.org/bugs/buglist.cgi?product=clang-tools-extra&component=clang-tidy&resolution=---&list_id=110936
> Everything matching ".*Feature Request.*"
>
> Piotr
>
> 2017-01-16 21:31 GMT+01:00 Sean Silva via cfe-dev <[hidden email]>:
>>
>> Do we have any open projects on LLD?
>>
>> I know we usually try to avoid any big "projects" and mainly add/fix
>> things in response to user needs, but just wondering if somebody has any
>> ideas.
>>
>> Some really generic/simple stuff I can think of:
>> 1. trying out LLD on a large program corpus and reporting/reducing/fixing
>> bugs (e.g. contributing to the FreeBSD effort or trying to build a bunch of
>> packages from a linux distro like Debian or Gentoo)
>> 2. performance analysis and optimization of LLD
>> 3. getting LLD to link a bootable Linux kernel and/or GRUB
>> 4. write an input verifier such that LLD can survive intensive fuzzing
>> with no crashes / fatal errors [1] when the verifier says the input is okay.
>> This will allow us to measure what the overhead of doing this actually is.
>>
>>
>> [1] As of the latest LLD discussion (in the thread "[llvm-dev] LLD status
>> update and performance chart") it sounds like people are okay with LLD
>> treating fatal errors the same way that LLVM uses assertions; for inputs
>> from the C++ API, we can document to not pass corrupted object files. For
>> inputs read from files, there is still community interest in at least having
>> the option to run a "verifier" to validate the inputs. I think the best way
>> to approach the verifier is to essentially follow the approach suggested by
>> Peter (in the context of "hardening") in
>> https://llvm.org/bugs/show_bug.cgi?id=30540#c5 i.e. getting to the point
>> where LLD can survive intensive fuzzing.
>>
>> -- Sean Silva
>>
>> On Mon, Jan 16, 2017 at 5:18 AM, Vassil Vassilev via llvm-dev
>> <[hidden email]> wrote:
>>>
>>> Hi folks,
>>>
>>>   Happy new year!
>>>
>>>   Last LLVM Developers' Meeting I had a BoF: 'Raising Next Generation
>>> LLVM Developers'. It was suggested that we should update our open projects
>>> page and possibly restructure it a little bit.
>>>
>>>   I volunteered to do this work and I need your help.
>>>
>>>
>>>   Chandler and I started working on a google doc [1]. We pinged few code
>>> owners asking them to list of work items we should get done in 2017 but we
>>> do not have the manpower. Now we would like to ask for your input, too.
>>>
>>>   I believe an up to date list can serve as a good entry point for
>>> students, interns and new contributors.
>>>
>>>   Feel free to propose a new item or comment under an existing one. I
>>> expect to start gradually updating the page beginning of Feb.
>>>
>>> -- Vassil
>>>
>>> [1]
>>> https://docs.google.com/document/d/1YLK_xINSg1Ei0w8w39uAMR1n0dlf6wrzfypiX0YDQBc/edit?usp=sharing
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> [hidden email]
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>



--
With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Your help needed: List of LLVM Open Projects 2017

Roman Popov via cfe-dev
In reply to this post by Roman Popov via cfe-dev
On 16 January 2017 at 15:31, Sean Silva <[hidden email]> wrote:
> Do we have any open projects on LLD?
>
> I know we usually try to avoid any big "projects" and mainly add/fix things
> in response to user needs, but just wondering if somebody has any ideas.
>
> Some really generic/simple stuff I can think of:
> 1. trying out LLD on a large program corpus and reporting/reducing/fixing
> bugs (e.g. contributing to the FreeBSD effort or trying to build a bunch of
> packages from a linux distro like Debian or Gentoo)

From Rafael's last Poudriere ports build I think about 98% of the
packages are building with LLD, and some of the missing ones are those
that were skipped (e.g. do not build on amd64, or the upstream
distfiles have gone away). I think some next steps here for FreeBSD
include:

* Ensure we're running the test suites in packages that have them
* Actually install and use the resulting packages for a smoke test
* Address the WIP patches / workarounds currently in use
* Triage the few hundred failures

From the FreeBSD perspective there's one key LLD task of interest:

* Bring other architecture support to parity with amd64/x86_64. For us
the next one in importance is AArch64/arm64, then i386 and 32-bit arm,
and 32- and 64-bit MIPS, PowerPC, and RISC-V.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Your help needed: List of LLVM Open Projects 2017

Roman Popov via cfe-dev
In reply to this post by Roman Popov via cfe-dev
On Mon, Jan 16, 2017 at 12:31 PM, Sean Silva via llvm-dev
<[hidden email]> wrote:
> Do we have any open projects on LLD?
>
> I know we usually try to avoid any big "projects" and mainly add/fix things
> in response to user needs, but just wondering if somebody has any ideas.
>

I'm not particularly active in lld anymore, but the last big item I'd
like to see implemented is Pettis-Hansen layout.
http://perso.ensta-paristech.fr/~bmonsuez/Cours/B6-4/Articles/papers15.pdf
(mainly because it improves performances of the final executable).
GCC/gold have an implementation of the algorithm that can be used as
base. I'll expand if anybody is interested.
Side note: I'd like to propose a couple of llvm projects as well, I'll
sit down later today and write them.

--
Davide

"There are no solved problems; there are only problems that are more
or less solved" -- Henri Poincare
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Your help needed: List of LLVM Open Projects 2017

Roman Popov via cfe-dev
In reply to this post by Roman Popov via cfe-dev


On Mon, Jan 16, 2017 at 1:17 PM, Ed Maste <[hidden email]> wrote:
On 16 January 2017 at 15:31, Sean Silva <[hidden email]> wrote:
> Do we have any open projects on LLD?
>
> I know we usually try to avoid any big "projects" and mainly add/fix things
> in response to user needs, but just wondering if somebody has any ideas.
>
> Some really generic/simple stuff I can think of:
> 1. trying out LLD on a large program corpus and reporting/reducing/fixing
> bugs (e.g. contributing to the FreeBSD effort or trying to build a bunch of
> packages from a linux distro like Debian or Gentoo)

From Rafael's last Poudriere ports build I think about 98% of the
packages are building with LLD, and some of the missing ones are those
that were skipped (e.g. do not build on amd64, or the upstream
distfiles have gone away).

I thought most of the skipped stuff was due to dependencies on packages that failed? Or is that no longer the case?
 
I think some next steps here for FreeBSD
include:

* Ensure we're running the test suites in packages that have them
* Actually install and use the resulting packages for a smoke test
* Address the WIP patches / workarounds currently in use

Are these collected somewhere / is there a status page to reference?
 
* Triage the few hundred failures

Are these collected somewhere / is there a status page to reference?
 

From the FreeBSD perspective there's one key LLD task of interest:

* Bring other architecture support to parity with amd64/x86_64. For us
the next one in importance is AArch64/arm64, then i386 and 32-bit arm,
and 32- and 64-bit MIPS, PowerPC, and RISC-V.

Architecture porting might be challenging for new contributors, since the will usually require access to "unusual" hardware, right? Or are there emulator options available? If so, it would be good to document those options because it will greatly expand the number of people that can work on these tasks.

-- Sean Silva 


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Your help needed: List of LLVM Open Projects 2017

Roman Popov via cfe-dev
In reply to this post by Roman Popov via cfe-dev


On Mon, Jan 16, 2017 at 1:25 PM, Davide Italiano <[hidden email]> wrote:
On Mon, Jan 16, 2017 at 12:31 PM, Sean Silva via llvm-dev
<[hidden email]> wrote:
> Do we have any open projects on LLD?
>
> I know we usually try to avoid any big "projects" and mainly add/fix things
> in response to user needs, but just wondering if somebody has any ideas.
>

I'm not particularly active in lld anymore, but the last big item I'd
like to see implemented is Pettis-Hansen layout.
http://perso.ensta-paristech.fr/~bmonsuez/Cours/B6-4/Articles/papers15.pdf
(mainly because it improves performances of the final executable).
GCC/gold have an implementation of the algorithm that can be used as
base. I'll expand if anybody is interested.
Side note: I'd like to propose a couple of llvm projects as well, I'll
sit down later today and write them.


For FullLTO it is conceptually pretty easy to get profile data we need for this, but I'm not sure about the ThinLTO case.

Teresa, Mehdi,

Are there any plans (or things already working!) for getting profile data from ThinLTO in a format that the linker can use for code layout? I assume that profile data is being used already to guide importing, so it may just be a matter of siphoning that off.

Or maybe that layout code should be inside LLVM; maybe part of the general LTO interface? It looks like the current gcc plugin calls back into gcc for the actual layout algorithm itself (function call find_pettis_hansen_function_layout) rather than the reordering logic living in the linker: https://android.googlesource.com/toolchain/gcc/+/3f73d6ef90458b45bbbb33ef4c2b174d4662a22d/gcc-4.6/function_reordering_plugin/function_reordering_plugin.c

-- Sean Silva
 

--
Davide

"There are no solved problems; there are only problems that are more
or less solved" -- Henri Poincare


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Your help needed: List of LLVM Open Projects 2017

Roman Popov via cfe-dev
On Mon, Jan 16, 2017 at 1:47 PM, Sean Silva <[hidden email]> wrote:

>
>
> On Mon, Jan 16, 2017 at 1:25 PM, Davide Italiano <[hidden email]> wrote:
>>
>> On Mon, Jan 16, 2017 at 12:31 PM, Sean Silva via llvm-dev
>> <[hidden email]> wrote:
>> > Do we have any open projects on LLD?
>> >
>> > I know we usually try to avoid any big "projects" and mainly add/fix
>> > things
>> > in response to user needs, but just wondering if somebody has any ideas.
>> >
>>
>> I'm not particularly active in lld anymore, but the last big item I'd
>> like to see implemented is Pettis-Hansen layout.
>> http://perso.ensta-paristech.fr/~bmonsuez/Cours/B6-4/Articles/papers15.pdf
>> (mainly because it improves performances of the final executable).
>> GCC/gold have an implementation of the algorithm that can be used as
>> base. I'll expand if anybody is interested.
>> Side note: I'd like to propose a couple of llvm projects as well, I'll
>> sit down later today and write them.
>
>
>
> For FullLTO it is conceptually pretty easy to get profile data we need for
> this, but I'm not sure about the ThinLTO case.
>
> Teresa, Mehdi,
>
> Are there any plans (or things already working!) for getting profile data
> from ThinLTO in a format that the linker can use for code layout? I assume
> that profile data is being used already to guide importing, so it may just
> be a matter of siphoning that off.
>
> Or maybe that layout code should be inside LLVM; maybe part of the general
> LTO interface? It looks like the current gcc plugin calls back into gcc for
> the actual layout algorithm itself (function call
> find_pettis_hansen_function_layout) rather than the reordering logic living
> in the linker:
> https://android.googlesource.com/toolchain/gcc/+/3f73d6ef90458b45bbbb33ef4c2b174d4662a22d/gcc-4.6/function_reordering_plugin/function_reordering_plugin.c
>

My idea was exactly to have the reordering logic living in LLVM rather
than lld, FWIW.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Your help needed: List of LLVM Open Projects 2017

Roman Popov via cfe-dev
In reply to this post by Roman Popov via cfe-dev

On Jan 16, 2017, at 1:47 PM, Sean Silva <[hidden email]> wrote:



On Mon, Jan 16, 2017 at 1:25 PM, Davide Italiano <[hidden email]> wrote:
On Mon, Jan 16, 2017 at 12:31 PM, Sean Silva via llvm-dev
<[hidden email]> wrote:
> Do we have any open projects on LLD?
>
> I know we usually try to avoid any big "projects" and mainly add/fix things
> in response to user needs, but just wondering if somebody has any ideas.
>

I'm not particularly active in lld anymore, but the last big item I'd
like to see implemented is Pettis-Hansen layout.
http://perso.ensta-paristech.fr/~bmonsuez/Cours/B6-4/Articles/papers15.pdf
(mainly because it improves performances of the final executable).
GCC/gold have an implementation of the algorithm that can be used as
base. I'll expand if anybody is interested.
Side note: I'd like to propose a couple of llvm projects as well, I'll
sit down later today and write them.


I’m not sure, can you confirm that such layout optimization on ELF requires -ffunction-sections?

Also, for clang on OSX the best layout we could get is to order functions in the order in which they get executed at runtime.
 

For FullLTO it is conceptually pretty easy to get profile data we need for this, but I'm not sure about the ThinLTO case.

Teresa, Mehdi,

Are there any plans (or things already working!) for getting profile data from ThinLTO in a format that the linker can use for code layout? I assume that profile data is being used already to guide importing, so it may just be a matter of siphoning that off.

I’m not sure what kind of “profile information” is needed, and what makes it easier for MonolithicLTO compared to ThinLTO?

Or maybe that layout code should be inside LLVM; maybe part of the general LTO interface? It looks like the current gcc plugin calls back into gcc for the actual layout algorithm itself (function call find_pettis_hansen_function_layout) rather than the reordering logic living in the linker: https://android.googlesource.com/toolchain/gcc/+/3f73d6ef90458b45bbbb33ef4c2b174d4662a22d/gcc-4.6/function_reordering_plugin/function_reordering_plugin.c

I was thinking about this: could this be done by reorganizing the module itself for LTO?

That wouldn’t help non-LTO and ThinLTO though.

— 
Mehdi



-- Sean Silva
 

--
Davide

"There are no solved problems; there are only problems that are more
or less solved" -- Henri Poincare


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Your help needed: List of LLVM Open Projects 2017

Roman Popov via cfe-dev
On Mon, Jan 16, 2017 at 2:07 PM, Mehdi Amini <[hidden email]> wrote:

>
> On Jan 16, 2017, at 1:47 PM, Sean Silva <[hidden email]> wrote:
>
>
>
> On Mon, Jan 16, 2017 at 1:25 PM, Davide Italiano <[hidden email]> wrote:
>>
>> On Mon, Jan 16, 2017 at 12:31 PM, Sean Silva via llvm-dev
>> <[hidden email]> wrote:
>> > Do we have any open projects on LLD?
>> >
>> > I know we usually try to avoid any big "projects" and mainly add/fix
>> > things
>> > in response to user needs, but just wondering if somebody has any ideas.
>> >
>>
>> I'm not particularly active in lld anymore, but the last big item I'd
>> like to see implemented is Pettis-Hansen layout.
>> http://perso.ensta-paristech.fr/~bmonsuez/Cours/B6-4/Articles/papers15.pdf
>> (mainly because it improves performances of the final executable).
>> GCC/gold have an implementation of the algorithm that can be used as
>> base. I'll expand if anybody is interested.
>> Side note: I'd like to propose a couple of llvm projects as well, I'll
>> sit down later today and write them.
>
>
>
> I’m not sure, can you confirm that such layout optimization on ELF requires
> -ffunction-sections?
>

For the non-LTO case, I think so.

> Also, for clang on OSX the best layout we could get is to order functions in
> the order in which they get executed at runtime.
>

That's what we already do for lld. We collect and order file (run a
profiler) and pass that to the linker that lays out functions
accordingly.
This is to improve startup time for a class of startup-time-sensitive
operations. The algorithm proposed by Pettis (allegedly) aims to
reduce the TLB misses as it tries to lay out hot functions (or
functions that are likely to  be called together near in the final
binary).

>
> For FullLTO it is conceptually pretty easy to get profile data we need for
> this, but I'm not sure about the ThinLTO case.
>
> Teresa, Mehdi,
>
> Are there any plans (or things already working!) for getting profile data
> from ThinLTO in a format that the linker can use for code layout? I assume
> that profile data is being used already to guide importing, so it may just
> be a matter of siphoning that off.
>
>
> I’m not sure what kind of “profile information” is needed, and what makes it
> easier for MonolithicLTO compared to ThinLTO?
>
> Or maybe that layout code should be inside LLVM; maybe part of the general
> LTO interface? It looks like the current gcc plugin calls back into gcc for
> the actual layout algorithm itself (function call
> find_pettis_hansen_function_layout) rather than the reordering logic living
> in the linker:
> https://android.googlesource.com/toolchain/gcc/+/3f73d6ef90458b45bbbb33ef4c2b174d4662a22d/gcc-4.6/function_reordering_plugin/function_reordering_plugin.c
>
>
> I was thinking about this: could this be done by reorganizing the module
> itself for LTO?
>
> That wouldn’t help non-LTO and ThinLTO though.

This is a dimension that I think can be explored. The fact that it
wouldn't help with other modes of operation is completely orthogonal,
in particular until it's proven that this kind of optimization makes
sense with ThinLTO (and if it doesn't, it can be an optimization ran
only during full LTO).

--
Davide

"There are no solved problems; there are only problems that are more
or less solved" -- Henri Poincare
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Your help needed: List of LLVM Open Projects 2017

Roman Popov via cfe-dev
In reply to this post by Roman Popov via cfe-dev


On Mon, Jan 16, 2017 at 2:07 PM, Mehdi Amini <[hidden email]> wrote:

On Jan 16, 2017, at 1:47 PM, Sean Silva <[hidden email]> wrote:



On Mon, Jan 16, 2017 at 1:25 PM, Davide Italiano <[hidden email]> wrote:
On Mon, Jan 16, 2017 at 12:31 PM, Sean Silva via llvm-dev
<[hidden email]> wrote:
> Do we have any open projects on LLD?
>
> I know we usually try to avoid any big "projects" and mainly add/fix things
> in response to user needs, but just wondering if somebody has any ideas.
>

I'm not particularly active in lld anymore, but the last big item I'd
like to see implemented is Pettis-Hansen layout.
http://perso.ensta-paristech.fr/~bmonsuez/Cours/B6-4/Articles/papers15.pdf
(mainly because it improves performances of the final executable).
GCC/gold have an implementation of the algorithm that can be used as
base. I'll expand if anybody is interested.
Side note: I'd like to propose a couple of llvm projects as well, I'll
sit down later today and write them.


I’m not sure, can you confirm that such layout optimization on ELF requires -ffunction-sections?

In order for a standard ELF linker to safely be able to reorder sections at function granularity, -ffunction-sections would be required. This isn't a problem during LTO since the code generation is set up by the linker :)
 

Also, for clang on OSX the best layout we could get is to order functions in the order in which they get executed at runtime.

What the optimal layout may be for given apps is a bit of a separate question. Right now we're mostly talking about how to plumb everything together so that we can do the reordering of the final executable.

In fact, standard ELF linking semantics generally require input sections to be concatenated in command line order (this is e.g. how .init_array/.ctors build up their arrays of pointers to initializers; a crt*.o file at the beginning/end has a sentinel value and so the order matters). So the linker will generally need blessing from the compiler to do most sorts of reorderings as far as I'm aware.

Other signals besides profile info, such as a startup trace, might be useful too, and we should make sure we can plug that into the design.
My understanding of the clang on OSX case is based on a comparison of the `form_by_*` functions in clang/utils/perf-training/perf-helper.py which offer a relatively simple set of algorithms, so I think the jury is still out on the best approach (that script also uses a data collection method that is not part of LLVM's usual instrumentation or sampling workflows for PGO, so we may not be able to provide the same signals out of the box as part of our standard offering in the compiler)
I think that once we have this ordering capability integrated more deeply into the compiler, we'll be able to evaluate more complicated algorithms like Pettis-Hansen, have access to signals like global profile info, do interesting call graph analyses, etc. to find interesting approaches.

 

For FullLTO it is conceptually pretty easy to get profile data we need for this, but I'm not sure about the ThinLTO case.

Teresa, Mehdi,

Are there any plans (or things already working!) for getting profile data from ThinLTO in a format that the linker can use for code layout? I assume that profile data is being used already to guide importing, so it may just be a matter of siphoning that off.

I’m not sure what kind of “profile information” is needed, and what makes it easier for MonolithicLTO compared to ThinLTO?

For MonolithicLTO I had in mind that a simple implementation would be:
```
std::vector<std::string> Ordering;
auto Pass = make_unique<LayoutModulePass>(&Ordering);
addPassToLTOPipeline(std::move(Pass))
```

The module pass would just query the profile data directly on IR datastructures and get the order out. This would require very little "plumbing".
 

Or maybe that layout code should be inside LLVM; maybe part of the general LTO interface? It looks like the current gcc plugin calls back into gcc for the actual layout algorithm itself (function call find_pettis_hansen_function_layout) rather than the reordering logic living in the linker: https://android.googlesource.com/toolchain/gcc/+/3f73d6ef90458b45bbbb33ef4c2b174d4662a22d/gcc-4.6/function_reordering_plugin/function_reordering_plugin.c

I was thinking about this: could this be done by reorganizing the module itself for LTO?

For MonolithicLTO that's another simple approach.
 

That wouldn’t help non-LTO and ThinLTO though.

I think we should ideally aim for something that works uniformly for Monolithic and Thin. For example, GCC emits special sections containing the profile data and the linker just reads those sections; something analogous in LLVM would just happen in the backend and be common to Monolithic and Thin. If ThinLTO already has profile summaries in some nice form though, it may be possible to bypass this.

Another advantage of using special sections in the output like GCC does is that you don't actually need LTO at all to get the function reordering. The profile data passed to the compiler during per-TU compilation can be lowered into the same kind of annotations. (though LTO and function ordering are likely to go hand-in-hand most often for peak-performance builds).

-- Sean Silva
 

— 
Mehdi



-- Sean Silva
 

--
Davide

"There are no solved problems; there are only problems that are more
or less solved" -- Henri Poincare



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Your help needed: List of LLVM Open Projects 2017

Roman Popov via cfe-dev
In reply to this post by Roman Popov via cfe-dev


On Mon, Jan 16, 2017 at 2:31 PM, Davide Italiano <[hidden email]> wrote:
On Mon, Jan 16, 2017 at 2:07 PM, Mehdi Amini <[hidden email]> wrote:
>
> On Jan 16, 2017, at 1:47 PM, Sean Silva <[hidden email]> wrote:
>
>
>
> On Mon, Jan 16, 2017 at 1:25 PM, Davide Italiano <[hidden email]> wrote:
>>
>> On Mon, Jan 16, 2017 at 12:31 PM, Sean Silva via llvm-dev
>> <[hidden email]> wrote:
>> > Do we have any open projects on LLD?
>> >
>> > I know we usually try to avoid any big "projects" and mainly add/fix
>> > things
>> > in response to user needs, but just wondering if somebody has any ideas.
>> >
>>
>> I'm not particularly active in lld anymore, but the last big item I'd
>> like to see implemented is Pettis-Hansen layout.
>> http://perso.ensta-paristech.fr/~bmonsuez/Cours/B6-4/Articles/papers15.pdf
>> (mainly because it improves performances of the final executable).
>> GCC/gold have an implementation of the algorithm that can be used as
>> base. I'll expand if anybody is interested.
>> Side note: I'd like to propose a couple of llvm projects as well, I'll
>> sit down later today and write them.
>
>
>
> I’m not sure, can you confirm that such layout optimization on ELF requires
> -ffunction-sections?
>

For the non-LTO case, I think so.

> Also, for clang on OSX the best layout we could get is to order functions in
> the order in which they get executed at runtime.
>

That's what we already do for lld. We collect and order file (run a
profiler) and pass that to the linker that lays out functions
accordingly.
This is to improve startup time for a class of startup-time-sensitive
operations. The algorithm proposed by Pettis (allegedly) aims to
reduce the TLB misses as it tries to lay out hot functions (or
functions that are likely to  be called together near in the final
binary).

IIRC from when I looked at the paper a while ago, it is mostly just a "huffman tree construction" type algorithm (agglomerating based on highest probability) and assumes that if two functions are hot then they are likely to be needed together. This is not always the case.

E.g. consider a server that accepts RPC requests and based on those requests either does Foo or Bar which are largely disjoint. It's entirely possible for the top two functions of the profile to be one in Foo and one in Bar, but laying them out near each other doesn't make sense since there is never locality (for a given RPC, either Foo or Bar gets run). A static call graph analysis can provide the needed signals to handle this case better. 

-- Sean Silva
 

>
> For FullLTO it is conceptually pretty easy to get profile data we need for
> this, but I'm not sure about the ThinLTO case.
>
> Teresa, Mehdi,
>
> Are there any plans (or things already working!) for getting profile data
> from ThinLTO in a format that the linker can use for code layout? I assume
> that profile data is being used already to guide importing, so it may just
> be a matter of siphoning that off.
>
>
> I’m not sure what kind of “profile information” is needed, and what makes it
> easier for MonolithicLTO compared to ThinLTO?
>
> Or maybe that layout code should be inside LLVM; maybe part of the general
> LTO interface? It looks like the current gcc plugin calls back into gcc for
> the actual layout algorithm itself (function call
> find_pettis_hansen_function_layout) rather than the reordering logic living
> in the linker:
> https://android.googlesource.com/toolchain/gcc/+/3f73d6ef90458b45bbbb33ef4c2b174d4662a22d/gcc-4.6/function_reordering_plugin/function_reordering_plugin.c
>
>
> I was thinking about this: could this be done by reorganizing the module
> itself for LTO?
>
> That wouldn’t help non-LTO and ThinLTO though.

This is a dimension that I think can be explored. The fact that it
wouldn't help with other modes of operation is completely orthogonal,
in particular until it's proven that this kind of optimization makes
sense with ThinLTO (and if it doesn't, it can be an optimization ran
only during full LTO).

--
Davide

"There are no solved problems; there are only problems that are more
or less solved" -- Henri Poincare


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Your help needed: List of LLVM Open Projects 2017

Roman Popov via cfe-dev


On Mon, Jan 16, 2017 at 3:32 PM, Sean Silva <[hidden email]> wrote:


On Mon, Jan 16, 2017 at 2:31 PM, Davide Italiano <[hidden email]> wrote:
On Mon, Jan 16, 2017 at 2:07 PM, Mehdi Amini <[hidden email]> wrote:
>
> On Jan 16, 2017, at 1:47 PM, Sean Silva <[hidden email]> wrote:
>
>
>
> On Mon, Jan 16, 2017 at 1:25 PM, Davide Italiano <[hidden email]> wrote:
>>
>> On Mon, Jan 16, 2017 at 12:31 PM, Sean Silva via llvm-dev
>> <[hidden email]> wrote:
>> > Do we have any open projects on LLD?
>> >
>> > I know we usually try to avoid any big "projects" and mainly add/fix
>> > things
>> > in response to user needs, but just wondering if somebody has any ideas.
>> >
>>
>> I'm not particularly active in lld anymore, but the last big item I'd
>> like to see implemented is Pettis-Hansen layout.
>> http://perso.ensta-paristech.fr/~bmonsuez/Cours/B6-4/Articles/papers15.pdf
>> (mainly because it improves performances of the final executable).
>> GCC/gold have an implementation of the algorithm that can be used as
>> base. I'll expand if anybody is interested.
>> Side note: I'd like to propose a couple of llvm projects as well, I'll
>> sit down later today and write them.
>
>
>
> I’m not sure, can you confirm that such layout optimization on ELF requires
> -ffunction-sections?
>

For the non-LTO case, I think so.

> Also, for clang on OSX the best layout we could get is to order functions in
> the order in which they get executed at runtime.
>

That's what we already do for lld. We collect and order file (run a
profiler) and pass that to the linker that lays out functions
accordingly.
This is to improve startup time for a class of startup-time-sensitive
operations. The algorithm proposed by Pettis (allegedly) aims to
reduce the TLB misses as it tries to lay out hot functions (or
functions that are likely to  be called together near in the final
binary).

IIRC from when I looked at the paper a while ago, it is mostly just a "huffman tree construction" type algorithm (agglomerating based on highest probability) and assumes that if two functions are hot then they are likely to be needed together. This is not always the case.

E.g. consider a server that accepts RPC requests and based on those requests either does Foo or Bar which are largely disjoint. It's entirely possible for the top two functions of the profile to be one in Foo and one in Bar, but laying them out near each other doesn't make sense since there is never locality (for a given RPC, either Foo or Bar gets run). A static call graph analysis can provide the needed signals to handle this case better. 


Hence you said "allegedly" :) I know we've talked about this before. Just wanted to put the backstory of the "allegedly" on the list.

-- Sean Silva
 
-- Sean Silva
 

>
> For FullLTO it is conceptually pretty easy to get profile data we need for
> this, but I'm not sure about the ThinLTO case.
>
> Teresa, Mehdi,
>
> Are there any plans (or things already working!) for getting profile data
> from ThinLTO in a format that the linker can use for code layout? I assume
> that profile data is being used already to guide importing, so it may just
> be a matter of siphoning that off.
>
>
> I’m not sure what kind of “profile information” is needed, and what makes it
> easier for MonolithicLTO compared to ThinLTO?
>
> Or maybe that layout code should be inside LLVM; maybe part of the general
> LTO interface? It looks like the current gcc plugin calls back into gcc for
> the actual layout algorithm itself (function call
> find_pettis_hansen_function_layout) rather than the reordering logic living
> in the linker:
> https://android.googlesource.com/toolchain/gcc/+/3f73d6ef90458b45bbbb33ef4c2b174d4662a22d/gcc-4.6/function_reordering_plugin/function_reordering_plugin.c
>
>
> I was thinking about this: could this be done by reorganizing the module
> itself for LTO?
>
> That wouldn’t help non-LTO and ThinLTO though.

This is a dimension that I think can be explored. The fact that it
wouldn't help with other modes of operation is completely orthogonal,
in particular until it's proven that this kind of optimization makes
sense with ThinLTO (and if it doesn't, it can be an optimization ran
only during full LTO).

--
Davide

"There are no solved problems; there are only problems that are more
or less solved" -- Henri Poincare



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Your help needed: List of LLVM Open Projects 2017

Roman Popov via cfe-dev
In reply to this post by Roman Popov via cfe-dev

On Jan 16, 2017, at 3:24 PM, Sean Silva <[hidden email]> wrote:



On Mon, Jan 16, 2017 at 2:07 PM, Mehdi Amini <[hidden email]> wrote:

On Jan 16, 2017, at 1:47 PM, Sean Silva <[hidden email]> wrote:



On Mon, Jan 16, 2017 at 1:25 PM, Davide Italiano <[hidden email]> wrote:
On Mon, Jan 16, 2017 at 12:31 PM, Sean Silva via llvm-dev
<[hidden email]> wrote:
> Do we have any open projects on LLD?
>
> I know we usually try to avoid any big "projects" and mainly add/fix things
> in response to user needs, but just wondering if somebody has any ideas.
>

I'm not particularly active in lld anymore, but the last big item I'd
like to see implemented is Pettis-Hansen layout.
http://perso.ensta-paristech.fr/~bmonsuez/Cours/B6-4/Articles/papers15.pdf
(mainly because it improves performances of the final executable).
GCC/gold have an implementation of the algorithm that can be used as
base. I'll expand if anybody is interested.
Side note: I'd like to propose a couple of llvm projects as well, I'll
sit down later today and write them.


I’m not sure, can you confirm that such layout optimization on ELF requires -ffunction-sections?

In order for a standard ELF linker to safely be able to reorder sections at function granularity, -ffunction-sections would be required. This isn't a problem during LTO since the code generation is set up by the linker :)
 

Also, for clang on OSX the best layout we could get is to order functions in the order in which they get executed at runtime.

What the optimal layout may be for given apps is a bit of a separate question. Right now we're mostly talking about how to plumb everything together so that we can do the reordering of the final executable.

Yes, I was raising this exactly with the idea of “we may want to try different algorithm based on different kind of data”.


In fact, standard ELF linking semantics generally require input sections to be concatenated in command line order (this is e.g. how .init_array/.ctors build up their arrays of pointers to initializers; a crt*.o file at the beginning/end has a sentinel value and so the order matters). So the linker will generally need blessing from the compiler to do most sorts of reorderings as far as I'm aware.

Other signals besides profile info, such as a startup trace, might be useful too, and we should make sure we can plug that into the design.
My understanding of the clang on OSX case is based on a comparison of the `form_by_*` functions in clang/utils/perf-training/perf-helper.py which offer a relatively simple set of algorithms, so I think the jury is still out on the best approach (that script also uses a data collection method that is not part of LLVM's usual instrumentation or sampling workflows for PGO, so we may not be able to provide the same signals out of the box as part of our standard offering in the compiler)

Yes, I was thinking that some Xray-based instrumentation could be used to provided the same data.

I think that once we have this ordering capability integrated more deeply into the compiler, we'll be able to evaluate more complicated algorithms like Pettis-Hansen, have access to signals like global profile info, do interesting call graph analyses, etc. to find interesting approaches.

 

For FullLTO it is conceptually pretty easy to get profile data we need for this, but I'm not sure about the ThinLTO case.

Teresa, Mehdi,

Are there any plans (or things already working!) for getting profile data from ThinLTO in a format that the linker can use for code layout? I assume that profile data is being used already to guide importing, so it may just be a matter of siphoning that off.

I’m not sure what kind of “profile information” is needed, and what makes it easier for MonolithicLTO compared to ThinLTO?

For MonolithicLTO I had in mind that a simple implementation would be:
```
std::vector<std::string> Ordering;
auto Pass = make_unique<LayoutModulePass>(&Ordering);
addPassToLTOPipeline(std::move(Pass))
```

The module pass would just query the profile data directly on IR datastructures and get the order out. This would require very little "plumbing".
 

Or maybe that layout code should be inside LLVM; maybe part of the general LTO interface? It looks like the current gcc plugin calls back into gcc for the actual layout algorithm itself (function call find_pettis_hansen_function_layout) rather than the reordering logic living in the linker: https://android.googlesource.com/toolchain/gcc/+/3f73d6ef90458b45bbbb33ef4c2b174d4662a22d/gcc-4.6/function_reordering_plugin/function_reordering_plugin.c

I was thinking about this: could this be done by reorganizing the module itself for LTO?

For MonolithicLTO that's another simple approach.
 

That wouldn’t help non-LTO and ThinLTO though.

I think we should ideally aim for something that works uniformly for Monolithic and Thin. For example, GCC emits special sections containing the profile data and the linker just reads those sections; something analogous in LLVM would just happen in the backend and be common to Monolithic and Thin. If ThinLTO already has profile summaries in some nice form though, it may be possible to bypass this.

Another advantage of using special sections in the output like GCC does is that you don't actually need LTO at all to get the function reordering. The profile data passed to the compiler during per-TU compilation can be lowered into the same kind of annotations. (though LTO and function ordering are likely to go hand-in-hand most often for peak-performance builds).

Yes I agree with all of this :)
That makes it for interesting design trade-off!
 
— 
Mehdi


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Your help needed: List of LLVM Open Projects 2017

Roman Popov via cfe-dev
In reply to this post by Roman Popov via cfe-dev


On Mon, Jan 16, 2017 at 3:34 PM, Sean Silva <[hidden email]> wrote:


On Mon, Jan 16, 2017 at 3:32 PM, Sean Silva <[hidden email]> wrote:


On Mon, Jan 16, 2017 at 2:31 PM, Davide Italiano <[hidden email]> wrote:
On Mon, Jan 16, 2017 at 2:07 PM, Mehdi Amini <[hidden email]> wrote:
>
> On Jan 16, 2017, at 1:47 PM, Sean Silva <[hidden email]> wrote:
>
>
>
> On Mon, Jan 16, 2017 at 1:25 PM, Davide Italiano <[hidden email]> wrote:
>>
>> On Mon, Jan 16, 2017 at 12:31 PM, Sean Silva via llvm-dev
>> <[hidden email]> wrote:
>> > Do we have any open projects on LLD?
>> >
>> > I know we usually try to avoid any big "projects" and mainly add/fix
>> > things
>> > in response to user needs, but just wondering if somebody has any ideas.
>> >
>>
>> I'm not particularly active in lld anymore, but the last big item I'd
>> like to see implemented is Pettis-Hansen layout.
>> http://perso.ensta-paristech.fr/~bmonsuez/Cours/B6-4/Articles/papers15.pdf
>> (mainly because it improves performances of the final executable).
>> GCC/gold have an implementation of the algorithm that can be used as
>> base. I'll expand if anybody is interested.
>> Side note: I'd like to propose a couple of llvm projects as well, I'll
>> sit down later today and write them.
>
>
>
> I’m not sure, can you confirm that such layout optimization on ELF requires
> -ffunction-sections?
>

For the non-LTO case, I think so.

> Also, for clang on OSX the best layout we could get is to order functions in
> the order in which they get executed at runtime.
>

That's what we already do for lld. We collect and order file (run a
profiler) and pass that to the linker that lays out functions
accordingly.
This is to improve startup time for a class of startup-time-sensitive
operations. The algorithm proposed by Pettis (allegedly) aims to
reduce the TLB misses as it tries to lay out hot functions (or
functions that are likely to  be called together near in the final
binary).

IIRC from when I looked at the paper a while ago, it is mostly just a "huffman tree construction" type algorithm (agglomerating based on highest probability) and assumes that if two functions are hot then they are likely to be needed together. This is not always the case.

E.g. consider a server that accepts RPC requests and based on those requests either does Foo or Bar which are largely disjoint. It's entirely possible for the top two functions of the profile to be one in Foo and one in Bar, but laying them out near each other doesn't make sense since there is never locality (for a given RPC, either Foo or Bar gets run). A static call graph analysis can provide the needed signals to handle this case better. 


Hence you said "allegedly" :) I know we've talked about this before. Just wanted to put the backstory of the "allegedly" on the list.

Looks like I remembered this wrong. The algorithm in section 3.2 of the paper is call-graph aware. It does do greedy coalescing like a Huffman tree construction algorithms, but constrains the available coalescing operations at each step by call graph adjacency (in fact, what it is "greedy" about is the hotness of the edges between call graph nodes and not the nodes themselves).

-- Sean Silva
 

-- Sean Silva
 
-- Sean Silva
 

>
> For FullLTO it is conceptually pretty easy to get profile data we need for
> this, but I'm not sure about the ThinLTO case.
>
> Teresa, Mehdi,
>
> Are there any plans (or things already working!) for getting profile data
> from ThinLTO in a format that the linker can use for code layout? I assume
> that profile data is being used already to guide importing, so it may just
> be a matter of siphoning that off.
>
>
> I’m not sure what kind of “profile information” is needed, and what makes it
> easier for MonolithicLTO compared to ThinLTO?
>
> Or maybe that layout code should be inside LLVM; maybe part of the general
> LTO interface? It looks like the current gcc plugin calls back into gcc for
> the actual layout algorithm itself (function call
> find_pettis_hansen_function_layout) rather than the reordering logic living
> in the linker:
> https://android.googlesource.com/toolchain/gcc/+/3f73d6ef90458b45bbbb33ef4c2b174d4662a22d/gcc-4.6/function_reordering_plugin/function_reordering_plugin.c
>
>
> I was thinking about this: could this be done by reorganizing the module
> itself for LTO?
>
> That wouldn’t help non-LTO and ThinLTO though.

This is a dimension that I think can be explored. The fact that it
wouldn't help with other modes of operation is completely orthogonal,
in particular until it's proven that this kind of optimization makes
sense with ThinLTO (and if it doesn't, it can be an optimization ran
only during full LTO).

--
Davide

"There are no solved problems; there are only problems that are more
or less solved" -- Henri Poincare




_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Your help needed: List of LLVM Open Projects 2017

Roman Popov via cfe-dev
In reply to this post by Roman Popov via cfe-dev


On Mon, Jan 16, 2017 at 3:35 PM, Mehdi Amini <[hidden email]> wrote:

On Jan 16, 2017, at 3:24 PM, Sean Silva <[hidden email]> wrote:



On Mon, Jan 16, 2017 at 2:07 PM, Mehdi Amini <[hidden email]> wrote:

On Jan 16, 2017, at 1:47 PM, Sean Silva <[hidden email]> wrote:



On Mon, Jan 16, 2017 at 1:25 PM, Davide Italiano <[hidden email]> wrote:
On Mon, Jan 16, 2017 at 12:31 PM, Sean Silva via llvm-dev
<[hidden email]> wrote:
> Do we have any open projects on LLD?
>
> I know we usually try to avoid any big "projects" and mainly add/fix things
> in response to user needs, but just wondering if somebody has any ideas.
>

I'm not particularly active in lld anymore, but the last big item I'd
like to see implemented is Pettis-Hansen layout.
http://perso.ensta-paristech.fr/~bmonsuez/Cours/B6-4/Articles/papers15.pdf
(mainly because it improves performances of the final executable).
GCC/gold have an implementation of the algorithm that can be used as
base. I'll expand if anybody is interested.
Side note: I'd like to propose a couple of llvm projects as well, I'll
sit down later today and write them.


I’m not sure, can you confirm that such layout optimization on ELF requires -ffunction-sections?

In order for a standard ELF linker to safely be able to reorder sections at function granularity, -ffunction-sections would be required. This isn't a problem during LTO since the code generation is set up by the linker :)
 

Also, for clang on OSX the best layout we could get is to order functions in the order in which they get executed at runtime.

What the optimal layout may be for given apps is a bit of a separate question. Right now we're mostly talking about how to plumb everything together so that we can do the reordering of the final executable.

Yes, I was raising this exactly with the idea of “we may want to try different algorithm based on different kind of data”.


In fact, standard ELF linking semantics generally require input sections to be concatenated in command line order (this is e.g. how .init_array/.ctors build up their arrays of pointers to initializers; a crt*.o file at the beginning/end has a sentinel value and so the order matters). So the linker will generally need blessing from the compiler to do most sorts of reorderings as far as I'm aware.

Other signals besides profile info, such as a startup trace, might be useful too, and we should make sure we can plug that into the design.
My understanding of the clang on OSX case is based on a comparison of the `form_by_*` functions in clang/utils/perf-training/perf-helper.py which offer a relatively simple set of algorithms, so I think the jury is still out on the best approach (that script also uses a data collection method that is not part of LLVM's usual instrumentation or sampling workflows for PGO, so we may not be able to provide the same signals out of the box as part of our standard offering in the compiler)

Yes, I was thinking that some Xray-based instrumentation could be used to provided the same data.

I hadn't though of using Xray for this! Good idea! (I haven't been following Xray very closely; I should look at it more...)

-- Sean Silva
 

I think that once we have this ordering capability integrated more deeply into the compiler, we'll be able to evaluate more complicated algorithms like Pettis-Hansen, have access to signals like global profile info, do interesting call graph analyses, etc. to find interesting approaches.

 

For FullLTO it is conceptually pretty easy to get profile data we need for this, but I'm not sure about the ThinLTO case.

Teresa, Mehdi,

Are there any plans (or things already working!) for getting profile data from ThinLTO in a format that the linker can use for code layout? I assume that profile data is being used already to guide importing, so it may just be a matter of siphoning that off.

I’m not sure what kind of “profile information” is needed, and what makes it easier for MonolithicLTO compared to ThinLTO?

For MonolithicLTO I had in mind that a simple implementation would be:
```
std::vector<std::string> Ordering;
auto Pass = make_unique<LayoutModulePass>(&Ordering);
addPassToLTOPipeline(std::move(Pass))
```

The module pass would just query the profile data directly on IR datastructures and get the order out. This would require very little "plumbing".
 

Or maybe that layout code should be inside LLVM; maybe part of the general LTO interface? It looks like the current gcc plugin calls back into gcc for the actual layout algorithm itself (function call find_pettis_hansen_function_layout) rather than the reordering logic living in the linker: https://android.googlesource.com/toolchain/gcc/+/3f73d6ef90458b45bbbb33ef4c2b174d4662a22d/gcc-4.6/function_reordering_plugin/function_reordering_plugin.c

I was thinking about this: could this be done by reorganizing the module itself for LTO?

For MonolithicLTO that's another simple approach.
 

That wouldn’t help non-LTO and ThinLTO though.

I think we should ideally aim for something that works uniformly for Monolithic and Thin. For example, GCC emits special sections containing the profile data and the linker just reads those sections; something analogous in LLVM would just happen in the backend and be common to Monolithic and Thin. If ThinLTO already has profile summaries in some nice form though, it may be possible to bypass this.

Another advantage of using special sections in the output like GCC does is that you don't actually need LTO at all to get the function reordering. The profile data passed to the compiler during per-TU compilation can be lowered into the same kind of annotations. (though LTO and function ordering are likely to go hand-in-hand most often for peak-performance builds).

Yes I agree with all of this :)
That makes it for interesting design trade-off!
 
— 
Mehdi



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Your help needed: List of LLVM Open Projects 2017

Roman Popov via cfe-dev
In reply to this post by Roman Popov via cfe-dev
Google GCC records profile data (dynamic callgraph) in a special named section in ELF object file to be consumed by the plugin. Those sections will be discarded later by the linker.  

There are pros and cons of using xray for layout purpose.  The call trace from xray is certainly more powerful for layout purpose, but it adds addtional complexity to the optimized build process.  You would need to collect xray trace profile on the optimized binary (presumably built with PGO already) and rebuild without xray nop insertion and function layout. 

David

On Mon, Jan 16, 2017 at 3:40 PM, Sean Silva via llvm-dev <[hidden email]> wrote:


On Mon, Jan 16, 2017 at 3:34 PM, Sean Silva <[hidden email]> wrote:


On Mon, Jan 16, 2017 at 3:32 PM, Sean Silva <[hidden email]> wrote:


On Mon, Jan 16, 2017 at 2:31 PM, Davide Italiano <[hidden email]> wrote:
On Mon, Jan 16, 2017 at 2:07 PM, Mehdi Amini <[hidden email]> wrote:
>
> On Jan 16, 2017, at 1:47 PM, Sean Silva <[hidden email]> wrote:
>
>
>
> On Mon, Jan 16, 2017 at 1:25 PM, Davide Italiano <[hidden email]> wrote:
>>
>> On Mon, Jan 16, 2017 at 12:31 PM, Sean Silva via llvm-dev
>> <[hidden email]> wrote:
>> > Do we have any open projects on LLD?
>> >
>> > I know we usually try to avoid any big "projects" and mainly add/fix
>> > things
>> > in response to user needs, but just wondering if somebody has any ideas.
>> >
>>
>> I'm not particularly active in lld anymore, but the last big item I'd
>> like to see implemented is Pettis-Hansen layout.
>> http://perso.ensta-paristech.fr/~bmonsuez/Cours/B6-4/Articles/papers15.pdf
>> (mainly because it improves performances of the final executable).
>> GCC/gold have an implementation of the algorithm that can be used as
>> base. I'll expand if anybody is interested.
>> Side note: I'd like to propose a couple of llvm projects as well, I'll
>> sit down later today and write them.
>
>
>
> I’m not sure, can you confirm that such layout optimization on ELF requires
> -ffunction-sections?
>

For the non-LTO case, I think so.

> Also, for clang on OSX the best layout we could get is to order functions in
> the order in which they get executed at runtime.
>

That's what we already do for lld. We collect and order file (run a
profiler) and pass that to the linker that lays out functions
accordingly.
This is to improve startup time for a class of startup-time-sensitive
operations. The algorithm proposed by Pettis (allegedly) aims to
reduce the TLB misses as it tries to lay out hot functions (or
functions that are likely to  be called together near in the final
binary).

IIRC from when I looked at the paper a while ago, it is mostly just a "huffman tree construction" type algorithm (agglomerating based on highest probability) and assumes that if two functions are hot then they are likely to be needed together. This is not always the case.

E.g. consider a server that accepts RPC requests and based on those requests either does Foo or Bar which are largely disjoint. It's entirely possible for the top two functions of the profile to be one in Foo and one in Bar, but laying them out near each other doesn't make sense since there is never locality (for a given RPC, either Foo or Bar gets run). A static call graph analysis can provide the needed signals to handle this case better. 


Hence you said "allegedly" :) I know we've talked about this before. Just wanted to put the backstory of the "allegedly" on the list.

Looks like I remembered this wrong. The algorithm in section 3.2 of the paper is call-graph aware. It does do greedy coalescing like a Huffman tree construction algorithms, but constrains the available coalescing operations at each step by call graph adjacency (in fact, what it is "greedy" about is the hotness of the edges between call graph nodes and not the nodes themselves).

-- Sean Silva
 

-- Sean Silva
 
-- Sean Silva
 

>
> For FullLTO it is conceptually pretty easy to get profile data we need for
> this, but I'm not sure about the ThinLTO case.
>
> Teresa, Mehdi,
>
> Are there any plans (or things already working!) for getting profile data
> from ThinLTO in a format that the linker can use for code layout? I assume
> that profile data is being used already to guide importing, so it may just
> be a matter of siphoning that off.
>
>
> I’m not sure what kind of “profile information” is needed, and what makes it
> easier for MonolithicLTO compared to ThinLTO?
>
> Or maybe that layout code should be inside LLVM; maybe part of the general
> LTO interface? It looks like the current gcc plugin calls back into gcc for
> the actual layout algorithm itself (function call
> find_pettis_hansen_function_layout) rather than the reordering logic living
> in the linker:
> https://android.googlesource.com/toolchain/gcc/+/3f73d6ef90458b45bbbb33ef4c2b174d4662a22d/gcc-4.6/function_reordering_plugin/function_reordering_plugin.c
>
>
> I was thinking about this: could this be done by reorganizing the module
> itself for LTO?
>
> That wouldn’t help non-LTO and ThinLTO though.

This is a dimension that I think can be explored. The fact that it
wouldn't help with other modes of operation is completely orthogonal,
in particular until it's proven that this kind of optimization makes
sense with ThinLTO (and if it doesn't, it can be an optimization ran
only during full LTO).

--
Davide

"There are no solved problems; there are only problems that are more
or less solved" -- Henri Poincare




_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Your help needed: List of LLVM Open Projects 2017

Roman Popov via cfe-dev
Would it make sense for xray instrumentation be part of -fprofile-generate? PGO will affect inlining decisions etc for the optimized binary, but the collected traces during the instrumented build would still have quite a bit of useful information.

-- Sean Silva

On Mon, Jan 16, 2017 at 4:33 PM, Xinliang David Li <[hidden email]> wrote:
Google GCC records profile data (dynamic callgraph) in a special named section in ELF object file to be consumed by the plugin. Those sections will be discarded later by the linker.  

There are pros and cons of using xray for layout purpose.  The call trace from xray is certainly more powerful for layout purpose, but it adds addtional complexity to the optimized build process.  You would need to collect xray trace profile on the optimized binary (presumably built with PGO already) and rebuild without xray nop insertion and function layout. 

David

On Mon, Jan 16, 2017 at 3:40 PM, Sean Silva via llvm-dev <[hidden email]> wrote:


On Mon, Jan 16, 2017 at 3:34 PM, Sean Silva <[hidden email]> wrote:


On Mon, Jan 16, 2017 at 3:32 PM, Sean Silva <[hidden email]> wrote:


On Mon, Jan 16, 2017 at 2:31 PM, Davide Italiano <[hidden email]> wrote:
On Mon, Jan 16, 2017 at 2:07 PM, Mehdi Amini <[hidden email]> wrote:
>
> On Jan 16, 2017, at 1:47 PM, Sean Silva <[hidden email]> wrote:
>
>
>
> On Mon, Jan 16, 2017 at 1:25 PM, Davide Italiano <[hidden email]> wrote:
>>
>> On Mon, Jan 16, 2017 at 12:31 PM, Sean Silva via llvm-dev
>> <[hidden email]> wrote:
>> > Do we have any open projects on LLD?
>> >
>> > I know we usually try to avoid any big "projects" and mainly add/fix
>> > things
>> > in response to user needs, but just wondering if somebody has any ideas.
>> >
>>
>> I'm not particularly active in lld anymore, but the last big item I'd
>> like to see implemented is Pettis-Hansen layout.
>> http://perso.ensta-paristech.fr/~bmonsuez/Cours/B6-4/Articles/papers15.pdf
>> (mainly because it improves performances of the final executable).
>> GCC/gold have an implementation of the algorithm that can be used as
>> base. I'll expand if anybody is interested.
>> Side note: I'd like to propose a couple of llvm projects as well, I'll
>> sit down later today and write them.
>
>
>
> I’m not sure, can you confirm that such layout optimization on ELF requires
> -ffunction-sections?
>

For the non-LTO case, I think so.

> Also, for clang on OSX the best layout we could get is to order functions in
> the order in which they get executed at runtime.
>

That's what we already do for lld. We collect and order file (run a
profiler) and pass that to the linker that lays out functions
accordingly.
This is to improve startup time for a class of startup-time-sensitive
operations. The algorithm proposed by Pettis (allegedly) aims to
reduce the TLB misses as it tries to lay out hot functions (or
functions that are likely to  be called together near in the final
binary).

IIRC from when I looked at the paper a while ago, it is mostly just a "huffman tree construction" type algorithm (agglomerating based on highest probability) and assumes that if two functions are hot then they are likely to be needed together. This is not always the case.

E.g. consider a server that accepts RPC requests and based on those requests either does Foo or Bar which are largely disjoint. It's entirely possible for the top two functions of the profile to be one in Foo and one in Bar, but laying them out near each other doesn't make sense since there is never locality (for a given RPC, either Foo or Bar gets run). A static call graph analysis can provide the needed signals to handle this case better. 


Hence you said "allegedly" :) I know we've talked about this before. Just wanted to put the backstory of the "allegedly" on the list.

Looks like I remembered this wrong. The algorithm in section 3.2 of the paper is call-graph aware. It does do greedy coalescing like a Huffman tree construction algorithms, but constrains the available coalescing operations at each step by call graph adjacency (in fact, what it is "greedy" about is the hotness of the edges between call graph nodes and not the nodes themselves).

-- Sean Silva
 

-- Sean Silva
 
-- Sean Silva
 

>
> For FullLTO it is conceptually pretty easy to get profile data we need for
> this, but I'm not sure about the ThinLTO case.
>
> Teresa, Mehdi,
>
> Are there any plans (or things already working!) for getting profile data
> from ThinLTO in a format that the linker can use for code layout? I assume
> that profile data is being used already to guide importing, so it may just
> be a matter of siphoning that off.
>
>
> I’m not sure what kind of “profile information” is needed, and what makes it
> easier for MonolithicLTO compared to ThinLTO?
>
> Or maybe that layout code should be inside LLVM; maybe part of the general
> LTO interface? It looks like the current gcc plugin calls back into gcc for
> the actual layout algorithm itself (function call
> find_pettis_hansen_function_layout) rather than the reordering logic living
> in the linker:
> https://android.googlesource.com/toolchain/gcc/+/3f73d6ef90458b45bbbb33ef4c2b174d4662a22d/gcc-4.6/function_reordering_plugin/function_reordering_plugin.c
>
>
> I was thinking about this: could this be done by reorganizing the module
> itself for LTO?
>
> That wouldn’t help non-LTO and ThinLTO though.

This is a dimension that I think can be explored. The fact that it
wouldn't help with other modes of operation is completely orthogonal,
in particular until it's proven that this kind of optimization makes
sense with ThinLTO (and if it doesn't, it can be an optimization ran
only during full LTO).

--
Davide

"There are no solved problems; there are only problems that are more
or less solved" -- Henri Poincare




_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev




_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Your help needed: List of LLVM Open Projects 2017

Roman Popov via cfe-dev
CC: D.M.Berris. for the specific of X-ray.


On Jan 16, 2017, at 7:41 PM, Sean Silva via llvm-dev <[hidden email]> wrote:

Would it make sense for xray instrumentation be part of -fprofile-generate? PGO will affect inlining decisions etc for the optimized binary, but the collected traces during the instrumented build would still have quite a bit of useful information.

As I remember, X-ray is only adding nops at compile time and the instrumentation is only added/enabled at runtime, so I’m not sure how it’ll play with -fprofile-generate ?

— 
Mehdi



On Mon, Jan 16, 2017 at 4:33 PM, Xinliang David Li <[hidden email]> wrote:
Google GCC records profile data (dynamic callgraph) in a special named section in ELF object file to be consumed by the plugin. Those sections will be discarded later by the linker.  

There are pros and cons of using xray for layout purpose.  The call trace from xray is certainly more powerful for layout purpose, but it adds addtional complexity to the optimized build process.  You would need to collect xray trace profile on the optimized binary (presumably built with PGO already) and rebuild without xray nop insertion and function layout. 

David

On Mon, Jan 16, 2017 at 3:40 PM, Sean Silva via llvm-dev <[hidden email]> wrote:


On Mon, Jan 16, 2017 at 3:34 PM, Sean Silva <[hidden email]> wrote:


On Mon, Jan 16, 2017 at 3:32 PM, Sean Silva <[hidden email]> wrote:


On Mon, Jan 16, 2017 at 2:31 PM, Davide Italiano <[hidden email]> wrote:
On Mon, Jan 16, 2017 at 2:07 PM, Mehdi Amini <[hidden email]> wrote:
>
> On Jan 16, 2017, at 1:47 PM, Sean Silva <[hidden email]> wrote:
>
>
>
> On Mon, Jan 16, 2017 at 1:25 PM, Davide Italiano <[hidden email]> wrote:
>>
>> On Mon, Jan 16, 2017 at 12:31 PM, Sean Silva via llvm-dev
>> <[hidden email]> wrote:
>> > Do we have any open projects on LLD?
>> >
>> > I know we usually try to avoid any big "projects" and mainly add/fix
>> > things
>> > in response to user needs, but just wondering if somebody has any ideas.
>> >
>>
>> I'm not particularly active in lld anymore, but the last big item I'd
>> like to see implemented is Pettis-Hansen layout.
>> http://perso.ensta-paristech.fr/~bmonsuez/Cours/B6-4/Articles/papers15.pdf
>> (mainly because it improves performances of the final executable).
>> GCC/gold have an implementation of the algorithm that can be used as
>> base. I'll expand if anybody is interested.
>> Side note: I'd like to propose a couple of llvm projects as well, I'll
>> sit down later today and write them.
>
>
>
> I’m not sure, can you confirm that such layout optimization on ELF requires
> -ffunction-sections?
>

For the non-LTO case, I think so.

> Also, for clang on OSX the best layout we could get is to order functions in
> the order in which they get executed at runtime.
>

That's what we already do for lld. We collect and order file (run a
profiler) and pass that to the linker that lays out functions
accordingly.
This is to improve startup time for a class of startup-time-sensitive
operations. The algorithm proposed by Pettis (allegedly) aims to
reduce the TLB misses as it tries to lay out hot functions (or
functions that are likely to  be called together near in the final
binary).

IIRC from when I looked at the paper a while ago, it is mostly just a "huffman tree construction" type algorithm (agglomerating based on highest probability) and assumes that if two functions are hot then they are likely to be needed together. This is not always the case.

E.g. consider a server that accepts RPC requests and based on those requests either does Foo or Bar which are largely disjoint. It's entirely possible for the top two functions of the profile to be one in Foo and one in Bar, but laying them out near each other doesn't make sense since there is never locality (for a given RPC, either Foo or Bar gets run). A static call graph analysis can provide the needed signals to handle this case better. 


Hence you said "allegedly" :) I know we've talked about this before. Just wanted to put the backstory of the "allegedly" on the list.

Looks like I remembered this wrong. The algorithm in section 3.2 of the paper is call-graph aware. It does do greedy coalescing like a Huffman tree construction algorithms, but constrains the available coalescing operations at each step by call graph adjacency (in fact, what it is "greedy" about is the hotness of the edges between call graph nodes and not the nodes themselves).

-- Sean Silva
 

-- Sean Silva
 
-- Sean Silva
 

>
> For FullLTO it is conceptually pretty easy to get profile data we need for
> this, but I'm not sure about the ThinLTO case.
>
> Teresa, Mehdi,
>
> Are there any plans (or things already working!) for getting profile data
> from ThinLTO in a format that the linker can use for code layout? I assume
> that profile data is being used already to guide importing, so it may just
> be a matter of siphoning that off.
>
>
> I’m not sure what kind of “profile information” is needed, and what makes it
> easier for MonolithicLTO compared to ThinLTO?
>
> Or maybe that layout code should be inside LLVM; maybe part of the general
> LTO interface? It looks like the current gcc plugin calls back into gcc for
> the actual layout algorithm itself (function call
> find_pettis_hansen_function_layout) rather than the reordering logic living
> in the linker:
> https://android.googlesource.com/toolchain/gcc/+/3f73d6ef90458b45bbbb33ef4c2b174d4662a22d/gcc-4.6/function_reordering_plugin/function_reordering_plugin.c
>
>
> I was thinking about this: could this be done by reorganizing the module
> itself for LTO?
>
> That wouldn’t help non-LTO and ThinLTO though.

This is a dimension that I think can be explored. The fact that it
wouldn't help with other modes of operation is completely orthogonal,
in particular until it's proven that this kind of optimization makes
sense with ThinLTO (and if it doesn't, it can be an optimization ran
only during full LTO).

--
Davide

"There are no solved problems; there are only problems that are more
or less solved" -- Henri Poincare




_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
12