Re: [llvm-dev] [RFC] Rearchitect Gnu toolchain driver to simplify multilib support

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Rearchitect Gnu toolchain driver to simplify multilib support

via cfe-dev
I think this is more of a cfe-dev discussion, so sending my reply there.

I agree, the current situation is a mess. We've basically attempted to codify gnu multilib rules in giant piles of C++ code again and again, but we can never keep up with the changes that GCC and then distros make. The only way to win is to quit the game and pass the buck to the vendor or user. That way, whenever someone complains about clang's inability to find a header or lib, we can say with a sigh, "sorry we couldn't find it, as a workaround, patch the config file next to clang," and not, "sorry we missed it, hack in some more C++ workarounds and build your own compiler."

This seems like a two part project:
1. Define a config file format morally equivalent to spec files that we can tolerate
2. Write some scripts that interrogate a GCC installation to generate those config files

I think we would want to document explicitly that the config file format is not intended to be forwards or backwards compatible. It's purpose is to allow vendors to customize header and library search logic without hacking clang's C++ logic. The idea is that clang will attempt to make one for you, get it right 90% of the time, and let you pick up the pieces when it fails.

On Wed, Oct 3, 2018 at 10:14 AM Frank Schaefer via llvm-dev <[hidden email]> wrote:
Hi all,

I've been poking around with llvm+clang+compiler-rt, trying to get it
working on Linux ARM soft-float (yes, ARM soft-float support is pretty
broken).  Along the way I tried writing a multilib toolchain driver
for ARM soft/hard float, with only partial success.  For reference see
https://reviews.llvm.org/D52705#inline-464117.

One thing I noticed while doing this (and a few other people seem to
agree on) is that the entire Gnu toolchain driver set could be greatly
simplified.  So far, it seems like every time someone has encountered
a new multilib case (either a new arch or a new distro arrangement),
the response has been to pile on another custom multilib driver, or
add a bunch of corner-case codepaths to an existing driver.  That's
been done so many times that the existing driver set is honestly
starting to collapse under its own weight. :-(

I'm now contemplating what it would take to reduce the entire driver
set to something that simply figures out all the multilib/multiarch
distinctions by querying the existing gcc installation.  This could
theoretically cover all Gnu multilib cases in a single codepath.

Some background:

Current GNU toolchains (gcc+glibc+binutils) tend to encapsulate all
multilib knowledge in gcc, including:

* What flags trigger a specific multilib selection
* What directories are associated with a particular multilib selection
(what we know as osSuffix()/gccSuffix())
* What run-time linker (/llib/ld-<arch>.so.<ver>) to use for a
particular multilib selection

This is highly customizable at gcc build time via a bunch of
arch+OS+ABI configuration fragments in the "gcc/config" directory of
the gcc source tree, and a lot of Linux distros have taken their own
liberties with this configuration.  That's part of why clang's Gnu
toolchain driver is in the state it's in.

The rough outline of what I would propose:
1. clang's CMakeLists can scan the spec tokens for a selected gcc
installation (available via "gcc -dumpspecs") and pick out the
important tokens (so far I know this includes "*multilib",
"*multilib_matches", "*multilib_defaults", "*multilib_options", and
"*link").
2. clang's Gnu driver can be re-coded to parse the relevant spec tokens.
3. clang's Gnu driver can build up a complete unified MultilibSet
based on these tokens.

Some potential complications I anticipate:
1. I don't know how consistently gcc has used these spec tokens, or
how the formatting has evolved over time.  Mimicking the current (gcc
8.2.0) format seems sensible, but what we pull from older gcc
installations may not comport with what we expect.
2. I don't see anything in the spec tokens that describes system
header arrangement.  Vanilla multilib-enabled gcc seems to honor
/usr/include/<os-suffix> (where <os-suffix> seems to conform to the
output of "gcc <flags> -print-multiarch").  Note that this doesn't
necessarily match the osSuffix; I've produced functional GNU
toolchains that honor a standard-triple osSuffix, but don't honor
_anything_ like it under /usr/include.
3. g++, OTOH, expects all C++ headers to be under
/usr/include/c++/<version>.  Vanilla g++ keeps some headers further
subbed under <os-suffix>, with some of those further subbed again
under <gcc-suffix> for non-default multilib cases.  Just to complicate
things, Debian/Ubuntu g++ has apparently been adapted to employ the
/usr/include/<os-suffix> for multilib-specific C++ headers.  If other
distros do their own thing with this, then I see no straightforward
way to autodetect anything but a few obvious cases.

To address the above complications, I would suggest adding CMake
options for users to supply their own multilib descriptor tokens, in
case whatever's in gcc specs doesn't work for them.  We might even
allow for an extra token or two to better describe C/C++ header
layout.

This would all require a LOT of planning and testing, especially
across the multiple targets/distros the Gnu toolchain driver currently
supports.  I'm not sure how to access suitable testbeds for a lot of
it (I count myself lucky just to have a reasonably-powerful ARM
build-box).  At least initially, I think we would have to keep the old
hodgepodge driver code around alongside the new unified driver code.

--
Frank
"If a server dies in a server farm and no one pings it, does it still
cost four figures to fix?"
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Rearchitect Gnu toolchain driver to simplify multilib support

via cfe-dev
I agree with Reid that the best place for the information is some kind
of configuration file that we can generate from a GCC installation. An
old Arm proprietary compiler had a similar feature to inspect a gcc
installation to extract the library, include directory and executable
locations. The configuration part worked well and when it didn't work
it was usually fairly simple by editing the configuration file that
the parser generated (in this particular case an XML file). I think
that for clang it would be better to keep the scripts outside the
compiler so that they could be edited by the user.

For cmake options I'm guessing you are thinking about configuring
against a particular gcc installation at clang build time? I think
that this could work well for a Toolchain that bundles a clang and gcc
together in lock-step, although I'd expect people to also want the
flexibility of run-time configuration. Given the amount of cmake work
that this would entail and the potential for lots of frustrating build
failures I suggest that we tackle this part of the problem after the
run-time.

I'm interested in helping out where I can. I've unfortunately only got
experience with Arm and AArch64 and I'm mostly interested in
cross-compilation particularly for embedded systems as that is where
the largest amount of pain is. Although even if we do start with Arm
I'm sure that other architectures will be able to adapt.

Peter
On Wed, 3 Oct 2018 at 23:02, Reid Kleckner via cfe-dev
<[hidden email]> wrote:

>
> I think this is more of a cfe-dev discussion, so sending my reply there.
>
> I agree, the current situation is a mess. We've basically attempted to codify gnu multilib rules in giant piles of C++ code again and again, but we can never keep up with the changes that GCC and then distros make. The only way to win is to quit the game and pass the buck to the vendor or user. That way, whenever someone complains about clang's inability to find a header or lib, we can say with a sigh, "sorry we couldn't find it, as a workaround, patch the config file next to clang," and not, "sorry we missed it, hack in some more C++ workarounds and build your own compiler."
>
> This seems like a two part project:
> 1. Define a config file format morally equivalent to spec files that we can tolerate
> 2. Write some scripts that interrogate a GCC installation to generate those config files
>
> I think we would want to document explicitly that the config file format is not intended to be forwards or backwards compatible. It's purpose is to allow vendors to customize header and library search logic without hacking clang's C++ logic. The idea is that clang will attempt to make one for you, get it right 90% of the time, and let you pick up the pieces when it fails.
>
> On Wed, Oct 3, 2018 at 10:14 AM Frank Schaefer via llvm-dev <[hidden email]> wrote:
>>
>> Hi all,
>>
>> I've been poking around with llvm+clang+compiler-rt, trying to get it
>> working on Linux ARM soft-float (yes, ARM soft-float support is pretty
>> broken).  Along the way I tried writing a multilib toolchain driver
>> for ARM soft/hard float, with only partial success.  For reference see
>> https://reviews.llvm.org/D52705#inline-464117.
>>
>> One thing I noticed while doing this (and a few other people seem to
>> agree on) is that the entire Gnu toolchain driver set could be greatly
>> simplified.  So far, it seems like every time someone has encountered
>> a new multilib case (either a new arch or a new distro arrangement),
>> the response has been to pile on another custom multilib driver, or
>> add a bunch of corner-case codepaths to an existing driver.  That's
>> been done so many times that the existing driver set is honestly
>> starting to collapse under its own weight. :-(
>>
>> I'm now contemplating what it would take to reduce the entire driver
>> set to something that simply figures out all the multilib/multiarch
>> distinctions by querying the existing gcc installation.  This could
>> theoretically cover all Gnu multilib cases in a single codepath.
>>
>> Some background:
>>
>> Current GNU toolchains (gcc+glibc+binutils) tend to encapsulate all
>> multilib knowledge in gcc, including:
>>
>> * What flags trigger a specific multilib selection
>> * What directories are associated with a particular multilib selection
>> (what we know as osSuffix()/gccSuffix())
>> * What run-time linker (/llib/ld-<arch>.so.<ver>) to use for a
>> particular multilib selection
>>
>> This is highly customizable at gcc build time via a bunch of
>> arch+OS+ABI configuration fragments in the "gcc/config" directory of
>> the gcc source tree, and a lot of Linux distros have taken their own
>> liberties with this configuration.  That's part of why clang's Gnu
>> toolchain driver is in the state it's in.
>>
>> The rough outline of what I would propose:
>> 1. clang's CMakeLists can scan the spec tokens for a selected gcc
>> installation (available via "gcc -dumpspecs") and pick out the
>> important tokens (so far I know this includes "*multilib",
>> "*multilib_matches", "*multilib_defaults", "*multilib_options", and
>> "*link").
>> 2. clang's Gnu driver can be re-coded to parse the relevant spec tokens.
>> 3. clang's Gnu driver can build up a complete unified MultilibSet
>> based on these tokens.
>>
>> Some potential complications I anticipate:
>> 1. I don't know how consistently gcc has used these spec tokens, or
>> how the formatting has evolved over time.  Mimicking the current (gcc
>> 8.2.0) format seems sensible, but what we pull from older gcc
>> installations may not comport with what we expect.
>> 2. I don't see anything in the spec tokens that describes system
>> header arrangement.  Vanilla multilib-enabled gcc seems to honor
>> /usr/include/<os-suffix> (where <os-suffix> seems to conform to the
>> output of "gcc <flags> -print-multiarch").  Note that this doesn't
>> necessarily match the osSuffix; I've produced functional GNU
>> toolchains that honor a standard-triple osSuffix, but don't honor
>> _anything_ like it under /usr/include.
>> 3. g++, OTOH, expects all C++ headers to be under
>> /usr/include/c++/<version>.  Vanilla g++ keeps some headers further
>> subbed under <os-suffix>, with some of those further subbed again
>> under <gcc-suffix> for non-default multilib cases.  Just to complicate
>> things, Debian/Ubuntu g++ has apparently been adapted to employ the
>> /usr/include/<os-suffix> for multilib-specific C++ headers.  If other
>> distros do their own thing with this, then I see no straightforward
>> way to autodetect anything but a few obvious cases.
>>
>> To address the above complications, I would suggest adding CMake
>> options for users to supply their own multilib descriptor tokens, in
>> case whatever's in gcc specs doesn't work for them.  We might even
>> allow for an extra token or two to better describe C/C++ header
>> layout.
>>
>> This would all require a LOT of planning and testing, especially
>> across the multiple targets/distros the Gnu toolchain driver currently
>> supports.  I'm not sure how to access suitable testbeds for a lot of
>> it (I count myself lucky just to have a reasonably-powerful ARM
>> build-box).  At least initially, I think we would have to keep the old
>> hodgepodge driver code around alongside the new unified driver code.
>>
>> --
>> Frank
>> "If a server dies in a server farm and no one pings it, does it still
>> cost four figures to fix?"
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Rearchitect Gnu toolchain driver to simplify multilib support

via cfe-dev
I'm all for the standalone script support at this point.  Most of what
I'm interested in is native builds with custom distributions, but GNU
cross-compilers don't really introduce much complication for the
scripting aside from sysroot considerations (which, in my world, is
very run-time dynamic anyways).

I'm close to having a native LInux+SPARC system spun up enough to
build clang, and I could potentially borrow and spin up a native
Octeon system as well.  PPC is still out of my reach though.  I can
spin up common x86 Linux distros in VM guests all day long.

On Thu, Oct 4, 2018 at 4:59 AM Peter Smith <[hidden email]> wrote:

>
> I agree with Reid that the best place for the information is some kind
> of configuration file that we can generate from a GCC installation. An
> old Arm proprietary compiler had a similar feature to inspect a gcc
> installation to extract the library, include directory and executable
> locations. The configuration part worked well and when it didn't work
> it was usually fairly simple by editing the configuration file that
> the parser generated (in this particular case an XML file). I think
> that for clang it would be better to keep the scripts outside the
> compiler so that they could be edited by the user.
>
> For cmake options I'm guessing you are thinking about configuring
> against a particular gcc installation at clang build time? I think
> that this could work well for a Toolchain that bundles a clang and gcc
> together in lock-step, although I'd expect people to also want the
> flexibility of run-time configuration. Given the amount of cmake work
> that this would entail and the potential for lots of frustrating build
> failures I suggest that we tackle this part of the problem after the
> run-time.
>
> I'm interested in helping out where I can. I've unfortunately only got
> experience with Arm and AArch64 and I'm mostly interested in
> cross-compilation particularly for embedded systems as that is where
> the largest amount of pain is. Although even if we do start with Arm
> I'm sure that other architectures will be able to adapt.
>
> Peter
> On Wed, 3 Oct 2018 at 23:02, Reid Kleckner via cfe-dev
> <[hidden email]> wrote:
> >
> > I think this is more of a cfe-dev discussion, so sending my reply there.
> >
> > I agree, the current situation is a mess. We've basically attempted to codify gnu multilib rules in giant piles of C++ code again and again, but we can never keep up with the changes that GCC and then distros make. The only way to win is to quit the game and pass the buck to the vendor or user. That way, whenever someone complains about clang's inability to find a header or lib, we can say with a sigh, "sorry we couldn't find it, as a workaround, patch the config file next to clang," and not, "sorry we missed it, hack in some more C++ workarounds and build your own compiler."
> >
> > This seems like a two part project:
> > 1. Define a config file format morally equivalent to spec files that we can tolerate
> > 2. Write some scripts that interrogate a GCC installation to generate those config files
> >
> > I think we would want to document explicitly that the config file format is not intended to be forwards or backwards compatible. It's purpose is to allow vendors to customize header and library search logic without hacking clang's C++ logic. The idea is that clang will attempt to make one for you, get it right 90% of the time, and let you pick up the pieces when it fails.
> >
> > On Wed, Oct 3, 2018 at 10:14 AM Frank Schaefer via llvm-dev <[hidden email]> wrote:
> >>
> >> Hi all,
> >>
> >> I've been poking around with llvm+clang+compiler-rt, trying to get it
> >> working on Linux ARM soft-float (yes, ARM soft-float support is pretty
> >> broken).  Along the way I tried writing a multilib toolchain driver
> >> for ARM soft/hard float, with only partial success.  For reference see
> >> https://reviews.llvm.org/D52705#inline-464117.
> >>
> >> One thing I noticed while doing this (and a few other people seem to
> >> agree on) is that the entire Gnu toolchain driver set could be greatly
> >> simplified.  So far, it seems like every time someone has encountered
> >> a new multilib case (either a new arch or a new distro arrangement),
> >> the response has been to pile on another custom multilib driver, or
> >> add a bunch of corner-case codepaths to an existing driver.  That's
> >> been done so many times that the existing driver set is honestly
> >> starting to collapse under its own weight. :-(
> >>
> >> I'm now contemplating what it would take to reduce the entire driver
> >> set to something that simply figures out all the multilib/multiarch
> >> distinctions by querying the existing gcc installation.  This could
> >> theoretically cover all Gnu multilib cases in a single codepath.
> >>
> >> Some background:
> >>
> >> Current GNU toolchains (gcc+glibc+binutils) tend to encapsulate all
> >> multilib knowledge in gcc, including:
> >>
> >> * What flags trigger a specific multilib selection
> >> * What directories are associated with a particular multilib selection
> >> (what we know as osSuffix()/gccSuffix())
> >> * What run-time linker (/llib/ld-<arch>.so.<ver>) to use for a
> >> particular multilib selection
> >>
> >> This is highly customizable at gcc build time via a bunch of
> >> arch+OS+ABI configuration fragments in the "gcc/config" directory of
> >> the gcc source tree, and a lot of Linux distros have taken their own
> >> liberties with this configuration.  That's part of why clang's Gnu
> >> toolchain driver is in the state it's in.
> >>
> >> The rough outline of what I would propose:
> >> 1. clang's CMakeLists can scan the spec tokens for a selected gcc
> >> installation (available via "gcc -dumpspecs") and pick out the
> >> important tokens (so far I know this includes "*multilib",
> >> "*multilib_matches", "*multilib_defaults", "*multilib_options", and
> >> "*link").
> >> 2. clang's Gnu driver can be re-coded to parse the relevant spec tokens.
> >> 3. clang's Gnu driver can build up a complete unified MultilibSet
> >> based on these tokens.
> >>
> >> Some potential complications I anticipate:
> >> 1. I don't know how consistently gcc has used these spec tokens, or
> >> how the formatting has evolved over time.  Mimicking the current (gcc
> >> 8.2.0) format seems sensible, but what we pull from older gcc
> >> installations may not comport with what we expect.
> >> 2. I don't see anything in the spec tokens that describes system
> >> header arrangement.  Vanilla multilib-enabled gcc seems to honor
> >> /usr/include/<os-suffix> (where <os-suffix> seems to conform to the
> >> output of "gcc <flags> -print-multiarch").  Note that this doesn't
> >> necessarily match the osSuffix; I've produced functional GNU
> >> toolchains that honor a standard-triple osSuffix, but don't honor
> >> _anything_ like it under /usr/include.
> >> 3. g++, OTOH, expects all C++ headers to be under
> >> /usr/include/c++/<version>.  Vanilla g++ keeps some headers further
> >> subbed under <os-suffix>, with some of those further subbed again
> >> under <gcc-suffix> for non-default multilib cases.  Just to complicate
> >> things, Debian/Ubuntu g++ has apparently been adapted to employ the
> >> /usr/include/<os-suffix> for multilib-specific C++ headers.  If other
> >> distros do their own thing with this, then I see no straightforward
> >> way to autodetect anything but a few obvious cases.
> >>
> >> To address the above complications, I would suggest adding CMake
> >> options for users to supply their own multilib descriptor tokens, in
> >> case whatever's in gcc specs doesn't work for them.  We might even
> >> allow for an extra token or two to better describe C/C++ header
> >> layout.
> >>
> >> This would all require a LOT of planning and testing, especially
> >> across the multiple targets/distros the Gnu toolchain driver currently
> >> supports.  I'm not sure how to access suitable testbeds for a lot of
> >> it (I count myself lucky just to have a reasonably-powerful ARM
> >> build-box).  At least initially, I think we would have to keep the old
> >> hodgepodge driver code around alongside the new unified driver code.
> >>
> >> --
> >> Frank
> >> "If a server dies in a server farm and no one pings it, does it still
> >> cost four figures to fix?"
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> [hidden email]
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
> > _______________________________________________
> > cfe-dev mailing list
> > [hidden email]
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



--
Frank
"If a server dies in a server farm and no one pings it, does it still
cost four figures to fix?"
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev