[RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

classic Classic list List threaded Threaded
41 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

Hubert Tong via cfe-dev
*TL;DR*

We propose some non-trivial refactoring in Clang and LLVM to enable
further work on Flang driver.

*SUMMARY*
We would like to start extracting the driver/frontend code from Clang
(alongside the code that the driver/frontend depends on, e.g.
Diagnostics) and move the components that could be re-used by
non-C-based languages to LLVM. From our initial investigation we see
that these changes will impact many projects (upstream and downstream)
and will require big mechanical patches (our first attempt is
implemented in [8]). This is not ideal, but seems unavoidable in the
long-term. We would like to do this refactoring _before_ we start
implementing the Flang driver upstream (OPTION 1 below). This way we avoid:

* contaminating Clang with Fortran specific code (and vice versa)
* introducing dependency on Clang in Flang

The downside is that the refactoring is likely to be disruptive for many
projects that use Clang. We will try our best to minimise this.

Does this approach make sense? Are there any preferred alternatives? At
this stage we'd like to discuss the overall direction. If folks are in
favour, we'll send a separate RFC with a finer breakdown and more
technical details for the refactoring.

Below you will find more context for our use-case (the Flang driver) and
possible alternatives. We hope that this will help the discussion. We
would really appreciate your feedback!


*BACKGROUND*
Flang (formerly known as F18) has recently been merged into LLVM [1].
Our ambition, as a community, is to make it as flexible, robust and nice
to work with as Clang. One of the major items to address is the
implementation of a driver that would provide the flexibility and user
experience similar to that available in Clang. The F18/Flang driver was
already discussed on cfe-dev last year [2], but back then F18 (now llvm
project/flang) was a separate project. In the original proposal it was
assumed that initially Flang would depend (and extend where necessary)
Clang's driver/frontend code. Since F18/Flang was an independent
project, the refactoring of Clang/LLVM wasn't really considered. That
design has been challenged since ([3], [10]), and also not much progress
has been made. We would like to revisit that RFC from a slightly
different angle. Since Flang is now part of LLVM's monorepo, we feel
that refactoring Clang/LLVM _before_ we upstream the driver makes a lot
of sense and is the natural first step.

*ASSUMPTIONS & DESIGN GOALS*
1. We will re-use as much of the Clang's driver/frontend code as
possible (this was previously proposed in [2]).

2. We want to avoid dependencies from Flang to Clang, both long-term
(strong requirement) and short-term (might be difficult to achieve).
This has recently come up in a discussion on one of our early patches
[3] (tl;dr Steve Scalpone, the code owner of Flang, would prefer us to
avoid this dependency), and was also suggested before by Eric
Christopher [10].

3. We will move the code that can be shared between Flang and Clang (and
other projects) to LLVM. This idea has already come up on llvm-dev
before [7] (in a slightly different context, and to a slightly different
extent). The methods that are not language specific would be shared in
an LLVM library.

4. The classes/types/methods that need specific changes for Fortran will
be "copied" to Flang and adapted as needed. We should minimize (or even
eliminate) any Fortran specific code from Clang and make sure that that
lives in llvm-project/flang.

*FLANG'S DEPENDENCIES ON CLANG*
These are the dependencies on Clang that we have identified so far while
prototyping the Flang driver.

1. All the machinery related to Diagnostics & SourceLocation.

This is currently part of libclangBasic [4] and is used in _many_ places
in Clang. The official documentation [5] suggests that this could be
re-used for non-C-based languages. In particular, we feel that It would
make a lot of sense for Flang to use it. Also, separating Clang's
driver/frontend code and the diagnostics would require a lot of
refactoring for no real benefit (and we feel that Flang should re-use
Clang's driver/frontend code, see below). This dependency is used in
many places, so moving it to LLVM will require a lot of (mostly)
mechanical changes. We can't see an obvious way to split it into smaller
chunks (see also below where we discuss the impact).

2. libclangFrontend & libclangDriver

The Flang driver will use many methods from libClangDriver,
libClangFrontend and libClangFrontendTool. Driver.h and Compilation.h
from libClangDriver are responsible to call, pass the correct arguments
and execute the driver. TextDiagnosticPrinter.h takes care of printing
the driver diagnostics in case of errors.

The Flang frontend will use CompilerInstance, CompilerInvocation,
FrontendOptions, FrontendActions and Utils from libClangFrontend and
libClangFrontendTool. These methods are responsible for translating the
command line arguments to frontend Options and later to Actions to be
executed by ExecuteCompilerInvocation. The translation from arguments to
Actions happens with FrontendOption and FrontendActions. But it is the
CompilerInvocation that has the pointers for the sequence of Actions
that are required in a Compiler Instance. These methods are needed to
implement Flang driver/frontend and contain actions/method/functions
that seem to be language agnostic.

*ALTERNATIVES*
This is a summary of the alternative ways of implementing the Flang
driver. We propose OPTION 1. If there are no major objections, we will
draft a separate RFC with more technical details (we will also break it
down into smaller pieces). Otherwise, what would be your preferred
alternative and why?

OPTION 1
We avoid dependency on Clang from Day 1.

This is the ideal scenario that would guarantee that Clang and Flang are
completely separate and that the common bits stay in LLVM instead. It
would mean slower progress for us initially, but then other projects
could benefit from the refactoring sooner rather than later.

OPTION 2
We avoid dependency on clangBasic from day 1, but initially allow
dependency on libClangFrontend & libClangDriver (or other libs specific
to the driver/frontend).

The dependency on libclang{Driver|Frontend} would gradually be
removed/refactored out as the driver for Flang gains momentum. As
mentioned earlier, there is plenty of code in libClangFrontend and
libClangDriver that we'd like to re-use, but the separation between code
that's specific to C-based languages and generic driver/frontend code is
not always obvious. We think that refactoring the common bits in
libClangFrontend and libClangDriver might simply be easier once:

  * we have a Flang driver that leverages these libraries, and, as a result,
  * we understand better what we could re-use and what's not that
relevant to non-C-based languages.

OPTION 3
We initially keep the dependency on Clang and re-visit this RFC later.

This would be the least disruptive approach (at least for the time
being) and would allow us to make us the most rapid progress (i.e. we
would be focusing on implementing the features rather than refactoring).
It would also inform the future refactoring better. But it was already
pointed out that we should avoid dependencies on clang [3] and this
would be a step in the opposite direction. Also, the build requirements
for Flang would increase, and we feel that we should strive to reduce
them instead [6].

If we missed any alternatives, please bring them up.

*IMPACT ON OTHER PROJECTS*
The refactoring will have non-trivial impact on other projects:

* OPTION 1 and OPTION 2 - huge impact initially.
* OPTION 3 - no impact initially, but most likely similar impact as
OPTION 1 and OPTION 2 in the long term.

 From our initial investigation, extracting Diagnostics/SourceLocation
from clangBasic and moving it to LLVM will be the most impactful change.
Within llvm-project it is used in clang, clang-tools-extra, lldb and
polly. Most of the changes will be mechanical, but will require touching
many files. In order to get to a state where we could build libclang
using the newly defined LLVM library, we had to touch ~850 files and
make ~30k insertions/deletions. The result of this exercise is available
in our development fork of llvm-project [8].

Please note: our patches on GitHub [8] are just experiments to
illustrate the idea. It's work-in-progress that requires a lot of
polishing. When/if up-streaming this, we would need to do some
low-impact refactoring first. For example, currently ASTReader &
ASTWriter are `friends` with DiagnosticsEngine [9]. That won't be
possible when DiagnosticsEngine is moved to LLVM.


On behalf of the Arm Fortran Team,
Andrzej Warzynski

REFERENCES

[1]
https://github.com/llvm/llvm-project/commit/b98ad941a40c96c841bceb171725c925500fce6c
[2] http://lists.llvm.org/pipermail/cfe-dev/2019-June/062669.html
[3] https://reviews.llvm.org/D79092
[4]
https://github.com/llvm/llvm-project/blob/ad5d319ee85d31ee2b1ca5c29b3a10b340513fec/clang/lib/Basic/CMakeLists.txt#L45-L47
[5] https://clang.llvm.org/docs/InternalsManual.html#the-clang-basic-library
[6] http://lists.llvm.org/pipermail/flang-dev/2019-November/000061.html
[7] http://lists.llvm.org/pipermail/llvm-dev/2019-November/136743.html
[8]
https://github.com/banach-space/llvm-project/commits/andrzej/refactor_clangBasic
[9]
https://github.com/llvm/llvm-project/blob/b11ecd196540d87cb7db190d405056984740d2ce/clang/include/clang/Basic/Diagnostic.h#L985-L986
[10] https://reviews.llvm.org/D63607
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

Hubert Tong via cfe-dev
> 1. All the machinery related to Diagnostics & SourceLocation.
>
> This is currently part of libclangBasic [4] and is used in _many_ places
> in Clang. The official documentation [5] suggests that this could be
> re-used for non-C-based languages. In particular, we feel that It would
> make a lot of sense for Flang to use it. Also, separating Clang's
> driver/frontend code and the diagnostics would require a lot of
> refactoring for no real benefit (and we feel that Flang should re-use
> Clang's driver/frontend code, see below). This dependency is used in
> many places, so moving it to LLVM will require a lot of (mostly)
> mechanical changes. We can't see an obvious way to split it into smaller
> chunks (see also below where we discuss the impact).

This totally makes sense to me.  It has been a couple of decades but I
remember working with the OpenVMS GEM compiler system, and it had all
the source-management stuff in the equivalent of an LLVM library for
use by all front-ends.  A consistent view of source coordinates across
the front-end and back-end was nice to have; I don't think moving the
Clang notion into an LLVM library gets us there, but it's a step in a
direction that I like.

I can imagine that some of the driver stuff would also be reusable,
but I don't have any direct experience to draw on there.

Flang would obviously be the in-tree beneficiary; are there downstream
projects that would like this change?  I don't mean speculatively, have
you tried asking around (Julia and Rust pop to mind)?  Interested
downstream consumers would provide a bit more motivation for such a
widespread refactoring.
--paulr

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

Hubert Tong via cfe-dev


On 02/06/2020 14:43, Robinson, Paul wrote:

>
> Flang would obviously be the in-tree beneficiary; are there downstream
> projects that would like this change?  I don't mean speculatively, have
> you tried asking around (Julia and Rust pop to mind)?  Interested
> downstream consumers would provide a bit more motivation for such a
> widespread refactoring.

That's a good suggestion, thank you! I've just posted the following:

* Julia:
https://discourse.julialang.org/t/refactoring-clang-could-this-benefit-julia/40621)
* Rust:
https://internals.rust-lang.org/t/refactoring-clang-could-this-benefit-rust/12479
* Swift:
https://forums.swift.org/t/refactoring-clang-could-this-benefit-swift/37151

Hopefully somebody from those communities will comment here. If not, I
will summarise any discussion happening outside llvm-dev in a separate
message. Please let me know if you think that the above list is incomplete!

-Andrzej
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

Hubert Tong via cfe-dev
On Tue, 2 Jun 2020 at 18:05, Andrzej Warzynski via llvm-dev
<[hidden email]> wrote:

>
>
>
> On 02/06/2020 14:43, Robinson, Paul wrote:
>
> >
> > Flang would obviously be the in-tree beneficiary; are there downstream
> > projects that would like this change?  I don't mean speculatively, have
> > you tried asking around (Julia and Rust pop to mind)?  Interested
> > downstream consumers would provide a bit more motivation for such a
> > widespread refactoring.
>
> That's a good suggestion, thank you! I've just posted the following:
>
> * Julia:
> https://discourse.julialang.org/t/refactoring-clang-could-this-benefit-julia/40621)
> * Rust:
> https://internals.rust-lang.org/t/refactoring-clang-could-this-benefit-rust/12479

I can't speak for other Rust compiler developers, but personally, I
doubt that we'll find a use for such a library. Using *any* C++
library for such a significant purpose in rustc is a hard sell, for
social as well as technical reasons. We use LLVM for optimizations and
codegen because it's worth the trouble, but for things where we
already have a working solution of our own, the downsides of switching
to something different are probably overwhelming. Some examples:

* Most contributors to Rust know Rust better than C++, if they are
proficient in it at all. Even people who don't directly contribute to
the component in question may need to read the code from time to time.

* Using C++ libraries from Rust is generally possible but relatively
involved (bindings must be developed and maintained, subtle bugs in
them are likely). If templates are involved or virtual methods need to
be overriden, lots of actual logic (and not just glue code for
bindings) may have to be written in C++, too, and this might in turn
require even more Rust/C++ bindings.

* As LLVM libraries have different development processes, tools, and
communities, making a change to LLVM libraries it is harder for those
accustomed to the ways things are done in the Rust project.

In addition, there may be some serious differences in functional
requirements.For example, Rust has hygienic macros and this is
implemented by associating extra hygiene-related data to every single
source code span. The specific representation of source locations also
interacts with the way incremental compilation is done in rustc, and
likewise for diagnostics (e.g., need to record and replay diagnostics
even if the passes that emitted them are skipped because the function
wasn't modified since the last build). I haven't thought deeply about
what porting these and other aspects to Clang-derived libraries would
involve, but at first glance it seems quite difficult.

All of this sounds very pessimistic, but to be clear, I think it's
great that this question was raised! More code reuse across frontends
would be great, it just seems like it wasn't meant to be in this
specific case.

Thanks,
Hanna


> * Swift:
> https://forums.swift.org/t/refactoring-clang-could-this-benefit-swift/37151
>
> Hopefully somebody from those communities will comment here. If not, I
> will summarise any discussion happening outside llvm-dev in a separate
> message. Please let me know if you think that the above list is incomplete!
>
> -Andrzej
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

Hubert Tong via cfe-dev
In reply to this post by Hubert Tong via cfe-dev

Nice, thanks!

FWIW, we started to move OpenMP frontend code into
`llvm/lib/Frontend/OpenMP` for very similar reasons. Already then I was
expecting the driver and other parts to follow.

Another reason for driver code in "llvm-core", which I think is not
mentioned below, is LTO, or linking in general. Right now OpenMP does
weird linking stuff by invoking the driver instead of using the linker
directly. One reason is that we need to know about toolchains, e.g.,
CUDA, which we don't in the linker anymore.

Cheers,
   Johannes


On 6/2/20 7:08 AM, Andrzej Warzynski via cfe-dev wrote:
 > *TL;DR*
 >
 > We propose some non-trivial refactoring in Clang and LLVM to enable
further work on Flang driver.
 >
 > *SUMMARY*
 > We would like to start extracting the driver/frontend code from Clang
(alongside the code that the driver/frontend depends on, e.g.
Diagnostics) and move the components that could be re-used by
non-C-based languages to LLVM. From our initial investigation we see
that these changes will impact many projects (upstream and downstream)
and will require big mechanical patches (our first attempt is
implemented in [8]). This is not ideal, but seems unavoidable in the
long-term. We would like to do this refactoring _before_ we start
implementing the Flang driver upstream (OPTION 1 below). This way we avoid:
 >
 > * contaminating Clang with Fortran specific code (and vice versa)
 > * introducing dependency on Clang in Flang
 >
 > The downside is that the refactoring is likely to be disruptive for
many projects that use Clang. We will try our best to minimise this.
 >
 > Does this approach make sense? Are there any preferred alternatives?
At this stage we'd like to discuss the overall direction. If folks are
in favour, we'll send a separate RFC with a finer breakdown and more
technical details for the refactoring.
 >
 > Below you will find more context for our use-case (the Flang driver)
and possible alternatives. We hope that this will help the discussion.
We would really appreciate your feedback!
 >
 >
 > *BACKGROUND*
 > Flang (formerly known as F18) has recently been merged into LLVM [1].
Our ambition, as a community, is to make it as flexible, robust and nice
to work with as Clang. One of the major items to address is the
implementation of a driver that would provide the flexibility and user
experience similar to that available in Clang. The F18/Flang driver was
already discussed on cfe-dev last year [2], but back then F18 (now llvm
project/flang) was a separate project. In the original proposal it was
assumed that initially Flang would depend (and extend where necessary)
Clang's driver/frontend code. Since F18/Flang was an independent
project, the refactoring of Clang/LLVM wasn't really considered. That
design has been challenged since ([3], [10]), and also not much progress
has been made. We would like to revisit that RFC from a slightly
different angle. Since Flang is now part of LLVM's monorepo, we feel
that refactoring Clang/LLVM _before_ we upstream the driver makes a lot
of sense and is the natural first step.
 >
 > *ASSUMPTIONS & DESIGN GOALS*
 > 1. We will re-use as much of the Clang's driver/frontend code as
possible (this was previously proposed in [2]).
 >
 > 2. We want to avoid dependencies from Flang to Clang, both long-term
(strong requirement) and short-term (might be difficult to achieve).
This has recently come up in a discussion on one of our early patches
[3] (tl;dr Steve Scalpone, the code owner of Flang, would prefer us to
avoid this dependency), and was also suggested before by Eric
Christopher [10].
 >
 > 3. We will move the code that can be shared between Flang and Clang
(and other projects) to LLVM. This idea has already come up on llvm-dev
before [7] (in a slightly different context, and to a slightly different
extent). The methods that are not language specific would be shared in
an LLVM library.
 >
 > 4. The classes/types/methods that need specific changes for Fortran
will be "copied" to Flang and adapted as needed. We should minimize (or
even eliminate) any Fortran specific code from Clang and make sure that
that lives in llvm-project/flang.
 >
 > *FLANG'S DEPENDENCIES ON CLANG*
 > These are the dependencies on Clang that we have identified so far
while prototyping the Flang driver.
 >
 > 1. All the machinery related to Diagnostics & SourceLocation.
 >
 > This is currently part of libclangBasic [4] and is used in _many_
places in Clang. The official documentation [5] suggests that this could
be re-used for non-C-based languages. In particular, we feel that It
would make a lot of sense for Flang to use it. Also, separating Clang's
driver/frontend code and the diagnostics would require a lot of
refactoring for no real benefit (and we feel that Flang should re-use
Clang's driver/frontend code, see below). This dependency is used in
many places, so moving it to LLVM will require a lot of (mostly)
mechanical changes. We can't see an obvious way to split it into smaller
chunks (see also below where we discuss the impact).
 >
 > 2. libclangFrontend & libclangDriver
 >
 > The Flang driver will use many methods from libClangDriver,
libClangFrontend and libClangFrontendTool. Driver.h and Compilation.h
from libClangDriver are responsible to call, pass the correct arguments
and execute the driver. TextDiagnosticPrinter.h takes care of printing
the driver diagnostics in case of errors.
 >
 > The Flang frontend will use CompilerInstance, CompilerInvocation,
FrontendOptions, FrontendActions and Utils from libClangFrontend and
libClangFrontendTool. These methods are responsible for translating the
command line arguments to frontend Options and later to Actions to be
executed by ExecuteCompilerInvocation. The translation from arguments to
Actions happens with FrontendOption and FrontendActions. But it is the
CompilerInvocation that has the pointers for the sequence of Actions
that are required in a Compiler Instance. These methods are needed to
implement Flang driver/frontend and contain actions/method/functions
that seem to be language agnostic.
 >
 > *ALTERNATIVES*
 > This is a summary of the alternative ways of implementing the Flang
driver. We propose OPTION 1. If there are no major objections, we will
draft a separate RFC with more technical details (we will also break it
down into smaller pieces). Otherwise, what would be your preferred
alternative and why?
 >
 > OPTION 1
 > We avoid dependency on Clang from Day 1.
 >
 > This is the ideal scenario that would guarantee that Clang and Flang
are completely separate and that the common bits stay in LLVM instead.
It would mean slower progress for us initially, but then other projects
could benefit from the refactoring sooner rather than later.
 >
 > OPTION 2
 > We avoid dependency on clangBasic from day 1, but initially allow
dependency on libClangFrontend & libClangDriver (or other libs specific
to the driver/frontend).
 >
 > The dependency on libclang{Driver|Frontend} would gradually be
removed/refactored out as the driver for Flang gains momentum. As
mentioned earlier, there is plenty of code in libClangFrontend and
libClangDriver that we'd like to re-use, but the separation between code
that's specific to C-based languages and generic driver/frontend code is
not always obvious. We think that refactoring the common bits in
libClangFrontend and libClangDriver might simply be easier once:
 >
 >  * we have a Flang driver that leverages these libraries, and, as a
result,
 >  * we understand better what we could re-use and what's not that
relevant to non-C-based languages.
 >
 > OPTION 3
 > We initially keep the dependency on Clang and re-visit this RFC later.
 >
 > This would be the least disruptive approach (at least for the time
being) and would allow us to make us the most rapid progress (i.e. we
would be focusing on implementing the features rather than refactoring).
It would also inform the future refactoring better. But it was already
pointed out that we should avoid dependencies on clang [3] and this
would be a step in the opposite direction. Also, the build requirements
for Flang would increase, and we feel that we should strive to reduce
them instead [6].
 >
 > If we missed any alternatives, please bring them up.
 >
 > *IMPACT ON OTHER PROJECTS*
 > The refactoring will have non-trivial impact on other projects:
 >
 > * OPTION 1 and OPTION 2 - huge impact initially.
 > * OPTION 3 - no impact initially, but most likely similar impact as
OPTION 1 and OPTION 2 in the long term.
 >
 > From our initial investigation, extracting Diagnostics/SourceLocation
from clangBasic and moving it to LLVM will be the most impactful change.
Within llvm-project it is used in clang, clang-tools-extra, lldb and
polly. Most of the changes will be mechanical, but will require touching
many files. In order to get to a state where we could build libclang
using the newly defined LLVM library, we had to touch ~850 files and
make ~30k insertions/deletions. The result of this exercise is available
in our development fork of llvm-project [8].
 >
 > Please note: our patches on GitHub [8] are just experiments to
illustrate the idea. It's work-in-progress that requires a lot of
polishing. When/if up-streaming this, we would need to do some
low-impact refactoring first. For example, currently ASTReader &
ASTWriter are `friends` with DiagnosticsEngine [9]. That won't be
possible when DiagnosticsEngine is moved to LLVM.
 >
 >
 > On behalf of the Arm Fortran Team,
 > Andrzej Warzynski
 >
 > REFERENCES
 >
 > [1]
https://github.com/llvm/llvm-project/commit/b98ad941a40c96c841bceb171725c925500fce6c
 > [2] http://lists.llvm.org/pipermail/cfe-dev/2019-June/062669.html
 > [3] https://reviews.llvm.org/D79092
 > [4]
https://github.com/llvm/llvm-project/blob/ad5d319ee85d31ee2b1ca5c29b3a10b340513fec/clang/lib/Basic/CMakeLists.txt#L45-L47
 > [5]
https://clang.llvm.org/docs/InternalsManual.html#the-clang-basic-library
 > [6] http://lists.llvm.org/pipermail/flang-dev/2019-November/000061.html
 > [7] http://lists.llvm.org/pipermail/llvm-dev/2019-November/136743.html
 > [8]
https://github.com/banach-space/llvm-project/commits/andrzej/refactor_clangBasic
 > [9]
https://github.com/llvm/llvm-project/blob/b11ecd196540d87cb7db190d405056984740d2ce/clang/include/clang/Basic/Diagnostic.h#L985-L986
 > [10] https://reviews.llvm.org/D63607
 > _______________________________________________
 > cfe-dev mailing list
 > [hidden email]
 > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

Hubert Tong via cfe-dev
In reply to this post by Hubert Tong via cfe-dev
> -----Original Message-----
> From: cfe-dev <[hidden email]> On Behalf Of Robinson, Paul
> via cfe-dev
> Sent: Tuesday, June 2, 2020 6:43 AM
> To: Andrzej Warzynski <[hidden email]>; llvm-
> [hidden email]; [hidden email]; '[hidden email]' <cfe-
> [hidden email]>
> Subject: [EXT] Re: [cfe-dev] [RFC] Refactor Clang: move
> frontend/driver/diagnostics code to LLVM
>
> > 1. All the machinery related to Diagnostics & SourceLocation.
> >
> > This is currently part of libclangBasic [4] and is used in _many_ places
> > in Clang. The official documentation [5] suggests that this could be
> > re-used for non-C-based languages. In particular, we feel that It would
> > make a lot of sense for Flang to use it. Also, separating Clang's
> > driver/frontend code and the diagnostics would require a lot of
> > refactoring for no real benefit (and we feel that Flang should re-use
> > Clang's driver/frontend code, see below). This dependency is used in
> > many places, so moving it to LLVM will require a lot of (mostly)
> > mechanical changes. We can't see an obvious way to split it into smaller
> > chunks (see also below where we discuss the impact).
>
> This totally makes sense to me.  It has been a couple of decades but I
> remember working with the OpenVMS GEM compiler system, and it had all
> the source-management stuff in the equivalent of an LLVM library for
> use by all front-ends.  A consistent view of source coordinates across
> the front-end and back-end was nice to have; I don't think moving the
> Clang notion into an LLVM library gets us there, but it's a step in a
> direction that I like.

Separate from clang, LLVM itself actually has its own infrastructure for source locations and diagnostics, which is used by various tools that parse text.  See llvm/Support/SourceMgr.h.  The reason this was implemented separately is that clang's infrastructure is overly complicated for most uses.  One, it's designed to support rich source locations out of C macro expansions.  Two, there's a bunch of infrastructure for managing the diagnostics themselves: warning levels, warning groups, etc.  If you don't need either of those, the clang infrastructure is overkill, and harder to use.

It's possible that flang's needs are similar enough to clang that it actually wants all the clang infrastructure.  But I can't imagine anyone else would want to use it over simpler alternatives.

Mapping source locations from LLVM IR back to the frontend is a separate issue, which I don't think is related to this.  We probably don't want to tie LLVM IR to any specific frontend representation of source code, and there isn't really any need.  See, for example, http://llvm.org/docs/LangRef.html#inline-asm-metadata .

-Eli
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

Hubert Tong via cfe-dev
On Jun 2, 2020, at 12:51 PM, Eli Friedman via llvm-dev <[hidden email]> wrote:

Separate from clang, LLVM itself actually has its own infrastructure for source locations and diagnostics, which is used by various tools that parse text.  See llvm/Support/SourceMgr.h. The reason this was implemented separately is that clang's infrastructure is overly complicated for most uses.  One, it's designed to support rich source locations out of C macro expansions.  Two, there's a bunch of infrastructure for managing the diagnostics themselves: warning levels, warning groups, etc.  If you don't need either of those, the clang infrastructure is overkill, and harder to use.

It's possible that flang's needs are similar enough to clang that it actually wants all the clang infrastructure.  But I can't imagine anyone else would want to use it over simpler alternatives.

Mapping source locations from LLVM IR back to the frontend is a separate issue, which I don't think is related to this.  We probably don't want to tie LLVM IR to any specific frontend representation of source code, and there isn't really any need.  See, for example, http://llvm.org/docs/LangRef.html#inline-asm-metadata .

I think there is room for both of these in the LLVM world: I think about SourceMgr as the right infra to use for small tools (e.g. tblgen, the .ll/mlir parsers, etc) and something like the clang diagnostic infra as being the right thing for full compilers like Swift (or flang, clang obviously).  The warning groups and other things are important for anything with real users, and concerns like multiple dialects and source compatibility over time.

I think it would be great to generalize Clang’s infra a bit if it can be reasonably done without hurting clang.  Flang adopting it would be a great test case.

-Chris

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

Hubert Tong via cfe-dev
In reply to this post by Hubert Tong via cfe-dev
While this is a different area of the codebase, another thing that
would benefit greatly from being moved out of Clang is function call
ABI handling.  Currently, that handling is split awkwardly between
Clang and LLVM proper, forcing frontends that implement C FFI to
either recreate the Clang parts themselves (like Rust does), depend on
Clang (like Swift does), or live with FFI just not working with some
function signatures.  I'm not sure what Flang currently does, but my
understanding is that Flang does support C FFI, so it would probably
benefit from this as well.  Just something to consider. :)
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

Hubert Tong via cfe-dev
In reply to this post by Hubert Tong via cfe-dev
On Tue, 2 Jun 2020 at 05:08, Andrzej Warzynski via cfe-dev <[hidden email]> wrote:
*TL;DR*

We propose some non-trivial refactoring in Clang and LLVM to enable
further work on Flang driver.

*SUMMARY*
We would like to start extracting the driver/frontend code from Clang
(alongside the code that the driver/frontend depends on, e.g.
Diagnostics) and move the components that could be re-used by
non-C-based languages to LLVM. From our initial investigation we see
that these changes will impact many projects (upstream and downstream)
and will require big mechanical patches (our first attempt is
implemented in [8]). This is not ideal, but seems unavoidable in the
long-term. We would like to do this refactoring _before_ we start
implementing the Flang driver upstream (OPTION 1 below). This way we avoid:

* contaminating Clang with Fortran specific code (and vice versa)
* introducing dependency on Clang in Flang

The downside is that the refactoring is likely to be disruptive for many
projects that use Clang. We will try our best to minimise this.

Does this approach make sense? Are there any preferred alternatives? At
this stage we'd like to discuss the overall direction. If folks are in
favour, we'll send a separate RFC with a finer breakdown and more
technical details for the refactoring.

Below you will find more context for our use-case (the Flang driver) and
possible alternatives. We hope that this will help the discussion. We
would really appreciate your feedback!

Generally, I think this is a good idea, and a healthy direction for LLVM overall. We need to be careful to do this in a way that doesn't introduce complexity or overheads in Clang, though, so we should proceed very cautiously.

I also think you're skewing somewhat too far in favor of code reuse. Some of the Clang code you're identifying below is very carefully tuned and tailored to Clang's use, and the amount of it that could reasonably be shared with another project (carefully tuned and tailored to that project's use) is probably small, unless we heavily generalize it. Such generalization is likely not going to be worth its development and maintenance costs.
 
*BACKGROUND*
Flang (formerly known as F18) has recently been merged into LLVM [1].
Our ambition, as a community, is to make it as flexible, robust and nice
to work with as Clang. One of the major items to address is the
implementation of a driver that would provide the flexibility and user
experience similar to that available in Clang. The F18/Flang driver was
already discussed on cfe-dev last year [2], but back then F18 (now llvm
project/flang) was a separate project. In the original proposal it was
assumed that initially Flang would depend (and extend where necessary)
Clang's driver/frontend code. Since F18/Flang was an independent
project, the refactoring of Clang/LLVM wasn't really considered. That
design has been challenged since ([3], [10]), and also not much progress
has been made. We would like to revisit that RFC from a slightly
different angle. Since Flang is now part of LLVM's monorepo, we feel
that refactoring Clang/LLVM _before_ we upstream the driver makes a lot
of sense and is the natural first step.

*ASSUMPTIONS & DESIGN GOALS*
1. We will re-use as much of the Clang's driver/frontend code as
possible (this was previously proposed in [2]).

2. We want to avoid dependencies from Flang to Clang, both long-term
(strong requirement) and short-term (might be difficult to achieve).
This has recently come up in a discussion on one of our early patches
[3] (tl;dr Steve Scalpone, the code owner of Flang, would prefer us to
avoid this dependency), and was also suggested before by Eric
Christopher [10].

3. We will move the code that can be shared between Flang and Clang (and
other projects) to LLVM. This idea has already come up on llvm-dev
before [7] (in a slightly different context, and to a slightly different
extent). The methods that are not language specific would be shared in
an LLVM library.

4. The classes/types/methods that need specific changes for Fortran will
be "copied" to Flang and adapted as needed. We should minimize (or even
eliminate) any Fortran specific code from Clang and make sure that that
lives in llvm-project/flang.

*FLANG'S DEPENDENCIES ON CLANG*
These are the dependencies on Clang that we have identified so far while
prototyping the Flang driver.

1. All the machinery related to Diagnostics & SourceLocation.

This is currently part of libclangBasic [4] and is used in _many_ places
in Clang. The official documentation [5] suggests that this could be
re-used for non-C-based languages. In particular, we feel that It would
make a lot of sense for Flang to use it. Also, separating Clang's
driver/frontend code and the diagnostics would require a lot of
refactoring for no real benefit (and we feel that Flang should re-use
Clang's driver/frontend code, see below). This dependency is used in
many places, so moving it to LLVM will require a lot of (mostly)
mechanical changes. We can't see an obvious way to split it into smaller
chunks (see also below where we discuss the impact).

I do not think it is necessarily going to be reasonable to move all machinery related to SourceLocation (in particular, all of clang's SourceManager) into LLVM. The ideas and data structure underpinning SourceLocation and SourceManager are quite general (a concatenated hierarchical slab of contiguous blocks, with linear indexing within those blocks), but the details are much more specific to Clang and the C-family languages it represents. Things like the support for object-like and function-like macros, macro arguments, #include, splitting >> tokens for C++, and so on, all make sense for Clang, but probably make less sense for Fortran, where a different set of kinds of block would probably be desired instead. This isn't something that can be trivially generalized and extended, either; we carefully bit-pack various things into our block representations, and as a result, we're quite tightly fitted to the needs of Clang, and would probably not want to move away from that position.

However, I do think there is common infrastructure that can be extracted, with some significant work done to generalize the SourceManager infrastructure and make it tailorable to the needs of Clang and Flang (and any other consumers of it that might come along). I could imagine moving all of the complexity to do with what kinds of SLocEntry are supported into a traits type, and having a reusable template that can generate a data structure that the Clang and Flang SourceManagers can be implemented in terms of.

Clang's SourceLocation is probably almost directly useable as-is -- it has hardcoded assumptions about a particular bit being reserved to indicate a location within a C preprocessor macro, but we can move that to a static method on Clang SourceManager, and then I think SourceLocation can be directly shared between the two projects.

(One big asterisk on the above: will Flang want an integrated C preprocessor? If so, then we're now talking about a much larger chunk of Clang, including the lexer, preprocessor, identifier tables, the Token type, and it may be best to simply acknowledge that Flang has a dependency on Clang to supply all that, rather than moving it into LLVM.)

The layers below SourceManager -- FileManager, the VFS, and so on -- all seem like they should be reasonable to share between projects.

Some of the diagnostics engine seems reasonable to share: specifically, the tablegen-driven diagnostic table generation, most of the diagnostics engine (including support for diagnostics pragmas that change the set of warnings enabled at different source locations), and the formatting code for non-clang-specific types are all relatively reusable. If you want to reuse the TextDiagnosticPrinter, I think that will need some refactoring; it's currently tied into the specific needs of Clang's SourceManager (for handling textual inclusion and macro expansion in the way that C-family languages deal with those things). I expect it would be possible to factor out an interface that Clang could implement to provide the necessary customizations.

Before we factor out the diagnostics engine, we should fix the longstanding issue that it requires a global monolithic table covering all diagnostics, and is consequently unable to properly respect layering. I think this is very much fixable, but it requires someone to do the work to fix it :)

Looking at your branch, I immediately see a few things there that are unacceptable changes: moving clang's TokenKinds.def, Specifiers.h, and OpenCLImageTypes.def into LLVM is not OK. But I assume you're aware of that already. =)
 
2. libclangFrontend & libclangDriver

The Flang driver will use many methods from libClangDriver,
libClangFrontend and libClangFrontendTool. Driver.h and Compilation.h
from libClangDriver are responsible to call, pass the correct arguments
and execute the driver. TextDiagnosticPrinter.h takes care of printing
the driver diagnostics in case of errors.

The Flang frontend will use CompilerInstance, CompilerInvocation,
FrontendOptions, FrontendActions and Utils from libClangFrontend and
libClangFrontendTool. These methods are responsible for translating the
command line arguments to frontend Options and later to Actions to be
executed by ExecuteCompilerInvocation. The translation from arguments to
Actions happens with FrontendOption and FrontendActions. But it is the
CompilerInvocation that has the pointers for the sequence of Actions
that are required in a Compiler Instance. These methods are needed to
implement Flang driver/frontend and contain actions/method/functions
that seem to be language agnostic.

I think this is going too far in attempting to reuse Clang code. CompilerInvocation, for example, is almost exclusively dealing in parsing Clang's -cc1 flags, which I would expect to have very little overlap with Flang's flags, and CompilerInstance exists (in part) to manage and own all the Clang-specific global objects (the parser, sema, the module loader, the AST consumer). Flang should not be going anywhere near this stuff, and should be implementing its own frontend.

There may be some clang-independent parts that can be factored out, but I would expect them to be small enough that we can address them on a case-by-case basis. The interesting thing to factor out is the parsing of command-line options, but that's already been done. I think your approach here should be to assume as a baseline that you reuse none of clang's Frontend library, but if you find general pieces that can meaningfully be extracted, we can talk about those pieces in isolation.


For the driver, I think the picture is very different. It seems to me that we should only have one LLVM driver, that can build C-family languages, Fortran code, or both at the same time (or invoke lld etc). To that end, I think it would be reasonable to move clang's driver out to a separate LLVM project (maybe that's llvm/, maybe it's somewhere new such as driver/), and extend it to be able to invoke flang actions in addition to clang actions. Then the only difference between the clang and flang drivers would be which frontend is directly linked into the driver binary and which one is invoked by exec'ing a different binary. That would imply that all the parts of Clang that are depended on by the driver are also moved out (I think the main parts here are flags and diagnostics, and via the diagnostics layer, source locations).

This will require some decoupling between the Clang driver and frontend (currently Clang's Options.td contains various driver options that are marked as also being options for Clang's -cc1 mode; duplicating those in CC1Options.td is probably acceptable, if we're going to split the driver and frontend into two different projects), and some shared support code (eg, clang's sanitizers list) will presumably end up in the driver, because we don't want a driver -> *lang dependency.
 
*ALTERNATIVES*
This is a summary of the alternative ways of implementing the Flang
driver. We propose OPTION 1. If there are no major objections, we will
draft a separate RFC with more technical details (we will also break it
down into smaller pieces). Otherwise, what would be your preferred
alternative and why?

OPTION 1
We avoid dependency on Clang from Day 1.

This is the ideal scenario that would guarantee that Clang and Flang are
completely separate and that the common bits stay in LLVM instead. It
would mean slower progress for us initially, but then other projects
could benefit from the refactoring sooner rather than later.

OPTION 2
We avoid dependency on clangBasic from day 1, but initially allow
dependency on libClangFrontend & libClangDriver (or other libs specific
to the driver/frontend).

The dependency on libclang{Driver|Frontend} would gradually be
removed/refactored out as the driver for Flang gains momentum. As
mentioned earlier, there is plenty of code in libClangFrontend and
libClangDriver that we'd like to re-use, but the separation between code
that's specific to C-based languages and generic driver/frontend code is
not always obvious. We think that refactoring the common bits in
libClangFrontend and libClangDriver might simply be easier once:

  * we have a Flang driver that leverages these libraries, and, as a result,
  * we understand better what we could re-use and what's not that
relevant to non-C-based languages.

OPTION 3
We initially keep the dependency on Clang and re-visit this RFC later.

This would be the least disruptive approach (at least for the time
being) and would allow us to make us the most rapid progress (i.e. we
would be focusing on implementing the features rather than refactoring).
It would also inform the future refactoring better. But it was already
pointed out that we should avoid dependencies on clang [3] and this
would be a step in the opposite direction. Also, the build requirements
for Flang would increase, and we feel that we should strive to reduce
them instead [6].

If we missed any alternatives, please bring them up.

I don't think I can express an opinion without knowing whether you intend for Flang to ever support an integrated C preprocessor. If not, then option 1 seems appropriate. But if so, then I think we have a choice between factoring out all of clang below the parser or just acknowledging that Flang depends on Clang for its lexical layer and deciding to keep a flang -> clang dependency forever.
 
*IMPACT ON OTHER PROJECTS*
The refactoring will have non-trivial impact on other projects:

* OPTION 1 and OPTION 2 - huge impact initially.
* OPTION 3 - no impact initially, but most likely similar impact as
OPTION 1 and OPTION 2 in the long term.

 From our initial investigation, extracting Diagnostics/SourceLocation
from clangBasic and moving it to LLVM will be the most impactful change.
Within llvm-project it is used in clang, clang-tools-extra, lldb and
polly. Most of the changes will be mechanical, but will require touching
many files. In order to get to a state where we could build libclang
using the newly defined LLVM library, we had to touch ~850 files and
make ~30k insertions/deletions. The result of this exercise is available
in our development fork of llvm-project [8].

Please note: our patches on GitHub [8] are just experiments to
illustrate the idea. It's work-in-progress that requires a lot of
polishing. When/if up-streaming this, we would need to do some
low-impact refactoring first. For example, currently ASTReader &
ASTWriter are `friends` with DiagnosticsEngine [9]. That won't be
possible when DiagnosticsEngine is moved to LLVM.


On behalf of the Arm Fortran Team,
Andrzej Warzynski

REFERENCES

[1]
https://github.com/llvm/llvm-project/commit/b98ad941a40c96c841bceb171725c925500fce6c
[2] http://lists.llvm.org/pipermail/cfe-dev/2019-June/062669.html
[3] https://reviews.llvm.org/D79092
[4]
https://github.com/llvm/llvm-project/blob/ad5d319ee85d31ee2b1ca5c29b3a10b340513fec/clang/lib/Basic/CMakeLists.txt#L45-L47
[5] https://clang.llvm.org/docs/InternalsManual.html#the-clang-basic-library
[6] http://lists.llvm.org/pipermail/flang-dev/2019-November/000061.html
[7] http://lists.llvm.org/pipermail/llvm-dev/2019-November/136743.html
[8]
https://github.com/banach-space/llvm-project/commits/andrzej/refactor_clangBasic
[9]
https://github.com/llvm/llvm-project/blob/b11ecd196540d87cb7db190d405056984740d2ce/clang/include/clang/Basic/Diagnostic.h#L985-L986
[10] https://reviews.llvm.org/D63607
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

Hubert Tong via cfe-dev
In reply to this post by Hubert Tong via cfe-dev
comex via cfe-dev <[hidden email]> writes:

> While this is a different area of the codebase, another thing that
> would benefit greatly from being moved out of Clang is function call
> ABI handling.  Currently, that handling is split awkwardly between
> Clang and LLVM proper, forcing frontends that implement C FFI to
> either recreate the Clang parts themselves (like Rust does), depend on
> Clang (like Swift does), or live with FFI just not working with some
> function signatures.  I'm not sure what Flang currently does, but my
> understanding is that Flang does support C FFI, so it would probably
> benefit from this as well.  Just something to consider. :)

Yep, C interop is part of the Fortran standard so flang will have to
support it.

This has been talked about before but as yet nothing has happened with
it.  Maybe this is our opportunity.

                   -David
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

Hubert Tong via cfe-dev
In reply to this post by Hubert Tong via cfe-dev

> One big asterisk on the above: will Flang want an integrated C preprocessor?

 

Fortran compilers generally come with a preprocessor.  Typically, the Fortran cpp language looks very similar to what one might find in a C or C++ compiler, however, because the lexing rules for Fortran are quite a bit different from C and C++, the actual the implementation is quite different too.

 

Flang has an integrated pre-processor [1].

 

[1] https://github.com/flang-compiler/f18-llvm-project/blob/master/flang/documentation/Preprocessing.md

 

From: llvm-dev <[hidden email]> on behalf of Richard Smith via llvm-dev <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Tuesday, June 2, 2020 at 5:38 PM
To: Andrzej Warzynski <[hidden email]>
Cc: "[hidden email]" <[hidden email]>, flang-dev <[hidden email]>, Clang Dev <[hidden email]>
Subject: Re: [llvm-dev] [cfe-dev] [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

 

External email: Use caution opening links or attachments

 

On Tue, 2 Jun 2020 at 05:08, Andrzej Warzynski via cfe-dev <[hidden email]> wrote:

*TL;DR*

We propose some non-trivial refactoring in Clang and LLVM to enable
further work on Flang driver.

*SUMMARY*
We would like to start extracting the driver/frontend code from Clang
(alongside the code that the driver/frontend depends on, e.g.
Diagnostics) and move the components that could be re-used by
non-C-based languages to LLVM. From our initial investigation we see
that these changes will impact many projects (upstream and downstream)
and will require big mechanical patches (our first attempt is
implemented in [8]). This is not ideal, but seems unavoidable in the
long-term. We would like to do this refactoring _before_ we start
implementing the Flang driver upstream (OPTION 1 below). This way we avoid:

* contaminating Clang with Fortran specific code (and vice versa)
* introducing dependency on Clang in Flang

The downside is that the refactoring is likely to be disruptive for many
projects that use Clang. We will try our best to minimise this.

Does this approach make sense? Are there any preferred alternatives? At
this stage we'd like to discuss the overall direction. If folks are in
favour, we'll send a separate RFC with a finer breakdown and more
technical details for the refactoring.

Below you will find more context for our use-case (the Flang driver) and
possible alternatives. We hope that this will help the discussion. We
would really appreciate your feedback!

 

Generally, I think this is a good idea, and a healthy direction for LLVM overall. We need to be careful to do this in a way that doesn't introduce complexity or overheads in Clang, though, so we should proceed very cautiously.

 

I also think you're skewing somewhat too far in favor of code reuse. Some of the Clang code you're identifying below is very carefully tuned and tailored to Clang's use, and the amount of it that could reasonably be shared with another project (carefully tuned and tailored to that project's use) is probably small, unless we heavily generalize it. Such generalization is likely not going to be worth its development and maintenance costs.

 

*BACKGROUND*
Flang (formerly known as F18) has recently been merged into LLVM [1].
Our ambition, as a community, is to make it as flexible, robust and nice
to work with as Clang. One of the major items to address is the
implementation of a driver that would provide the flexibility and user
experience similar to that available in Clang. The F18/Flang driver was
already discussed on cfe-dev last year [2], but back then F18 (now llvm
project/flang) was a separate project. In the original proposal it was
assumed that initially Flang would depend (and extend where necessary)
Clang's driver/frontend code. Since F18/Flang was an independent
project, the refactoring of Clang/LLVM wasn't really considered. That
design has been challenged since ([3], [10]), and also not much progress
has been made. We would like to revisit that RFC from a slightly
different angle. Since Flang is now part of LLVM's monorepo, we feel
that refactoring Clang/LLVM _before_ we upstream the driver makes a lot
of sense and is the natural first step.

*ASSUMPTIONS & DESIGN GOALS*
1. We will re-use as much of the Clang's driver/frontend code as
possible (this was previously proposed in [2]).

2. We want to avoid dependencies from Flang to Clang, both long-term
(strong requirement) and short-term (might be difficult to achieve).
This has recently come up in a discussion on one of our early patches
[3] (tl;dr Steve Scalpone, the code owner of Flang, would prefer us to
avoid this dependency), and was also suggested before by Eric
Christopher [10].

3. We will move the code that can be shared between Flang and Clang (and
other projects) to LLVM. This idea has already come up on llvm-dev
before [7] (in a slightly different context, and to a slightly different
extent). The methods that are not language specific would be shared in
an LLVM library.

4. The classes/types/methods that need specific changes for Fortran will
be "copied" to Flang and adapted as needed. We should minimize (or even
eliminate) any Fortran specific code from Clang and make sure that that
lives in llvm-project/flang.

*FLANG'S DEPENDENCIES ON CLANG*
These are the dependencies on Clang that we have identified so far while
prototyping the Flang driver.

1. All the machinery related to Diagnostics & SourceLocation.

This is currently part of libclangBasic [4] and is used in _many_ places
in Clang. The official documentation [5] suggests that this could be
re-used for non-C-based languages. In particular, we feel that It would
make a lot of sense for Flang to use it. Also, separating Clang's
driver/frontend code and the diagnostics would require a lot of
refactoring for no real benefit (and we feel that Flang should re-use
Clang's driver/frontend code, see below). This dependency is used in
many places, so moving it to LLVM will require a lot of (mostly)
mechanical changes. We can't see an obvious way to split it into smaller
chunks (see also below where we discuss the impact).

 

I do not think it is necessarily going to be reasonable to move all machinery related to SourceLocation (in particular, all of clang's SourceManager) into LLVM. The ideas and data structure underpinning SourceLocation and SourceManager are quite general (a concatenated hierarchical slab of contiguous blocks, with linear indexing within those blocks), but the details are much more specific to Clang and the C-family languages it represents. Things like the support for object-like and function-like macros, macro arguments, #include, splitting >> tokens for C++, and so on, all make sense for Clang, but probably make less sense for Fortran, where a different set of kinds of block would probably be desired instead. This isn't something that can be trivially generalized and extended, either; we carefully bit-pack various things into our block representations, and as a result, we're quite tightly fitted to the needs of Clang, and would probably not want to move away from that position.

 

However, I do think there is common infrastructure that can be extracted, with some significant work done to generalize the SourceManager infrastructure and make it tailorable to the needs of Clang and Flang (and any other consumers of it that might come along). I could imagine moving all of the complexity to do with what kinds of SLocEntry are supported into a traits type, and having a reusable template that can generate a data structure that the Clang and Flang SourceManagers can be implemented in terms of.

 

Clang's SourceLocation is probably almost directly useable as-is -- it has hardcoded assumptions about a particular bit being reserved to indicate a location within a C preprocessor macro, but we can move that to a static method on Clang SourceManager, and then I think SourceLocation can be directly shared between the two projects.

 

(One big asterisk on the above: will Flang want an integrated C preprocessor? If so, then we're now talking about a much larger chunk of Clang, including the lexer, preprocessor, identifier tables, the Token type, and it may be best to simply acknowledge that Flang has a dependency on Clang to supply all that, rather than moving it into LLVM.)

 

The layers below SourceManager -- FileManager, the VFS, and so on -- all seem like they should be reasonable to share between projects.

 

Some of the diagnostics engine seems reasonable to share: specifically, the tablegen-driven diagnostic table generation, most of the diagnostics engine (including support for diagnostics pragmas that change the set of warnings enabled at different source locations), and the formatting code for non-clang-specific types are all relatively reusable. If you want to reuse the TextDiagnosticPrinter, I think that will need some refactoring; it's currently tied into the specific needs of Clang's SourceManager (for handling textual inclusion and macro expansion in the way that C-family languages deal with those things). I expect it would be possible to factor out an interface that Clang could implement to provide the necessary customizations.

 

Before we factor out the diagnostics engine, we should fix the longstanding issue that it requires a global monolithic table covering all diagnostics, and is consequently unable to properly respect layering. I think this is very much fixable, but it requires someone to do the work to fix it :)

 

Looking at your branch, I immediately see a few things there that are unacceptable changes: moving clang's TokenKinds.def, Specifiers.h, and OpenCLImageTypes.def into LLVM is not OK. But I assume you're aware of that already. =)

 

2. libclangFrontend & libclangDriver

The Flang driver will use many methods from libClangDriver,
libClangFrontend and libClangFrontendTool. Driver.h and Compilation.h
from libClangDriver are responsible to call, pass the correct arguments
and execute the driver. TextDiagnosticPrinter.h takes care of printing
the driver diagnostics in case of errors.

The Flang frontend will use CompilerInstance, CompilerInvocation,
FrontendOptions, FrontendActions and Utils from libClangFrontend and
libClangFrontendTool. These methods are responsible for translating the
command line arguments to frontend Options and later to Actions to be
executed by ExecuteCompilerInvocation. The translation from arguments to
Actions happens with FrontendOption and FrontendActions. But it is the
CompilerInvocation that has the pointers for the sequence of Actions
that are required in a Compiler Instance. These methods are needed to
implement Flang driver/frontend and contain actions/method/functions
that seem to be language agnostic.

 

I think this is going too far in attempting to reuse Clang code. CompilerInvocation, for example, is almost exclusively dealing in parsing Clang's -cc1 flags, which I would expect to have very little overlap with Flang's flags, and CompilerInstance exists (in part) to manage and own all the Clang-specific global objects (the parser, sema, the module loader, the AST consumer). Flang should not be going anywhere near this stuff, and should be implementing its own frontend.

 

There may be some clang-independent parts that can be factored out, but I would expect them to be small enough that we can address them on a case-by-case basis. The interesting thing to factor out is the parsing of command-line options, but that's already been done. I think your approach here should be to assume as a baseline that you reuse none of clang's Frontend library, but if you find general pieces that can meaningfully be extracted, we can talk about those pieces in isolation.

 

 

For the driver, I think the picture is very different. It seems to me that we should only have one LLVM driver, that can build C-family languages, Fortran code, or both at the same time (or invoke lld etc). To that end, I think it would be reasonable to move clang's driver out to a separate LLVM project (maybe that's llvm/, maybe it's somewhere new such as driver/), and extend it to be able to invoke flang actions in addition to clang actions. Then the only difference between the clang and flang drivers would be which frontend is directly linked into the driver binary and which one is invoked by exec'ing a different binary. That would imply that all the parts of Clang that are depended on by the driver are also moved out (I think the main parts here are flags and diagnostics, and via the diagnostics layer, source locations).

 

This will require some decoupling between the Clang driver and frontend (currently Clang's Options.td contains various driver options that are marked as also being options for Clang's -cc1 mode; duplicating those in CC1Options.td is probably acceptable, if we're going to split the driver and frontend into two different projects), and some shared support code (eg, clang's sanitizers list) will presumably end up in the driver, because we don't want a driver -> *lang dependency.

 

*ALTERNATIVES*
This is a summary of the alternative ways of implementing the Flang
driver. We propose OPTION 1. If there are no major objections, we will
draft a separate RFC with more technical details (we will also break it
down into smaller pieces). Otherwise, what would be your preferred
alternative and why?

OPTION 1
We avoid dependency on Clang from Day 1.

This is the ideal scenario that would guarantee that Clang and Flang are
completely separate and that the common bits stay in LLVM instead. It
would mean slower progress for us initially, but then other projects
could benefit from the refactoring sooner rather than later.

OPTION 2
We avoid dependency on clangBasic from day 1, but initially allow
dependency on libClangFrontend & libClangDriver (or other libs specific
to the driver/frontend).

The dependency on libclang{Driver|Frontend} would gradually be
removed/refactored out as the driver for Flang gains momentum. As
mentioned earlier, there is plenty of code in libClangFrontend and
libClangDriver that we'd like to re-use, but the separation between code
that's specific to C-based languages and generic driver/frontend code is
not always obvious. We think that refactoring the common bits in
libClangFrontend and libClangDriver might simply be easier once:

  * we have a Flang driver that leverages these libraries, and, as a result,
  * we understand better what we could re-use and what's not that
relevant to non-C-based languages.

OPTION 3
We initially keep the dependency on Clang and re-visit this RFC later.

This would be the least disruptive approach (at least for the time
being) and would allow us to make us the most rapid progress (i.e. we
would be focusing on implementing the features rather than refactoring).
It would also inform the future refactoring better. But it was already
pointed out that we should avoid dependencies on clang [3] and this
would be a step in the opposite direction. Also, the build requirements
for Flang would increase, and we feel that we should strive to reduce
them instead [6].

If we missed any alternatives, please bring them up.

 

I don't think I can express an opinion without knowing whether you intend for Flang to ever support an integrated C preprocessor. If not, then option 1 seems appropriate. But if so, then I think we have a choice between factoring out all of clang below the parser or just acknowledging that Flang depends on Clang for its lexical layer and deciding to keep a flang -> clang dependency forever.

 

*IMPACT ON OTHER PROJECTS*
The refactoring will have non-trivial impact on other projects:

* OPTION 1 and OPTION 2 - huge impact initially.
* OPTION 3 - no impact initially, but most likely similar impact as
OPTION 1 and OPTION 2 in the long term.

 From our initial investigation, extracting Diagnostics/SourceLocation
from clangBasic and moving it to LLVM will be the most impactful change.
Within llvm-project it is used in clang, clang-tools-extra, lldb and
polly. Most of the changes will be mechanical, but will require touching
many files. In order to get to a state where we could build libclang
using the newly defined LLVM library, we had to touch ~850 files and
make ~30k insertions/deletions. The result of this exercise is available
in our development fork of llvm-project [8].

Please note: our patches on GitHub [8] are just experiments to
illustrate the idea. It's work-in-progress that requires a lot of
polishing. When/if up-streaming this, we would need to do some
low-impact refactoring first. For example, currently ASTReader &
ASTWriter are `friends` with DiagnosticsEngine [9]. That won't be
possible when DiagnosticsEngine is moved to LLVM.


On behalf of the Arm Fortran Team,
Andrzej Warzynski

REFERENCES

[1]
https://github.com/llvm/llvm-project/commit/b98ad941a40c96c841bceb171725c925500fce6c
[2] http://lists.llvm.org/pipermail/cfe-dev/2019-June/062669.html
[3] https://reviews.llvm.org/D79092
[4]
https://github.com/llvm/llvm-project/blob/ad5d319ee85d31ee2b1ca5c29b3a10b340513fec/clang/lib/Basic/CMakeLists.txt#L45-L47
[5] https://clang.llvm.org/docs/InternalsManual.html#the-clang-basic-library
[6] http://lists.llvm.org/pipermail/flang-dev/2019-November/000061.html
[7] http://lists.llvm.org/pipermail/llvm-dev/2019-November/136743.html
[8]
https://github.com/banach-space/llvm-project/commits/andrzej/refactor_clangBasic
[9]
https://github.com/llvm/llvm-project/blob/b11ecd196540d87cb7db190d405056984740d2ce/clang/include/clang/Basic/Diagnostic.h#L985-L986
[10] https://reviews.llvm.org/D63607
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

Hubert Tong via cfe-dev
In reply to this post by Hubert Tong via cfe-dev
+1

Yes, we do have to deal with it in flang (in other to support the Fortran standard).  It would be great to have it moved into LLVM rather than "here and there".

--
Eric

-----Original Message-----
From: llvm-dev <[hidden email]> On Behalf Of David Greene via llvm-dev
Sent: Wednesday, June 3, 2020 8:41 AM
To: comex <[hidden email]>; Andrzej Warzynski <[hidden email]>
Cc: [hidden email]; cfe-dev <[hidden email]>; [hidden email]
Subject: Re: [llvm-dev] [cfe-dev] [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

comex via cfe-dev <[hidden email]> writes:

> While this is a different area of the codebase, another thing that
> would benefit greatly from being moved out of Clang is function call
> ABI handling.  Currently, that handling is split awkwardly between
> Clang and LLVM proper, forcing frontends that implement C FFI to
> either recreate the Clang parts themselves (like Rust does), depend on
> Clang (like Swift does), or live with FFI just not working with some
> function signatures.  I'm not sure what Flang currently does, but my
> understanding is that Flang does support C FFI, so it would probably
> benefit from this as well.  Just something to consider. :)

Yep, C interop is part of the Fortran standard so flang will have to
support it.

This has been talked about before but as yet nothing has happened with
it.  Maybe this is our opportunity.

                   -David
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information.  Any unauthorized review, use, disclosure or distribution
is prohibited.  If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

Hubert Tong via cfe-dev
In reply to this post by Hubert Tong via cfe-dev

On Tue, Jun 2, 2020 at 6:38 PM Richard Smith via llvm-dev <[hidden email]> wrote:
On Tue, 2 Jun 2020 at 05:08, Andrzej Warzynski via cfe-dev <[hidden email]> wrote:
*TL;DR*

We propose some non-trivial refactoring in Clang and LLVM to enable
further work on Flang driver.

*SUMMARY*
We would like to start extracting the driver/frontend code from Clang
(alongside the code that the driver/frontend depends on, e.g.
Diagnostics) and move the components that could be re-used by
non-C-based languages to LLVM. From our initial investigation we see
that these changes will impact many projects (upstream and downstream)
and will require big mechanical patches (our first attempt is
implemented in [8]). This is not ideal, but seems unavoidable in the
long-term. We would like to do this refactoring _before_ we start
implementing the Flang driver upstream (OPTION 1 below). This way we avoid:

* contaminating Clang with Fortran specific code (and vice versa)
* introducing dependency on Clang in Flang

The downside is that the refactoring is likely to be disruptive for many
projects that use Clang. We will try our best to minimise this.

Does this approach make sense? Are there any preferred alternatives? At
this stage we'd like to discuss the overall direction. If folks are in
favour, we'll send a separate RFC with a finer breakdown and more
technical details for the refactoring.

Below you will find more context for our use-case (the Flang driver) and
possible alternatives. We hope that this will help the discussion. We
would really appreciate your feedback!

Generally, I think this is a good idea, and a healthy direction for LLVM overall. We need to be careful to do this in a way that doesn't introduce complexity or overheads in Clang, though, so we should proceed very cautiously.

I also think you're skewing somewhat too far in favor of code reuse. Some of the Clang code you're identifying below is very carefully tuned and tailored to Clang's use, and the amount of it that could reasonably be shared with another project (carefully tuned and tailored to that project's use) is probably small, unless we heavily generalize it. Such generalization is likely not going to be worth its development and maintenance costs.
 
*BACKGROUND*
Flang (formerly known as F18) has recently been merged into LLVM [1].
Our ambition, as a community, is to make it as flexible, robust and nice
to work with as Clang. One of the major items to address is the
implementation of a driver that would provide the flexibility and user
experience similar to that available in Clang. The F18/Flang driver was
already discussed on cfe-dev last year [2], but back then F18 (now llvm
project/flang) was a separate project. In the original proposal it was
assumed that initially Flang would depend (and extend where necessary)
Clang's driver/frontend code. Since F18/Flang was an independent
project, the refactoring of Clang/LLVM wasn't really considered. That
design has been challenged since ([3], [10]), and also not much progress
has been made. We would like to revisit that RFC from a slightly
different angle. Since Flang is now part of LLVM's monorepo, we feel
that refactoring Clang/LLVM _before_ we upstream the driver makes a lot
of sense and is the natural first step.

*ASSUMPTIONS & DESIGN GOALS*
1. We will re-use as much of the Clang's driver/frontend code as
possible (this was previously proposed in [2]).

2. We want to avoid dependencies from Flang to Clang, both long-term
(strong requirement) and short-term (might be difficult to achieve).
This has recently come up in a discussion on one of our early patches
[3] (tl;dr Steve Scalpone, the code owner of Flang, would prefer us to
avoid this dependency), and was also suggested before by Eric
Christopher [10].

3. We will move the code that can be shared between Flang and Clang (and
other projects) to LLVM. This idea has already come up on llvm-dev
before [7] (in a slightly different context, and to a slightly different
extent). The methods that are not language specific would be shared in
an LLVM library.

4. The classes/types/methods that need specific changes for Fortran will
be "copied" to Flang and adapted as needed. We should minimize (or even
eliminate) any Fortran specific code from Clang and make sure that that
lives in llvm-project/flang.

*FLANG'S DEPENDENCIES ON CLANG*
These are the dependencies on Clang that we have identified so far while
prototyping the Flang driver.

1. All the machinery related to Diagnostics & SourceLocation.

This is currently part of libclangBasic [4] and is used in _many_ places
in Clang. The official documentation [5] suggests that this could be
re-used for non-C-based languages. In particular, we feel that It would
make a lot of sense for Flang to use it. Also, separating Clang's
driver/frontend code and the diagnostics would require a lot of
refactoring for no real benefit (and we feel that Flang should re-use
Clang's driver/frontend code, see below). This dependency is used in
many places, so moving it to LLVM will require a lot of (mostly)
mechanical changes. We can't see an obvious way to split it into smaller
chunks (see also below where we discuss the impact).

I do not think it is necessarily going to be reasonable to move all machinery related to SourceLocation (in particular, all of clang's SourceManager) into LLVM. The ideas and data structure underpinning SourceLocation and SourceManager are quite general (a concatenated hierarchical slab of contiguous blocks, with linear indexing within those blocks), but the details are much more specific to Clang and the C-family languages it represents. Things like the support for object-like and function-like macros, macro arguments, #include, splitting >> tokens for C++, and so on, all make sense for Clang, but probably make less sense for Fortran, where a different set of kinds of block would probably be desired instead. This isn't something that can be trivially generalized and extended, either; we carefully bit-pack various things into our block representations, and as a result, we're quite tightly fitted to the needs of Clang, and would probably not want to move away from that position.

However, I do think there is common infrastructure that can be extracted, with some significant work done to generalize the SourceManager infrastructure and make it tailorable to the needs of Clang and Flang (and any other consumers of it that might come along). I could imagine moving all of the complexity to do with what kinds of SLocEntry are supported into a traits type, and having a reusable template that can generate a data structure that the Clang and Flang SourceManagers can be implemented in terms of.

Clang's SourceLocation is probably almost directly useable as-is -- it has hardcoded assumptions about a particular bit being reserved to indicate a location within a C preprocessor macro, but we can move that to a static method on Clang SourceManager, and then I think SourceLocation can be directly shared between the two projects.

(One big asterisk on the above: will Flang want an integrated C preprocessor? If so, then we're now talking about a much larger chunk of Clang, including the lexer, preprocessor, identifier tables, the Token type, and it may be best to simply acknowledge that Flang has a dependency on Clang to supply all that, rather than moving it into LLVM.)

The layers below SourceManager -- FileManager, the VFS, and so on -- all seem like they should be reasonable to share between projects.

Some of the diagnostics engine seems reasonable to share: specifically, the tablegen-driven diagnostic table generation, most of the diagnostics engine (including support for diagnostics pragmas that change the set of warnings enabled at different source locations), and the formatting code for non-clang-specific types are all relatively reusable. If you want to reuse the TextDiagnosticPrinter, I think that will need some refactoring; it's currently tied into the specific needs of Clang's SourceManager (for handling textual inclusion and macro expansion in the way that C-family languages deal with those things). I expect it would be possible to factor out an interface that Clang could implement to provide the necessary customizations.

Before we factor out the diagnostics engine, we should fix the longstanding issue that it requires a global monolithic table covering all diagnostics, and is consequently unable to properly respect layering. I think this is very much fixable, but it requires someone to do the work to fix it :)

Looking at your branch, I immediately see a few things there that are unacceptable changes: moving clang's TokenKinds.def, Specifiers.h, and OpenCLImageTypes.def into LLVM is not OK. But I assume you're aware of that already. =)
 
2. libclangFrontend & libclangDriver

The Flang driver will use many methods from libClangDriver,
libClangFrontend and libClangFrontendTool. Driver.h and Compilation.h
from libClangDriver are responsible to call, pass the correct arguments
and execute the driver. TextDiagnosticPrinter.h takes care of printing
the driver diagnostics in case of errors.

The Flang frontend will use CompilerInstance, CompilerInvocation,
FrontendOptions, FrontendActions and Utils from libClangFrontend and
libClangFrontendTool. These methods are responsible for translating the
command line arguments to frontend Options and later to Actions to be
executed by ExecuteCompilerInvocation. The translation from arguments to
Actions happens with FrontendOption and FrontendActions. But it is the
CompilerInvocation that has the pointers for the sequence of Actions
that are required in a Compiler Instance. These methods are needed to
implement Flang driver/frontend and contain actions/method/functions
that seem to be language agnostic.

I think this is going too far in attempting to reuse Clang code. CompilerInvocation, for example, is almost exclusively dealing in parsing Clang's -cc1 flags, which I would expect to have very little overlap with Flang's flags, and CompilerInstance exists (in part) to manage and own all the Clang-specific global objects (the parser, sema, the module loader, the AST consumer). Flang should not be going anywhere near this stuff, and should be implementing its own frontend.

There may be some clang-independent parts that can be factored out, but I would expect them to be small enough that we can address them on a case-by-case basis. The interesting thing to factor out is the parsing of command-line options, but that's already been done. I think your approach here should be to assume as a baseline that you reuse none of clang's Frontend library, but if you find general pieces that can meaningfully be extracted, we can talk about those pieces in isolation.

I strongly agree here. It doesn't make sense for `CompilerInvocation` or `CompilerInstance` to know anything about Fortran as they are entirely about driving the Clang frontend (-cc1).
 
For the driver, I think the picture is very different. It seems to me that we should only have one LLVM driver, that can build C-family languages, Fortran code, or both at the same time (or invoke lld etc). To that end, I think it would be reasonable to move clang's driver out to a separate LLVM project (maybe that's llvm/, maybe it's somewhere new such as driver/), and extend it to be able to invoke flang actions in addition to clang actions. Then the only difference between the clang and flang drivers would be which frontend is directly linked into the driver binary and which one is invoked by exec'ing a different binary. That would imply that all the parts of Clang that are depended on by the driver are also moved out (I think the main parts here are flags and diagnostics, and via the diagnostics layer, source locations).

This will require some decoupling between the Clang driver and frontend (currently Clang's Options.td contains various driver options that are marked as also being options for Clang's -cc1 mode; duplicating those in CC1Options.td is probably acceptable, if we're going to split the driver and frontend into two different projects), and some shared support code (eg, clang's sanitizers list) will presumably end up in the driver, because we don't want a driver -> *lang dependency.

Also strongly agree here. Large chunks of the driver will end up being the same between Clang and Flang, but they should still be separate actions having separate `FrontendOptions` and `FrontendActions`.

I feel this is a much larger refactoring than the current changeset and description implies. I'm OK with the direction of Option 1, but am concerned with the specific implementation details that have been described so far. I'll feel much better when the concerns Richard expressed have been addressed.

- Michael Spencer
 
 
*ALTERNATIVES*
This is a summary of the alternative ways of implementing the Flang
driver. We propose OPTION 1. If there are no major objections, we will
draft a separate RFC with more technical details (we will also break it
down into smaller pieces). Otherwise, what would be your preferred
alternative and why?

OPTION 1
We avoid dependency on Clang from Day 1.

This is the ideal scenario that would guarantee that Clang and Flang are
completely separate and that the common bits stay in LLVM instead. It
would mean slower progress for us initially, but then other projects
could benefit from the refactoring sooner rather than later.

OPTION 2
We avoid dependency on clangBasic from day 1, but initially allow
dependency on libClangFrontend & libClangDriver (or other libs specific
to the driver/frontend).

The dependency on libclang{Driver|Frontend} would gradually be
removed/refactored out as the driver for Flang gains momentum. As
mentioned earlier, there is plenty of code in libClangFrontend and
libClangDriver that we'd like to re-use, but the separation between code
that's specific to C-based languages and generic driver/frontend code is
not always obvious. We think that refactoring the common bits in
libClangFrontend and libClangDriver might simply be easier once:

  * we have a Flang driver that leverages these libraries, and, as a result,
  * we understand better what we could re-use and what's not that
relevant to non-C-based languages.

OPTION 3
We initially keep the dependency on Clang and re-visit this RFC later.

This would be the least disruptive approach (at least for the time
being) and would allow us to make us the most rapid progress (i.e. we
would be focusing on implementing the features rather than refactoring).
It would also inform the future refactoring better. But it was already
pointed out that we should avoid dependencies on clang [3] and this
would be a step in the opposite direction. Also, the build requirements
for Flang would increase, and we feel that we should strive to reduce
them instead [6].

If we missed any alternatives, please bring them up.

I don't think I can express an opinion without knowing whether you intend for Flang to ever support an integrated C preprocessor. If not, then option 1 seems appropriate. But if so, then I think we have a choice between factoring out all of clang below the parser or just acknowledging that Flang depends on Clang for its lexical layer and deciding to keep a flang -> clang dependency forever.
 
*IMPACT ON OTHER PROJECTS*
The refactoring will have non-trivial impact on other projects:

* OPTION 1 and OPTION 2 - huge impact initially.
* OPTION 3 - no impact initially, but most likely similar impact as
OPTION 1 and OPTION 2 in the long term.

 From our initial investigation, extracting Diagnostics/SourceLocation
from clangBasic and moving it to LLVM will be the most impactful change.
Within llvm-project it is used in clang, clang-tools-extra, lldb and
polly. Most of the changes will be mechanical, but will require touching
many files. In order to get to a state where we could build libclang
using the newly defined LLVM library, we had to touch ~850 files and
make ~30k insertions/deletions. The result of this exercise is available
in our development fork of llvm-project [8].

Please note: our patches on GitHub [8] are just experiments to
illustrate the idea. It's work-in-progress that requires a lot of
polishing. When/if up-streaming this, we would need to do some
low-impact refactoring first. For example, currently ASTReader &
ASTWriter are `friends` with DiagnosticsEngine [9]. That won't be
possible when DiagnosticsEngine is moved to LLVM.


On behalf of the Arm Fortran Team,
Andrzej Warzynski

REFERENCES

[1]
https://github.com/llvm/llvm-project/commit/b98ad941a40c96c841bceb171725c925500fce6c
[2] http://lists.llvm.org/pipermail/cfe-dev/2019-June/062669.html
[3] https://reviews.llvm.org/D79092
[4]
https://github.com/llvm/llvm-project/blob/ad5d319ee85d31ee2b1ca5c29b3a10b340513fec/clang/lib/Basic/CMakeLists.txt#L45-L47
[5] https://clang.llvm.org/docs/InternalsManual.html#the-clang-basic-library
[6] http://lists.llvm.org/pipermail/flang-dev/2019-November/000061.html
[7] http://lists.llvm.org/pipermail/llvm-dev/2019-November/136743.html
[8]
https://github.com/banach-space/llvm-project/commits/andrzej/refactor_clangBasic
[9]
https://github.com/llvm/llvm-project/blob/b11ecd196540d87cb7db190d405056984740d2ce/clang/include/clang/Basic/Diagnostic.h#L985-L986
[10] https://reviews.llvm.org/D63607
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

Hubert Tong via cfe-dev
Extracting common code, especially the Driver code, from clang, so that it can also be used for flang seems entirely reasonable as a high-level goal.

But I'd just like to enter another vote for it to live somewhere other than under the "llvm" top-level-directory, e.g. "clang-common", "frontend-support", or any other name people would like to paint that bikeshed. :)

On Wed, Jun 3, 2020 at 5:50 PM Michael Spencer via llvm-dev <[hidden email]> wrote:

On Tue, Jun 2, 2020 at 6:38 PM Richard Smith via llvm-dev <[hidden email]> wrote:
On Tue, 2 Jun 2020 at 05:08, Andrzej Warzynski via cfe-dev <[hidden email]> wrote:
*TL;DR*

We propose some non-trivial refactoring in Clang and LLVM to enable
further work on Flang driver.

*SUMMARY*
We would like to start extracting the driver/frontend code from Clang
(alongside the code that the driver/frontend depends on, e.g.
Diagnostics) and move the components that could be re-used by
non-C-based languages to LLVM. From our initial investigation we see
that these changes will impact many projects (upstream and downstream)
and will require big mechanical patches (our first attempt is
implemented in [8]). This is not ideal, but seems unavoidable in the
long-term. We would like to do this refactoring _before_ we start
implementing the Flang driver upstream (OPTION 1 below). This way we avoid:

* contaminating Clang with Fortran specific code (and vice versa)
* introducing dependency on Clang in Flang

The downside is that the refactoring is likely to be disruptive for many
projects that use Clang. We will try our best to minimise this.

Does this approach make sense? Are there any preferred alternatives? At
this stage we'd like to discuss the overall direction. If folks are in
favour, we'll send a separate RFC with a finer breakdown and more
technical details for the refactoring.

Below you will find more context for our use-case (the Flang driver) and
possible alternatives. We hope that this will help the discussion. We
would really appreciate your feedback!

Generally, I think this is a good idea, and a healthy direction for LLVM overall. We need to be careful to do this in a way that doesn't introduce complexity or overheads in Clang, though, so we should proceed very cautiously.

I also think you're skewing somewhat too far in favor of code reuse. Some of the Clang code you're identifying below is very carefully tuned and tailored to Clang's use, and the amount of it that could reasonably be shared with another project (carefully tuned and tailored to that project's use) is probably small, unless we heavily generalize it. Such generalization is likely not going to be worth its development and maintenance costs.
 
*BACKGROUND*
Flang (formerly known as F18) has recently been merged into LLVM [1].
Our ambition, as a community, is to make it as flexible, robust and nice
to work with as Clang. One of the major items to address is the
implementation of a driver that would provide the flexibility and user
experience similar to that available in Clang. The F18/Flang driver was
already discussed on cfe-dev last year [2], but back then F18 (now llvm
project/flang) was a separate project. In the original proposal it was
assumed that initially Flang would depend (and extend where necessary)
Clang's driver/frontend code. Since F18/Flang was an independent
project, the refactoring of Clang/LLVM wasn't really considered. That
design has been challenged since ([3], [10]), and also not much progress
has been made. We would like to revisit that RFC from a slightly
different angle. Since Flang is now part of LLVM's monorepo, we feel
that refactoring Clang/LLVM _before_ we upstream the driver makes a lot
of sense and is the natural first step.

*ASSUMPTIONS & DESIGN GOALS*
1. We will re-use as much of the Clang's driver/frontend code as
possible (this was previously proposed in [2]).

2. We want to avoid dependencies from Flang to Clang, both long-term
(strong requirement) and short-term (might be difficult to achieve).
This has recently come up in a discussion on one of our early patches
[3] (tl;dr Steve Scalpone, the code owner of Flang, would prefer us to
avoid this dependency), and was also suggested before by Eric
Christopher [10].

3. We will move the code that can be shared between Flang and Clang (and
other projects) to LLVM. This idea has already come up on llvm-dev
before [7] (in a slightly different context, and to a slightly different
extent). The methods that are not language specific would be shared in
an LLVM library.

4. The classes/types/methods that need specific changes for Fortran will
be "copied" to Flang and adapted as needed. We should minimize (or even
eliminate) any Fortran specific code from Clang and make sure that that
lives in llvm-project/flang.

*FLANG'S DEPENDENCIES ON CLANG*
These are the dependencies on Clang that we have identified so far while
prototyping the Flang driver.

1. All the machinery related to Diagnostics & SourceLocation.

This is currently part of libclangBasic [4] and is used in _many_ places
in Clang. The official documentation [5] suggests that this could be
re-used for non-C-based languages. In particular, we feel that It would
make a lot of sense for Flang to use it. Also, separating Clang's
driver/frontend code and the diagnostics would require a lot of
refactoring for no real benefit (and we feel that Flang should re-use
Clang's driver/frontend code, see below). This dependency is used in
many places, so moving it to LLVM will require a lot of (mostly)
mechanical changes. We can't see an obvious way to split it into smaller
chunks (see also below where we discuss the impact).

I do not think it is necessarily going to be reasonable to move all machinery related to SourceLocation (in particular, all of clang's SourceManager) into LLVM. The ideas and data structure underpinning SourceLocation and SourceManager are quite general (a concatenated hierarchical slab of contiguous blocks, with linear indexing within those blocks), but the details are much more specific to Clang and the C-family languages it represents. Things like the support for object-like and function-like macros, macro arguments, #include, splitting >> tokens for C++, and so on, all make sense for Clang, but probably make less sense for Fortran, where a different set of kinds of block would probably be desired instead. This isn't something that can be trivially generalized and extended, either; we carefully bit-pack various things into our block representations, and as a result, we're quite tightly fitted to the needs of Clang, and would probably not want to move away from that position.

However, I do think there is common infrastructure that can be extracted, with some significant work done to generalize the SourceManager infrastructure and make it tailorable to the needs of Clang and Flang (and any other consumers of it that might come along). I could imagine moving all of the complexity to do with what kinds of SLocEntry are supported into a traits type, and having a reusable template that can generate a data structure that the Clang and Flang SourceManagers can be implemented in terms of.

Clang's SourceLocation is probably almost directly useable as-is -- it has hardcoded assumptions about a particular bit being reserved to indicate a location within a C preprocessor macro, but we can move that to a static method on Clang SourceManager, and then I think SourceLocation can be directly shared between the two projects.

(One big asterisk on the above: will Flang want an integrated C preprocessor? If so, then we're now talking about a much larger chunk of Clang, including the lexer, preprocessor, identifier tables, the Token type, and it may be best to simply acknowledge that Flang has a dependency on Clang to supply all that, rather than moving it into LLVM.)

The layers below SourceManager -- FileManager, the VFS, and so on -- all seem like they should be reasonable to share between projects.

Some of the diagnostics engine seems reasonable to share: specifically, the tablegen-driven diagnostic table generation, most of the diagnostics engine (including support for diagnostics pragmas that change the set of warnings enabled at different source locations), and the formatting code for non-clang-specific types are all relatively reusable. If you want to reuse the TextDiagnosticPrinter, I think that will need some refactoring; it's currently tied into the specific needs of Clang's SourceManager (for handling textual inclusion and macro expansion in the way that C-family languages deal with those things). I expect it would be possible to factor out an interface that Clang could implement to provide the necessary customizations.

Before we factor out the diagnostics engine, we should fix the longstanding issue that it requires a global monolithic table covering all diagnostics, and is consequently unable to properly respect layering. I think this is very much fixable, but it requires someone to do the work to fix it :)

Looking at your branch, I immediately see a few things there that are unacceptable changes: moving clang's TokenKinds.def, Specifiers.h, and OpenCLImageTypes.def into LLVM is not OK. But I assume you're aware of that already. =)
 
2. libclangFrontend & libclangDriver

The Flang driver will use many methods from libClangDriver,
libClangFrontend and libClangFrontendTool. Driver.h and Compilation.h
from libClangDriver are responsible to call, pass the correct arguments
and execute the driver. TextDiagnosticPrinter.h takes care of printing
the driver diagnostics in case of errors.

The Flang frontend will use CompilerInstance, CompilerInvocation,
FrontendOptions, FrontendActions and Utils from libClangFrontend and
libClangFrontendTool. These methods are responsible for translating the
command line arguments to frontend Options and later to Actions to be
executed by ExecuteCompilerInvocation. The translation from arguments to
Actions happens with FrontendOption and FrontendActions. But it is the
CompilerInvocation that has the pointers for the sequence of Actions
that are required in a Compiler Instance. These methods are needed to
implement Flang driver/frontend and contain actions/method/functions
that seem to be language agnostic.

I think this is going too far in attempting to reuse Clang code. CompilerInvocation, for example, is almost exclusively dealing in parsing Clang's -cc1 flags, which I would expect to have very little overlap with Flang's flags, and CompilerInstance exists (in part) to manage and own all the Clang-specific global objects (the parser, sema, the module loader, the AST consumer). Flang should not be going anywhere near this stuff, and should be implementing its own frontend.

There may be some clang-independent parts that can be factored out, but I would expect them to be small enough that we can address them on a case-by-case basis. The interesting thing to factor out is the parsing of command-line options, but that's already been done. I think your approach here should be to assume as a baseline that you reuse none of clang's Frontend library, but if you find general pieces that can meaningfully be extracted, we can talk about those pieces in isolation.

I strongly agree here. It doesn't make sense for `CompilerInvocation` or `CompilerInstance` to know anything about Fortran as they are entirely about driving the Clang frontend (-cc1).
 
For the driver, I think the picture is very different. It seems to me that we should only have one LLVM driver, that can build C-family languages, Fortran code, or both at the same time (or invoke lld etc). To that end, I think it would be reasonable to move clang's driver out to a separate LLVM project (maybe that's llvm/, maybe it's somewhere new such as driver/), and extend it to be able to invoke flang actions in addition to clang actions. Then the only difference between the clang and flang drivers would be which frontend is directly linked into the driver binary and which one is invoked by exec'ing a different binary. That would imply that all the parts of Clang that are depended on by the driver are also moved out (I think the main parts here are flags and diagnostics, and via the diagnostics layer, source locations).

This will require some decoupling between the Clang driver and frontend (currently Clang's Options.td contains various driver options that are marked as also being options for Clang's -cc1 mode; duplicating those in CC1Options.td is probably acceptable, if we're going to split the driver and frontend into two different projects), and some shared support code (eg, clang's sanitizers list) will presumably end up in the driver, because we don't want a driver -> *lang dependency.

Also strongly agree here. Large chunks of the driver will end up being the same between Clang and Flang, but they should still be separate actions having separate `FrontendOptions` and `FrontendActions`.

I feel this is a much larger refactoring than the current changeset and description implies. I'm OK with the direction of Option 1, but am concerned with the specific implementation details that have been described so far. I'll feel much better when the concerns Richard expressed have been addressed.

- Michael Spencer
 
 
*ALTERNATIVES*
This is a summary of the alternative ways of implementing the Flang
driver. We propose OPTION 1. If there are no major objections, we will
draft a separate RFC with more technical details (we will also break it
down into smaller pieces). Otherwise, what would be your preferred
alternative and why?

OPTION 1
We avoid dependency on Clang from Day 1.

This is the ideal scenario that would guarantee that Clang and Flang are
completely separate and that the common bits stay in LLVM instead. It
would mean slower progress for us initially, but then other projects
could benefit from the refactoring sooner rather than later.

OPTION 2
We avoid dependency on clangBasic from day 1, but initially allow
dependency on libClangFrontend & libClangDriver (or other libs specific
to the driver/frontend).

The dependency on libclang{Driver|Frontend} would gradually be
removed/refactored out as the driver for Flang gains momentum. As
mentioned earlier, there is plenty of code in libClangFrontend and
libClangDriver that we'd like to re-use, but the separation between code
that's specific to C-based languages and generic driver/frontend code is
not always obvious. We think that refactoring the common bits in
libClangFrontend and libClangDriver might simply be easier once:

  * we have a Flang driver that leverages these libraries, and, as a result,
  * we understand better what we could re-use and what's not that
relevant to non-C-based languages.

OPTION 3
We initially keep the dependency on Clang and re-visit this RFC later.

This would be the least disruptive approach (at least for the time
being) and would allow us to make us the most rapid progress (i.e. we
would be focusing on implementing the features rather than refactoring).
It would also inform the future refactoring better. But it was already
pointed out that we should avoid dependencies on clang [3] and this
would be a step in the opposite direction. Also, the build requirements
for Flang would increase, and we feel that we should strive to reduce
them instead [6].

If we missed any alternatives, please bring them up.

I don't think I can express an opinion without knowing whether you intend for Flang to ever support an integrated C preprocessor. If not, then option 1 seems appropriate. But if so, then I think we have a choice between factoring out all of clang below the parser or just acknowledging that Flang depends on Clang for its lexical layer and deciding to keep a flang -> clang dependency forever.
 
*IMPACT ON OTHER PROJECTS*
The refactoring will have non-trivial impact on other projects:

* OPTION 1 and OPTION 2 - huge impact initially.
* OPTION 3 - no impact initially, but most likely similar impact as
OPTION 1 and OPTION 2 in the long term.

 From our initial investigation, extracting Diagnostics/SourceLocation
from clangBasic and moving it to LLVM will be the most impactful change.
Within llvm-project it is used in clang, clang-tools-extra, lldb and
polly. Most of the changes will be mechanical, but will require touching
many files. In order to get to a state where we could build libclang
using the newly defined LLVM library, we had to touch ~850 files and
make ~30k insertions/deletions. The result of this exercise is available
in our development fork of llvm-project [8].

Please note: our patches on GitHub [8] are just experiments to
illustrate the idea. It's work-in-progress that requires a lot of
polishing. When/if up-streaming this, we would need to do some
low-impact refactoring first. For example, currently ASTReader &
ASTWriter are `friends` with DiagnosticsEngine [9]. That won't be
possible when DiagnosticsEngine is moved to LLVM.


On behalf of the Arm Fortran Team,
Andrzej Warzynski

REFERENCES

[1]
https://github.com/llvm/llvm-project/commit/b98ad941a40c96c841bceb171725c925500fce6c
[2] http://lists.llvm.org/pipermail/cfe-dev/2019-June/062669.html
[3] https://reviews.llvm.org/D79092
[4]
https://github.com/llvm/llvm-project/blob/ad5d319ee85d31ee2b1ca5c29b3a10b340513fec/clang/lib/Basic/CMakeLists.txt#L45-L47
[5] https://clang.llvm.org/docs/InternalsManual.html#the-clang-basic-library
[6] http://lists.llvm.org/pipermail/flang-dev/2019-November/000061.html
[7] http://lists.llvm.org/pipermail/llvm-dev/2019-November/136743.html
[8]
https://github.com/banach-space/llvm-project/commits/andrzej/refactor_clangBasic
[9]
https://github.com/llvm/llvm-project/blob/b11ecd196540d87cb7db190d405056984740d2ce/clang/include/clang/Basic/Diagnostic.h#L985-L986
[10] https://reviews.llvm.org/D63607
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

Hubert Tong via cfe-dev
In reply to this post by Hubert Tong via cfe-dev


On Jun 2, 2020, at 4:21 PM, comex via cfe-dev <[hidden email]> wrote:

While this is a different area of the codebase, another thing that
would benefit greatly from being moved out of Clang is function call
ABI handling.  Currently, that handling is split awkwardly between
Clang and LLVM proper, forcing frontends that implement C FFI to
either recreate the Clang parts themselves (like Rust does), depend on
Clang (like Swift does), or live with FFI just not working with some
function signatures.  I'm not sure what Flang currently does, but my
understanding is that Flang does support C FFI, so it would probably
benefit from this as well.  Just something to consider. :)

For what its worth, I think there is a pretty clear path on this, but it hinges on Clang moving to MLIR as its code generation backend (an intermediary to generating LLVM IR).

The approach is to factor the ABI lower part of clang out of Clang itself into a specific dialect lowering pass, that works on a generic C type system (plus callout to extended type systems).  MLIR has all the infra to support this, it is just a massive job to refactor all the things to change clang’s architecture.

I also don’t think there is broad consensus on the direction for Clang here, but given that Flang is already using MLIR for this, maybe it would make sense to start work there.

If you’re curious, I co-delivered a talk about this recently, the slides are available here.

-Chris


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Clang/LLVM function ABI lowering (was: Re: [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM)

Hubert Tong via cfe-dev
While MLIR may be one part of the solution, I think it's also the case that the function-ABI interface between Clang and LLVM is just wrong and should be fixed -- independently of whether Clang might use MLIR in the future.

I've mentioned this idea before, I think, but never got around to writing up a real proposal. And I still haven't. Maybe this email could inspire someone else to work on that.

Essentially, I'd like to see the code in Clang responsible for function parameter-type mangling as part of its ABI lowering deleted. Currently, there is a secret "LLVM IR" ABI used between Clang and LLVM, which involves expanding some arguments into multiple arguments, adding a smattering of "inreg" or "byval" attributes, and converting some types into other types. All in a completely target-dependent, complex, and undocumented manner.

So, while the IR function syntax appears at first glance to be generic and target-independent, that's not at all true. Sadly, in some cases, clang must even know how many registers different calling conventions use, and count numbers of available registers left, in order to choose the right set of those "generic" attributes to put on a parameter.

So: not only does a frontend need to understand the C ABI rules, they also need to understand that complex dance for how to convert that into LLVM IR -- and that's both completely undocumented, and a huge mess.

Instead, I believe clang should always pass function parameters in a "naive" fashion. E.g. if a parameter type is "struct X", the llvm function should be lowered to LLVM IR with a function parameter of type %struct.X. The decision on whether to then pass that in a register (or multiple registers), on the stack, padded and then passed on the stack, etc, should be the responsibility of LLVM. Only in the case of C++ types which must be passed indirectly for correctness, independent of calling convention ABI, should clang be explicitly making the decision to pass indirectly.

Of course, the tricky part is that LLVM doesn't -- and shouldn't -- have the full C type system available to it, and the full C type system typically is required to evaluate the ABI rules (e.g., distinguishing a "_Complex float" from a struct containing two floats).

Therefore, in order to communicate the correct ABI information to LLVM, I'd like clang to also emit explicitly-ABI-specific data (metadata?), reflecting the extra information that the ABI rules require the backend to know about the type. E.g., for X86_64, clang needs to inform LLVM of the classification for each parameter's type into MEMORY, INTEGER, SSE, SSEUP, X87, X87UP, COMPLEX_X87. Or, for PPC64 elfv2, Clang needs to inform LLVM when a structure should be treated as a "homogenous aggregate" of floating-point or vector type. (In both cases, that information cannot correctly be extracted from the LLVM IR struct type, only from the C type system.)

We should document what data is needed, for each architecture/abi. This required data should be as straightforward an application of the ABI document's rules as possible -- and be only the minimum data necessary.

If this is done, frontends (either a new one, or Clang itself) who want to use the C ABI have a significantly simpler task. It remains non-trivial -- you do still need to understand ABI-specific rules, and write ABI-specific code to generate ABI-specific metadata. But, at least the interface boundary has become something which is readily-understandable and implementable based on the ABI documents.

All that said, an MLIR encoding of the C type system can still be useful -- it could contain the code which distills the C types into the ABI-specific metadata. But, I  see that as less important than getting the fundamentals in LLVM-IR into a better shape. Even frontends without a C type system representation should still be able to generate LLVM IR which conforms in their own manner to the documented ABIs -- without it being super painful. Also, the code in Clang now is really confusing, and nearly unmaintainable; it would be a clear improvement to be able to eliminate the majority of it, not just move it into an MLIR dialect.

On Wed, Jun 3, 2020 at 7:26 PM Chris Lattner via cfe-dev <[hidden email]> wrote:
On Jun 2, 2020, at 4:21 PM, comex via cfe-dev <[hidden email]> wrote:

While this is a different area of the codebase, another thing that
would benefit greatly from being moved out of Clang is function call
ABI handling.  Currently, that handling is split awkwardly between
Clang and LLVM proper, forcing frontends that implement C FFI to
either recreate the Clang parts themselves (like Rust does), depend on
Clang (like Swift does), or live with FFI just not working with some
function signatures.  I'm not sure what Flang currently does, but my
understanding is that Flang does support C FFI, so it would probably
benefit from this as well.  Just something to consider. :)

For what its worth, I think there is a pretty clear path on this, but it hinges on Clang moving to MLIR as its code generation backend (an intermediary to generating LLVM IR).

The approach is to factor the ABI lower part of clang out of Clang itself into a specific dialect lowering pass, that works on a generic C type system (plus callout to extended type systems).  MLIR has all the infra to support this, it is just a massive job to refactor all the things to change clang’s architecture.

I also don’t think there is broad consensus on the direction for Clang here, but given that Flang is already using MLIR for this, maybe it would make sense to start work there.

If you’re curious, I co-delivered a talk about this recently, the slides are available here.

-Chris

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

Hubert Tong via cfe-dev
In reply to this post by Hubert Tong via cfe-dev
On Wed, Jun 3, 2020 at 4:26 PM Chris Lattner <[hidden email]> wrote:

> On Jun 2, 2020, at 4:21 PM, comex via cfe-dev <[hidden email]> wrote:
>
> While this is a different area of the codebase, another thing that
> would benefit greatly from being moved out of Clang is function call
> ABI handling.  Currently, that handling is split awkwardly between
> Clang and LLVM proper, forcing frontends that implement C FFI to
> either recreate the Clang parts themselves (like Rust does), depend on
> Clang (like Swift does), or live with FFI just not working with some
> function signatures.  I'm not sure what Flang currently does, but my
> understanding is that Flang does support C FFI, so it would probably
> benefit from this as well.  Just something to consider. :)
>
>
> For what its worth, I think there is a pretty clear path on this, but it hinges on Clang moving to MLIR as its code generation backend (an intermediary to generating LLVM IR).

I'd be interested in seeing a higher-level Clang IR for many different
reasons. :) On the other hand, when it comes to calling conventions,
at least some of the things currently handled by Clang seem like they
would fit well into the existing LLVM IR.

For example, this C code, compiled for x86-64 Unix:

struct foo { uint64_t a, b; };
struct foo get_foo() { return (struct foo){0, 1}; }

is translated straightforwardly to LLVM IR (trimmed for readability):

define { i64, i64 } @get_foo() {
 ret { i64, i64 } { i64 0, i64 1 }
}

and the generated assembly returns the values in RAX and RDX,
corresponding to the C ABI.

If you add a third field to the struct, the ABI demands the struct be
returned in memory with a hidden parameter.  Rather than leave this to
LLVM, Clang implements this itself, generating IR like:

define void @get_foo(%struct.foo* noalias nocapture sret align 8 %0) {
 // ...
}

If, however, you instead modify the IR from the two-field case to add
a third field:

define { i64, i64, i64 } @get_foo() {
 ret { i64, i64, i64 } { i64 0, i64 1, i64 2 }
}

...LLVM accepts it, but the generated assembly returns the values in
RAX, RDX, and *RCX*, which is not part of the ABI at all!

If you proceed to add a fourth field, LLVM suddenly decides to handle
the out-parameter transformation itself, so all is well again.  Except
that the transformation seemingly happens too late in the pipeline, so
the generated code isn't vectorized.  (I'm not sure exactly how this
works.)

In these examples, I'd say LLVM IR is capable of expressing the
desired semantics ('follow the C ABI for returning a struct with these
fields'), and LLVM tries to implement those semantics, but it's
slightly off.  And then Clang papers over that by reimplementing parts
of those semantics itself, and only generating IR that LLVM does
handle correctly.  This seems inelegant to me; it would be better if
LLVM just 'did the right thing' here.

On the other hand, LLVM IR struct returns aren't currently expressive
enough to handle *all* C/C++ struct returns; you run into problems
with things like C++ guaranteed copy elision (which effectively
exposes the out pointer directly to user code), and call ABIs
depending on arcane C++ concepts like 'trivial for the purposes of
calls'.  I suppose a higher-level IR might help here... but I think an
ideal design might put parts of this in LLVM IR as well.  I'm not
sure.
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

Hubert Tong via cfe-dev
In reply to this post by Hubert Tong via cfe-dev
+1 for clang emitting MLIR.

Best regards,
Alexey Bataev

3 июня 2020 г., в 19:26, Chris Lattner via llvm-dev <[hidden email]> написал(а):



On Jun 2, 2020, at 4:21 PM, comex via cfe-dev <[hidden email]> wrote:

While this is a different area of the codebase, another thing that
would benefit greatly from being moved out of Clang is function call
ABI handling.  Currently, that handling is split awkwardly between
Clang and LLVM proper, forcing frontends that implement C FFI to
either recreate the Clang parts themselves (like Rust does), depend on
Clang (like Swift does), or live with FFI just not working with some
function signatures.  I'm not sure what Flang currently does, but my
understanding is that Flang does support C FFI, so it would probably
benefit from this as well.  Just something to consider. :)

For what its worth, I think there is a pretty clear path on this, but it hinges on Clang moving to MLIR as its code generation backend (an intermediary to generating LLVM IR).

The approach is to factor the ABI lower part of clang out of Clang itself into a specific dialect lowering pass, that works on a generic C type system (plus callout to extended type systems).  MLIR has all the infra to support this, it is just a massive job to refactor all the things to change clang’s architecture.

I also don’t think there is broad consensus on the direction for Clang here, but given that Flang is already using MLIR for this, maybe it would make sense to start work there.

If you’re curious, I co-delivered a talk about this recently, the slides are available here.

-Chris


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Clang/LLVM function ABI lowering (was: Re: [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM)

Hubert Tong via cfe-dev
In reply to this post by Hubert Tong via cfe-dev

In LLVM, ABI information currently comes from three sources:

 

  1. The function type
  2. The calling convention
  3. Attributes

 

I’m getting the following from your description of what you think needs to change:

  1. ABI attributes shouldn’t be mixed with other attributes; we should have some data structure dedicated to ABI information.
  2. ABI information should be explicitly target-specific: instead of using attributes like “inreg” that have target-specific meanings, each target-specific ABI attribute should have its own target-specific name.
  3. We should depend more on explicit ABI information, as opposed to depending on each target’s default rules.
  4. We should document each ABI supported by clang.
  5. LLVM function types should be fixed to correspond more closely to C function types.

 

I think messing with LLVM function types is a giant sinkhole that would destroy any comprehensive proposal, though.  The LLVM type system is not the C type system; LLVM structs are not C structs, and LLVM functions are not C functions.  And any changes are very high impact: messing with struct or function types would impact basically every file in LLVM.

 

From the standpoint of LLVM IR optimizations and lowering, the place we’re currently at with function types is actually pretty convenient, mostly, even if generating LLVM IR is inconvenient. Making the LLVM IR representation closer to the machine, as opposed to the frontend, is good for optimization: it’s hard to model the cost of code implicitly generated during isel.  And first-class structs/arrays are pretty awful to work with in LLVM IR; optimizations strongly prefer working with simple values.  Really, I think we want to break up arguments more, not less.

 

I agree the way ABI markings are represented in IR is lacking, though, and we need ABI-specific documentation for the way the lowering works. Wrapping up the current clang code in a friendlier interface only goes so far.

 

-Eli

 

From: llvm-dev <[hidden email]> On Behalf Of James Y Knight via llvm-dev
Sent: Wednesday, June 3, 2020 9:54 PM
To: Chris Lattner <[hidden email]>
Cc: [hidden email]; cfe-dev <[hidden email]>; [hidden email]
Subject: [EXT] Re: [llvm-dev] [cfe-dev] Clang/LLVM function ABI lowering (was: Re: [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM)

 

While MLIR may be one part of the solution, I think it's also the case that the function-ABI interface between Clang and LLVM is just wrong and should be fixed -- independently of whether Clang might use MLIR in the future.

 

I've mentioned this idea before, I think, but never got around to writing up a real proposal. And I still haven't. Maybe this email could inspire someone else to work on that.

 

Essentially, I'd like to see the code in Clang responsible for function parameter-type mangling as part of its ABI lowering deleted. Currently, there is a secret "LLVM IR" ABI used between Clang and LLVM, which involves expanding some arguments into multiple arguments, adding a smattering of "inreg" or "byval" attributes, and converting some types into other types. All in a completely target-dependent, complex, and undocumented manner.

 

So, while the IR function syntax appears at first glance to be generic and target-independent, that's not at all true. Sadly, in some cases, clang must even know how many registers different calling conventions use, and count numbers of available registers left, in order to choose the right set of those "generic" attributes to put on a parameter.

 

So: not only does a frontend need to understand the C ABI rules, they also need to understand that complex dance for how to convert that into LLVM IR -- and that's both completely undocumented, and a huge mess.

 

Instead, I believe clang should always pass function parameters in a "naive" fashion. E.g. if a parameter type is "struct X", the llvm function should be lowered to LLVM IR with a function parameter of type %struct.X. The decision on whether to then pass that in a register (or multiple registers), on the stack, padded and then passed on the stack, etc, should be the responsibility of LLVM. Only in the case of C++ types which must be passed indirectly for correctness, independent of calling convention ABI, should clang be explicitly making the decision to pass indirectly.

 

Of course, the tricky part is that LLVM doesn't -- and shouldn't -- have the full C type system available to it, and the full C type system typically is required to evaluate the ABI rules (e.g., distinguishing a "_Complex float" from a struct containing two floats).

 

Therefore, in order to communicate the correct ABI information to LLVM, I'd like clang to also emit explicitly-ABI-specific data (metadata?), reflecting the extra information that the ABI rules require the backend to know about the type. E.g., for X86_64, clang needs to inform LLVM of the classification for each parameter's type into MEMORY, INTEGER, SSE, SSEUP, X87, X87UP, COMPLEX_X87. Or, for PPC64 elfv2, Clang needs to inform LLVM when a structure should be treated as a "homogenous aggregate" of floating-point or vector type. (In both cases, that information cannot correctly be extracted from the LLVM IR struct type, only from the C type system.)

 

We should document what data is needed, for each architecture/abi. This required data should be as straightforward an application of the ABI document's rules as possible -- and be only the minimum data necessary.

 

If this is done, frontends (either a new one, or Clang itself) who want to use the C ABI have a significantly simpler task. It remains non-trivial -- you do still need to understand ABI-specific rules, and write ABI-specific code to generate ABI-specific metadata. But, at least the interface boundary has become something which is readily-understandable and implementable based on the ABI documents.

 

All that said, an MLIR encoding of the C type system can still be useful -- it could contain the code which distills the C types into the ABI-specific metadata. But, I  see that as less important than getting the fundamentals in LLVM-IR into a better shape. Even frontends without a C type system representation should still be able to generate LLVM IR which conforms in their own manner to the documented ABIs -- without it being super painful. Also, the code in Clang now is really confusing, and nearly unmaintainable; it would be a clear improvement to be able to eliminate the majority of it, not just move it into an MLIR dialect.

 

On Wed, Jun 3, 2020 at 7:26 PM Chris Lattner via cfe-dev <[hidden email]> wrote:

On Jun 2, 2020, at 4:21 PM, comex via cfe-dev <[hidden email]> wrote:

 

While this is a different area of the codebase, another thing that
would benefit greatly from being moved out of Clang is function call
ABI handling.  Currently, that handling is split awkwardly between
Clang and LLVM proper, forcing frontends that implement C FFI to
either recreate the Clang parts themselves (like Rust does), depend on
Clang (like Swift does), or live with FFI just not working with some
function signatures.  I'm not sure what Flang currently does, but my
understanding is that Flang does support C FFI, so it would probably
benefit from this as well.  Just something to consider. :)

 

For what its worth, I think there is a pretty clear path on this, but it hinges on Clang moving to MLIR as its code generation backend (an intermediary to generating LLVM IR).

 

The approach is to factor the ABI lower part of clang out of Clang itself into a specific dialect lowering pass, that works on a generic C type system (plus callout to extended type systems).  MLIR has all the infra to support this, it is just a massive job to refactor all the things to change clang’s architecture.

 

I also don’t think there is broad consensus on the direction for Clang here, but given that Flang is already using MLIR for this, maybe it would make sense to start work there.

 

If you’re curious, I co-delivered a talk about this recently, the slides are available here.

 

-Chris

 

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Clang/LLVM function ABI lowering (was: Re: [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM)

Hubert Tong via cfe-dev


On Thu, Jun 4, 2020 at 11:47 AM Eli Friedman <[hidden email]> wrote:

In LLVM, ABI information currently comes from three sources:

 

  1. The function type
  2. The calling convention
  3. Attributes

 

I’m getting the following from your description of what you think needs to change:
1. ABI attributes shouldn’t be mixed with other attributes; we should have some data structure dedicated to ABI information.
2. ABI information should be explicitly target-specific: instead of using attributes like “inreg” that have target-specific meanings, each target-specific ABI attribute should have its own target-specific name.
3. We should depend more on explicit ABI information, as opposed to depending on each target’s default rules.

I think reasonable default rules are likely to remain useful, especially for basic non-aggregate types. But, yes, all that.

4. We should document each ABI supported by clang.
5. LLVM function types should be fixed to correspond more closely to C function types.
 
I'd rephrase 5 as "LLVM function types should be more ABI-agnostic". Although your bullet fairly reflects what I said in my initial email, having LLVM function types match C function types is not what I meant to propose. Rather, what I'd like is for the C function type -> IR function-type mapping to be as independent from the calling convention as possible.

I think messing with LLVM function types is a giant sinkhole that would destroy any comprehensive proposal, though.  The LLVM type system is not the C type system; LLVM structs are not C structs, and LLVM functions are not C functions.  And any changes are very high impact: messing with struct or function types would impact basically every file in LLVM.


I think we do not need to extend the LLVM type system. It is sufficient to represent what is needed. C unions and structs do not, and will not, be convertible 1:1 into LLVM structs. I don't propose to change that.

From the standpoint of LLVM IR optimizations and lowering, the place we’re currently at with function types is actually pretty convenient, mostly, even if generating LLVM IR is inconvenient. Making the LLVM IR representation closer to the machine, as opposed to the frontend, is good for optimization: it’s hard to model the cost of code implicitly generated during isel.  And first-class structs/arrays are pretty awful to work with in LLVM IR; optimizations strongly prefer working with simple values.  Really, I think we want to break up arguments more, not less.


The IR representation isn't the machine for function calls today. And, indeed, this causes some trouble already. For example, we generate inefficient code for:
void bar(int*);
void foo(int a) { bar(&a); }

If "a" is passed on the stack (e.g. 32-bit x86), we load it from that parameter slot, allocate a new stack slot, store the value there, then pass the address of the new stack slot to bar. It's silly. We ought to simply pass the address where the variable is already being kept.

And, yes, my proposal is to take this even further -- implementing my proposal would make this current inefficiency show up much more often than it does now. Which, yes, means we will need to actually solve the issue! I don't have an immediate proposal, but I don't see any reason to think it's unsolvable.

I agree the way ABI markings are represented in IR is lacking, though, and we need ABI-specific documentation for the way the lowering works. Wrapping up the current clang code in a friendlier interface only goes so far.

 

-Eli

 

From: llvm-dev <[hidden email]> On Behalf Of James Y Knight via llvm-dev
Sent: Wednesday, June 3, 2020 9:54 PM
To: Chris Lattner <[hidden email]>
Cc: [hidden email]; cfe-dev <[hidden email]>; [hidden email]
Subject: [EXT] Re: [llvm-dev] [cfe-dev] Clang/LLVM function ABI lowering (was: Re: [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM)

 

While MLIR may be one part of the solution, I think it's also the case that the function-ABI interface between Clang and LLVM is just wrong and should be fixed -- independently of whether Clang might use MLIR in the future.

 

I've mentioned this idea before, I think, but never got around to writing up a real proposal. And I still haven't. Maybe this email could inspire someone else to work on that.

 

Essentially, I'd like to see the code in Clang responsible for function parameter-type mangling as part of its ABI lowering deleted. Currently, there is a secret "LLVM IR" ABI used between Clang and LLVM, which involves expanding some arguments into multiple arguments, adding a smattering of "inreg" or "byval" attributes, and converting some types into other types. All in a completely target-dependent, complex, and undocumented manner.

 

So, while the IR function syntax appears at first glance to be generic and target-independent, that's not at all true. Sadly, in some cases, clang must even know how many registers different calling conventions use, and count numbers of available registers left, in order to choose the right set of those "generic" attributes to put on a parameter.

 

So: not only does a frontend need to understand the C ABI rules, they also need to understand that complex dance for how to convert that into LLVM IR -- and that's both completely undocumented, and a huge mess.

 

Instead, I believe clang should always pass function parameters in a "naive" fashion. E.g. if a parameter type is "struct X", the llvm function should be lowered to LLVM IR with a function parameter of type %struct.X. The decision on whether to then pass that in a register (or multiple registers), on the stack, padded and then passed on the stack, etc, should be the responsibility of LLVM. Only in the case of C++ types which must be passed indirectly for correctness, independent of calling convention ABI, should clang be explicitly making the decision to pass indirectly.

 

Of course, the tricky part is that LLVM doesn't -- and shouldn't -- have the full C type system available to it, and the full C type system typically is required to evaluate the ABI rules (e.g., distinguishing a "_Complex float" from a struct containing two floats).

 

Therefore, in order to communicate the correct ABI information to LLVM, I'd like clang to also emit explicitly-ABI-specific data (metadata?), reflecting the extra information that the ABI rules require the backend to know about the type. E.g., for X86_64, clang needs to inform LLVM of the classification for each parameter's type into MEMORY, INTEGER, SSE, SSEUP, X87, X87UP, COMPLEX_X87. Or, for PPC64 elfv2, Clang needs to inform LLVM when a structure should be treated as a "homogenous aggregate" of floating-point or vector type. (In both cases, that information cannot correctly be extracted from the LLVM IR struct type, only from the C type system.)

 

We should document what data is needed, for each architecture/abi. This required data should be as straightforward an application of the ABI document's rules as possible -- and be only the minimum data necessary.

 

If this is done, frontends (either a new one, or Clang itself) who want to use the C ABI have a significantly simpler task. It remains non-trivial -- you do still need to understand ABI-specific rules, and write ABI-specific code to generate ABI-specific metadata. But, at least the interface boundary has become something which is readily-understandable and implementable based on the ABI documents.

 

All that said, an MLIR encoding of the C type system can still be useful -- it could contain the code which distills the C types into the ABI-specific metadata. But, I  see that as less important than getting the fundamentals in LLVM-IR into a better shape. Even frontends without a C type system representation should still be able to generate LLVM IR which conforms in their own manner to the documented ABIs -- without it being super painful. Also, the code in Clang now is really confusing, and nearly unmaintainable; it would be a clear improvement to be able to eliminate the majority of it, not just move it into an MLIR dialect.

 

On Wed, Jun 3, 2020 at 7:26 PM Chris Lattner via cfe-dev <[hidden email]> wrote:

On Jun 2, 2020, at 4:21 PM, comex via cfe-dev <[hidden email]> wrote:

 

While this is a different area of the codebase, another thing that
would benefit greatly from being moved out of Clang is function call
ABI handling.  Currently, that handling is split awkwardly between
Clang and LLVM proper, forcing frontends that implement C FFI to
either recreate the Clang parts themselves (like Rust does), depend on
Clang (like Swift does), or live with FFI just not working with some
function signatures.  I'm not sure what Flang currently does, but my
understanding is that Flang does support C FFI, so it would probably
benefit from this as well.  Just something to consider. :)

 

For what its worth, I think there is a pretty clear path on this, but it hinges on Clang moving to MLIR as its code generation backend (an intermediary to generating LLVM IR).

 

The approach is to factor the ABI lower part of clang out of Clang itself into a specific dialect lowering pass, that works on a generic C type system (plus callout to extended type systems).  MLIR has all the infra to support this, it is just a massive job to refactor all the things to change clang’s architecture.

 

I also don’t think there is broad consensus on the direction for Clang here, but given that Flang is already using MLIR for this, maybe it would make sense to start work there.

 

If you’re curious, I co-delivered a talk about this recently, the slides are available here.

 

-Chris

 

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
123