[RFC] AlwaysInline codegen

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[RFC] AlwaysInline codegen

Bakhvalov, Denis via cfe-dev
Hi,

There is a problem with the handling of alwaysinline functions in
Clang: they are not always inlined. AFAIK, this may only happen when
the caller is in the dead code, but then we don't always successfully
remove all dead code.

Because of this, we may end up emitting an undefined reference for an
"inline __attribute__((always_inline))" function. Libc++ relies on the
compiler never doing that - it has lots of functions in the headers
marked this way and does _not_ export them from libc++.so.

Current implementation in clang emits alwaysinline+inline functions as
available_externally definitions. The inliner is an SCC pass, and as
such it does not process unreachable functions at all. This means that
AlwaysInliner may leave some alwaysinline functions not inlined. If
such function has an available_externally linkage, it is not emitted
into the binary, and all calls to it are emitted as undefined symbol
references.

Some time ago I've made an attempt to add a DCE pass before the
AlwaysInliner pass to fix this. Thst
 (a) caused a big churn in the existing tests
 (b) must be done at -O0 as well, which is probably undesirable and
could inflate compilation time
 (c) feels like patching over a bigger problem.

The following, better, idea was suggested by Chandler Carruth and Richard Smith.

Instead of emitting an available_externally definition for an
alwaysinline function, we emit a pair of
 1. internal alwaysinline definition (let's call it F.inlinefunction -
it demangles nicely)
 2a. A stub F() { musttail call F.inlinefunction }
  -- or --
 2b. A declaration of F.

The frontend ensures that F.inlinefunction is only used for direct
calls, and the stub is used for everything else (taking the address of
the function, really). Declaration (2b) is emitted in the case when
"inline" is meant for inlining only (like __gnu_inline__ and some
other cases).

This way has a number of useful properties that are easy to enforce.
1. alwaysinline functions are always internal
2. AlwaysInliner can be split from normal inliner; it would be a super
simple implementation that would ensure that there are no alwaysinline
functions remaining after it's done.
3. alwaysinline functions must never reach backend and can be rejected
before machine code generation (in SelectionDAG?).

As this changes the semantics of alwaysinline attribute in the IR, we
would need to reserve a new attribute ID, rename the current ID to
legacy_alwaysinline or something similar and only enforce the above
properties on the new attribute.

There is a proposed Clang patch here: http://reviews.llvm.org/D12087
The patch only implements the Clang codegen part of the proposal and
does not do any of the backend improvements.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] AlwaysInline codegen

Bakhvalov, Denis via cfe-dev


On Thu, Aug 20, 2015 at 5:19 PM, Evgenii Stepanov via llvm-dev <[hidden email]> wrote:
Hi,

There is a problem with the handling of alwaysinline functions in
Clang: they are not always inlined. AFAIK, this may only happen when
the caller is in the dead code, but then we don't always successfully
remove all dead code.

Because of this, we may end up emitting an undefined reference

Why would the failure to inline produced an undefined reference? Wouldn't we just keep the body around? The following example seems to demonstrate this

void f1();
inline __attribute__((always_inline)) void f2() {
  f1();
}
void (*f3())() {
  return f2;
}
 
for an
"inline __attribute__((always_inline))" function. Libc++ relies on the
compiler never doing that - it has lots of functions in the headers
marked this way and does _not_ export them from libc++.so.

Current implementation in clang emits alwaysinline+inline functions as
available_externally definitions.

I heard this mentioned in another thread - and I'm not seeing that behavior. Is there an extra step missing necessary to produce that behavior?
 
The inliner is an SCC pass, and as
such it does not process unreachable functions at all. This means that
AlwaysInliner may leave some alwaysinline functions not inlined. If
such function has an available_externally linkage, it is not emitted
into the binary, and all calls to it are emitted as undefined symbol
references.

Some time ago I've made an attempt to add a DCE pass before the
AlwaysInliner pass to fix this. Thst
 (a) caused a big churn in the existing tests
 (b) must be done at -O0 as well, which is probably undesirable and
could inflate compilation time
 (c) feels like patching over a bigger problem.

The following, better, idea was suggested by Chandler Carruth and Richard Smith.

Instead of emitting an available_externally definition for an
alwaysinline function, we emit a pair of
 1. internal alwaysinline definition (let's call it F.inlinefunction -
it demangles nicely)
 2a. A stub F() { musttail call F.inlinefunction }
  -- or --
 2b. A declaration of F.

The frontend ensures that F.inlinefunction is only used for direct
calls, and the stub is used for everything else (taking the address of
the function, really). Declaration (2b) is emitted in the case when
"inline" is meant for inlining only (like __gnu_inline__ and some
other cases).

This way has a number of useful properties that are easy to enforce.
1. alwaysinline functions are always internal
2. AlwaysInliner can be split from normal inliner; it would be a super
simple implementation that would ensure that there are no alwaysinline
functions remaining after it's done.
3. alwaysinline functions must never reach backend and can be rejected
before machine code generation (in SelectionDAG?).

As this changes the semantics of alwaysinline attribute in the IR, we
would need to reserve a new attribute ID, rename the current ID to
legacy_alwaysinline or something similar and only enforce the above
properties on the new attribute.

There is a proposed Clang patch here: http://reviews.llvm.org/D12087
The patch only implements the Clang codegen part of the proposal and
does not do any of the backend improvements.
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] AlwaysInline codegen

Bakhvalov, Denis via cfe-dev
On Thu, Aug 20, 2015 at 5:36 PM, David Blaikie <[hidden email]> wrote:


On Thu, Aug 20, 2015 at 5:19 PM, Evgenii Stepanov via llvm-dev <[hidden email]> wrote:
Hi,

There is a problem with the handling of alwaysinline functions in
Clang: they are not always inlined. AFAIK, this may only happen when
the caller is in the dead code, but then we don't always successfully
remove all dead code.

Because of this, we may end up emitting an undefined reference

Why would the failure to inline produced an undefined reference? Wouldn't we just keep the body around?

No, if it's available_externally we'll discard it before we emit code.
 
The following example seems to demonstrate this

void f1();
inline __attribute__((always_inline)) void f2() {
Try:

extern inline __attribute__((gnu_inline, always_inline)) void f2() { 
  f1();
}
void (*f3())() {
  return f2;
}
 
for an
"inline __attribute__((always_inline))" function. Libc++ relies on the
compiler never doing that - it has lots of functions in the headers
marked this way and does _not_ export them from libc++.so.

Current implementation in clang emits alwaysinline+inline functions as
available_externally definitions.

I heard this mentioned in another thread - and I'm not seeing that behavior. Is there an extra step missing necessary to produce that behavior?

You need to first arrange a situation where you'd get an available_externally definition without the always_inline.
 
The inliner is an SCC pass, and as
such it does not process unreachable functions at all. This means that
AlwaysInliner may leave some alwaysinline functions not inlined. If
such function has an available_externally linkage, it is not emitted
into the binary, and all calls to it are emitted as undefined symbol
references.

Some time ago I've made an attempt to add a DCE pass before the
AlwaysInliner pass to fix this. Thst
 (a) caused a big churn in the existing tests
 (b) must be done at -O0 as well, which is probably undesirable and
could inflate compilation time
 (c) feels like patching over a bigger problem.

The following, better, idea was suggested by Chandler Carruth and Richard Smith.

Instead of emitting an available_externally definition for an
alwaysinline function, we emit a pair of
 1. internal alwaysinline definition (let's call it F.inlinefunction -
it demangles nicely)
 2a. A stub F() { musttail call F.inlinefunction }
  -- or --
 2b. A declaration of F.

The frontend ensures that F.inlinefunction is only used for direct
calls, and the stub is used for everything else (taking the address of
the function, really). Declaration (2b) is emitted in the case when
"inline" is meant for inlining only (like __gnu_inline__ and some
other cases).

This way has a number of useful properties that are easy to enforce.
1. alwaysinline functions are always internal
2. AlwaysInliner can be split from normal inliner; it would be a super
simple implementation that would ensure that there are no alwaysinline
functions remaining after it's done.
3. alwaysinline functions must never reach backend and can be rejected
before machine code generation (in SelectionDAG?).

As this changes the semantics of alwaysinline attribute in the IR, we
would need to reserve a new attribute ID, rename the current ID to
legacy_alwaysinline or something similar and only enforce the above
properties on the new attribute.

There is a proposed Clang patch here: http://reviews.llvm.org/D12087
The patch only implements the Clang codegen part of the proposal and
does not do any of the backend improvements.
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] AlwaysInline codegen

Bakhvalov, Denis via cfe-dev


On Thu, Aug 20, 2015 at 5:53 PM, Richard Smith <[hidden email]> wrote:
On Thu, Aug 20, 2015 at 5:36 PM, David Blaikie <[hidden email]> wrote:


On Thu, Aug 20, 2015 at 5:19 PM, Evgenii Stepanov via llvm-dev <[hidden email]> wrote:
Hi,

There is a problem with the handling of alwaysinline functions in
Clang: they are not always inlined. AFAIK, this may only happen when
the caller is in the dead code, but then we don't always successfully
remove all dead code.

Because of this, we may end up emitting an undefined reference

Why would the failure to inline produced an undefined reference? Wouldn't we just keep the body around?

No, if it's available_externally we'll discard it before we emit code.
 
The following example seems to demonstrate this

void f1();
inline __attribute__((always_inline)) void f2() {
Try:

extern inline __attribute__((gnu_inline, always_inline)) void f2() { 

Ah, thanks - I didn't know that was possible in C++. I knew that the weird C inline semantics had such a thing.

Good(ish) to know. Thanks!
  f1();
}
void (*f3())() {
  return f2;
}
 
for an
"inline __attribute__((always_inline))" function. Libc++ relies on the
compiler never doing that - it has lots of functions in the headers
marked this way and does _not_ export them from libc++.so.

Current implementation in clang emits alwaysinline+inline functions as
available_externally definitions.

I heard this mentioned in another thread - and I'm not seeing that behavior. Is there an extra step missing necessary to produce that behavior?

You need to first arrange a situation where you'd get an available_externally definition without the always_inline. 
 
The inliner is an SCC pass, and as
such it does not process unreachable functions at all. This means that
AlwaysInliner may leave some alwaysinline functions not inlined. If
such function has an available_externally linkage, it is not emitted
into the binary, and all calls to it are emitted as undefined symbol
references.

Some time ago I've made an attempt to add a DCE pass before the
AlwaysInliner pass to fix this. Thst
 (a) caused a big churn in the existing tests
 (b) must be done at -O0 as well, which is probably undesirable and
could inflate compilation time
 (c) feels like patching over a bigger problem.

The following, better, idea was suggested by Chandler Carruth and Richard Smith.

Instead of emitting an available_externally definition for an
alwaysinline function, we emit a pair of
 1. internal alwaysinline definition (let's call it F.inlinefunction -
it demangles nicely)
 2a. A stub F() { musttail call F.inlinefunction }
  -- or --
 2b. A declaration of F.

The frontend ensures that F.inlinefunction is only used for direct
calls, and the stub is used for everything else (taking the address of
the function, really). Declaration (2b) is emitted in the case when
"inline" is meant for inlining only (like __gnu_inline__ and some
other cases).

This way has a number of useful properties that are easy to enforce.
1. alwaysinline functions are always internal
2. AlwaysInliner can be split from normal inliner; it would be a super
simple implementation that would ensure that there are no alwaysinline
functions remaining after it's done.
3. alwaysinline functions must never reach backend and can be rejected
before machine code generation (in SelectionDAG?).

As this changes the semantics of alwaysinline attribute in the IR, we
would need to reserve a new attribute ID, rename the current ID to
legacy_alwaysinline or something similar and only enforce the above
properties on the new attribute.

There is a proposed Clang patch here: http://reviews.llvm.org/D12087
The patch only implements the Clang codegen part of the proposal and
does not do any of the backend improvements.
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev




_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] AlwaysInline codegen

Bakhvalov, Denis via cfe-dev
In reply to this post by Bakhvalov, Denis via cfe-dev
> On Aug 20, 2015, at 5:19 PM, Evgenii Stepanov via cfe-dev <[hidden email]> wrote:
> Hi,
>
> There is a problem with the handling of alwaysinline functions in
> Clang: they are not always inlined. AFAIK, this may only happen when
> the caller is in the dead code, but then we don't always successfully
> remove all dead code.
>
> Because of this, we may end up emitting an undefined reference for an
> "inline __attribute__((always_inline))" function. Libc++ relies on the
> compiler never doing that - it has lots of functions in the headers
> marked this way and does _not_ export them from libc++.so.
>
> Current implementation in clang emits alwaysinline+inline functions as
> available_externally definitions. The inliner is an SCC pass, and as
> such it does not process unreachable functions at all. This means that
> AlwaysInliner may leave some alwaysinline functions not inlined. If
> such function has an available_externally linkage, it is not emitted
> into the binary, and all calls to it are emitted as undefined symbol
> references.
>
> Some time ago I've made an attempt to add a DCE pass before the
> AlwaysInliner pass to fix this. Thst
> (a) caused a big churn in the existing tests
> (b) must be done at -O0 as well, which is probably undesirable and
> could inflate compilation time
> (c) feels like patching over a bigger problem.
>
> The following, better, idea was suggested by Chandler Carruth and Richard Smith.
>
> Instead of emitting an available_externally definition for an
> alwaysinline function, we emit a pair of
> 1. internal alwaysinline definition (let's call it F.inlinefunction -
> it demangles nicely)
> 2a. A stub F() { musttail call F.inlinefunction }
>  -- or --
> 2b. A declaration of F.

I have no idea why always_inline function definitions are being marked as
available_externally.  There is zero reason to think that they’re actually
available externally, and there’s zero benefit to the optimizer from this
information because inlining is forced anyway, so this lie is both terrible
and pointless.

I don’t understand the goal of this stub idea.  We should just emit always_inline
function definitions normally, but capping their linkage to hidden+linkonce_odr
if not already internal.  If the AlwaysInliner fails to inline a dead call, fine, we’ll
just emit a dead function body, too.  (I mean, this seems like inexcusable
backend behavior to me, but whatever.)

John.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] AlwaysInline codegen

Bakhvalov, Denis via cfe-dev
On Thu, Aug 20, 2015 at 7:17 PM, John McCall <[hidden email]> wrote:
> On Aug 20, 2015, at 5:19 PM, Evgenii Stepanov via cfe-dev <[hidden email]> wrote:
> Hi,
>
> There is a problem with the handling of alwaysinline functions in
> Clang: they are not always inlined. AFAIK, this may only happen when
> the caller is in the dead code, but then we don't always successfully
> remove all dead code.
>
> Because of this, we may end up emitting an undefined reference for an
> "inline __attribute__((always_inline))" function. Libc++ relies on the
> compiler never doing that - it has lots of functions in the headers
> marked this way and does _not_ export them from libc++.so.
>
> Current implementation in clang emits alwaysinline+inline functions as
> available_externally definitions. The inliner is an SCC pass, and as
> such it does not process unreachable functions at all. This means that
> AlwaysInliner may leave some alwaysinline functions not inlined. If
> such function has an available_externally linkage, it is not emitted
> into the binary, and all calls to it are emitted as undefined symbol
> references.
>
> Some time ago I've made an attempt to add a DCE pass before the
> AlwaysInliner pass to fix this. Thst
> (a) caused a big churn in the existing tests
> (b) must be done at -O0 as well, which is probably undesirable and
> could inflate compilation time
> (c) feels like patching over a bigger problem.
>
> The following, better, idea was suggested by Chandler Carruth and Richard Smith.
>
> Instead of emitting an available_externally definition for an
> alwaysinline function, we emit a pair of
> 1. internal alwaysinline definition (let's call it F.inlinefunction -
> it demangles nicely)
> 2a. A stub F() { musttail call F.inlinefunction }
>  -- or --
> 2b. A declaration of F.

I have no idea why always_inline function definitions are being marked as
available_externally.  There is zero reason to think that they’re actually
available externally, and there’s zero benefit to the optimizer from this
information because inlining is forced anyway, so this lie is both terrible
and pointless.

The interesting case is a declaration like this:

  extern inline __attribute__((gnu_inline, always_inline)) void f() { /*...*/ }

These are common in the glibc headers, and they mean:

1) This is a body just for inlining, and there is a strong definition provided elsewhere (gnu_inline + extern inline), probably with an entirely different body.
2) If you see a direct call to this function, you must inline it.

The intention appears to be to create a function definition that is only used for inlining, and is never itself emitted. Taking the address of the function should give the address of the strong definition from elsewhere, not the address of this local definition of f.

So, if we only want a single IR-level function for this case, it must be available_externally (for (1)), and must be always_inline (for (2)). But that results in the problem that Evgeniy reported.

The other natural approach seems to be to say that there are really two functions involved: the always_inline function, and the external definition, which leads to the design being suggested, at least for this case.

I don’t understand the goal of this stub idea.  We should just emit always_inline
function definitions normally, but capping their linkage to hidden+linkonce_odr
if not already internal.  If the AlwaysInliner fails to inline a dead call, fine, we’ll
just emit a dead function body, too.  (I mean, this seems like inexcusable
backend behavior to me, but whatever.)

The goal is to make always_inline really mean "always inline" at the IR level. We would guarantee that by making it a verifier failure to have an always_inline function that is not private or internal, or to use an always_inline function anywhere other than as a direct callee of a call or invoke. We then recover the expected frontend-level semantics by emitting a non-always-inline stub for an address-taken or non-internal function declared with __attribute__((always_inline)).

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] AlwaysInline codegen

Bakhvalov, Denis via cfe-dev

On Aug 20, 2015, at 7:36 PM, Richard Smith <[hidden email]> wrote:

On Thu, Aug 20, 2015 at 7:17 PM, John McCall <[hidden email]> wrote:
> On Aug 20, 2015, at 5:19 PM, Evgenii Stepanov via cfe-dev <[hidden email]> wrote:
> Hi,
>
> There is a problem with the handling of alwaysinline functions in
> Clang: they are not always inlined. AFAIK, this may only happen when
> the caller is in the dead code, but then we don't always successfully
> remove all dead code.
>
> Because of this, we may end up emitting an undefined reference for an
> "inline __attribute__((always_inline))" function. Libc++ relies on the
> compiler never doing that - it has lots of functions in the headers
> marked this way and does _not_ export them from libc++.so.
>
> Current implementation in clang emits alwaysinline+inline functions as
> available_externally definitions. The inliner is an SCC pass, and as
> such it does not process unreachable functions at all. This means that
> AlwaysInliner may leave some alwaysinline functions not inlined. If
> such function has an available_externally linkage, it is not emitted
> into the binary, and all calls to it are emitted as undefined symbol
> references.
>
> Some time ago I've made an attempt to add a DCE pass before the
> AlwaysInliner pass to fix this. Thst
> (a) caused a big churn in the existing tests
> (b) must be done at -O0 as well, which is probably undesirable and
> could inflate compilation time
> (c) feels like patching over a bigger problem.
>
> The following, better, idea was suggested by Chandler Carruth and Richard Smith.
>
> Instead of emitting an available_externally definition for an
> alwaysinline function, we emit a pair of
> 1. internal alwaysinline definition (let's call it F.inlinefunction -
> it demangles nicely)
> 2a. A stub F() { musttail call F.inlinefunction }
>  -- or --
> 2b. A declaration of F.

I have no idea why always_inline function definitions are being marked as
available_externally.  There is zero reason to think that they’re actually
available externally, and there’s zero benefit to the optimizer from this
information because inlining is forced anyway, so this lie is both terrible
and pointless.

The interesting case is a declaration like this:

  extern inline __attribute__((gnu_inline, always_inline)) void f() { /*...*/ }

These are common in the glibc headers, and they mean:

1) This is a body just for inlining, and there is a strong definition provided elsewhere (gnu_inline + extern inline), probably with an entirely different body.
2) If you see a direct call to this function, you must inline it.

The intention appears to be to create a function definition that is only used for inlining, and is never itself emitted. Taking the address of the function should give the address of the strong definition from elsewhere, not the address of this local definition of f.

So, if we only want a single IR-level function for this case, it must be available_externally (for (1)), and must be always_inline (for (2)). But that results in the problem that Evgeniy reported.

How so?  You have a call that doesn’t get inlined for whatever reason, and it resolves against an external symbol that’s guaranteed to exist.  Sub-optimal, but easily fixable by DCE'ing the dead call.  I don’t see how this fails to link unless, what, the program is using the libc headers and not linking against libc?  Do we really need to fall over ourselves to support that?

Evgeniy seemed to be describing a situation where the external symbol *wasn’t* guaranteed to exist; but in that case, gnu_inline is a clear lie.

John.

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] AlwaysInline codegen

Bakhvalov, Denis via cfe-dev
On Fri, Aug 21, 2015 at 1:23 PM, John McCall <[hidden email]> wrote:
On Aug 20, 2015, at 7:36 PM, Richard Smith <[hidden email]> wrote:

On Thu, Aug 20, 2015 at 7:17 PM, John McCall <[hidden email]> wrote:
> On Aug 20, 2015, at 5:19 PM, Evgenii Stepanov via cfe-dev <[hidden email]> wrote:
> Hi,
>
> There is a problem with the handling of alwaysinline functions in
> Clang: they are not always inlined. AFAIK, this may only happen when
> the caller is in the dead code, but then we don't always successfully
> remove all dead code.
>
> Because of this, we may end up emitting an undefined reference for an
> "inline __attribute__((always_inline))" function. Libc++ relies on the
> compiler never doing that - it has lots of functions in the headers
> marked this way and does _not_ export them from libc++.so.
>
> Current implementation in clang emits alwaysinline+inline functions as
> available_externally definitions. The inliner is an SCC pass, and as
> such it does not process unreachable functions at all. This means that
> AlwaysInliner may leave some alwaysinline functions not inlined. If
> such function has an available_externally linkage, it is not emitted
> into the binary, and all calls to it are emitted as undefined symbol
> references.
>
> Some time ago I've made an attempt to add a DCE pass before the
> AlwaysInliner pass to fix this. Thst
> (a) caused a big churn in the existing tests
> (b) must be done at -O0 as well, which is probably undesirable and
> could inflate compilation time
> (c) feels like patching over a bigger problem.
>
> The following, better, idea was suggested by Chandler Carruth and Richard Smith.
>
> Instead of emitting an available_externally definition for an
> alwaysinline function, we emit a pair of
> 1. internal alwaysinline definition (let's call it F.inlinefunction -
> it demangles nicely)
> 2a. A stub F() { musttail call F.inlinefunction }
>  -- or --
> 2b. A declaration of F.

I have no idea why always_inline function definitions are being marked as
available_externally.  There is zero reason to think that they’re actually
available externally, and there’s zero benefit to the optimizer from this
information because inlining is forced anyway, so this lie is both terrible
and pointless.

The interesting case is a declaration like this:

  extern inline __attribute__((gnu_inline, always_inline)) void f() { /*...*/ }

These are common in the glibc headers, and they mean:

1) This is a body just for inlining, and there is a strong definition provided elsewhere (gnu_inline + extern inline), probably with an entirely different body.
2) If you see a direct call to this function, you must inline it.

The intention appears to be to create a function definition that is only used for inlining, and is never itself emitted. Taking the address of the function should give the address of the strong definition from elsewhere, not the address of this local definition of f.

So, if we only want a single IR-level function for this case, it must be available_externally (for (1)), and must be always_inline (for (2)). But that results in the problem that Evgeniy reported.

How so?  You have a call that doesn’t get inlined for whatever reason, and it resolves against an external symbol that’s guaranteed to exist.  Sub-optimal, but easily fixable by DCE'ing the dead call.  I don’t see how this fails to link unless, what, the program is using the libc headers and not linking against libc?  Do we really need to fall over ourselves to support that?

Evgeniy seemed to be describing a situation where the external symbol *wasn’t* guaranteed to exist; but in that case, gnu_inline is a clear lie.

The combination of flags above implies that if all calls are direct calls, then no external definition is required. Some headers rely on that, such as clang's own <htmxlintrin.h>.

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] AlwaysInline codegen

Bakhvalov, Denis via cfe-dev
Are there any concerns remaining?
I'd like to go ahead with http://reviews.llvm.org/D12087

On Fri, Aug 21, 2015 at 3:40 PM, Richard Smith <[hidden email]> wrote:

> On Fri, Aug 21, 2015 at 1:23 PM, John McCall <[hidden email]> wrote:
>>
>> On Aug 20, 2015, at 7:36 PM, Richard Smith <[hidden email]> wrote:
>>
>> On Thu, Aug 20, 2015 at 7:17 PM, John McCall <[hidden email]> wrote:
>>>
>>> > On Aug 20, 2015, at 5:19 PM, Evgenii Stepanov via cfe-dev
>>> > <[hidden email]> wrote:
>>> > Hi,
>>> >
>>> > There is a problem with the handling of alwaysinline functions in
>>> > Clang: they are not always inlined. AFAIK, this may only happen when
>>> > the caller is in the dead code, but then we don't always successfully
>>> > remove all dead code.
>>> >
>>> > Because of this, we may end up emitting an undefined reference for an
>>> > "inline __attribute__((always_inline))" function. Libc++ relies on the
>>> > compiler never doing that - it has lots of functions in the headers
>>> > marked this way and does _not_ export them from libc++.so.
>>> >
>>> > Current implementation in clang emits alwaysinline+inline functions as
>>> > available_externally definitions. The inliner is an SCC pass, and as
>>> > such it does not process unreachable functions at all. This means that
>>> > AlwaysInliner may leave some alwaysinline functions not inlined. If
>>> > such function has an available_externally linkage, it is not emitted
>>> > into the binary, and all calls to it are emitted as undefined symbol
>>> > references.
>>> >
>>> > Some time ago I've made an attempt to add a DCE pass before the
>>> > AlwaysInliner pass to fix this. Thst
>>> > (a) caused a big churn in the existing tests
>>> > (b) must be done at -O0 as well, which is probably undesirable and
>>> > could inflate compilation time
>>> > (c) feels like patching over a bigger problem.
>>> >
>>> > The following, better, idea was suggested by Chandler Carruth and
>>> > Richard Smith.
>>> >
>>> > Instead of emitting an available_externally definition for an
>>> > alwaysinline function, we emit a pair of
>>> > 1. internal alwaysinline definition (let's call it F.inlinefunction -
>>> > it demangles nicely)
>>> > 2a. A stub F() { musttail call F.inlinefunction }
>>> >  -- or --
>>> > 2b. A declaration of F.
>>>
>>> I have no idea why always_inline function definitions are being marked as
>>> available_externally.  There is zero reason to think that they’re
>>> actually
>>> available externally, and there’s zero benefit to the optimizer from this
>>> information because inlining is forced anyway, so this lie is both
>>> terrible
>>> and pointless.
>>
>>
>> The interesting case is a declaration like this:
>>
>>   extern inline __attribute__((gnu_inline, always_inline)) void f() {
>> /*...*/ }
>>
>> These are common in the glibc headers, and they mean:
>>
>> 1) This is a body just for inlining, and there is a strong definition
>> provided elsewhere (gnu_inline + extern inline), probably with an entirely
>> different body.
>> 2) If you see a direct call to this function, you must inline it.
>>
>> The intention appears to be to create a function definition that is only
>> used for inlining, and is never itself emitted. Taking the address of the
>> function should give the address of the strong definition from elsewhere,
>> not the address of this local definition of f.
>>
>> So, if we only want a single IR-level function for this case, it must be
>> available_externally (for (1)), and must be always_inline (for (2)). But
>> that results in the problem that Evgeniy reported.
>>
>>
>> How so?  You have a call that doesn’t get inlined for whatever reason, and
>> it resolves against an external symbol that’s guaranteed to exist.
>> Sub-optimal, but easily fixable by DCE'ing the dead call.  I don’t see how
>> this fails to link unless, what, the program is using the libc headers and
>> not linking against libc?  Do we really need to fall over ourselves to
>> support that?
>>
>> Evgeniy seemed to be describing a situation where the external symbol
>> *wasn’t* guaranteed to exist; but in that case, gnu_inline is a clear lie.
>
>
> The combination of flags above implies that if all calls are direct calls,
> then no external definition is required. Some headers rely on that, such as
> clang's own <htmxlintrin.h>.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev