Zero length function pointer equality

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Zero length function pointer equality

Dimitry Andric via cfe-dev
LLVM can produce zero length functions from cases like this (when
optimizations are enabled):

void f1() { __builtin_unreachable(); }
int f2() { /* missing return statement */ }

This code is valid, so long as the functions are never called.

I believe C++ requires that all functions have a distinct address (ie:
&f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
gets optimized into an unconditional assertion failure)

But these zero length functions can end up with identical addresses.

I'm unaware of anything in the C++ spec (or the LLVM langref) that
would indicate that would allow distinct functions to have identical
addresses - so should we do something about this in the LLVM backend?
add a little padding? a nop instruction? (if we're adding an
instruction anyway, perhaps we might as well make it an int3?)

(I came across this due to DWARF issues with zero length functions &
thinking about if/how this should be supported)
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Zero length function pointer equality

Dimitry Andric via cfe-dev
On Thu, 23 Jul 2020 at 17:46, David Blaikie <[hidden email]> wrote:
LLVM can produce zero length functions from cases like this (when
optimizations are enabled):

void f1() { __builtin_unreachable(); }
int f2() { /* missing return statement */ }

This code is valid, so long as the functions are never called.

I believe C++ requires that all functions have a distinct address (ie:
&f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
gets optimized into an unconditional assertion failure)

But these zero length functions can end up with identical addresses.

I'm unaware of anything in the C++ spec (or the LLVM langref) that
would indicate that would allow distinct functions to have identical
addresses - so should we do something about this in the LLVM backend?
add a little padding? a nop instruction? (if we're adding an
instruction anyway, perhaps we might as well make it an int3?)

(I came across this due to DWARF issues with zero length functions &
thinking about if/how this should be supported)

Yes, I think at least if the optimizer turns a non-empty function into an empty function, that's a miscompile for C and C++ source-language programs. My (possibly flawed) understanding is that LLVM is obliged to give a different address to distinct globals if neither of them is marked unnamed_addr, so it seems to me that this is a backend bug. Generating a ud2 function body in this case seems ideal to me.

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Zero length function pointer equality

Dimitry Andric via cfe-dev
On Thu, Jul 23, 2020 at 7:17 PM Richard Smith <[hidden email]> wrote:

>
> On Thu, 23 Jul 2020 at 17:46, David Blaikie <[hidden email]> wrote:
>>
>> LLVM can produce zero length functions from cases like this (when
>> optimizations are enabled):
>>
>> void f1() { __builtin_unreachable(); }
>> int f2() { /* missing return statement */ }
>>
>> This code is valid, so long as the functions are never called.
>>
>> I believe C++ requires that all functions have a distinct address (ie:
>> &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
>> gets optimized into an unconditional assertion failure)
>>
>> But these zero length functions can end up with identical addresses.
>>
>> I'm unaware of anything in the C++ spec (or the LLVM langref) that
>> would indicate that would allow distinct functions to have identical
>> addresses - so should we do something about this in the LLVM backend?
>> add a little padding? a nop instruction? (if we're adding an
>> instruction anyway, perhaps we might as well make it an int3?)
>>
>> (I came across this due to DWARF issues with zero length functions &
>> thinking about if/how this should be supported)
>
>
> Yes, I think at least if the optimizer turns a non-empty function into an empty function,

What about functions that are already empty? (well, I guess at the
LLVM IR level, no function can be empty, because every basic block
must end in some terminator instruction - is that the distinction
you're drawing?)

> that's a miscompile for C and C++ source-language programs. My (possibly flawed) understanding is that LLVM is obliged to give a different address to distinct globals if neither of them is marked unnamed_addr,

It seems like other LLVM passes make this assumption too - which is
how "f1 == f2" can be folded to a constant false. I haven't checked to
see exactly where that constant folding happens. (hmm, looks like it
happens in some constant folding utility - happens in the inliner if
there's inlining, happens at IR generation if there's no function
indirection, etc)

> so it seems to me that this is a backend bug. Generating a ud2 function body in this case seems ideal to me.

Guess that still leaves the possibility of the last function in an
object file as being zero-length? (or I guess not, because otherwise
when linked it could still end up with the same address as the
function that comes after it)
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Zero length function pointer equality

Dimitry Andric via cfe-dev
On Thu, 23 Jul 2020 at 20:28, David Blaikie <[hidden email]> wrote:
On Thu, Jul 23, 2020 at 7:17 PM Richard Smith <[hidden email]> wrote:
>
> On Thu, 23 Jul 2020 at 17:46, David Blaikie <[hidden email]> wrote:
>>
>> LLVM can produce zero length functions from cases like this (when
>> optimizations are enabled):
>>
>> void f1() { __builtin_unreachable(); }
>> int f2() { /* missing return statement */ }
>>
>> This code is valid, so long as the functions are never called.
>>
>> I believe C++ requires that all functions have a distinct address (ie:
>> &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
>> gets optimized into an unconditional assertion failure)
>>
>> But these zero length functions can end up with identical addresses.
>>
>> I'm unaware of anything in the C++ spec (or the LLVM langref) that
>> would indicate that would allow distinct functions to have identical
>> addresses - so should we do something about this in the LLVM backend?
>> add a little padding? a nop instruction? (if we're adding an
>> instruction anyway, perhaps we might as well make it an int3?)
>>
>> (I came across this due to DWARF issues with zero length functions &
>> thinking about if/how this should be supported)
>
>
> Yes, I think at least if the optimizer turns a non-empty function into an empty function,

What about functions that are already empty? (well, I guess at the
LLVM IR level, no function can be empty, because every basic block
must end in some terminator instruction - is that the distinction
you're drawing?)

Here's what I was thinking: a case could be made that the frontend is responsible for making sure that functions don't start non-empty, in much the same way that if the frontend produces a global of zero size, it gets what it asked for.
But you're right, there really isn't such a thing as an empty function at the IR level, because there's always an entry block and it always has a terminator.
 
> that's a miscompile for C and C++ source-language programs. My (possibly flawed) understanding is that LLVM is obliged to give a different address to distinct globals if neither of them is marked unnamed_addr,

It seems like other LLVM passes make this assumption too - which is
how "f1 == f2" can be folded to a constant false. I haven't checked to
see exactly where that constant folding happens. (hmm, looks like it
happens in some constant folding utility - happens in the inliner if
there's inlining, happens at IR generation if there's no function
indirection, etc)

> so it seems to me that this is a backend bug. Generating a ud2 function body in this case seems ideal to me.

Guess that still leaves the possibility of the last function in an
object file as being zero-length? (or I guess not, because otherwise
when linked it could still end up with the same address as the
function that comes after it)

Yes, I think that's right. We should never put a non-unnamed_addr global at the end of a section because we don't know if it will share an address with another global. 

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Zero length function pointer equality

Dimitry Andric via cfe-dev
In reply to this post by Dimitry Andric via cfe-dev
Maybe we can just expand this to always apply: https://reviews.llvm.org/D32330

On Fri, Jul 24, 2020 at 2:46 AM David Blaikie via cfe-dev
<[hidden email]> wrote:

>
> LLVM can produce zero length functions from cases like this (when
> optimizations are enabled):
>
> void f1() { __builtin_unreachable(); }
> int f2() { /* missing return statement */ }
>
> This code is valid, so long as the functions are never called.
>
> I believe C++ requires that all functions have a distinct address (ie:
> &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
> gets optimized into an unconditional assertion failure)
>
> But these zero length functions can end up with identical addresses.
>
> I'm unaware of anything in the C++ spec (or the LLVM langref) that
> would indicate that would allow distinct functions to have identical
> addresses - so should we do something about this in the LLVM backend?
> add a little padding? a nop instruction? (if we're adding an
> instruction anyway, perhaps we might as well make it an int3?)
>
> (I came across this due to DWARF issues with zero length functions &
> thinking about if/how this should be supported)
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Zero length function pointer equality

Dimitry Andric via cfe-dev
In reply to this post by Dimitry Andric via cfe-dev
On Thu, Jul 23, 2020 at 9:10 PM Richard Smith <[hidden email]> wrote:

>
> On Thu, 23 Jul 2020 at 20:28, David Blaikie <[hidden email]> wrote:
>>
>> On Thu, Jul 23, 2020 at 7:17 PM Richard Smith <[hidden email]> wrote:
>> >
>> > On Thu, 23 Jul 2020 at 17:46, David Blaikie <[hidden email]> wrote:
>> >>
>> >> LLVM can produce zero length functions from cases like this (when
>> >> optimizations are enabled):
>> >>
>> >> void f1() { __builtin_unreachable(); }
>> >> int f2() { /* missing return statement */ }
>> >>
>> >> This code is valid, so long as the functions are never called.
>> >>
>> >> I believe C++ requires that all functions have a distinct address (ie:
>> >> &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
>> >> gets optimized into an unconditional assertion failure)
>> >>
>> >> But these zero length functions can end up with identical addresses.
>> >>
>> >> I'm unaware of anything in the C++ spec (or the LLVM langref) that
>> >> would indicate that would allow distinct functions to have identical
>> >> addresses - so should we do something about this in the LLVM backend?
>> >> add a little padding? a nop instruction? (if we're adding an
>> >> instruction anyway, perhaps we might as well make it an int3?)
>> >>
>> >> (I came across this due to DWARF issues with zero length functions &
>> >> thinking about if/how this should be supported)
>> >
>> >
>> > Yes, I think at least if the optimizer turns a non-empty function into an empty function,
>>
>> What about functions that are already empty? (well, I guess at the
>> LLVM IR level, no function can be empty, because every basic block
>> must end in some terminator instruction - is that the distinction
>> you're drawing?)
>
>
> Here's what I was thinking: a case could be made that the frontend is responsible for making sure that functions don't start non-empty, in much the same way that if the frontend produces a global of zero size, it gets what it asked for.
> But you're right, there really isn't such a thing as an empty function at the IR level, because there's always an entry block and it always has a terminator.
>
>>
>> > that's a miscompile for C and C++ source-language programs. My (possibly flawed) understanding is that LLVM is obliged to give a different address to distinct globals if neither of them is marked unnamed_addr,
>>
>> It seems like other LLVM passes make this assumption too - which is
>> how "f1 == f2" can be folded to a constant false. I haven't checked to
>> see exactly where that constant folding happens. (hmm, looks like it
>> happens in some constant folding utility - happens in the inliner if
>> there's inlining, happens at IR generation if there's no function
>> indirection, etc)
>>
>> > so it seems to me that this is a backend bug. Generating a ud2 function body in this case seems ideal to me.
>>
>> Guess that still leaves the possibility of the last function in an
>> object file as being zero-length? (or I guess not, because otherwise
>> when linked it could still end up with the same address as the
>> function that comes after it)
>
>
> Yes, I think that's right. We should never put a non-unnamed_addr global at the end of a section because we don't know if it will share an address with another global.

Fair point, we have unnamed_addr that helps distinguish the important
cases - though that does mean addressing this problem wouldn't
coincidentally address my DWARF problem (zero length functions are
weird/problematic in DWARF for a few reasons).

Doesn't mean it isn't worth fixing, though.
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Zero length function pointer equality

Dimitry Andric via cfe-dev
In reply to this post by Dimitry Andric via cfe-dev
Looks perfect to me!

well, a couple of questions: Why a noop, rather than int3/ud2/etc?
Might be worth using the existing code that places such an instruction
when building at -O0?
& you mention that this causes problems on Windows - but ICF done by
the Windows linker does not cause such problems? (I'd have thought
they'd result in the same situation - two functions described as being
at the same address?) is there a quick summary of why those two cases
turn out differently?

On Fri, Jul 24, 2020 at 6:17 AM Hans Wennborg <[hidden email]> wrote:

>
> Maybe we can just expand this to always apply: https://reviews.llvm.org/D32330
>
> On Fri, Jul 24, 2020 at 2:46 AM David Blaikie via cfe-dev
> <[hidden email]> wrote:
> >
> > LLVM can produce zero length functions from cases like this (when
> > optimizations are enabled):
> >
> > void f1() { __builtin_unreachable(); }
> > int f2() { /* missing return statement */ }
> >
> > This code is valid, so long as the functions are never called.
> >
> > I believe C++ requires that all functions have a distinct address (ie:
> > &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
> > gets optimized into an unconditional assertion failure)
> >
> > But these zero length functions can end up with identical addresses.
> >
> > I'm unaware of anything in the C++ spec (or the LLVM langref) that
> > would indicate that would allow distinct functions to have identical
> > addresses - so should we do something about this in the LLVM backend?
> > add a little padding? a nop instruction? (if we're adding an
> > instruction anyway, perhaps we might as well make it an int3?)
> >
> > (I came across this due to DWARF issues with zero length functions &
> > thinking about if/how this should be supported)
> > _______________________________________________
> > cfe-dev mailing list
> > [hidden email]
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Zero length function pointer equality

Dimitry Andric via cfe-dev
On Sat, Jul 25, 2020 at 3:40 AM David Blaikie <[hidden email]> wrote:
>
> Looks perfect to me!
>
> well, a couple of questions: Why a noop, rather than int3/ud2/etc?

Probably no reason.

> Might be worth using the existing code that places such an instruction
> when building at -O0?

I wasn't aware of that. Does it happen for all functions (e.g. I think
I got pulled into this due to functions with the naked attribute)?

> & you mention that this causes problems on Windows - but ICF done by
> the Windows linker does not cause such problems? (I'd have thought
> they'd result in the same situation - two functions described as being
> at the same address?) is there a quick summary of why those two cases
> turn out differently?

The case that we hit was that the Control Flow Guard table of
addresses in the binary ended up listing the same address twice, which
the loader didn't expect. It may be that the linker took care to avoid
that for ICF (if two ICF'd functions got the same address, only list
it once in the CFG table) but still didn't handle the "empty function"
problem.

> On Fri, Jul 24, 2020 at 6:17 AM Hans Wennborg <[hidden email]> wrote:
> >
> > Maybe we can just expand this to always apply: https://reviews.llvm.org/D32330
> >
> > On Fri, Jul 24, 2020 at 2:46 AM David Blaikie via cfe-dev
> > <[hidden email]> wrote:
> > >
> > > LLVM can produce zero length functions from cases like this (when
> > > optimizations are enabled):
> > >
> > > void f1() { __builtin_unreachable(); }
> > > int f2() { /* missing return statement */ }
> > >
> > > This code is valid, so long as the functions are never called.
> > >
> > > I believe C++ requires that all functions have a distinct address (ie:
> > > &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
> > > gets optimized into an unconditional assertion failure)
> > >
> > > But these zero length functions can end up with identical addresses.
> > >
> > > I'm unaware of anything in the C++ spec (or the LLVM langref) that
> > > would indicate that would allow distinct functions to have identical
> > > addresses - so should we do something about this in the LLVM backend?
> > > add a little padding? a nop instruction? (if we're adding an
> > > instruction anyway, perhaps we might as well make it an int3?)
> > >
> > > (I came across this due to DWARF issues with zero length functions &
> > > thinking about if/how this should be supported)
> > > _______________________________________________
> > > cfe-dev mailing list
> > > [hidden email]
> > > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Zero length function pointer equality

Dimitry Andric via cfe-dev


> -----Original Message-----
> From: llvm-dev <[hidden email]> On Behalf Of Hans
> Wennborg via llvm-dev
> Sent: Monday, July 27, 2020 9:11 AM
> To: David Blaikie <[hidden email]>
> Cc: llvm-dev <[hidden email]>; Clang Dev <[hidden email]>
> Subject: Re: [llvm-dev] [cfe-dev] Zero length function pointer equality
>
> On Sat, Jul 25, 2020 at 3:40 AM David Blaikie <[hidden email]> wrote:
> >
> > Looks perfect to me!
> >
> > well, a couple of questions: Why a noop, rather than int3/ud2/etc?
>
> Probably no reason.

FTR there is TargetOptions.TrapUnreachable, which some targets turn on
(for X86 it's on for MachO and PS4), this turns 'unreachable' into ud2.
Clearly it covers more than "empty" functions but is probably the kind
of thing you're looking for.
--paulr

>
> > Might be worth using the existing code that places such an instruction
> > when building at -O0?
>
> I wasn't aware of that. Does it happen for all functions (e.g. I think
> I got pulled into this due to functions with the naked attribute)?
>
> > & you mention that this causes problems on Windows - but ICF done by
> > the Windows linker does not cause such problems? (I'd have thought
> > they'd result in the same situation - two functions described as being
> > at the same address?) is there a quick summary of why those two cases
> > turn out differently?
>
> The case that we hit was that the Control Flow Guard table of
> addresses in the binary ended up listing the same address twice, which
> the loader didn't expect. It may be that the linker took care to avoid
> that for ICF (if two ICF'd functions got the same address, only list
> it once in the CFG table) but still didn't handle the "empty function"
> problem.
>
> > On Fri, Jul 24, 2020 at 6:17 AM Hans Wennborg <[hidden email]> wrote:
> > >
> > > Maybe we can just expand this to always apply:
> https://reviews.llvm.org/D32330
> > >
> > > On Fri, Jul 24, 2020 at 2:46 AM David Blaikie via cfe-dev
> > > <[hidden email]> wrote:
> > > >
> > > > LLVM can produce zero length functions from cases like this (when
> > > > optimizations are enabled):
> > > >
> > > > void f1() { __builtin_unreachable(); }
> > > > int f2() { /* missing return statement */ }
> > > >
> > > > This code is valid, so long as the functions are never called.
> > > >
> > > > I believe C++ requires that all functions have a distinct address
> (ie:
> > > > &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
> > > > gets optimized into an unconditional assertion failure)
> > > >
> > > > But these zero length functions can end up with identical addresses.
> > > >
> > > > I'm unaware of anything in the C++ spec (or the LLVM langref) that
> > > > would indicate that would allow distinct functions to have identical
> > > > addresses - so should we do something about this in the LLVM
> backend?
> > > > add a little padding? a nop instruction? (if we're adding an
> > > > instruction anyway, perhaps we might as well make it an int3?)
> > > >
> > > > (I came across this due to DWARF issues with zero length functions &
> > > > thinking about if/how this should be supported)
> > > > _______________________________________________
> > > > cfe-dev mailing list
> > > > [hidden email]
> > > > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Zero length function pointer equality

Dimitry Andric via cfe-dev
Just writing it down in this thread - this issue's been discussed a bit in this bug: https://bugs.llvm.org/show_bug.cgi?id=49599

And yeah, I'm considering adopting MachO's default (TrapUnreachable + NoTrapOnNoreturn) as the default for LLVM (will require some design discussion, no doubt) since it seems to capture most of the functionality desired. Maybe there are some cases where we have extra unreachables that could've otherwise been optimized away/elided, but hopefully nothing drastic.

(some platforms still need the trap-on-noreturn - Windows+AArch64 and maybe Sony, etc - happy for some folks to opt into that). I wonder whether TrapUnreachable shouldn't even be an option anymore though, if it becomes load bearing for correctness - or should it become a fallback option - "no trap unreachable" maybe means nop instead of trap, in case your target can't handle a trap sometimes (I came across an issue with AMDGPU not being able to add traps to functions that it isn't expecting - the function needs some special attribute to have a trap in it - but I guess it can be updated to add that attribute if the function has an unreachable in it (though then it has to recreate the no-trap-on-noreturn handling too when deciding whether to add the attribute... ))

On Mon, Jul 27, 2020 at 9:20 AM Robinson, Paul <[hidden email]> wrote:


> -----Original Message-----
> From: llvm-dev <[hidden email]> On Behalf Of Hans
> Wennborg via llvm-dev
> Sent: Monday, July 27, 2020 9:11 AM
> To: David Blaikie <[hidden email]>
> Cc: llvm-dev <[hidden email]>; Clang Dev <[hidden email]>
> Subject: Re: [llvm-dev] [cfe-dev] Zero length function pointer equality
>
> On Sat, Jul 25, 2020 at 3:40 AM David Blaikie <[hidden email]> wrote:
> >
> > Looks perfect to me!
> >
> > well, a couple of questions: Why a noop, rather than int3/ud2/etc?
>
> Probably no reason.

FTR there is TargetOptions.TrapUnreachable, which some targets turn on
(for X86 it's on for MachO and PS4), this turns 'unreachable' into ud2.
Clearly it covers more than "empty" functions but is probably the kind
of thing you're looking for.
--paulr

>
> > Might be worth using the existing code that places such an instruction
> > when building at -O0?
>
> I wasn't aware of that. Does it happen for all functions (e.g. I think
> I got pulled into this due to functions with the naked attribute)?
>
> > & you mention that this causes problems on Windows - but ICF done by
> > the Windows linker does not cause such problems? (I'd have thought
> > they'd result in the same situation - two functions described as being
> > at the same address?) is there a quick summary of why those two cases
> > turn out differently?
>
> The case that we hit was that the Control Flow Guard table of
> addresses in the binary ended up listing the same address twice, which
> the loader didn't expect. It may be that the linker took care to avoid
> that for ICF (if two ICF'd functions got the same address, only list
> it once in the CFG table) but still didn't handle the "empty function"
> problem.
>
> > On Fri, Jul 24, 2020 at 6:17 AM Hans Wennborg <[hidden email]> wrote:
> > >
> > > Maybe we can just expand this to always apply:
> https://reviews.llvm.org/D32330
> > >
> > > On Fri, Jul 24, 2020 at 2:46 AM David Blaikie via cfe-dev
> > > <[hidden email]> wrote:
> > > >
> > > > LLVM can produce zero length functions from cases like this (when
> > > > optimizations are enabled):
> > > >
> > > > void f1() { __builtin_unreachable(); }
> > > > int f2() { /* missing return statement */ }
> > > >
> > > > This code is valid, so long as the functions are never called.
> > > >
> > > > I believe C++ requires that all functions have a distinct address
> (ie:
> > > > &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
> > > > gets optimized into an unconditional assertion failure)
> > > >
> > > > But these zero length functions can end up with identical addresses.
> > > >
> > > > I'm unaware of anything in the C++ spec (or the LLVM langref) that
> > > > would indicate that would allow distinct functions to have identical
> > > > addresses - so should we do something about this in the LLVM
> backend?
> > > > add a little padding? a nop instruction? (if we're adding an
> > > > instruction anyway, perhaps we might as well make it an int3?)
> > > >
> > > > (I came across this due to DWARF issues with zero length functions &
> > > > thinking about if/how this should be supported)
> > > > _______________________________________________
> > > > cfe-dev mailing list
> > > > [hidden email]
> > > > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Zero length function pointer equality

Dimitry Andric via cfe-dev
I'm not sure whether we'd want every unreachable to emit a trap, but I do think we should try not to let code fall out of one function and into a completely unrelated one. That is: I'd propose that the last basic-block in every function should get a trap instruction added unless it already ends in a control transfer instruction (jmp, ret, or branch -- call doesn't count, since it may return).

On Fri, Mar 19, 2021 at 5:12 PM David Blaikie via cfe-dev <[hidden email]> wrote:
Just writing it down in this thread - this issue's been discussed a bit in this bug: https://bugs.llvm.org/show_bug.cgi?id=49599

And yeah, I'm considering adopting MachO's default (TrapUnreachable + NoTrapOnNoreturn) as the default for LLVM (will require some design discussion, no doubt) since it seems to capture most of the functionality desired. Maybe there are some cases where we have extra unreachables that could've otherwise been optimized away/elided, but hopefully nothing drastic.

(some platforms still need the trap-on-noreturn - Windows+AArch64 and maybe Sony, etc - happy for some folks to opt into that). I wonder whether TrapUnreachable shouldn't even be an option anymore though, if it becomes load bearing for correctness - or should it become a fallback option - "no trap unreachable" maybe means nop instead of trap, in case your target can't handle a trap sometimes (I came across an issue with AMDGPU not being able to add traps to functions that it isn't expecting - the function needs some special attribute to have a trap in it - but I guess it can be updated to add that attribute if the function has an unreachable in it (though then it has to recreate the no-trap-on-noreturn handling too when deciding whether to add the attribute... ))

On Mon, Jul 27, 2020 at 9:20 AM Robinson, Paul <[hidden email]> wrote:


> -----Original Message-----
> From: llvm-dev <[hidden email]> On Behalf Of Hans
> Wennborg via llvm-dev
> Sent: Monday, July 27, 2020 9:11 AM
> To: David Blaikie <[hidden email]>
> Cc: llvm-dev <[hidden email]>; Clang Dev <[hidden email]>
> Subject: Re: [llvm-dev] [cfe-dev] Zero length function pointer equality
>
> On Sat, Jul 25, 2020 at 3:40 AM David Blaikie <[hidden email]> wrote:
> >
> > Looks perfect to me!
> >
> > well, a couple of questions: Why a noop, rather than int3/ud2/etc?
>
> Probably no reason.

FTR there is TargetOptions.TrapUnreachable, which some targets turn on
(for X86 it's on for MachO and PS4), this turns 'unreachable' into ud2.
Clearly it covers more than "empty" functions but is probably the kind
of thing you're looking for.
--paulr

>
> > Might be worth using the existing code that places such an instruction
> > when building at -O0?
>
> I wasn't aware of that. Does it happen for all functions (e.g. I think
> I got pulled into this due to functions with the naked attribute)?
>
> > & you mention that this causes problems on Windows - but ICF done by
> > the Windows linker does not cause such problems? (I'd have thought
> > they'd result in the same situation - two functions described as being
> > at the same address?) is there a quick summary of why those two cases
> > turn out differently?
>
> The case that we hit was that the Control Flow Guard table of
> addresses in the binary ended up listing the same address twice, which
> the loader didn't expect. It may be that the linker took care to avoid
> that for ICF (if two ICF'd functions got the same address, only list
> it once in the CFG table) but still didn't handle the "empty function"
> problem.
>
> > On Fri, Jul 24, 2020 at 6:17 AM Hans Wennborg <[hidden email]> wrote:
> > >
> > > Maybe we can just expand this to always apply:
> https://reviews.llvm.org/D32330
> > >
> > > On Fri, Jul 24, 2020 at 2:46 AM David Blaikie via cfe-dev
> > > <[hidden email]> wrote:
> > > >
> > > > LLVM can produce zero length functions from cases like this (when
> > > > optimizations are enabled):
> > > >
> > > > void f1() { __builtin_unreachable(); }
> > > > int f2() { /* missing return statement */ }
> > > >
> > > > This code is valid, so long as the functions are never called.
> > > >
> > > > I believe C++ requires that all functions have a distinct address
> (ie:
> > > > &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
> > > > gets optimized into an unconditional assertion failure)
> > > >
> > > > But these zero length functions can end up with identical addresses.
> > > >
> > > > I'm unaware of anything in the C++ spec (or the LLVM langref) that
> > > > would indicate that would allow distinct functions to have identical
> > > > addresses - so should we do something about this in the LLVM
> backend?
> > > > add a little padding? a nop instruction? (if we're adding an
> > > > instruction anyway, perhaps we might as well make it an int3?)
> > > >
> > > > (I came across this due to DWARF issues with zero length functions &
> > > > thinking about if/how this should be supported)
> > > > _______________________________________________
> > > > cfe-dev mailing list
> > > > [hidden email]
> > > > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Zero length function pointer equality

Dimitry Andric via cfe-dev
On Fri, Mar 19, 2021 at 3:00 PM James Y Knight <[hidden email]> wrote:
I'm not sure whether we'd want every unreachable to emit a trap, but I do think we should try not to let code fall out of one function and into a completely unrelated one.

Yeah, that was my gut reaction too - though TrapUnreachable doesn't inhibit SimplifyCFG from eliminating unreachable blocks (can still leave unreachable in a block after a call - because the call may or may not return).
 
That is: I'd propose that the last basic-block in every function should get a trap instruction added unless it already ends in a control transfer instruction

That was my initial theory, but given MachO defaults to TrapUnreachable and it doesn't inhibit SimplifyCFG, so most unreachable blocks do get eliminated - I'm sort of leaning towards that being sufficiently correct.
 
(jmp, ret, or branch -- call doesn't count, since it may return).

I'm inclined to omit the trap after a call to a noreturn function for brevity - even though it does leave the possibility of violating the noreturn contract leading to that fallthrough UB.
 

On Fri, Mar 19, 2021 at 5:12 PM David Blaikie via cfe-dev <[hidden email]> wrote:
Just writing it down in this thread - this issue's been discussed a bit in this bug: https://bugs.llvm.org/show_bug.cgi?id=49599

And yeah, I'm considering adopting MachO's default (TrapUnreachable + NoTrapOnNoreturn) as the default for LLVM (will require some design discussion, no doubt) since it seems to capture most of the functionality desired. Maybe there are some cases where we have extra unreachables that could've otherwise been optimized away/elided, but hopefully nothing drastic.

(some platforms still need the trap-on-noreturn - Windows+AArch64 and maybe Sony, etc - happy for some folks to opt into that). I wonder whether TrapUnreachable shouldn't even be an option anymore though, if it becomes load bearing for correctness - or should it become a fallback option - "no trap unreachable" maybe means nop instead of trap, in case your target can't handle a trap sometimes (I came across an issue with AMDGPU not being able to add traps to functions that it isn't expecting - the function needs some special attribute to have a trap in it - but I guess it can be updated to add that attribute if the function has an unreachable in it (though then it has to recreate the no-trap-on-noreturn handling too when deciding whether to add the attribute... ))

On Mon, Jul 27, 2020 at 9:20 AM Robinson, Paul <[hidden email]> wrote:


> -----Original Message-----
> From: llvm-dev <[hidden email]> On Behalf Of Hans
> Wennborg via llvm-dev
> Sent: Monday, July 27, 2020 9:11 AM
> To: David Blaikie <[hidden email]>
> Cc: llvm-dev <[hidden email]>; Clang Dev <[hidden email]>
> Subject: Re: [llvm-dev] [cfe-dev] Zero length function pointer equality
>
> On Sat, Jul 25, 2020 at 3:40 AM David Blaikie <[hidden email]> wrote:
> >
> > Looks perfect to me!
> >
> > well, a couple of questions: Why a noop, rather than int3/ud2/etc?
>
> Probably no reason.

FTR there is TargetOptions.TrapUnreachable, which some targets turn on
(for X86 it's on for MachO and PS4), this turns 'unreachable' into ud2.
Clearly it covers more than "empty" functions but is probably the kind
of thing you're looking for.
--paulr

>
> > Might be worth using the existing code that places such an instruction
> > when building at -O0?
>
> I wasn't aware of that. Does it happen for all functions (e.g. I think
> I got pulled into this due to functions with the naked attribute)?
>
> > & you mention that this causes problems on Windows - but ICF done by
> > the Windows linker does not cause such problems? (I'd have thought
> > they'd result in the same situation - two functions described as being
> > at the same address?) is there a quick summary of why those two cases
> > turn out differently?
>
> The case that we hit was that the Control Flow Guard table of
> addresses in the binary ended up listing the same address twice, which
> the loader didn't expect. It may be that the linker took care to avoid
> that for ICF (if two ICF'd functions got the same address, only list
> it once in the CFG table) but still didn't handle the "empty function"
> problem.
>
> > On Fri, Jul 24, 2020 at 6:17 AM Hans Wennborg <[hidden email]> wrote:
> > >
> > > Maybe we can just expand this to always apply:
> https://reviews.llvm.org/D32330
> > >
> > > On Fri, Jul 24, 2020 at 2:46 AM David Blaikie via cfe-dev
> > > <[hidden email]> wrote:
> > > >
> > > > LLVM can produce zero length functions from cases like this (when
> > > > optimizations are enabled):
> > > >
> > > > void f1() { __builtin_unreachable(); }
> > > > int f2() { /* missing return statement */ }
> > > >
> > > > This code is valid, so long as the functions are never called.
> > > >
> > > > I believe C++ requires that all functions have a distinct address
> (ie:
> > > > &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
> > > > gets optimized into an unconditional assertion failure)
> > > >
> > > > But these zero length functions can end up with identical addresses.
> > > >
> > > > I'm unaware of anything in the C++ spec (or the LLVM langref) that
> > > > would indicate that would allow distinct functions to have identical
> > > > addresses - so should we do something about this in the LLVM
> backend?
> > > > add a little padding? a nop instruction? (if we're adding an
> > > > instruction anyway, perhaps we might as well make it an int3?)
> > > >
> > > > (I came across this due to DWARF issues with zero length functions &
> > > > thinking about if/how this should be supported)
> > > > _______________________________________________
> > > > cfe-dev mailing list
> > > > [hidden email]
> > > > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Zero length function pointer equality

Dimitry Andric via cfe-dev
> I'm inclined to omit the trap after a call to a noreturn function
> for brevity - even though it does leave the possibility of
> violating the noreturn contract leading to that fallthrough UB.

This is specifically the case where we needed it, at least when the
noreturn call is in the last block. Which is commonly the case, a
noreturn call is likely to have its block laid out at the end of a
function.
--paulr

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Zero length function pointer equality

Dimitry Andric via cfe-dev


On Mon, Mar 22, 2021 at 7:39 AM <[hidden email]> wrote:
> I'm inclined to omit the trap after a call to a noreturn function
> for brevity - even though it does leave the possibility of
> violating the noreturn contract leading to that fallthrough UB.

This is specifically the case where we needed it, at least when the
noreturn call is in the last block. Which is commonly the case, a
noreturn call is likely to have its block laid out at the end of a
function.
--paulr

Yeah, not suggesting removing that option or changing those who've opted into that behavior - only about the default when platforms haven't otherwise cared/have already been living with zero length functions.

- Dave
 

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev