Zero length function pointer equality

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Zero length function pointer equality

Keane, Erich via cfe-dev
LLVM can produce zero length functions from cases like this (when
optimizations are enabled):

void f1() { __builtin_unreachable(); }
int f2() { /* missing return statement */ }

This code is valid, so long as the functions are never called.

I believe C++ requires that all functions have a distinct address (ie:
&f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
gets optimized into an unconditional assertion failure)

But these zero length functions can end up with identical addresses.

I'm unaware of anything in the C++ spec (or the LLVM langref) that
would indicate that would allow distinct functions to have identical
addresses - so should we do something about this in the LLVM backend?
add a little padding? a nop instruction? (if we're adding an
instruction anyway, perhaps we might as well make it an int3?)

(I came across this due to DWARF issues with zero length functions &
thinking about if/how this should be supported)
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Zero length function pointer equality

Keane, Erich via cfe-dev
On Thu, 23 Jul 2020 at 17:46, David Blaikie <[hidden email]> wrote:
LLVM can produce zero length functions from cases like this (when
optimizations are enabled):

void f1() { __builtin_unreachable(); }
int f2() { /* missing return statement */ }

This code is valid, so long as the functions are never called.

I believe C++ requires that all functions have a distinct address (ie:
&f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
gets optimized into an unconditional assertion failure)

But these zero length functions can end up with identical addresses.

I'm unaware of anything in the C++ spec (or the LLVM langref) that
would indicate that would allow distinct functions to have identical
addresses - so should we do something about this in the LLVM backend?
add a little padding? a nop instruction? (if we're adding an
instruction anyway, perhaps we might as well make it an int3?)

(I came across this due to DWARF issues with zero length functions &
thinking about if/how this should be supported)

Yes, I think at least if the optimizer turns a non-empty function into an empty function, that's a miscompile for C and C++ source-language programs. My (possibly flawed) understanding is that LLVM is obliged to give a different address to distinct globals if neither of them is marked unnamed_addr, so it seems to me that this is a backend bug. Generating a ud2 function body in this case seems ideal to me.

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Zero length function pointer equality

Keane, Erich via cfe-dev
On Thu, Jul 23, 2020 at 7:17 PM Richard Smith <[hidden email]> wrote:

>
> On Thu, 23 Jul 2020 at 17:46, David Blaikie <[hidden email]> wrote:
>>
>> LLVM can produce zero length functions from cases like this (when
>> optimizations are enabled):
>>
>> void f1() { __builtin_unreachable(); }
>> int f2() { /* missing return statement */ }
>>
>> This code is valid, so long as the functions are never called.
>>
>> I believe C++ requires that all functions have a distinct address (ie:
>> &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
>> gets optimized into an unconditional assertion failure)
>>
>> But these zero length functions can end up with identical addresses.
>>
>> I'm unaware of anything in the C++ spec (or the LLVM langref) that
>> would indicate that would allow distinct functions to have identical
>> addresses - so should we do something about this in the LLVM backend?
>> add a little padding? a nop instruction? (if we're adding an
>> instruction anyway, perhaps we might as well make it an int3?)
>>
>> (I came across this due to DWARF issues with zero length functions &
>> thinking about if/how this should be supported)
>
>
> Yes, I think at least if the optimizer turns a non-empty function into an empty function,

What about functions that are already empty? (well, I guess at the
LLVM IR level, no function can be empty, because every basic block
must end in some terminator instruction - is that the distinction
you're drawing?)

> that's a miscompile for C and C++ source-language programs. My (possibly flawed) understanding is that LLVM is obliged to give a different address to distinct globals if neither of them is marked unnamed_addr,

It seems like other LLVM passes make this assumption too - which is
how "f1 == f2" can be folded to a constant false. I haven't checked to
see exactly where that constant folding happens. (hmm, looks like it
happens in some constant folding utility - happens in the inliner if
there's inlining, happens at IR generation if there's no function
indirection, etc)

> so it seems to me that this is a backend bug. Generating a ud2 function body in this case seems ideal to me.

Guess that still leaves the possibility of the last function in an
object file as being zero-length? (or I guess not, because otherwise
when linked it could still end up with the same address as the
function that comes after it)
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Zero length function pointer equality

Keane, Erich via cfe-dev
On Thu, 23 Jul 2020 at 20:28, David Blaikie <[hidden email]> wrote:
On Thu, Jul 23, 2020 at 7:17 PM Richard Smith <[hidden email]> wrote:
>
> On Thu, 23 Jul 2020 at 17:46, David Blaikie <[hidden email]> wrote:
>>
>> LLVM can produce zero length functions from cases like this (when
>> optimizations are enabled):
>>
>> void f1() { __builtin_unreachable(); }
>> int f2() { /* missing return statement */ }
>>
>> This code is valid, so long as the functions are never called.
>>
>> I believe C++ requires that all functions have a distinct address (ie:
>> &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
>> gets optimized into an unconditional assertion failure)
>>
>> But these zero length functions can end up with identical addresses.
>>
>> I'm unaware of anything in the C++ spec (or the LLVM langref) that
>> would indicate that would allow distinct functions to have identical
>> addresses - so should we do something about this in the LLVM backend?
>> add a little padding? a nop instruction? (if we're adding an
>> instruction anyway, perhaps we might as well make it an int3?)
>>
>> (I came across this due to DWARF issues with zero length functions &
>> thinking about if/how this should be supported)
>
>
> Yes, I think at least if the optimizer turns a non-empty function into an empty function,

What about functions that are already empty? (well, I guess at the
LLVM IR level, no function can be empty, because every basic block
must end in some terminator instruction - is that the distinction
you're drawing?)

Here's what I was thinking: a case could be made that the frontend is responsible for making sure that functions don't start non-empty, in much the same way that if the frontend produces a global of zero size, it gets what it asked for.
But you're right, there really isn't such a thing as an empty function at the IR level, because there's always an entry block and it always has a terminator.
 
> that's a miscompile for C and C++ source-language programs. My (possibly flawed) understanding is that LLVM is obliged to give a different address to distinct globals if neither of them is marked unnamed_addr,

It seems like other LLVM passes make this assumption too - which is
how "f1 == f2" can be folded to a constant false. I haven't checked to
see exactly where that constant folding happens. (hmm, looks like it
happens in some constant folding utility - happens in the inliner if
there's inlining, happens at IR generation if there's no function
indirection, etc)

> so it seems to me that this is a backend bug. Generating a ud2 function body in this case seems ideal to me.

Guess that still leaves the possibility of the last function in an
object file as being zero-length? (or I guess not, because otherwise
when linked it could still end up with the same address as the
function that comes after it)

Yes, I think that's right. We should never put a non-unnamed_addr global at the end of a section because we don't know if it will share an address with another global. 

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Zero length function pointer equality

Keane, Erich via cfe-dev
In reply to this post by Keane, Erich via cfe-dev
Maybe we can just expand this to always apply: https://reviews.llvm.org/D32330

On Fri, Jul 24, 2020 at 2:46 AM David Blaikie via cfe-dev
<[hidden email]> wrote:

>
> LLVM can produce zero length functions from cases like this (when
> optimizations are enabled):
>
> void f1() { __builtin_unreachable(); }
> int f2() { /* missing return statement */ }
>
> This code is valid, so long as the functions are never called.
>
> I believe C++ requires that all functions have a distinct address (ie:
> &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
> gets optimized into an unconditional assertion failure)
>
> But these zero length functions can end up with identical addresses.
>
> I'm unaware of anything in the C++ spec (or the LLVM langref) that
> would indicate that would allow distinct functions to have identical
> addresses - so should we do something about this in the LLVM backend?
> add a little padding? a nop instruction? (if we're adding an
> instruction anyway, perhaps we might as well make it an int3?)
>
> (I came across this due to DWARF issues with zero length functions &
> thinking about if/how this should be supported)
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Zero length function pointer equality

Keane, Erich via cfe-dev
In reply to this post by Keane, Erich via cfe-dev
On Thu, Jul 23, 2020 at 9:10 PM Richard Smith <[hidden email]> wrote:

>
> On Thu, 23 Jul 2020 at 20:28, David Blaikie <[hidden email]> wrote:
>>
>> On Thu, Jul 23, 2020 at 7:17 PM Richard Smith <[hidden email]> wrote:
>> >
>> > On Thu, 23 Jul 2020 at 17:46, David Blaikie <[hidden email]> wrote:
>> >>
>> >> LLVM can produce zero length functions from cases like this (when
>> >> optimizations are enabled):
>> >>
>> >> void f1() { __builtin_unreachable(); }
>> >> int f2() { /* missing return statement */ }
>> >>
>> >> This code is valid, so long as the functions are never called.
>> >>
>> >> I believe C++ requires that all functions have a distinct address (ie:
>> >> &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
>> >> gets optimized into an unconditional assertion failure)
>> >>
>> >> But these zero length functions can end up with identical addresses.
>> >>
>> >> I'm unaware of anything in the C++ spec (or the LLVM langref) that
>> >> would indicate that would allow distinct functions to have identical
>> >> addresses - so should we do something about this in the LLVM backend?
>> >> add a little padding? a nop instruction? (if we're adding an
>> >> instruction anyway, perhaps we might as well make it an int3?)
>> >>
>> >> (I came across this due to DWARF issues with zero length functions &
>> >> thinking about if/how this should be supported)
>> >
>> >
>> > Yes, I think at least if the optimizer turns a non-empty function into an empty function,
>>
>> What about functions that are already empty? (well, I guess at the
>> LLVM IR level, no function can be empty, because every basic block
>> must end in some terminator instruction - is that the distinction
>> you're drawing?)
>
>
> Here's what I was thinking: a case could be made that the frontend is responsible for making sure that functions don't start non-empty, in much the same way that if the frontend produces a global of zero size, it gets what it asked for.
> But you're right, there really isn't such a thing as an empty function at the IR level, because there's always an entry block and it always has a terminator.
>
>>
>> > that's a miscompile for C and C++ source-language programs. My (possibly flawed) understanding is that LLVM is obliged to give a different address to distinct globals if neither of them is marked unnamed_addr,
>>
>> It seems like other LLVM passes make this assumption too - which is
>> how "f1 == f2" can be folded to a constant false. I haven't checked to
>> see exactly where that constant folding happens. (hmm, looks like it
>> happens in some constant folding utility - happens in the inliner if
>> there's inlining, happens at IR generation if there's no function
>> indirection, etc)
>>
>> > so it seems to me that this is a backend bug. Generating a ud2 function body in this case seems ideal to me.
>>
>> Guess that still leaves the possibility of the last function in an
>> object file as being zero-length? (or I guess not, because otherwise
>> when linked it could still end up with the same address as the
>> function that comes after it)
>
>
> Yes, I think that's right. We should never put a non-unnamed_addr global at the end of a section because we don't know if it will share an address with another global.

Fair point, we have unnamed_addr that helps distinguish the important
cases - though that does mean addressing this problem wouldn't
coincidentally address my DWARF problem (zero length functions are
weird/problematic in DWARF for a few reasons).

Doesn't mean it isn't worth fixing, though.
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Zero length function pointer equality

Keane, Erich via cfe-dev
In reply to this post by Keane, Erich via cfe-dev
Looks perfect to me!

well, a couple of questions: Why a noop, rather than int3/ud2/etc?
Might be worth using the existing code that places such an instruction
when building at -O0?
& you mention that this causes problems on Windows - but ICF done by
the Windows linker does not cause such problems? (I'd have thought
they'd result in the same situation - two functions described as being
at the same address?) is there a quick summary of why those two cases
turn out differently?

On Fri, Jul 24, 2020 at 6:17 AM Hans Wennborg <[hidden email]> wrote:

>
> Maybe we can just expand this to always apply: https://reviews.llvm.org/D32330
>
> On Fri, Jul 24, 2020 at 2:46 AM David Blaikie via cfe-dev
> <[hidden email]> wrote:
> >
> > LLVM can produce zero length functions from cases like this (when
> > optimizations are enabled):
> >
> > void f1() { __builtin_unreachable(); }
> > int f2() { /* missing return statement */ }
> >
> > This code is valid, so long as the functions are never called.
> >
> > I believe C++ requires that all functions have a distinct address (ie:
> > &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
> > gets optimized into an unconditional assertion failure)
> >
> > But these zero length functions can end up with identical addresses.
> >
> > I'm unaware of anything in the C++ spec (or the LLVM langref) that
> > would indicate that would allow distinct functions to have identical
> > addresses - so should we do something about this in the LLVM backend?
> > add a little padding? a nop instruction? (if we're adding an
> > instruction anyway, perhaps we might as well make it an int3?)
> >
> > (I came across this due to DWARF issues with zero length functions &
> > thinking about if/how this should be supported)
> > _______________________________________________
> > cfe-dev mailing list
> > [hidden email]
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Zero length function pointer equality

Keane, Erich via cfe-dev
On Sat, Jul 25, 2020 at 3:40 AM David Blaikie <[hidden email]> wrote:
>
> Looks perfect to me!
>
> well, a couple of questions: Why a noop, rather than int3/ud2/etc?

Probably no reason.

> Might be worth using the existing code that places such an instruction
> when building at -O0?

I wasn't aware of that. Does it happen for all functions (e.g. I think
I got pulled into this due to functions with the naked attribute)?

> & you mention that this causes problems on Windows - but ICF done by
> the Windows linker does not cause such problems? (I'd have thought
> they'd result in the same situation - two functions described as being
> at the same address?) is there a quick summary of why those two cases
> turn out differently?

The case that we hit was that the Control Flow Guard table of
addresses in the binary ended up listing the same address twice, which
the loader didn't expect. It may be that the linker took care to avoid
that for ICF (if two ICF'd functions got the same address, only list
it once in the CFG table) but still didn't handle the "empty function"
problem.

> On Fri, Jul 24, 2020 at 6:17 AM Hans Wennborg <[hidden email]> wrote:
> >
> > Maybe we can just expand this to always apply: https://reviews.llvm.org/D32330
> >
> > On Fri, Jul 24, 2020 at 2:46 AM David Blaikie via cfe-dev
> > <[hidden email]> wrote:
> > >
> > > LLVM can produce zero length functions from cases like this (when
> > > optimizations are enabled):
> > >
> > > void f1() { __builtin_unreachable(); }
> > > int f2() { /* missing return statement */ }
> > >
> > > This code is valid, so long as the functions are never called.
> > >
> > > I believe C++ requires that all functions have a distinct address (ie:
> > > &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
> > > gets optimized into an unconditional assertion failure)
> > >
> > > But these zero length functions can end up with identical addresses.
> > >
> > > I'm unaware of anything in the C++ spec (or the LLVM langref) that
> > > would indicate that would allow distinct functions to have identical
> > > addresses - so should we do something about this in the LLVM backend?
> > > add a little padding? a nop instruction? (if we're adding an
> > > instruction anyway, perhaps we might as well make it an int3?)
> > >
> > > (I came across this due to DWARF issues with zero length functions &
> > > thinking about if/how this should be supported)
> > > _______________________________________________
> > > cfe-dev mailing list
> > > [hidden email]
> > > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Zero length function pointer equality

Keane, Erich via cfe-dev


> -----Original Message-----
> From: llvm-dev <[hidden email]> On Behalf Of Hans
> Wennborg via llvm-dev
> Sent: Monday, July 27, 2020 9:11 AM
> To: David Blaikie <[hidden email]>
> Cc: llvm-dev <[hidden email]>; Clang Dev <[hidden email]>
> Subject: Re: [llvm-dev] [cfe-dev] Zero length function pointer equality
>
> On Sat, Jul 25, 2020 at 3:40 AM David Blaikie <[hidden email]> wrote:
> >
> > Looks perfect to me!
> >
> > well, a couple of questions: Why a noop, rather than int3/ud2/etc?
>
> Probably no reason.

FTR there is TargetOptions.TrapUnreachable, which some targets turn on
(for X86 it's on for MachO and PS4), this turns 'unreachable' into ud2.
Clearly it covers more than "empty" functions but is probably the kind
of thing you're looking for.
--paulr

>
> > Might be worth using the existing code that places such an instruction
> > when building at -O0?
>
> I wasn't aware of that. Does it happen for all functions (e.g. I think
> I got pulled into this due to functions with the naked attribute)?
>
> > & you mention that this causes problems on Windows - but ICF done by
> > the Windows linker does not cause such problems? (I'd have thought
> > they'd result in the same situation - two functions described as being
> > at the same address?) is there a quick summary of why those two cases
> > turn out differently?
>
> The case that we hit was that the Control Flow Guard table of
> addresses in the binary ended up listing the same address twice, which
> the loader didn't expect. It may be that the linker took care to avoid
> that for ICF (if two ICF'd functions got the same address, only list
> it once in the CFG table) but still didn't handle the "empty function"
> problem.
>
> > On Fri, Jul 24, 2020 at 6:17 AM Hans Wennborg <[hidden email]> wrote:
> > >
> > > Maybe we can just expand this to always apply:
> https://reviews.llvm.org/D32330
> > >
> > > On Fri, Jul 24, 2020 at 2:46 AM David Blaikie via cfe-dev
> > > <[hidden email]> wrote:
> > > >
> > > > LLVM can produce zero length functions from cases like this (when
> > > > optimizations are enabled):
> > > >
> > > > void f1() { __builtin_unreachable(); }
> > > > int f2() { /* missing return statement */ }
> > > >
> > > > This code is valid, so long as the functions are never called.
> > > >
> > > > I believe C++ requires that all functions have a distinct address
> (ie:
> > > > &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
> > > > gets optimized into an unconditional assertion failure)
> > > >
> > > > But these zero length functions can end up with identical addresses.
> > > >
> > > > I'm unaware of anything in the C++ spec (or the LLVM langref) that
> > > > would indicate that would allow distinct functions to have identical
> > > > addresses - so should we do something about this in the LLVM
> backend?
> > > > add a little padding? a nop instruction? (if we're adding an
> > > > instruction anyway, perhaps we might as well make it an int3?)
> > > >
> > > > (I came across this due to DWARF issues with zero length functions &
> > > > thinking about if/how this should be supported)
> > > > _______________________________________________
> > > > cfe-dev mailing list
> > > > [hidden email]
> > > > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev