Understand assumptions towards uninitialized variables on stack

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Understand assumptions towards uninitialized variables on stack

Oleg Smolsky via cfe-dev
Hello list,

Hope this is the right place to post this question. So I am writing to understand assumptions made by Clang, in terms of the values of uninitialized variables on the stack. 

My observation is that when Clang compiles the following piece of code without any optimization, the assembly code will check the path condition, and assign variable t with whatever keeps on the stack, which seems pretty reasonable to me. 

int main() {

        char a[20];
        char* p = a;
        int t;
        if (p) {
                t = p[7];
        }

        return t;
}

On the other hand, when optimized with -O2, the whole if condition is gone, and t is assigned with zero (i.e., xor eax, eax), then returned at the end of the main function.

So directly reading from the uninitialized variables are considered to be "undefined behavior". And as far as I can see, compiler shouldn't make any assumption on the that, right? My test environment is 64-bit Ubuntu 18.04 with Clang version 5.0. 

I am trying to understand whether clang -O2 utilizes some analysis to make sure the initial value of stack variables must be zeroed. At least so far from the assembly code and the enabled compiler options by -O2 I didn't figure out such tricks. Am I missed anything here?

Thanks,
Irene

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Understand assumptions towards uninitialized variables on stack

Oleg Smolsky via cfe-dev
Hi Irene,
I am not an expert, but here is my interpretation: "undefined behaviour" means that the behaviour observed by the person running the compiled program is not dictated by the C++ standard, and thus the compiler is free to do whatever it wants.

Setting the value of an uninitialised variable to zero, or to any arbitrary number, or to a random number, or not setting it at all, would be acceptable behaviours for "undefined behaviour".

Ciao,
.Andrea

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Understand assumptions towards uninitialized variables on stack

Oleg Smolsky via cfe-dev
Hello Andrea,

Thank you for the clarification. It makes a lot of sense to me. 

On the other hand, I am trying to understand the "inconsistency" regarding different optimization levels, with respect to this undefined behaviour. So basically this is how I executed the presented code on my machine:

➜  code clang-5.0 -g -O0 test.c
➜  code ./a.out
➜  code echo $?
192
➜  code clang-5.0 -g -O2 test.c
➜  code ./a.out
➜  code echo $?
0
➜  code

So for the -O2 case since t is zeroed, the return value will be zero in anyway. In contrast, for -O0 the return value seems un-predictable. IMHO the inconsistency makes a lot of additional effort and perhaps is not preferred, but I guess that's eventually the programmer's responsibility to solve that? 

Overall, from the assembly code generated from clang -O2 (attached below), uninitialized variables on the stack is assumed to be zero due to some reason, and I am writing to inquire the motivation/analysis behind.

0000000000400490 <main>:
  400490:       31 c0                   xor    %eax,%eax
  400492:       c3                      retq
  400493:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  40049a:       00 00 00
  40049d:       0f 1f 00                nopl   (%rax)

Sincerely,
Irene

On Wed, Nov 21, 2018 at 9:27 AM Andrea Bocci <[hidden email]> wrote:
Hi Irene,
I am not an expert, but here is my interpretation: "undefined behaviour" means that the behaviour observed by the person running the compiled program is not dictated by the C++ standard, and thus the compiler is free to do whatever it wants.

Setting the value of an uninitialised variable to zero, or to any arbitrary number, or to a random number, or not setting it at all, would be acceptable behaviours for "undefined behaviour".

Ciao,
.Andrea

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Understand assumptions towards uninitialized variables on stack

Oleg Smolsky via cfe-dev
> IMHO the inconsistency makes a lot of additional effort and perhaps is
> not preferred, but I guess that's eventually the programmer's
> responsibility to solve that? 

Correct. The program has undefined behavior, and it is the programmer's
responsibility to solve that.  The Undefined Behavior Sanitizer would
reveal the problem immediately.

> uninitialized variables on the stack is assumed to be zero due to
> some reason,

That is not exactly what happened.  The assignment is from uninitialized
memory, which will have an unknown value.  Because the value is unknown,
the assignment can be optimized to avoid a read from memory, and
substitute any convenient value, without perturbing any defined property
of the program. The most convenient value to use here is zero.

This is a different sequence of reasoning than what you suggested, which
is more like this:  The stack values are assumed to be zero, therefore
we can use value propagation to assign the value zero instead of reading
memory with a known value.

I agree that the net effect here is the same, but the reasoning is
important for correct understanding of the program's semantics.
--paulr
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Understand assumptions towards uninitialized variables on stack

Oleg Smolsky via cfe-dev
One particular point is: "In contrast, for -O0 the return value seems un-predictable. " 

Not entirely true - and if you were writing this code to intentionally get a unpredictable value (to seed a random number generator etc) - that's a security problem (has been a bug in crypto libraries etc where they've used similar techniques and eventually the compiler breaks them - or people find ways to compromise the source of the data (by writing specific values to stack variables elsewhere in the program making the values more predictable)

On Wed, Nov 21, 2018, 10:03 AM via cfe-dev <[hidden email] wrote:
> IMHO the inconsistency makes a lot of additional effort and perhaps is
> not preferred, but I guess that's eventually the programmer's
> responsibility to solve that? 

Correct. The program has undefined behavior, and it is the programmer's
responsibility to solve that.  The Undefined Behavior Sanitizer would
reveal the problem immediately.

> uninitialized variables on the stack is assumed to be zero due to
> some reason,

That is not exactly what happened.  The assignment is from uninitialized
memory, which will have an unknown value.  Because the value is unknown,
the assignment can be optimized to avoid a read from memory, and
substitute any convenient value, without perturbing any defined property
of the program. The most convenient value to use here is zero.

This is a different sequence of reasoning than what you suggested, which
is more like this:  The stack values are assumed to be zero, therefore
we can use value propagation to assign the value zero instead of reading
memory with a known value.

I agree that the net effect here is the same, but the reasoning is
important for correct understanding of the program's semantics.
--paulr
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Understand assumptions towards uninitialized variables on stack

Oleg Smolsky via cfe-dev
In reply to this post by Oleg Smolsky via cfe-dev
On 21/11/2018 16:03, via cfe-dev wrote:
>> uninitialized variables on the stack is assumed to be zero due to
>> some reason,
>
> That is not exactly what happened.  The assignment is from uninitialized
> memory, which will have an unknown value.  Because the value is unknown,
> the assignment can be optimized to avoid a read from memory, and
> substitute any convenient value, without perturbing any defined property
> of the program. The most convenient value to use here is zero.

But why bother to come up with a specific value at all, why not drop the
"xorl %eax, %eax" completely and use whatever value is present in %eax?

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Understand assumptions towards uninitialized variables on stack

Oleg Smolsky via cfe-dev
On 11/21/2018 8:58 AM, Stephan Bergmann via cfe-dev wrote:

> On 21/11/2018 16:03, via cfe-dev wrote:
>>> uninitialized variables on the stack is assumed to be zero due to
>>> some reason,
>>
>> That is not exactly what happened.  The assignment is from uninitialized
>> memory, which will have an unknown value.  Because the value is unknown,
>> the assignment can be optimized to avoid a read from memory, and
>> substitute any convenient value, without perturbing any defined property
>> of the program. The most convenient value to use here is zero.
>
> But why bother to come up with a specific value at all, why not drop
> the "xorl %eax, %eax" completely and use whatever value is present in
> %eax?
>
The optimizer isn't specifically trying to catch this case. It just runs
a series of transforms which assume that the behavior is defined, and
some of those transforms constrain the behavior of the function.  This
eventually leads to generating an xor which wasn't necessary for the
original function.  If you're curious about what happens in this
particular case, you can use "-mllvm -print-after-all" to see how
various transforms change the IR.

-Eli

--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev