libc++ std::cout alignment trouble (was: Re: [llvm] r240144 - [SLP] Vectorize for all-constant entries.)

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

libc++ std::cout alignment trouble (was: Re: [llvm] r240144 - [SLP] Vectorize for all-constant entries.)

Ilya Biryukov via cfe-dev
On 19 Jun 2015, at 19:40, Michael Zolotukhin <[hidden email]> wrote:
> Author: mzolotukhin
> Date: Fri Jun 19 12:40:15 2015
> New Revision: 240144
>
> URL: http://llvm.org/viewvc/llvm-project?rev=240144&view=rev
> Log:
> [SLP] Vectorize for all-constant entries.
>
> Differential Revision: http://reviews.llvm.org/D10558

Hi,

tl;dr: after some research, I found that this commit appears to change the behavior of llvm in such a way as to cause trouble for libc++ users on FreeBSD, due to alignment changes.

Recently, I upgraded llvm and clang to 3.7.0 in the FreeBSD base system.  Shortly afterwards, I received reports from users that some of their previously compiled C++ applications (dynamically linked to libc++.so.1) were crashing with SIGBUS errors.  The stack traces always looked like this:

    #0  0x0000000800c95cfd in std::__1::basic_ostream<char, std::__1::char_traits<char> >::basic_ostream (this=<optimized out>, __sb=<optimized out>) at /usr/src/lib/libc++/../../contrib/libc++/include/ostream:280
    #1  std::__1::ios_base::Init::Init (this=<optimized out>) at /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:53
    #2  0x0000000800c96029 in ?? () at /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:44 from /usr/obj
    /usr/src/lib/libc++/libc++.so.1
    #3  0x0000000800cef352 in ?? () from /usr/obj/usr/src/lib/libc++/libc++.so.1
    #4  0x0000000800628c00 in ?? ()
    #5  0x00007fffffffdf18 in ?? ()
    #6  0x00007fffffffdea0 in ?? ()
    #7  0x0000000800c925c6 in _init () from /usr/obj/usr/src/lib/libc++/libc++.so.1
    #8  0x00007fffffffdea0 in ?? ()

E.g. it crashed in ios_base::Init::Init(), at this line:

    ostream* cout_ptr = ::new(cout) ostream(::new(__cout) __stdoutbuf<char>(stdout, &mb_cout));

and the relevant disassembly is:

    0x0000000800c95ccf <+255>:   callq  0x800c96210 <std::__1::__stdoutbuf<char>::__stdoutbuf(__sFILE*, __mbstate_t*)>
    0x0000000800c95cd4 <+260>:   mov    0x27d565(%rip),%rax        # 0x800f13240
    0x0000000800c95cdb <+267>:   lea    0x40(%rax),%rcx
    0x0000000800c95cdf <+271>:   movq   %rcx,%xmm0
    0x0000000800c95ce4 <+276>:   lea    0x18(%rax),%rcx
    0x0000000800c95ce8 <+280>:   movq   %rcx,%xmm1
    0x0000000800c95ced <+285>:   punpcklqdq %xmm0,%xmm1
    0x0000000800c95cf1 <+289>:   movdqa %xmm1,-0x40(%rbp)
    0x0000000800c95cf6 <+294>:   mov    0x27d86b(%rip),%rcx        # 0x800f13568
 => 0x0000000800c95cfd <+301>:   movdqa %xmm1,(%rcx)
    0x0000000800c95d01 <+305>:   mov    (%rax),%r14
    0x0000000800c95d04 <+308>:   lea    (%rcx,%r14,1),%rbx
    0x0000000800c95d08 <+312>:   mov    %rbx,%rdi
    0x0000000800c95d0b <+315>:   mov    %r15,%rsi
    0x0000000800c95d0e <+318>:   callq  0x800c92c4c <_ZNSt3__18ios_base4initEPv@plt>

In this case, %rcx was the address of std::cout, 0x609068 (or some other address that was aligned to 8 bytes, *not* 16 bytes), which causes movdqa to result in a SIGBUS.

These crashing executables were compiled by clang 3.6.1 (or earlier versions), and it turns out that all of them had a global symbol for std::cout, which was aligned to 8 bytes.  For example, in case of the original report, one of the executables showed the following in readelf output:

    Relocation section '.rela.dyn' at offset 0x2070 contains 6 entries:
        Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
    [...]
    0000000000609068  0000003c00000005 R_X86_64_COPY          0000000000609068 _ZNSt3__14coutE + 0

and further down:

    Symbol table '.dynsym' contains 87 entries:
       Num:    Value          Size Type    Bind   Vis      Ndx Name
    [...]
        60: 0000000000609068   160 OBJECT  GLOBAL DEFAULT   25 _ZNSt3__14coutE

(Note: the executable gets a copy of std::cout, because the linker finds a global data object with copy relocations.)

The std::cout symbol is explicitly declared with an alignment in iostream.cpp, as follows:

    _ALIGNAS_TYPE (ostream)  _LIBCPP_FUNC_VIS char cout[sizeof(ostream)];

The alignment of ostream is 8 bytes, therefore the alignment of cout is also 8 bytes.

When libc++ was previously compiled by clang 3.6.1, the assembly generated from ios_base::Init::Init() looked different than above, and std::cout was explicitly aligned to 8 bytes, in the .bss section:

        #DEBUG_VALUE: Init:this <- RDI
        .loc    3 669 5                 # /usr/src/lib/libc++/../../contrib/libc++/include/ios:669:5
        movq    $0, 136(%r15,%rbx)
        .loc    3 670 5                 # /usr/src/lib/libc++/../../contrib/libc++/include/ios:670:5
        movl    $-1, 144(%r15,%rbx)
.Ltmp44:
        .loc    2 53 77                 # /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:53:77
        movq    __stdoutp@GOTPCREL(%rip), %r15
        movq    (%r15), %rsi
        .loc    2 53 45 is_stmt 0       # /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:53:45
        leaq    _ZNSt3__1L6__coutE(%rip), %r12
        leaq    _ZNSt3__1L7mb_coutE(%rip), %rdx
        movq    %r12, %rdi
.Ltmp45:
        callq   _ZNSt3__111__stdoutbufIcEC2EP7__sFILEP11__mbstate_t
        .loc    20 280 1 is_stmt 1      # /usr/src/lib/libc++/../../contrib/libc++/include/ostream:280:1
.Ltmp46:
        movq    _ZTVNSt3__113basic_ostreamIcNS_11char_traitsIcEEEE@GOTPCREL(%rip), %rax
        leaq    24(%rax), %rcx
        movq    %rcx, -64(%rbp)         # 8-byte Spill
        movq    _ZNSt3__14coutE@GOTPCREL(%rip), %rbx
        movq    %rcx, (%rbx)
        leaq    64(%rax), %rcx
        movq    %rcx, -48(%rbp)         # 8-byte Spill
        movq    %rcx, 8(%rbx)
        .loc    20 281 5                # /usr/src/lib/libc++/../../contrib/libc++/include/ostream:281:5
.Ltmp47:
        movq    (%rax), %r14
        leaq    (%rbx,%r14), %rdi
        .loc    3 668 15                # /usr/src/lib/libc++/../../contrib/libc++/include/ios:668:15
.Ltmp48:
.Ltmp6:
        movq    %r12, %rsi
        callq   _ZNSt3__18ios_base4initEPv@PLT
[... skip to .bss ...]
        .type   _ZNSt3__14coutE,@object # @_ZNSt3__14coutE
        .globl  _ZNSt3__14coutE
        .align  8
_ZNSt3__14coutE:
        .zero   160
        .size   _ZNSt3__14coutE, 160

In contrast, when you compile the same file with clang 3.7.0, the assembly becomes similar to the crashing case:

        #DEBUG_VALUE: Init:this <- RDI
        .loc    3 669 12                # /usr/src/lib/libc++/../../contrib/libc++/include/ios:669:12
        movq    $0, 136(%rbx)
        .loc    3 670 13                # /usr/src/lib/libc++/../../contrib/libc++/include/ios:670:13
        movl    $-1, 144(%rbx)
.Ltmp44:
        .loc    2 53 77                 # /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:53:77
        movq    __stdoutp@GOTPCREL(%rip), %r12
        movq    (%r12), %rsi
        .loc    2 53 59 is_stmt 0       # /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:53:59
        leaq    _ZNSt3__1L6__coutE(%rip), %r15
        leaq    _ZNSt3__1L7mb_coutE(%rip), %rdx
        movq    %r15, %rdi
.Ltmp45:
        callq   _ZNSt3__111__stdoutbufIcEC2EP7__sFILEP11__mbstate_t
        .loc    20 280 1 is_stmt 1      # /usr/src/lib/libc++/../../contrib/libc++/include/ostream:280:1
.Ltmp46:
        movq    _ZTVNSt3__113basic_ostreamIcNS_11char_traitsIcEEEE@GOTPCREL(%rip), %rax
        leaq    64(%rax), %rcx
        movd    %rcx, %xmm0
        leaq    24(%rax), %rcx
        movd    %rcx, %xmm1
        punpcklqdq      %xmm0, %xmm1    # xmm1 = xmm1[0],xmm0[0]
        movdqa  %xmm1, -64(%rbp)        # 16-byte Spill
        movq    _ZNSt3__14coutE@GOTPCREL(%rip), %rcx
        movdqa  %xmm1, (%rcx)
.Ltmp47:
        .loc    20 281 5                # /usr/src/lib/libc++/../../contrib/libc++/include/ostream:281:5
        movq    (%rax), %r14
.Ltmp48:
        .loc    20 281 5 is_stmt 0      # /usr/src/lib/libc++/../../contrib/libc++/include/ostream:281:5
        leaq    (%rcx,%r14), %rbx
        .loc    3 668 5 is_stmt 1       # /usr/src/lib/libc++/../../contrib/libc++/include/ios:668:5
.Ltmp49:
.Ltmp6:
        movq    %rbx, %rdi
        movq    %r15, %rsi
        callq   _ZNSt3__18ios_base4initEPv@PLT

and the definition of of std::cout is now with 16 bytes alignment instead:

        .type   _ZNSt3__14coutE,@object # @_ZNSt3__14coutE
        .globl  _ZNSt3__14coutE
        .align  16
_ZNSt3__14coutE:
        .zero   160
        .size   _ZNSt3__14coutE, 160

Bisecting has shown that r240144 is the commit where this behavior changed, i.e. before this commit, std::cout is aligned at 8 bytes, and the instructions to access it take this into account.  After this commit, it becomes aligned at 16 bytes, and it is then accessed using SSE (movdqa, etc).  (Note that it is certainly possible that r240144 is just exposing a deeper issue, instead of being the root cause!)

So unfortunately, this new alignment makes a dynamic libc++ library, compiled by clang after this commit, incompatible with previously built applications.  This is a big problem for FreeBSD, since we rather value backwards compatibility. :-)

I have had several discussions with people who indicate that 16 byte alignment is an x86_64 ABI requirement, at least for "large enough" objects.  This may very well be true, but on the other hand, if the program authors explicitly specify that their objects must be aligned to X bytes, where X < 16, then the compiler should obey this, right?  And changing existing behavior is incompatible with previously compiled programs.

As a temporary workaround, I have now reverted r240144 in FreeBSD, which restores the previous behavior, and aligns std::cout to 8 bytes again.  But I would like to bring this to your attention, in the hope that we can find out what the real fix should be.

And to finish this way too long mail, I am attaching a minimized test case, which shows the behavior, and which should be appropriate to attach to a PR, if that is needed later on.

This test case should be compiled with -O2, to see the effects.  Clang 3.6.x will result in the following IR for std::cout:

    @cout = global [24 x i8] zeroinitializer, align 8

while clang 3.7.0 will result in:

    @cout = global [24 x i8] zeroinitializer, align 16

-Dimitry

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

cout-align.cpp (417 bytes) Download Attachment
signature.asc (201 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: libc++ std::cout alignment trouble (was: Re: [llvm] r240144 - [SLP] Vectorize for all-constant entries.)

Ilya Biryukov via cfe-dev
Hi Dimitry,

Thanks for the report! I took a quick look at the test and found the following:

1) Before SLP we have this:

@cout = global [24 x i8] zeroinitializer, align 8
define void @_Z7do_initv() #0 {
entry:
  store i32 (...)** bitcast (i8** getelementptr inbounds ([10 x i8*], [10 x i8*]* @_ZTV1BIciE, i64 0, i64 3) to i32 (...)**), i32 (...)*** bitcast ([24 x i8]* @cout to i32 (...)***), align 8, !tbaa !2
  store i32 (...)** bitcast (i8** getelementptr inbounds ([10 x i8*], [10 x i8*]* @_ZTV1BIciE, i64 0, i64 8) to i32 (...)**), i32 (...)*** bitcast (i8* getelementptr inbounds ([24 x i8], [24 x i8]* @cout, i64 0, i64 8) to i32 (...)***), align 8, !tbaa !2
  ret void
}

2) After SLP we get this:

@cout = global [24 x i8] zeroinitializer, align 8
define void @_Z7do_initv() #2 {
entry:
  store <2 x i32 (...)**> <i32 (...)** bitcast (i8** getelementptr inbounds ([10 x i8*], [10 x i8*]* @_ZTV1BIciE, i64 0, i64 3) to i32 (...)**), i32 (...)** bitcast (i8** getelementptr inbounds ([10 x i8*], [10 x i8*]* @_ZTV1BIciE, i64 0, i64 8) to i32 (...)**)>, <2 x i32 (...)**>* bitcast ([24 x i8]* @cout to <2 x i32 (...)**>*), align 8, !tbaa !2
  ret void
}

3) And then, after instcobmine, we get this:

@cout = global [24 x i8] zeroinitializer, align 16
; Function Attrs: nounwind ssp uwtable
define void @_Z7do_initv() #2 {
entry:
  store <2 x i32 (...)**> <i32 (...)** bitcast (i8** getelementptr inbounds ([10 x i8*], [10 x i8*]* @_ZTV1BIciE, i64 0, i64 3) to i32 (...)**), i32 (...)** bitcast (i8** getelementptr inbounds ([10 x i8*], [10 x i8*]* @_ZTV1BIciE, i64 0, i64 8) to i32 (...)**)>, <2 x i32 (...)**>* bitcast ([24 x i8]* @cout to <2 x i32 (...)**>*), align 16, !tbaa !2
  ret void
}

I’ll take a look at instombine to understand why it replaces "align 8" with "align 16” in this case. Maybe it’s just a bug, or maybe we shouldn’t vectorize this case for some reason.

Thanks,
Michael


On Oct 10, 2015, at 11:53 AM, Dimitry Andric <[hidden email]> wrote:

On 19 Jun 2015, at 19:40, Michael Zolotukhin <[hidden email]> wrote:
Author: mzolotukhin
Date: Fri Jun 19 12:40:15 2015
New Revision: 240144

URL: http://llvm.org/viewvc/llvm-project?rev=240144&view=rev
Log:
[SLP] Vectorize for all-constant entries.

Differential Revision: http://reviews.llvm.org/D10558

Hi,

tl;dr: after some research, I found that this commit appears to change the behavior of llvm in such a way as to cause trouble for libc++ users on FreeBSD, due to alignment changes.

Recently, I upgraded llvm and clang to 3.7.0 in the FreeBSD base system.  Shortly afterwards, I received reports from users that some of their previously compiled C++ applications (dynamically linked to libc++.so.1) were crashing with SIGBUS errors.  The stack traces always looked like this:

   #0  0x0000000800c95cfd in std::__1::basic_ostream<char, std::__1::char_traits<char> >::basic_ostream (this=<optimized out>, __sb=<optimized out>) at /usr/src/lib/libc++/../../contrib/libc++/include/ostream:280
   #1  std::__1::ios_base::Init::Init (this=<optimized out>) at /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:53
   #2  0x0000000800c96029 in ?? () at /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:44 from /usr/obj
   /usr/src/lib/libc++/libc++.so.1
   #3  0x0000000800cef352 in ?? () from /usr/obj/usr/src/lib/libc++/libc++.so.1
   #4  0x0000000800628c00 in ?? ()
   #5  0x00007fffffffdf18 in ?? ()
   #6  0x00007fffffffdea0 in ?? ()
   #7  0x0000000800c925c6 in _init () from /usr/obj/usr/src/lib/libc++/libc++.so.1
   #8  0x00007fffffffdea0 in ?? ()

E.g. it crashed in ios_base::Init::Init(), at this line:

   ostream* cout_ptr = ::new(cout) ostream(::new(__cout) __stdoutbuf<char>(stdout, &mb_cout));

and the relevant disassembly is:

   0x0000000800c95ccf <+255>:   callq  0x800c96210 <std::__1::__stdoutbuf<char>::__stdoutbuf(__sFILE*, __mbstate_t*)>
   0x0000000800c95cd4 <+260>:   mov    0x27d565(%rip),%rax        # 0x800f13240
   0x0000000800c95cdb <+267>:   lea    0x40(%rax),%rcx
   0x0000000800c95cdf <+271>:   movq   %rcx,%xmm0
   0x0000000800c95ce4 <+276>:   lea    0x18(%rax),%rcx
   0x0000000800c95ce8 <+280>:   movq   %rcx,%xmm1
   0x0000000800c95ced <+285>:   punpcklqdq %xmm0,%xmm1
   0x0000000800c95cf1 <+289>:   movdqa %xmm1,-0x40(%rbp)
   0x0000000800c95cf6 <+294>:   mov    0x27d86b(%rip),%rcx        # 0x800f13568
=> 0x0000000800c95cfd <+301>:   movdqa %xmm1,(%rcx)
   0x0000000800c95d01 <+305>:   mov    (%rax),%r14
   0x0000000800c95d04 <+308>:   lea    (%rcx,%r14,1),%rbx
   0x0000000800c95d08 <+312>:   mov    %rbx,%rdi
   0x0000000800c95d0b <+315>:   mov    %r15,%rsi
   0x0000000800c95d0e <+318>:   callq  0x800c92c4c <_ZNSt3__18ios_base4initEPv@plt>

In this case, %rcx was the address of std::cout, 0x609068 (or some other address that was aligned to 8 bytes, *not* 16 bytes), which causes movdqa to result in a SIGBUS.

These crashing executables were compiled by clang 3.6.1 (or earlier versions), and it turns out that all of them had a global symbol for std::cout, which was aligned to 8 bytes.  For example, in case of the original report, one of the executables showed the following in readelf output:

   Relocation section '.rela.dyn' at offset 0x2070 contains 6 entries:
       Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
   [...]
   0000000000609068  0000003c00000005 R_X86_64_COPY          0000000000609068 _ZNSt3__14coutE + 0

and further down:

   Symbol table '.dynsym' contains 87 entries:
      Num:    Value          Size Type    Bind   Vis      Ndx Name
   [...]
       60: 0000000000609068   160 OBJECT  GLOBAL DEFAULT   25 _ZNSt3__14coutE

(Note: the executable gets a copy of std::cout, because the linker finds a global data object with copy relocations.)

The std::cout symbol is explicitly declared with an alignment in iostream.cpp, as follows:

   _ALIGNAS_TYPE (ostream)  _LIBCPP_FUNC_VIS char cout[sizeof(ostream)];

The alignment of ostream is 8 bytes, therefore the alignment of cout is also 8 bytes.

When libc++ was previously compiled by clang 3.6.1, the assembly generated from ios_base::Init::Init() looked different than above, and std::cout was explicitly aligned to 8 bytes, in the .bss section:

       #DEBUG_VALUE: Init:this <- RDI
       .loc    3 669 5                 # /usr/src/lib/libc++/../../contrib/libc++/include/ios:669:5
       movq    $0, 136(%r15,%rbx)
       .loc    3 670 5                 # /usr/src/lib/libc++/../../contrib/libc++/include/ios:670:5
       movl    $-1, 144(%r15,%rbx)
.Ltmp44:
       .loc    2 53 77                 # /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:53:77
       movq    __stdoutp@GOTPCREL(%rip), %r15
       movq    (%r15), %rsi
       .loc    2 53 45 is_stmt 0       # /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:53:45
       leaq    _ZNSt3__1L6__coutE(%rip), %r12
       leaq    _ZNSt3__1L7mb_coutE(%rip), %rdx
       movq    %r12, %rdi
.Ltmp45:
       callq   _ZNSt3__111__stdoutbufIcEC2EP7__sFILEP11__mbstate_t
       .loc    20 280 1 is_stmt 1      # /usr/src/lib/libc++/../../contrib/libc++/include/ostream:280:1
.Ltmp46:
       movq    _ZTVNSt3__113basic_ostreamIcNS_11char_traitsIcEEEE@GOTPCREL(%rip), %rax
       leaq    24(%rax), %rcx
       movq    %rcx, -64(%rbp)         # 8-byte Spill
       movq    _ZNSt3__14coutE@GOTPCREL(%rip), %rbx
       movq    %rcx, (%rbx)
       leaq    64(%rax), %rcx
       movq    %rcx, -48(%rbp)         # 8-byte Spill
       movq    %rcx, 8(%rbx)
       .loc    20 281 5                # /usr/src/lib/libc++/../../contrib/libc++/include/ostream:281:5
.Ltmp47:
       movq    (%rax), %r14
       leaq    (%rbx,%r14), %rdi
       .loc    3 668 15                # /usr/src/lib/libc++/../../contrib/libc++/include/ios:668:15
.Ltmp48:
.Ltmp6:
       movq    %r12, %rsi
       callq   _ZNSt3__18ios_base4initEPv@PLT
[... skip to .bss ...]
       .type   _ZNSt3__14coutE,@object # @_ZNSt3__14coutE
       .globl  _ZNSt3__14coutE
       .align  8
_ZNSt3__14coutE:
       .zero   160
       .size   _ZNSt3__14coutE, 160

In contrast, when you compile the same file with clang 3.7.0, the assembly becomes similar to the crashing case:

       #DEBUG_VALUE: Init:this <- RDI
       .loc    3 669 12                # /usr/src/lib/libc++/../../contrib/libc++/include/ios:669:12
       movq    $0, 136(%rbx)
       .loc    3 670 13                # /usr/src/lib/libc++/../../contrib/libc++/include/ios:670:13
       movl    $-1, 144(%rbx)
.Ltmp44:
       .loc    2 53 77                 # /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:53:77
       movq    __stdoutp@GOTPCREL(%rip), %r12
       movq    (%r12), %rsi
       .loc    2 53 59 is_stmt 0       # /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:53:59
       leaq    _ZNSt3__1L6__coutE(%rip), %r15
       leaq    _ZNSt3__1L7mb_coutE(%rip), %rdx
       movq    %r15, %rdi
.Ltmp45:
       callq   _ZNSt3__111__stdoutbufIcEC2EP7__sFILEP11__mbstate_t
       .loc    20 280 1 is_stmt 1      # /usr/src/lib/libc++/../../contrib/libc++/include/ostream:280:1
.Ltmp46:
       movq    _ZTVNSt3__113basic_ostreamIcNS_11char_traitsIcEEEE@GOTPCREL(%rip), %rax
       leaq    64(%rax), %rcx
       movd    %rcx, %xmm0
       leaq    24(%rax), %rcx
       movd    %rcx, %xmm1
       punpcklqdq      %xmm0, %xmm1    # xmm1 = xmm1[0],xmm0[0]
       movdqa  %xmm1, -64(%rbp)        # 16-byte Spill
       movq    _ZNSt3__14coutE@GOTPCREL(%rip), %rcx
       movdqa  %xmm1, (%rcx)
.Ltmp47:
       .loc    20 281 5                # /usr/src/lib/libc++/../../contrib/libc++/include/ostream:281:5
       movq    (%rax), %r14
.Ltmp48:
       .loc    20 281 5 is_stmt 0      # /usr/src/lib/libc++/../../contrib/libc++/include/ostream:281:5
       leaq    (%rcx,%r14), %rbx
       .loc    3 668 5 is_stmt 1       # /usr/src/lib/libc++/../../contrib/libc++/include/ios:668:5
.Ltmp49:
.Ltmp6:
       movq    %rbx, %rdi
       movq    %r15, %rsi
       callq   _ZNSt3__18ios_base4initEPv@PLT

and the definition of of std::cout is now with 16 bytes alignment instead:

       .type   _ZNSt3__14coutE,@object # @_ZNSt3__14coutE
       .globl  _ZNSt3__14coutE
       .align  16
_ZNSt3__14coutE:
       .zero   160
       .size   _ZNSt3__14coutE, 160

Bisecting has shown that r240144 is the commit where this behavior changed, i.e. before this commit, std::cout is aligned at 8 bytes, and the instructions to access it take this into account.  After this commit, it becomes aligned at 16 bytes, and it is then accessed using SSE (movdqa, etc).  (Note that it is certainly possible that r240144 is just exposing a deeper issue, instead of being the root cause!)

So unfortunately, this new alignment makes a dynamic libc++ library, compiled by clang after this commit, incompatible with previously built applications.  This is a big problem for FreeBSD, since we rather value backwards compatibility. :-)

I have had several discussions with people who indicate that 16 byte alignment is an x86_64 ABI requirement, at least for "large enough" objects.  This may very well be true, but on the other hand, if the program authors explicitly specify that their objects must be aligned to X bytes, where X < 16, then the compiler should obey this, right?  And changing existing behavior is incompatible with previously compiled programs.

As a temporary workaround, I have now reverted r240144 in FreeBSD, which restores the previous behavior, and aligns std::cout to 8 bytes again.  But I would like to bring this to your attention, in the hope that we can find out what the real fix should be.

And to finish this way too long mail, I am attaching a minimized test case, which shows the behavior, and which should be appropriate to attach to a PR, if that is needed later on.

This test case should be compiled with -O2, to see the effects.  Clang 3.6.x will result in the following IR for std::cout:

   @cout = global [24 x i8] zeroinitializer, align 8

while clang 3.7.0 will result in:

   @cout = global [24 x i8] zeroinitializer, align 16

-Dimitry
<cout-align.cpp>


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: libc++ std::cout alignment trouble (was: Re: [llvm] r240144 - [SLP] Vectorize for all-constant entries.)

Ilya Biryukov via cfe-dev
Hi,

This is a smaller reproducer of the issue:

$ cat test.ll
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.10.0"

@cout = global [24 x i8] zeroinitializer, align 8

define void @foo() {
entry:
  store i64 100, i64* bitcast ([24 x i8]* @cout to i64*), align 8
  store i64 200, i64* bitcast (i8* getelementptr inbounds ([24 x i8], [24 x i8]* @cout, i64 0, i64 8) to i64*), align 8
  ret void
}

$ opt -slp-vectorizer < test.ll -S
; ModuleID = '<stdin>'
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.10.0"

@cout = global [24 x i8] zeroinitializer, align 8

define void @foo() {
entry:
  store <2 x i64> <i64 100, i64 200>, <2 x i64>* bitcast ([24 x i8]* @cout to <2 x i64>*), align 8
  ret void
}

$ opt -slp-vectorizer < test.ll  | opt -instcombine -S
; ModuleID = '<stdin>'
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.10.0"

@cout = global [24 x i8] zeroinitializer, align 16

define void @foo() {
entry:
  store <2 x i64> <i64 100, i64 200>, <2 x i64>* bitcast ([24 x i8]* @cout to <2 x i64>*), align 16
  ret void
}

It looks like instcombine intentionally rewrites the alignment. It happens in getOrEnforceKnownAlignment called by InstCombiner::visitStoreInst. Then compiler performs several checks and decides that it can bump up the alignment, which is presumable illegal in this case.

Reid,
it looks like you relatively recently touched code in enforceKnownAlignment. Could you please comment on whether it’s correct behavior or not?

Thanks,
Michael

On Oct 10, 2015, at 12:24 PM, Michael Zolotukhin via cfe-dev <[hidden email]> wrote:

Hi Dimitry,

Thanks for the report! I took a quick look at the test and found the following:

1) Before SLP we have this:

@cout = global [24 x i8] zeroinitializer, align 8
define void @_Z7do_initv() #0 {
entry:
  store i32 (...)** bitcast (i8** getelementptr inbounds ([10 x i8*], [10 x i8*]* @_ZTV1BIciE, i64 0, i64 3) to i32 (...)**), i32 (...)*** bitcast ([24 x i8]* @cout to i32 (...)***), align 8, !tbaa !2
  store i32 (...)** bitcast (i8** getelementptr inbounds ([10 x i8*], [10 x i8*]* @_ZTV1BIciE, i64 0, i64 8) to i32 (...)**), i32 (...)*** bitcast (i8* getelementptr inbounds ([24 x i8], [24 x i8]* @cout, i64 0, i64 8) to i32 (...)***), align 8, !tbaa !2
  ret void
}

2) After SLP we get this:

@cout = global [24 x i8] zeroinitializer, align 8
define void @_Z7do_initv() #2 {
entry:
  store <2 x i32 (...)**> <i32 (...)** bitcast (i8** getelementptr inbounds ([10 x i8*], [10 x i8*]* @_ZTV1BIciE, i64 0, i64 3) to i32 (...)**), i32 (...)** bitcast (i8** getelementptr inbounds ([10 x i8*], [10 x i8*]* @_ZTV1BIciE, i64 0, i64 8) to i32 (...)**)>, <2 x i32 (...)**>* bitcast ([24 x i8]* @cout to <2 x i32 (...)**>*), align 8, !tbaa !2
  ret void
}

3) And then, after instcobmine, we get this:

@cout = global [24 x i8] zeroinitializer, align 16
; Function Attrs: nounwind ssp uwtable
define void @_Z7do_initv() #2 {
entry:
  store <2 x i32 (...)**> <i32 (...)** bitcast (i8** getelementptr inbounds ([10 x i8*], [10 x i8*]* @_ZTV1BIciE, i64 0, i64 3) to i32 (...)**), i32 (...)** bitcast (i8** getelementptr inbounds ([10 x i8*], [10 x i8*]* @_ZTV1BIciE, i64 0, i64 8) to i32 (...)**)>, <2 x i32 (...)**>* bitcast ([24 x i8]* @cout to <2 x i32 (...)**>*), align 16, !tbaa !2
  ret void
}

I’ll take a look at instombine to understand why it replaces "align 8" with "align 16” in this case. Maybe it’s just a bug, or maybe we shouldn’t vectorize this case for some reason.

Thanks,
Michael


On Oct 10, 2015, at 11:53 AM, Dimitry Andric <[hidden email]> wrote:

On 19 Jun 2015, at 19:40, Michael Zolotukhin <[hidden email]> wrote:
Author: mzolotukhin
Date: Fri Jun 19 12:40:15 2015
New Revision: 240144

URL: http://llvm.org/viewvc/llvm-project?rev=240144&view=rev
Log:
[SLP] Vectorize for all-constant entries.

Differential Revision: http://reviews.llvm.org/D10558

Hi,

tl;dr: after some research, I found that this commit appears to change the behavior of llvm in such a way as to cause trouble for libc++ users on FreeBSD, due to alignment changes.

Recently, I upgraded llvm and clang to 3.7.0 in the FreeBSD base system.  Shortly afterwards, I received reports from users that some of their previously compiled C++ applications (dynamically linked to libc++.so.1) were crashing with SIGBUS errors.  The stack traces always looked like this:

   #0  0x0000000800c95cfd in std::__1::basic_ostream<char, std::__1::char_traits<char> >::basic_ostream (this=<optimized out>, __sb=<optimized out>) at /usr/src/lib/libc++/../../contrib/libc++/include/ostream:280
   #1  std::__1::ios_base::Init::Init (this=<optimized out>) at /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:53
   #2  0x0000000800c96029 in ?? () at /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:44 from /usr/obj
   /usr/src/lib/libc++/libc++.so.1
   #3  0x0000000800cef352 in ?? () from /usr/obj/usr/src/lib/libc++/libc++.so.1
   #4  0x0000000800628c00 in ?? ()
   #5  0x00007fffffffdf18 in ?? ()
   #6  0x00007fffffffdea0 in ?? ()
   #7  0x0000000800c925c6 in _init () from /usr/obj/usr/src/lib/libc++/libc++.so.1
   #8  0x00007fffffffdea0 in ?? ()

E.g. it crashed in ios_base::Init::Init(), at this line:

   ostream* cout_ptr = ::new(cout) ostream(::new(__cout) __stdoutbuf<char>(stdout, &mb_cout));

and the relevant disassembly is:

   0x0000000800c95ccf <+255>:   callq  0x800c96210 <std::__1::__stdoutbuf<char>::__stdoutbuf(__sFILE*, __mbstate_t*)>
   0x0000000800c95cd4 <+260>:   mov    0x27d565(%rip),%rax        # 0x800f13240
   0x0000000800c95cdb <+267>:   lea    0x40(%rax),%rcx
   0x0000000800c95cdf <+271>:   movq   %rcx,%xmm0
   0x0000000800c95ce4 <+276>:   lea    0x18(%rax),%rcx
   0x0000000800c95ce8 <+280>:   movq   %rcx,%xmm1
   0x0000000800c95ced <+285>:   punpcklqdq %xmm0,%xmm1
   0x0000000800c95cf1 <+289>:   movdqa %xmm1,-0x40(%rbp)
   0x0000000800c95cf6 <+294>:   mov    0x27d86b(%rip),%rcx        # 0x800f13568
=> 0x0000000800c95cfd <+301>:   movdqa %xmm1,(%rcx)
   0x0000000800c95d01 <+305>:   mov    (%rax),%r14
   0x0000000800c95d04 <+308>:   lea    (%rcx,%r14,1),%rbx
   0x0000000800c95d08 <+312>:   mov    %rbx,%rdi
   0x0000000800c95d0b <+315>:   mov    %r15,%rsi
   0x0000000800c95d0e <+318>:   callq  0x800c92c4c <_ZNSt3__18ios_base4initEPv@plt>

In this case, %rcx was the address of std::cout, 0x609068 (or some other address that was aligned to 8 bytes, *not* 16 bytes), which causes movdqa to result in a SIGBUS.

These crashing executables were compiled by clang 3.6.1 (or earlier versions), and it turns out that all of them had a global symbol for std::cout, which was aligned to 8 bytes.  For example, in case of the original report, one of the executables showed the following in readelf output:

   Relocation section '.rela.dyn' at offset 0x2070 contains 6 entries:
       Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
   [...]
   0000000000609068  0000003c00000005 R_X86_64_COPY          0000000000609068 _ZNSt3__14coutE + 0

and further down:

   Symbol table '.dynsym' contains 87 entries:
      Num:    Value          Size Type    Bind   Vis      Ndx Name
   [...]
       60: 0000000000609068   160 OBJECT  GLOBAL DEFAULT   25 _ZNSt3__14coutE

(Note: the executable gets a copy of std::cout, because the linker finds a global data object with copy relocations.)

The std::cout symbol is explicitly declared with an alignment in iostream.cpp, as follows:

   _ALIGNAS_TYPE (ostream)  _LIBCPP_FUNC_VIS char cout[sizeof(ostream)];

The alignment of ostream is 8 bytes, therefore the alignment of cout is also 8 bytes.

When libc++ was previously compiled by clang 3.6.1, the assembly generated from ios_base::Init::Init() looked different than above, and std::cout was explicitly aligned to 8 bytes, in the .bss section:

       #DEBUG_VALUE: Init:this <- RDI
       .loc    3 669 5                 # /usr/src/lib/libc++/../../contrib/libc++/include/ios:669:5
       movq    $0, 136(%r15,%rbx)
       .loc    3 670 5                 # /usr/src/lib/libc++/../../contrib/libc++/include/ios:670:5
       movl    $-1, 144(%r15,%rbx)
.Ltmp44:
       .loc    2 53 77                 # /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:53:77
       movq    __stdoutp@GOTPCREL(%rip), %r15
       movq    (%r15), %rsi
       .loc    2 53 45 is_stmt 0       # /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:53:45
       leaq    _ZNSt3__1L6__coutE(%rip), %r12
       leaq    _ZNSt3__1L7mb_coutE(%rip), %rdx
       movq    %r12, %rdi
.Ltmp45:
       callq   _ZNSt3__111__stdoutbufIcEC2EP7__sFILEP11__mbstate_t
       .loc    20 280 1 is_stmt 1      # /usr/src/lib/libc++/../../contrib/libc++/include/ostream:280:1
.Ltmp46:
       movq    _ZTVNSt3__113basic_ostreamIcNS_11char_traitsIcEEEE@GOTPCREL(%rip), %rax
       leaq    24(%rax), %rcx
       movq    %rcx, -64(%rbp)         # 8-byte Spill
       movq    _ZNSt3__14coutE@GOTPCREL(%rip), %rbx
       movq    %rcx, (%rbx)
       leaq    64(%rax), %rcx
       movq    %rcx, -48(%rbp)         # 8-byte Spill
       movq    %rcx, 8(%rbx)
       .loc    20 281 5                # /usr/src/lib/libc++/../../contrib/libc++/include/ostream:281:5
.Ltmp47:
       movq    (%rax), %r14
       leaq    (%rbx,%r14), %rdi
       .loc    3 668 15                # /usr/src/lib/libc++/../../contrib/libc++/include/ios:668:15
.Ltmp48:
.Ltmp6:
       movq    %r12, %rsi
       callq   _ZNSt3__18ios_base4initEPv@PLT
[... skip to .bss ...]
       .type   _ZNSt3__14coutE,@object # @_ZNSt3__14coutE
       .globl  _ZNSt3__14coutE
       .align  8
_ZNSt3__14coutE:
       .zero   160
       .size   _ZNSt3__14coutE, 160

In contrast, when you compile the same file with clang 3.7.0, the assembly becomes similar to the crashing case:

       #DEBUG_VALUE: Init:this <- RDI
       .loc    3 669 12                # /usr/src/lib/libc++/../../contrib/libc++/include/ios:669:12
       movq    $0, 136(%rbx)
       .loc    3 670 13                # /usr/src/lib/libc++/../../contrib/libc++/include/ios:670:13
       movl    $-1, 144(%rbx)
.Ltmp44:
       .loc    2 53 77                 # /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:53:77
       movq    __stdoutp@GOTPCREL(%rip), %r12
       movq    (%r12), %rsi
       .loc    2 53 59 is_stmt 0       # /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:53:59
       leaq    _ZNSt3__1L6__coutE(%rip), %r15
       leaq    _ZNSt3__1L7mb_coutE(%rip), %rdx
       movq    %r15, %rdi
.Ltmp45:
       callq   _ZNSt3__111__stdoutbufIcEC2EP7__sFILEP11__mbstate_t
       .loc    20 280 1 is_stmt 1      # /usr/src/lib/libc++/../../contrib/libc++/include/ostream:280:1
.Ltmp46:
       movq    _ZTVNSt3__113basic_ostreamIcNS_11char_traitsIcEEEE@GOTPCREL(%rip), %rax
       leaq    64(%rax), %rcx
       movd    %rcx, %xmm0
       leaq    24(%rax), %rcx
       movd    %rcx, %xmm1
       punpcklqdq      %xmm0, %xmm1    # xmm1 = xmm1[0],xmm0[0]
       movdqa  %xmm1, -64(%rbp)        # 16-byte Spill
       movq    _ZNSt3__14coutE@GOTPCREL(%rip), %rcx
       movdqa  %xmm1, (%rcx)
.Ltmp47:
       .loc    20 281 5                # /usr/src/lib/libc++/../../contrib/libc++/include/ostream:281:5
       movq    (%rax), %r14
.Ltmp48:
       .loc    20 281 5 is_stmt 0      # /usr/src/lib/libc++/../../contrib/libc++/include/ostream:281:5
       leaq    (%rcx,%r14), %rbx
       .loc    3 668 5 is_stmt 1       # /usr/src/lib/libc++/../../contrib/libc++/include/ios:668:5
.Ltmp49:
.Ltmp6:
       movq    %rbx, %rdi
       movq    %r15, %rsi
       callq   _ZNSt3__18ios_base4initEPv@PLT

and the definition of of std::cout is now with 16 bytes alignment instead:

       .type   _ZNSt3__14coutE,@object # @_ZNSt3__14coutE
       .globl  _ZNSt3__14coutE
       .align  16
_ZNSt3__14coutE:
       .zero   160
       .size   _ZNSt3__14coutE, 160

Bisecting has shown that r240144 is the commit where this behavior changed, i.e. before this commit, std::cout is aligned at 8 bytes, and the instructions to access it take this into account.  After this commit, it becomes aligned at 16 bytes, and it is then accessed using SSE (movdqa, etc).  (Note that it is certainly possible that r240144 is just exposing a deeper issue, instead of being the root cause!)

So unfortunately, this new alignment makes a dynamic libc++ library, compiled by clang after this commit, incompatible with previously built applications.  This is a big problem for FreeBSD, since we rather value backwards compatibility. :-)

I have had several discussions with people who indicate that 16 byte alignment is an x86_64 ABI requirement, at least for "large enough" objects.  This may very well be true, but on the other hand, if the program authors explicitly specify that their objects must be aligned to X bytes, where X < 16, then the compiler should obey this, right?  And changing existing behavior is incompatible with previously compiled programs.

As a temporary workaround, I have now reverted r240144 in FreeBSD, which restores the previous behavior, and aligns std::cout to 8 bytes again.  But I would like to bring this to your attention, in the hope that we can find out what the real fix should be.

And to finish this way too long mail, I am attaching a minimized test case, which shows the behavior, and which should be appropriate to attach to a PR, if that is needed later on.

This test case should be compiled with -O2, to see the effects.  Clang 3.6.x will result in the following IR for std::cout:

   @cout = global [24 x i8] zeroinitializer, align 8

while clang 3.7.0 will result in:

   @cout = global [24 x i8] zeroinitializer, align 16

-Dimitry
<cout-align.cpp>

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: libc++ std::cout alignment trouble (was: Re: [llvm] r240144 - [SLP] Vectorize for all-constant entries.)

Ilya Biryukov via cfe-dev
On Sat, Oct 10, 2015 at 02:17:38PM -0700, Michael Zolotukhin via cfe-dev wrote:
> it looks like you relatively recently touched code in
> enforceKnownAlignment. Could you please comment on whether it’s correct
> behavior or not?

This looks entirely correct to me as long as PrefAlign <= 16 for AMD64.

Joerg
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: libc++ std::cout alignment trouble (was: Re: [llvm] r240144 - [SLP] Vectorize for all-constant entries.)

Ilya Biryukov via cfe-dev
On 10 October 2015 at 14:22, Joerg Sonnenberger via cfe-dev
<[hidden email]> wrote:
> On Sat, Oct 10, 2015 at 02:17:38PM -0700, Michael Zolotukhin via cfe-dev wrote:
>> it looks like you relatively recently touched code in
>> enforceKnownAlignment. Could you please comment on whether it’s correct
>> behavior or not?
>
> This looks entirely correct to me as long as PrefAlign <= 16 for AMD64.

I'm not so sure, especially when we get to other targets that don't
have the special AMD64 rule about objects > 16 bytes in size getting
that alignment.

It seems that the way ELF's R_..._COPY relocations work make an
externally visible object's alignment part of a .so's ABI whether we
like it or not. I just don't see a way around that which allows us to
increase it at the moment.

Tim.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: libc++ std::cout alignment trouble (was: Re: [llvm] r240144 - [SLP] Vectorize for all-constant entries.)

Ilya Biryukov via cfe-dev
On Sat, Oct 10, 2015 at 04:52:57PM -0700, Tim Northover wrote:

> On 10 October 2015 at 14:22, Joerg Sonnenberger via cfe-dev
> <[hidden email]> wrote:
> > On Sat, Oct 10, 2015 at 02:17:38PM -0700, Michael Zolotukhin via cfe-dev wrote:
> >> it looks like you relatively recently touched code in
> >> enforceKnownAlignment. Could you please comment on whether it’s correct
> >> behavior or not?
> >
> > This looks entirely correct to me as long as PrefAlign <= 16 for AMD64.
>
> I'm not so sure, especially when we get to other targets that don't
> have the special AMD64 rule about objects > 16 bytes in size getting
> that alignment.

So there are two slightly different things here. I think we should be
able to raise the alignment of global objects, independently of what the
platform ABI says. Now on the specific case of AMD64, there is a
separate ABI rule which makes this a correct transform.

> It seems that the way ELF's R_..._COPY relocations work make an
> externally visible object's alignment part of a .so's ABI whether we
> like it or not. I just don't see a way around that which allows us to
> increase it at the moment.

The strange part is that noone seems to be observing this outside
FreeBSD, which has a pretty ancient ld. What I don't know is where ld
*is* picking up the 64bit alignment from.

Joerg
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: libc++ std::cout alignment trouble (was: Re: [llvm] r240144 - [SLP] Vectorize for all-constant entries.)

Ilya Biryukov via cfe-dev
> So there are two slightly different things here. I think we should be
> able to raise the alignment of global objects, independently of what the
> platform ABI says.

I'd like that to be the case, but it clearly is visible at shared
object boundaries so I don't think it's something we can change at
will.

> Now on the specific case of AMD64, there is a
> separate ABI rule which makes this a correct transform.

I don't think it does quite that. If you're applying that rule it
makes it a *necessary* transformation (in this view LLVM enforces ABI
alignments whether you want them or not for your language, LLVM 3.6
had a bug by not doing so, and trunk still has a bug for not always
doing so).

> The strange part is that noone seems to be observing this outside
> FreeBSD, which has a pretty ancient ld. What I don't know is where ld
> *is* picking up the 64bit alignment from.

ELF files don't actually contain an alignment field for symbols.
Gold's algorithm appears to start with the section's alignment, keep
reducing that until the symbol's address is properly aligned and then
decide that was "obviously" the intended alignment. Not ideal (objects
can easily get over-aligned by accident), but about the only sensible
way to do it given the data available.

So the 64-bit alignment ultimately came from LLVM only aligning the
section containing std::cout to 64-bits.

Tim.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: libc++ std::cout alignment trouble (was: Re: [llvm] r240144 - [SLP] Vectorize for all-constant entries.)

Ilya Biryukov via cfe-dev
On Sat, Oct 10, 2015 at 5:40 PM, Tim Northover via cfe-dev <[hidden email]> wrote:
> So there are two slightly different things here. I think we should be
> able to raise the alignment of global objects, independently of what the
> platform ABI says.

I'd like that to be the case, but it clearly is visible at shared
object boundaries so I don't think it's something we can change at
will.

> Now on the specific case of AMD64, there is a
> separate ABI rule which makes this a correct transform.

I don't think it does quite that. If you're applying that rule it
makes it a *necessary* transformation (in this view LLVM enforces ABI
alignments whether you want them or not for your language, LLVM 3.6
had a bug by not doing so, and trunk still has a bug for not always
doing so).

> The strange part is that noone seems to be observing this outside
> FreeBSD, which has a pretty ancient ld. What I don't know is where ld
> *is* picking up the 64bit alignment from.

ELF files don't actually contain an alignment field for symbols.
Gold's algorithm appears to start with the section's alignment, keep
reducing that until the symbol's address is properly aligned and then
decide that was "obviously" the intended alignment. Not ideal (objects
can easily get over-aligned by accident), but about the only sensible
way to do it given the data available.

So the 64-bit alignment ultimately came from LLVM only aligning the
section containing std::cout to 64-bits.

Any chance this is because libc++ gives these objects different types in the declaration versus the definition?

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: libc++ std::cout alignment trouble (was: Re: [llvm] r240144 - [SLP] Vectorize for all-constant entries.)

Ilya Biryukov via cfe-dev

> Any chance this is because libc++ gives these objects different types in the
> declaration versus the definition?

I think it's unlikely. The decision to increase alignment doesn't seem to be based on anything you'd expect to be affected by that (underlying global type, possibly TBAA info). It's linkages and sizes that actually get used.

Tim.


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: libc++ std::cout alignment trouble (was: Re: [llvm] r240144 - [SLP] Vectorize for all-constant entries.)

Ilya Biryukov via cfe-dev
In reply to this post by Ilya Biryukov via cfe-dev
On 11 Oct 2015, at 01:52, Tim Northover via cfe-dev <[hidden email]> wrote:

>
> On 10 October 2015 at 14:22, Joerg Sonnenberger via cfe-dev
> <[hidden email]> wrote:
>> On Sat, Oct 10, 2015 at 02:17:38PM -0700, Michael Zolotukhin via cfe-dev wrote:
>>> it looks like you relatively recently touched code in
>>> enforceKnownAlignment. Could you please comment on whether it’s correct
>>> behavior or not?
>>
>> This looks entirely correct to me as long as PrefAlign <= 16 for AMD64.
>
> I'm not so sure, especially when we get to other targets that don't
> have the special AMD64 rule about objects > 16 bytes in size getting
> that alignment.
>
> It seems that the way ELF's R_..._COPY relocations work make an
> externally visible object's alignment part of a .so's ABI whether we
> like it or not. I just don't see a way around that which allows us to
> increase it at the moment.
Yes, this is the important part.  There will be existing executables out
there, compiled by previous versions of clang, or even gcc, that have
this 'bad' alignment.  Increasing the alignment now, with no way to undo
it, will break backwards compatibility.

Another interesting data point is that in the following one-liner:

__attribute__((__aligned__(8))) char cout[160];

cout will *not* be aligned to 16 bytes, whereas the sample I posted
earlier, and the code in iostream.cpp, *will* be aligned to 16 bytes,
even though the actual declarations look quite the same.

So there is something going on in llvm that bumps up the alignment, for
some reason, related to r240144.  I would like to either revert that
behavior, or try to find a way to work around it (except from reverting
r240144, that is).

Another reason to not bump the alignment is that all versions of gcc
that I tried (up to the latest 6.0.0 trunk) emit .align 8 for std::cout.
Of course we don't want to be bug-for-bug compatible with *everything*
gcc does, but this one seems a no-brainer to me. :-)

-Dimitry

P.S.: In hindsight, specifying an explicit alignment for std::cout etc
was maybe not so handy.  But it's too late to change now, without
breaking the libc++ ABI...


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

signature.asc (201 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: libc++ std::cout alignment trouble (was: Re: [llvm] r240144 - [SLP] Vectorize for all-constant entries.)

Ilya Biryukov via cfe-dev
> cout will *not* be aligned to 16 bytes, whereas the sample I posted
> earlier, and the code in iostream.cpp, *will* be aligned to 16 bytes,
> even though the actual declarations look quite the same.

It's particular operations on the variable that trigger the bump,
rather than just the declaration itself.

> So there is something going on in llvm that bumps up the alignment, for
> some reason, related to r240144.  I would like to either revert that
> behavior, or try to find a way to work around it (except from reverting
> r240144, that is).

Reverting that revision would just be hiding the underlying issue;
completely the wrong thing to do on trunk.

> P.S.: In hindsight, specifying an explicit alignment for std::cout etc
> was maybe not so handy.  But it's too late to change now, without
> breaking the libc++ ABI...

It's impossible to do otherwise (and maintain our cunning lazy
initialization). Other places are using std::cout as an ostream, which
has an ABI-required alignment of (at least) 8.

Cheers.

Tim.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: libc++ std::cout alignment trouble (was: Re: [llvm] r240144 - [SLP] Vectorize for all-constant entries.)

Ilya Biryukov via cfe-dev
On 14 October 2015 at 16:48, Tim Northover <[hidden email]> wrote:
>>
>> So there is something going on in llvm that bumps up the alignment, for
>> some reason, related to r240144.  I would like to either revert that
>> behavior, or try to find a way to work around it (except from reverting
>> r240144, that is).
>
> Reverting that revision would just be hiding the underlying issue;
> completely the wrong thing to do on trunk.

Right, we need to find the root cause. For now Dimitry has reverted
r240144 in the FreeBSD tree to work around the issue, but we need to
replace it with a real fix.

>> P.S.: In hindsight, specifying an explicit alignment for std::cout etc
>> was maybe not so handy.  But it's too late to change now, without
>> breaking the libc++ ABI...
>
> It's impossible to do otherwise (and maintain our cunning lazy
> initialization). Other places are using std::cout as an ostream, which
> has an ABI-required alignment of (at least) 8.

I'm a bit confused about where the bug actually is though. As I
understand it, the alignment attribute specifies the minimum
alignment, and the cunning char array trick actually imposes a 16-byte
ABI alignment. If so then I'd say the bug is libc++'s.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: libc++ std::cout alignment trouble (was: Re: [llvm] r240144 - [SLP] Vectorize for all-constant entries.)

Ilya Biryukov via cfe-dev
>> It's impossible to do otherwise (and maintain our cunning lazy
>> initialization). Other places are using std::cout as an ostream, which
>> has an ABI-required alignment of (at least) 8.

First, Joerg pointed out on IRC that my statement above was in error
and the ABI-alignment of "char cout[LOTS]" is actually 16, which means
that on AMD64 only we could have skipped the aligned attribute. I
don't think that affects what we do here though.

> I'm a bit confused about where the bug actually is though. As I
> understand it, the alignment attribute specifies the minimum
> alignment, and the cunning char array trick actually imposes a 16-byte
> ABI alignment.

So the AMD64 ABI alignment should override the attribute? I think
Dmitri has a good point that that's not how GCC interprets it (even
the documentation seems to allow under-aligning non-struct variables).
But even if so the bug would be in Clang 3.6[*] for emitting that
declaration into libc++.so with only 8 bytes alignment.

We'd also still have the issue that we probably shouldn't overalign
externally visible globals on ELF, even if the AMD64 ABI came to the
rescue in this one case.

My summary position is:
1. We need to stop over-aligning externally visible globals on ELF.
Ideally only if they're going into a shared library, but I don't think
that info is available.
2. "align 8" is correct for a "__attribute__((aligned(8))) char
arr[LOTS]" declaration.

Cheers.

Tim.

[*] Or LLVM. Or the linker. Depending on who you ask.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: libc++ std::cout alignment trouble (was: Re: [llvm] r240144 - [SLP] Vectorize for all-constant entries.)

Ilya Biryukov via cfe-dev
On 14 October 2015 at 22:31, Tim Northover <[hidden email]> wrote:
>
> So the AMD64 ABI alignment should override the attribute? I think
> Dmitri has a good point that that's not how GCC interprets it (even
> the documentation seems to allow under-aligning non-struct variables).
> But even if so the bug would be in Clang 3.6[*] for emitting that
> declaration into libc++.so with only 8 bytes alignment.

GCC says "This attribute specifies a minimum alignment (in bytes) for
variables of the specified type." which implies to me the 16-byte ABI
alignment should apply. However, that's not true for all GCC versions
we've examined and Clang 3.6 and earlier, and it seems to be true on
Clang 3.7 only by accident.

Perhaps this issue is _also_ a Clang bug. But if we assume the
alignment attribute is a minimum, and we want the natural 8-byte
alignment for ostream, libc++ is also in error.

> My summary position is:
> 1. We need to stop over-aligning externally visible globals on ELF.
> Ideally only if they're going into a shared library, but I don't think
> that info is available.

I'm not sure how we avoid overalignment though?

> 2. "align 8" is correct for a "__attribute__((aligned(8))) char
> arr[LOTS]" declaration.

I don't believe this is true according to GCC's documentation. I agree
that if we expect aligned(8) to specify exact instead of at least
8-byte alignment then this is a Clang bug.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: libc++ std::cout alignment trouble (was: Re: [llvm] r240144 - [SLP] Vectorize for all-constant entries.)

Ilya Biryukov via cfe-dev
On 14 October 2015 at 16:06, Ed Maste <[hidden email]> wrote:
> On 14 October 2015 at 22:31, Tim Northover <[hidden email]> wrote:
> GCC says "This attribute specifies a minimum alignment (in bytes) for
> variables of the specified type." which implies to me the 16-byte ABI
> alignment should apply.

I think there might be two places discussing the attribute. One
applying to types, the other to variables (and typedefs, strangely).
From https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html,
I'm basing my thoughts on:

"This attribute specifies a minimum alignment for the variable or
structure field, measured in bytes. [...] The default alignment is
fixed for a particular target ABI. [...] When used on a struct, or
struct member, the aligned attribute can only increase the alignment."

To me those 3 imply that attribute((aligned(N))) takes precedence over
the ABI and can reduce the alignment below ABI as long as it's not
applied to a struct. I take the "minimum" to mean that the compiler
isn't going to deliberately misalign a variable for you to guarantee
that "addr % (2*N) != 0".

> Perhaps this issue is _also_ a Clang bug. But if we assume the
> alignment attribute is a minimum, and we want the natural 8-byte
> alignment for ostream, libc++ is also in error.

I followed your reasoning on the aligned attribute (though still don't
agree), but I'm afraid I don't see this one. Are you saying that
libc++ should have used some other means to make sure that cout was
aligned to precisely 8 bytes, rather than 16? Should have known all
such attempts were futile and not tried, or what?

And how does this reasoning apply to other platforms where without the
attribute the ABI alignment would only have been 1?

Cheers.

Tim.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: libc++ std::cout alignment trouble (was: Re: [llvm] r240144 - [SLP] Vectorize for all-constant entries.)

Ilya Biryukov via cfe-dev
On 14 October 2015 at 23:22, Tim Northover <[hidden email]> wrote:

> I think there might be two places discussing the attribute. One
> applying to types, the other to variables (and typedefs, strangely).
> From https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html,
> I'm basing my thoughts on:
>
> "This attribute specifies a minimum alignment for the variable or
> structure field, measured in bytes. [...] The default alignment is
> fixed for a particular target ABI. [...] When used on a struct, or
> struct member, the aligned attribute can only increase the alignment."
>
> To me those 3 imply that attribute((aligned(N))) takes precedence over
> the ABI and can reduce the alignment below ABI as long as it's not
> applied to a struct. I take the "minimum" to mean that the compiler
> isn't going to deliberately misalign a variable for you to guarantee
> that "addr % (2*N) != 0".

Ok, now I see what you mean. An alignment is inherently a minimum,
since a symbol may always end up with a wider alignment than required.
So the attribute overrides the "minimum alignment," not sets a minimum
value for the alignment (that may end up higher due to ABI).
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: libc++ std::cout alignment trouble (was: Re: [llvm] r240144 - [SLP] Vectorize for all-constant entries.)

Ilya Biryukov via cfe-dev
In reply to this post by Ilya Biryukov via cfe-dev
On 15 Oct 2015, at 00:31, Tim Northover <[hidden email]> wrote:

>
>>> It's impossible to do otherwise (and maintain our cunning lazy
>>> initialization). Other places are using std::cout as an ostream, which
>>> has an ABI-required alignment of (at least) 8.
>
> First, Joerg pointed out on IRC that my statement above was in error
> and the ABI-alignment of "char cout[LOTS]" is actually 16, which means
> that on AMD64 only we could have skipped the aligned attribute. I
> don't think that affects what we do here though.
>
>> I'm a bit confused about where the bug actually is though. As I
>> understand it, the alignment attribute specifies the minimum
>> alignment, and the cunning char array trick actually imposes a 16-byte
>> ABI alignment.
>
> So the AMD64 ABI alignment should override the attribute? I think
> Dmitri has a good point that that's not how GCC interprets it (even
> the documentation seems to allow under-aligning non-struct variables).
> But even if so the bug would be in Clang 3.6[*] for emitting that
> declaration into libc++.so with only 8 bytes alignment.
>
> We'd also still have the issue that we probably shouldn't overalign
> externally visible globals on ELF, even if the AMD64 ABI came to the
> rescue in this one case.
>
> My summary position is:
> 1. We need to stop over-aligning externally visible globals on ELF.
> Ideally only if they're going into a shared library, but I don't think
> that info is available.
> 2. "align 8" is correct for a "__attribute__((aligned(8))) char
> arr[LOTS]" declaration.
This thread unfortunately died out without any resolution, so we are now
nearing 3.8 branching, and the problem is still not fixed.  E.g., the
only custom patch we have now left in FreeBSD's version of llvm (which
we would really like to get rid of!) is the reversal of r240144: it is a
pretty gross workaround, but it works for now.

But is there any way that we can get this resolved before 3.8.0 ships,
please? :-)

-Dimitry


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

signature.asc (201 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: libc++ std::cout alignment trouble (was: Re: [llvm] r240144 - [SLP] Vectorize for all-constant entries.)

Ilya Biryukov via cfe-dev
On Dec 31, 2015, at 5:03 AM, Dimitry Andric via cfe-dev <[hidden email]> wrote:
This thread unfortunately died out without any resolution, so we are now
nearing 3.8 branching, and the problem is still not fixed.  E.g., the
only custom patch we have now left in FreeBSD's version of llvm (which
we would really like to get rid of!) is the reversal of r240144: it is a
pretty gross workaround, but it works for now.

But is there any way that we can get this resolved before 3.8.0 ships,
please? :-)

I believe the summary of the thread is that LLVM has a fundamentally unsound transformation: automatically increasing the required alignment of an externally visible global variable.

In enforceKnownAlignment in lib/Transforms/Utils/Local.cpp, it checks isStrongDefinitionForLinker() purportedly to ensure that the "memory we set aside for the global" is "the memory used by the final program.". However, even if said predicate returns true, that condition is NOT guaranteed, at least on ELF platforms, because of copy relocations.

(There's also very similar-looking alignment-fixing code in CodeGenPrepare::optimizeCallInst, too, I'm not sure why it's duplicated there.)

Basically, we need to not modify the alignment if:
if (Subtarget->isTargetELF() &&
    TM.getRelocationModel() == Reloc::PIC_ &&
    GV->hasDefaultVisibility() && !GV->hasLocalLinkage())

The above code snippet appears in X86FastISel.cpp, and equivalent-but-not-identical code appears in several other places, to determine whether a PLT load for the variable is required.

I'm not sure if isStrongDefinitionForLinker itself ought to be taking that into account, or if there should be a different new predicate for that check. isReallyActuallyTrulyStrongDefinitionForLinker()? :)


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: libc++ std::cout alignment trouble (was: Re: [llvm] r240144 - [SLP] Vectorize for all-constant entries.)

Ilya Biryukov via cfe-dev
I believe the pending change at http://reviews.llvm.org/D16145 will solve this issue.

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev