How to debug if LTO generate wrong code?

classic Classic list List threaded Threaded
34 messages Options
12
Reply | Threaded
Open this post in threaded view
|

How to debug if LTO generate wrong code?

David Chisnall via cfe-dev

Hello,

I'm enabling clang LTO to improve code size of Uefi standard (http://www.uefi.org/) firmware (https://github.com/tianocore/edk2), which is mostly C code. My project is in https://github.com/shijunjing/edk2 branch llvm : https://github.com/shijunjing/edk2/tree/llvm. I find my most firmware modules work well after enable LTO, but some X64 modules will not run (e.g. hang with CPU exception) , and these X64 modules work well if build with the LTO disabled (-fno-lto).

I don’t know how to efficiently debug these LTO wrong code and investigate if there is compiler’s bug. I appreciate if anyone can  give me some suggestions about the clang LTO issue debug method, commands, or BKMs.

 

Below are  my clang LTO build tools and options, I use clang 3.8 release with binutils 2.26 ld (I’ve pushed ld support LLVM gold plugin https://sourceware.org/bugzilla/show_bug.cgi?id=20070). Any suggestion is welcome!

 

 

 

##################

# CLANGLTO38 X64 definitions

##################

*_CLANGLTO38_X64_OBJCOPY_PATH         = DEF(GCC53_X64_PREFIX)objcopy

*_CLANGLTO38_X64_CC_PATH              = DEF(CLANG38_X64_PREFIX)clang

*_CLANGLTO38_X64_SLINK_PATH           = DEF(CLANG38_X64_PREFIX)llvm-ar

*_CLANGLTO38_X64_DLINK_PATH           = DEF(CLANG38_X64_PREFIX)clang

*_CLANGLTO38_X64_ASM_PATH             = DEF(CLANG38_X64_PREFIX)clang

*_CLANGLTO38_X64_PP_PATH              = DEF(CLANG38_X64_PREFIX)clang

*_CLANGLTO38_X64_RC_PATH              = DEF(GCC53_X64_PREFIX)objcopy

 

*_CLANGLTO38_X64_CC_FLAGS = -c -fshort-wchar -fno-strict-aliasing -Wall -Werror -Wno-array-bounds -Wno-empty-body -ffunction-sections -fdata-sections -include AutoGen.h -DSTRING_ARRAY_NAME=$(BASE_NAME)Strings -fno-stack-protector -fno-builtin -mms-bitfields -Wno-address -Wno-shift-negative-value -Wno-parentheses-equality -Wno-unknown-pragmas -Wno-tautological-constant-out-of-range-compare -Wno-incompatible-library-redeclaration -target x86_64-pc-linux-gnu -fno-asynchronous-unwind-tables -m64 -Wno-enum-conversion "-DEFIAPI=__attribute__((ms_abi))" -mno-red-zone -mcmodel=large -g -Os -flto 

*_CLANGLTO38_X64_DLINK_FLAGS  = -flto -nostdlib -Wl,-n -Wl,-q -Wl,--gc-sections -Wl,-z,common-page-size=0x40 -Wl,--entry,$(IMAGE_ENTRY_POINT) -Wl,-u,$(IMAGE_ENTRY_POINT) -Wl,-Map,$(DEST_DIR_DEBUG)/$(BASE_NAME).map -Wl,-melf_x86_64 -Wl,--oformat=elf64-x86-64

*_CLANGLTO38_X64_ASM_FLAGS            = -c -x assembler -imacros $(DEST_DIR_DEBUG)/AutoGen.h -m64 -target x86_64-pc-linux-gnu

*_CLANGLTO38_X64_RC_FLAGS             = -I binary -O elf64-x86-64        -B i386    --rename-section .data=.hii

*_CLANGLTO38_X64_NASM_FLAGS           = -f elf64

 

 

 

Steven Shi

Intel\SSG\STO\UEFI Firmware

 

Tel: +86 021-61166522

iNet: 821-6522

 


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] How to debug if LTO generate wrong code?

David Chisnall via cfe-dev
Steven,

Brute force method  is ,get the disassemble of the hanged function and
try to check the difference with and without LTO in the generated
code.

or try to attach gdb and check  for the instruction ,that cause the exception .

Thank you
~Umesh

On Fri, May 13, 2016 at 7:48 PM, Shi, Steven via llvm-dev
<[hidden email]> wrote:

> Hello,
>
> I'm enabling clang LTO to improve code size of Uefi standard
> (http://www.uefi.org/) firmware (https://github.com/tianocore/edk2), which
> is mostly C code. My project is in https://github.com/shijunjing/edk2 branch
> llvm : https://github.com/shijunjing/edk2/tree/llvm. I find my most firmware
> modules work well after enable LTO, but some X64 modules will not run (e.g.
> hang with CPU exception) , and these X64 modules work well if build with the
> LTO disabled (-fno-lto).
>
> I don’t know how to efficiently debug these LTO wrong code and investigate
> if there is compiler’s bug. I appreciate if anyone can  give me some
> suggestions about the clang LTO issue debug method, commands, or BKMs.
>
>
>
> Below are  my clang LTO build tools and options, I use clang 3.8 release
> with binutils 2.26 ld (I’ve pushed ld support LLVM gold plugin
> https://sourceware.org/bugzilla/show_bug.cgi?id=20070). Any suggestion is
> welcome!
>
>
>
>
>
>
>
> ##################
>
> # CLANGLTO38 X64 definitions
>
> ##################
>
> *_CLANGLTO38_X64_OBJCOPY_PATH         = DEF(GCC53_X64_PREFIX)objcopy
>
> *_CLANGLTO38_X64_CC_PATH              = DEF(CLANG38_X64_PREFIX)clang
>
> *_CLANGLTO38_X64_SLINK_PATH           = DEF(CLANG38_X64_PREFIX)llvm-ar
>
> *_CLANGLTO38_X64_DLINK_PATH           = DEF(CLANG38_X64_PREFIX)clang
>
> *_CLANGLTO38_X64_ASM_PATH             = DEF(CLANG38_X64_PREFIX)clang
>
> *_CLANGLTO38_X64_PP_PATH              = DEF(CLANG38_X64_PREFIX)clang
>
> *_CLANGLTO38_X64_RC_PATH              = DEF(GCC53_X64_PREFIX)objcopy
>
>
>
> *_CLANGLTO38_X64_CC_FLAGS = -c -fshort-wchar -fno-strict-aliasing -Wall
> -Werror -Wno-array-bounds -Wno-empty-body -ffunction-sections
> -fdata-sections -include AutoGen.h -DSTRING_ARRAY_NAME=$(BASE_NAME)Strings
> -fno-stack-protector -fno-builtin -mms-bitfields -Wno-address
> -Wno-shift-negative-value -Wno-parentheses-equality -Wno-unknown-pragmas
> -Wno-tautological-constant-out-of-range-compare
> -Wno-incompatible-library-redeclaration -target x86_64-pc-linux-gnu
> -fno-asynchronous-unwind-tables -m64 -Wno-enum-conversion
> "-DEFIAPI=__attribute__((ms_abi))" -mno-red-zone -mcmodel=large -g -Os -flto
>
> *_CLANGLTO38_X64_DLINK_FLAGS  = -flto -nostdlib -Wl,-n -Wl,-q
> -Wl,--gc-sections -Wl,-z,common-page-size=0x40
> -Wl,--entry,$(IMAGE_ENTRY_POINT) -Wl,-u,$(IMAGE_ENTRY_POINT)
> -Wl,-Map,$(DEST_DIR_DEBUG)/$(BASE_NAME).map -Wl,-melf_x86_64
> -Wl,--oformat=elf64-x86-64
>
> *_CLANGLTO38_X64_ASM_FLAGS            = -c -x assembler -imacros
> $(DEST_DIR_DEBUG)/AutoGen.h -m64 -target x86_64-pc-linux-gnu
>
> *_CLANGLTO38_X64_RC_FLAGS             = -I binary -O elf64-x86-64        -B
> i386    --rename-section .data=.hii
>
> *_CLANGLTO38_X64_NASM_FLAGS           = -f elf64
>
>
>
>
>
>
>
> Steven Shi
>
> Intel\SSG\STO\UEFI Firmware
>
>
>
> Tel: +86 021-61166522
>
> iNet: 821-6522
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] How to debug if LTO generate wrong code?

David Chisnall via cfe-dev

Hi Umesh,

Thank you for the suggestion. I can use the "Brute force method " to narrow down the LTO wrong instructions here and there, but I still don't know why these wrong instructions are generated, and how to let Clang LTO don't generate those wrong instructions.

I suspect the wrong code is caused by some LTO wrong optimization pass, so I hope to disable all optimizations in the LTO firstly, then enable them one by one later to narrow down my issue root cause. But when I try to disable the optimization by enforcing –O0 in the LTO build, I find the ld fails to recognize some clang  bitcode library, and fail to link.

 

e.g. use the Clang_LTO_Fails_On_LD example in below bug attachment

https://sourceware.org/bugzilla/show_bug.cgi?id=20070

 

If I enforce the -O0 to disable the optimization in LTO, the ld fail to link:

~/clang38/bin/clang -o Hello.dll -flto -O0 -nostdlib -Wl,-n -Wl,-q -Wl,--gc-sections -Wl,-z,common-page-size=0x40 -Wl,--entry,_ModuleEntryPoint -Wl,-u,_ModuleEntryPoint -Wl,-Map,Hello.map -Wl,-melf_x86_64 -Wl,--oformat=elf64-x86-64 -Wl,--start-group,,@static_library_files.lst -Wl,--end-group

BaseLib.lib: error adding symbols: File format not recognized

clang-3.8: error: linker command failed with exit code 1 (use -v to see invocation)

 

But if I enable the -O1, -O2, or higher -On, the ld  link pass:

~/clang38/bin/clang -o Hello.dll -flto -O1 -nostdlib -Wl,-n -Wl,-q -Wl,--gc-sections -Wl,-z,common-page-size=0x40 -Wl,--entry,_ModuleEntryPoint -Wl,-u,_ModuleEntryPoint -Wl,-Map,Hello.map -Wl,-melf_x86_64 -Wl,--oformat=elf64-x86-64 -Wl,--start-group,,@static_library_files.lst -Wl,--end-group

 

 

How can I correctly disable all the optimization in clang LTO? How can I know, enable and disable the specific optimizations in clang LTO? Any suggestion is welcomed!

 

 

 

Steven Shi

Intel\SSG\STO\UEFI Firmware

 

Tel: +86 021-61166522

iNet: 821-6522

 

> -----Original Message-----

> From: Umesh Kalappa [mailto:[hidden email]]

> Sent: Saturday, May 14, 2016 2:14 AM

> To: Shi, Steven <[hidden email]>

> Cc: llvm-dev <[hidden email]>; [hidden email]

> Subject: Re: [llvm-dev] How to debug if LTO generate wrong code?

>

> Steven,

>

> Brute force method  is ,get the disassemble of the hanged function and

> try to check the difference with and without LTO in the generated

> code.

>

> or try to attach gdb and check  for the instruction ,that cause the exception .

>

> Thank you

> ~Umesh

>

> On Fri, May 13, 2016 at 7:48 PM, Shi, Steven via llvm-dev

> <[hidden email]> wrote:

> > Hello,

> >

> > I'm enabling clang LTO to improve code size of Uefi standard

> > (http://www.uefi.org/) firmware (https://github.com/tianocore/edk2),

> which

> > is mostly C code. My project is in https://github.com/shijunjing/edk2

> branch

> > llvm : https://github.com/shijunjing/edk2/tree/llvm. I find my most

> firmware

> > modules work well after enable LTO, but some X64 modules will not run

> (e.g.

> > hang with CPU exception) , and these X64 modules work well if build with

> the

> > LTO disabled (-fno-lto).

> >

> > I don’t know how to efficiently debug these LTO wrong code and

> investigate

> > if there is compiler’s bug. I appreciate if anyone can  give me some

> > suggestions about the clang LTO issue debug method, commands, or BKMs.

> >

> >

> >

> > Below are  my clang LTO build tools and options, I use clang 3.8 release

> > with binutils 2.26 ld (I’ve pushed ld support LLVM gold plugin

> > https://sourceware.org/bugzilla/show_bug.cgi?id=20070). Any suggestion

> is

> > welcome!

> >

> >

> >

> >

> >

> >

> >

> > ##################

> >

> > # CLANGLTO38 X64 definitions

> >

> > ##################

> >

> > *_CLANGLTO38_X64_OBJCOPY_PATH         =

> DEF(GCC53_X64_PREFIX)objcopy

> >

> > *_CLANGLTO38_X64_CC_PATH              = DEF(CLANG38_X64_PREFIX)clang

> >

> > *_CLANGLTO38_X64_SLINK_PATH           = DEF(CLANG38_X64_PREFIX)llvm-

> ar

> >

> > *_CLANGLTO38_X64_DLINK_PATH           =

> DEF(CLANG38_X64_PREFIX)clang

> >

> > *_CLANGLTO38_X64_ASM_PATH             = DEF(CLANG38_X64_PREFIX)clang

> >

> > *_CLANGLTO38_X64_PP_PATH              = DEF(CLANG38_X64_PREFIX)clang

> >

> > *_CLANGLTO38_X64_RC_PATH              = DEF(GCC53_X64_PREFIX)objcopy

> >

> >

> >

> > *_CLANGLTO38_X64_CC_FLAGS = -c -fshort-wchar -fno-strict-aliasing -

> Wall

> > -Werror -Wno-array-bounds -Wno-empty-body -ffunction-sections

> > -fdata-sections -include AutoGen.h -

> DSTRING_ARRAY_NAME=$(BASE_NAME)Strings

> > -fno-stack-protector -fno-builtin -mms-bitfields -Wno-address

> > -Wno-shift-negative-value -Wno-parentheses-equality -Wno-unknown-

> pragmas

> > -Wno-tautological-constant-out-of-range-compare

> > -Wno-incompatible-library-redeclaration -target x86_64-pc-linux-gnu

> > -fno-asynchronous-unwind-tables -m64 -Wno-enum-conversion

> > "-DEFIAPI=__attribute__((ms_abi))" -mno-red-zone -mcmodel=large -g -Os

> -flto

> >

> > *_CLANGLTO38_X64_DLINK_FLAGS  = -flto -nostdlib -Wl,-n -Wl,-q

> > -Wl,--gc-sections -Wl,-z,common-page-size=0x40

> > -Wl,--entry,$(IMAGE_ENTRY_POINT) -Wl,-u,$(IMAGE_ENTRY_POINT)

> > -Wl,-Map,$(DEST_DIR_DEBUG)/$(BASE_NAME).map -Wl,-melf_x86_64

> > -Wl,--oformat=elf64-x86-64

> >

> > *_CLANGLTO38_X64_ASM_FLAGS            = -c -x assembler -imacros

> > $(DEST_DIR_DEBUG)/AutoGen.h -m64 -target x86_64-pc-linux-gnu

> >

> > *_CLANGLTO38_X64_RC_FLAGS             = -I binary -O elf64-x86-64        -B

> > i386    --rename-section .data=.hii

> >

> > *_CLANGLTO38_X64_NASM_FLAGS           = -f elf64

> >

> >

> >

> >

> >

> >

> >

> > Steven Shi

> >

> > Intel\SSG\STO\UEFI Firmware

> >

> >

> >

> > Tel: +86 021-61166522

> >

> > iNet: 821-6522

> >

> >

> >

> >

> > _______________________________________________

> > LLVM Developers mailing list

> > [hidden email]

> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

> >


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] How to debug if LTO generate wrong code?

David Chisnall via cfe-dev

Hello,

Let me ask a LTO simple question again. For the llvm LTO example in the link: http://llvm.org/docs/LinkTimeOptimization.html, I use below build commands to generate three different optimization level binary: -O0, -O1, -O2. By nm listing the foo1~4 symbols , I can see these different optimizations really works.

1.       How can I know what different optimizations are used by the clang LTO among -O0, -O1 and -O2?

2.       Is the compiler domain optimization (e.g. clang/llvm) or the linker (e.g. ld) domain optimization make these difference?

3.       How can I explicitly enable or disable these specific optimizations besides using -O0, -O1, -O2?

 

 

$clang -emit-llvm -c main.c -o main.bc

$clang -emit-llvm -c a.c -o a.bc

$llvm-ar cr main.lib main.bc

$llvm-ar cr a.lib a.bc

$clang -O0 -flto main.lib a.lib -o main0

$clang -O1 -flto main.lib a.lib -o main1

$clang -O2 -flto main.lib a.lib -o main2

 

$nm main0

00000000004005a0 t foo1

0000000000400580 t foo2

00000000004005e0 t foo3

0000000000400530 t foo4

0000000000400500 t frame_dummy

$ nm main1

0000000000400550 t foo1

0000000000400580 t foo3

0000000000400530 t foo4

0000000000400500 t frame_dummy

$ nm main2

00000000004004d0 t frame_dummy

 

From blew verbose output, tt seems only linker( e.g. ld) is invovled to do the optimization?

 

$ clang -O2 -flto main.lib a.lib -o main2 -v

clang version 3.8.0 (tags/RELEASE_380/final)

Target: x86_64-unknown-linux-gnu

Thread model: posix

InstalledDir: /usr/local/bin

Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.9

Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.9.3

Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/5.3.1

Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/6.0.0

Found candidate GCC installation: /usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0

Selected GCC installation: /usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0

Candidate multilib: .;@m64

Candidate multilib: 32;@m32

Selected multilib: .;@m64

"/usr/bin/ld" -z relro --hash-style=gnu --build-id --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o main2 /usr/lib/x86_64-linux-gnu/crt1.o /usr/lib/x86_64-linux-gnu/crti.o /usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0/crtbegin.o -L/usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0 -L/usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0/../../../../lib64 -L/usr/local/bin/../lib64 -L/lib/x86_64-linux-gnu -L/lib/../lib64 -L/usr/lib/x86_64-linux-gnu -L/usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0/../../.. -L/usr/local/bin/../lib -L/lib -L/usr/lib -plugin /usr/local/bin/../lib/LLVMgold.so -plugin-opt=mcpu=x86-64 -plugin-opt=O2 main.lib a.lib -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0/crtend.o /usr/lib/x86_64-linux-gnu/crtn.o

 

 

Steven Shi

Intel\SSG\STO\UEFI Firmware

 

Tel: +86 021-61166522

iNet: 821-6522

 


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] How to debug if LTO generate wrong code?

David Chisnall via cfe-dev

On May 17, 2016, at 1:33 AM, Shi, Steven via llvm-dev <[hidden email]> wrote:

Hello,
Let me ask a LTO simple question again. For the llvm LTO example in the link:http://llvm.org/docs/LinkTimeOptimization.html, I use below build commands to generate three different optimization level binary: -O0, -O1, -O2. By nm listing the foo1~4 symbols , I can see these different optimizations really works. 
1.       How can I know what different optimizations are used by the clang LTO among -O0, -O1 and -O2?

LTO is linker specific, clang is only forwarding the option to the linker here.

2.       Is the compiler domain optimization (e.g. clang/llvm) or the linker (e.g. ld) domain optimization make these difference?

In you case, you invoke clang with "emit-llvm", without any optimization level, so you get O0.
For what the linker is doing at these optimizations levels, again this is linker specific.

3.       How can I explicitly enable or disable these specific optimizations besides using -O0, -O1, -O2?

If you're talking about the LTO, this is linker specific again (ld is not the same program on every platform). For instance there is no such thing as O0/O1/O2 on OS X.


 
 
$clang -emit-llvm -c main.c -o main.bc
$clang -emit-llvm -c a.c -o a.bc
$llvm-ar cr main.lib main.bc
$llvm-ar cr a.lib a.bc
$clang -O0 -flto main.lib a.lib -o main0
$clang -O1 -flto main.lib a.lib -o main1
$clang -O2 -flto main.lib a.lib -o main2
 
$nm main0
00000000004005a0 t foo1
0000000000400580 t foo2
00000000004005e0 t foo3
0000000000400530 t foo4
0000000000400500 t frame_dummy
$ nm main1
0000000000400550 t foo1
0000000000400580 t foo3
0000000000400530 t foo4
0000000000400500 t frame_dummy
$ nm main2
00000000004004d0 t frame_dummy
 
From blew verbose output, tt seems only linker( e.g. ld) is invovled to do the optimization?

Yes.
Usually the LTO pipeline is a bit different from what you're doing, I'm used to see:

$clang -flto -O3 -c main.c -o main.o
$clang -flto -O3 -c a.c -o a.o
$clang -flto -O3 main.o a.o -o main0


-- 
Mehdi



 
$ clang -O2 -flto main.lib a.lib -o main2 -v
clang version 3.8.0 (tags/RELEASE_380/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/local/bin
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.9
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.9.3
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/5.3.1
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/6.0.0
Found candidate GCC installation: /usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0
Selected GCC installation: /usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Selected multilib: .;@m64
"/usr/bin/ld" -z relro --hash-style=gnu --build-id --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o main2 /usr/lib/x86_64-linux-gnu/crt1.o /usr/lib/x86_64-linux-gnu/crti.o /usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0/crtbegin.o -L/usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0 -L/usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0/../../../../lib64 -L/usr/local/bin/../lib64 -L/lib/x86_64-linux-gnu -L/lib/../lib64 -L/usr/lib/x86_64-linux-gnu -L/usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0/../../.. -L/usr/local/bin/../lib -L/lib -L/usr/lib -plugin /usr/local/bin/../lib/LLVMgold.so -plugin-opt=mcpu=x86-64 -plugin-opt=O2 main.lib a.lib -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0/crtend.o /usr/lib/x86_64-linux-gnu/crtn.o
 
 
Steven Shi
Intel\SSG\STO\UEFI Firmware
 
Tel: +86 021-61166522
iNet: 821-6522
 
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] How to debug if LTO generate wrong code?

David Chisnall via cfe-dev
Steven,

As mehdi stated , the optimisation level is specific to linker and it
enables Inter-Pro  opts passes ,please  refer function

PassManagerBuilder::addLTOOptimizationPasses()  at
http://llvm.org/docs/doxygen/html/PassManagerBuilder_8cpp_source.html

internal options to disable to them ,i don't think ,you can do so.

Thank you
~Umesh

On Tue, May 17, 2016 at 9:21 PM, Mehdi Amini via cfe-dev
<[hidden email]> wrote:

>
> On May 17, 2016, at 1:33 AM, Shi, Steven via llvm-dev
> <[hidden email]> wrote:
>
> Hello,
> Let me ask a LTO simple question again. For the llvm LTO example in the
> link:http://llvm.org/docs/LinkTimeOptimization.html, I use below build
> commands to generate three different optimization level binary: -O0, -O1,
> -O2. By nm listing the foo1~4 symbols , I can see these different
> optimizations really works.
> 1.       How can I know what different optimizations are used by the clang
> LTO among -O0, -O1 and -O2?
>
>
> LTO is linker specific, clang is only forwarding the option to the linker
> here.
>
> 2.       Is the compiler domain optimization (e.g. clang/llvm) or the linker
> (e.g. ld) domain optimization make these difference?
>
>
> In you case, you invoke clang with "emit-llvm", without any optimization
> level, so you get O0.
> For what the linker is doing at these optimizations levels, again this is
> linker specific.
>
> 3.       How can I explicitly enable or disable these specific optimizations
> besides using -O0, -O1, -O2?
>
>
> If you're talking about the LTO, this is linker specific again (ld is not
> the same program on every platform). For instance there is no such thing as
> O0/O1/O2 on OS X.
>
>
>
>
> $clang -emit-llvm -c main.c -o main.bc
> $clang -emit-llvm -c a.c -o a.bc
> $llvm-ar cr main.lib main.bc
> $llvm-ar cr a.lib a.bc
> $clang -O0 -flto main.lib a.lib -o main0
> $clang -O1 -flto main.lib a.lib -o main1
> $clang -O2 -flto main.lib a.lib -o main2
>
> $nm main0
> …
> 00000000004005a0 t foo1
> 0000000000400580 t foo2
> 00000000004005e0 t foo3
> 0000000000400530 t foo4
> 0000000000400500 t frame_dummy
> …
> $ nm main1
> …
> 0000000000400550 t foo1
> 0000000000400580 t foo3
> 0000000000400530 t foo4
> 0000000000400500 t frame_dummy
> …
> $ nm main2
> …
> 00000000004004d0 t frame_dummy
> …
>
> From blew verbose output, tt seems only linker( e.g. ld) is invovled to do
> the optimization?
>
>
> Yes.
> Usually the LTO pipeline is a bit different from what you're doing, I'm used
> to see:
>
> $clang -flto -O3 -c main.c -o main.o
> $clang -flto -O3 -c a.c -o a.o
> $clang -flto -O3 main.o a.o -o main0
>
>
> --
> Mehdi
>
>
>
>
> $ clang -O2 -flto main.lib a.lib -o main2 -v
> clang version 3.8.0 (tags/RELEASE_380/final)
> Target: x86_64-unknown-linux-gnu
> Thread model: posix
> InstalledDir: /usr/local/bin
> Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.9
> Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.9.3
> Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/5.3.1
> Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/6.0.0
> Found candidate GCC installation:
> /usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0
> Selected GCC installation:
> /usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0
> Candidate multilib: .;@m64
> Candidate multilib: 32;@m32
> Selected multilib: .;@m64
> "/usr/bin/ld" -z relro --hash-style=gnu --build-id --eh-frame-hdr -m
> elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o main2
> /usr/lib/x86_64-linux-gnu/crt1.o /usr/lib/x86_64-linux-gnu/crti.o
> /usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0/crtbegin.o
> -L/usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0
> -L/usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0/../../../../lib64
> -L/usr/local/bin/../lib64 -L/lib/x86_64-linux-gnu -L/lib/../lib64
> -L/usr/lib/x86_64-linux-gnu
> -L/usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0/../../..
> -L/usr/local/bin/../lib -L/lib -L/usr/lib -plugin
> /usr/local/bin/../lib/LLVMgold.so -plugin-opt=mcpu=x86-64 -plugin-opt=O2
> main.lib a.lib -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc
> --as-needed -lgcc_s --no-as-needed
> /usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0/crtend.o
> /usr/lib/x86_64-linux-gnu/crtn.o
>
>
> Steven Shi
> Intel\SSG\STO\UEFI Firmware
>
> Tel: +86 021-61166522
> iNet: 821-6522
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] How to debug if LTO generate wrong code?

David Chisnall via cfe-dev

> On May 17, 2016, at 11:21 AM, Umesh Kalappa <[hidden email]> wrote:
>
> Steven,
>
> As mehdi stated , the optimisation level is specific to linker and it
> enables Inter-Pro  opts passes ,please  refer function

To be very clear: the -O option may trigger *linker* optimizations as well, independently of LTO.

--
Mehdi



>
> PassManagerBuilder::addLTOOptimizationPasses()  at
> http://llvm.org/docs/doxygen/html/PassManagerBuilder_8cpp_source.html
>
> internal options to disable to them ,i don't think ,you can do so.
>
> Thank you
> ~Umesh
>
> On Tue, May 17, 2016 at 9:21 PM, Mehdi Amini via cfe-dev
> <[hidden email]> wrote:
>>
>> On May 17, 2016, at 1:33 AM, Shi, Steven via llvm-dev
>> <[hidden email]> wrote:
>>
>> Hello,
>> Let me ask a LTO simple question again. For the llvm LTO example in the
>> link:http://llvm.org/docs/LinkTimeOptimization.html, I use below build
>> commands to generate three different optimization level binary: -O0, -O1,
>> -O2. By nm listing the foo1~4 symbols , I can see these different
>> optimizations really works.
>> 1.       How can I know what different optimizations are used by the clang
>> LTO among -O0, -O1 and -O2?
>>
>>
>> LTO is linker specific, clang is only forwarding the option to the linker
>> here.
>>
>> 2.       Is the compiler domain optimization (e.g. clang/llvm) or the linker
>> (e.g. ld) domain optimization make these difference?
>>
>>
>> In you case, you invoke clang with "emit-llvm", without any optimization
>> level, so you get O0.
>> For what the linker is doing at these optimizations levels, again this is
>> linker specific.
>>
>> 3.       How can I explicitly enable or disable these specific optimizations
>> besides using -O0, -O1, -O2?
>>
>>
>> If you're talking about the LTO, this is linker specific again (ld is not
>> the same program on every platform). For instance there is no such thing as
>> O0/O1/O2 on OS X.
>>
>>
>>
>>
>> $clang -emit-llvm -c main.c -o main.bc
>> $clang -emit-llvm -c a.c -o a.bc
>> $llvm-ar cr main.lib main.bc
>> $llvm-ar cr a.lib a.bc
>> $clang -O0 -flto main.lib a.lib -o main0
>> $clang -O1 -flto main.lib a.lib -o main1
>> $clang -O2 -flto main.lib a.lib -o main2
>>
>> $nm main0
>> …
>> 00000000004005a0 t foo1
>> 0000000000400580 t foo2
>> 00000000004005e0 t foo3
>> 0000000000400530 t foo4
>> 0000000000400500 t frame_dummy
>> …
>> $ nm main1
>> …
>> 0000000000400550 t foo1
>> 0000000000400580 t foo3
>> 0000000000400530 t foo4
>> 0000000000400500 t frame_dummy
>> …
>> $ nm main2
>> …
>> 00000000004004d0 t frame_dummy
>> …
>>
>> From blew verbose output, tt seems only linker( e.g. ld) is invovled to do
>> the optimization?
>>
>>
>> Yes.
>> Usually the LTO pipeline is a bit different from what you're doing, I'm used
>> to see:
>>
>> $clang -flto -O3 -c main.c -o main.o
>> $clang -flto -O3 -c a.c -o a.o
>> $clang -flto -O3 main.o a.o -o main0
>>
>>
>> --
>> Mehdi
>>
>>
>>
>>
>> $ clang -O2 -flto main.lib a.lib -o main2 -v
>> clang version 3.8.0 (tags/RELEASE_380/final)
>> Target: x86_64-unknown-linux-gnu
>> Thread model: posix
>> InstalledDir: /usr/local/bin
>> Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.9
>> Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.9.3
>> Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/5.3.1
>> Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/6.0.0
>> Found candidate GCC installation:
>> /usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0
>> Selected GCC installation:
>> /usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0
>> Candidate multilib: .;@m64
>> Candidate multilib: 32;@m32
>> Selected multilib: .;@m64
>> "/usr/bin/ld" -z relro --hash-style=gnu --build-id --eh-frame-hdr -m
>> elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o main2
>> /usr/lib/x86_64-linux-gnu/crt1.o /usr/lib/x86_64-linux-gnu/crti.o
>> /usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0/crtbegin.o
>> -L/usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0
>> -L/usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0/../../../../lib64
>> -L/usr/local/bin/../lib64 -L/lib/x86_64-linux-gnu -L/lib/../lib64
>> -L/usr/lib/x86_64-linux-gnu
>> -L/usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0/../../..
>> -L/usr/local/bin/../lib -L/lib -L/usr/lib -plugin
>> /usr/local/bin/../lib/LLVMgold.so -plugin-opt=mcpu=x86-64 -plugin-opt=O2
>> main.lib a.lib -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc
>> --as-needed -lgcc_s --no-as-needed
>> /usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0/crtend.o
>> /usr/lib/x86_64-linux-gnu/crtn.o
>>
>>
>> Steven Shi
>> Intel\SSG\STO\UEFI Firmware
>>
>> Tel: +86 021-61166522
>> iNet: 821-6522
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] How to debug if LTO generate wrong code?

David Chisnall via cfe-dev

Hi Mehdi,

After deeper debug, I found my firmware LTO wrong code issue is related to X64 code model (-mcmodel=large) is always overridden as small (-mcmodel=small) if LTO build. And I don't know how to correctly specific the large code model for my X64 firmware LTO build. Appreciate if you could let me know it.

 

You know, parts of my Uefi firmware (BIOS) have to been loaded to run in high address (larger than 2 GB) at the very beginning, and I need the code makes absolutely no assumptions about the addresses and data sections. But current LLVM LTO seems stick to use the small code model and generate many code with 32-bit RIP-relative addressing, which cause CPU exceptions when run in address larger than 2GB.

 

Below, I just simply reuse the Eli's codemodel1.c example (link: http://eli.thegreenplace.net/2012/01/03/understanding-the-x64-code-models) to show the LLVM LTO code model issue.

$ clang -g -O0 codemodel1.c -mcmodel=large -o codemodel1_large.bin

$ clang -g -O0 codemodel1.c -mcmodel=small -o codemodel1_small.bin

$ clang -g -O0 -flto codemodel1.c -mcmodel=large -o codemodel1_large_lto.bin

$ clang -g -O0 -flto codemodel1.c -mcmodel=small -o codemodel1_small_lto.bin

 

You will see the codemodel1_large_lto.bin and codemodel1_small_lto.bin are exactly the same!

And if you disassemble the codemodel1_large_lto.bin, you will see it uses the small code model (32-bit RIP-relative), not large, to do addressing as below.

 

$ objdump -dS codemodel1_large_lto.bin

 

int main(int argc, const char* argv[])

{

  4004f0:       55                      push   %rbp

  4004f1:       48 89 e5                mov    %rsp,%rbp

  4004f4:       48 83 ec 20             sub    $0x20,%rsp

  4004f8:       c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)

  4004ff:       89 7d f8                mov    %edi,-0x8(%rbp)

  400502:       48 89 75 f0             mov    %rsi,-0x10(%rbp)

    int t = global_func(argc);

  400506:       8b 7d f8                mov    -0x8(%rbp),%edi

  400509:       e8 d2 ff ff ff          callq  4004e0 <global_func>

  40050e:       89 45 ec                mov    %eax,-0x14(%rbp)

    t += global_arr[7];

  400511:       8b 04 25 4c 10 60 00    mov    0x60104c,%eax

  400518:       03 45 ec                add    -0x14(%rbp),%eax

  40051b:       89 45 ec                mov    %eax,-0x14(%rbp)

    t += static_arr[7];

  40051e:       8b 04 25 dc 11 60 00    mov    0x6011dc,%eax

  400525:       03 45 ec                add    -0x14(%rbp),%eax

  400528:       89 45 ec                mov    %eax,-0x14(%rbp)

    t += global_arr_big[7];

  40052b:       8b 04 25 6c 13 60 00    mov    0x60136c,%eax

  400532:       03 45 ec                add    -0x14(%rbp),%eax

  400535:       89 45 ec                mov    %eax,-0x14(%rbp)

    t += static_arr_big[7];

  400538:       8b 04 25 ac 20 63 00    mov    0x6320ac,%eax

  40053f:       03 45 ec                add    -0x14(%rbp),%eax

  400542:       89 45 ec                mov    %eax,-0x14(%rbp)

    return t;

  400545:       8b 45 ec                mov    -0x14(%rbp),%eax

  400548:       48 83 c4 20             add    $0x20,%rsp

  40054c:       5d                      pop    %rbp

  40054d:       c3                      retq

  40054e:       66 90                   xchg   %ax,%ax

 

 

So, does LTO support large code model? How to correctly specify the LTO code model option?

 

 

Steven Shi

Intel\SSG\STO\UEFI Firmware

 

Tel: +86 021-61166522

iNet: 821-6522

 

> -----Original Message-----

> From: [hidden email] [mailto:[hidden email]]

> Sent: Wednesday, May 18, 2016 4:02 AM

> To: Umesh Kalappa <[hidden email]>

> Cc: Shi, Steven <[hidden email]>; llvm-dev <[hidden email]>;

> [hidden email]

> Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?

>

>

> > On May 17, 2016, at 11:21 AM, Umesh Kalappa

> <[hidden email]> wrote:

> >

> > Steven,

> >

> > As mehdi stated , the optimisation level is specific to linker and it

> > enables Inter-Pro  opts passes ,please  refer function

>

> To be very clear: the -O option may trigger *linker* optimizations as well,

> independently of LTO.

>

> --

> Mehdi

>

>

>

 


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] How to debug if LTO generate wrong code?

David Chisnall via cfe-dev
Hi,


On May 29, 2016, at 7:36 AM, Shi, Steven <[hidden email]> wrote:

Hi Mehdi,
After deeper debug, I found my firmware LTO wrong code issue is related to X64 code model (-mcmodel=large) is always overridden as small (-mcmodel=small) if LTO build. And I don't know how to correctly specific the large code model for my X64 firmware LTO build. Appreciate if you could let me know it. 
 
You know, parts of my Uefi firmware (BIOS) have to been loaded to run in high address (larger than 2 GB) at the very beginning, and I need the code makes absolutely no assumptions about the addresses and data sections. But current LLVM LTO seems stick to use the small code model and generate many code with 32-bit RIP-relative addressing, which cause CPU exceptions when run in address larger than 2GB.
 
Below, I just simply reuse the Eli's codemodel1.c example (link: http://eli.thegreenplace.net/2012/01/03/understanding-the-x64-code-models) to show the LLVM LTO code model issue.
$ clang -g -O0 codemodel1.c -mcmodel=large -o codemodel1_large.bin
$ clang -g -O0 codemodel1.c -mcmodel=small -o codemodel1_small.bin
$ clang -g -O0 -flto codemodel1.c -mcmodel=large -o codemodel1_large_lto.bin
$ clang -g -O0 -flto codemodel1.c -mcmodel=small -o codemodel1_small_lto.bin
 
You will see the codemodel1_large_lto.bin and codemodel1_small_lto.bin are exactly the same!
And if you disassemble the codemodel1_large_lto.bin, you will see it uses the small code model (32-bit RIP-relative), not large, to do addressing as below.
 
$ objdump -dS codemodel1_large_lto.bin
 
int main(int argc, const char* argv[])
{
  4004f0:       55                      push   %rbp
  4004f1:       48 89 e5                mov    %rsp,%rbp
  4004f4:       48 83 ec 20             sub    $0x20,%rsp
  4004f8:       c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)
  4004ff:       89 7d f8                mov    %edi,-0x8(%rbp)
  400502:       48 89 75 f0             mov    %rsi,-0x10(%rbp)
    int t = global_func(argc);
  400506:       8b 7d f8                mov    -0x8(%rbp),%edi
  400509:       e8 d2 ff ff ff          callq  4004e0 <global_func>
  40050e:       89 45 ec                mov    %eax,-0x14(%rbp)
    t += global_arr[7];
  400511:       8b 04 25 4c 10 60 00    mov    0x60104c,%eax
  400518:       03 45 ec                add    -0x14(%rbp),%eax
  40051b:       89 45 ec                mov    %eax,-0x14(%rbp)
    t += static_arr[7];
  40051e:       8b 04 25 dc 11 60 00    mov    0x6011dc,%eax
  400525:       03 45 ec                add    -0x14(%rbp),%eax
  400528:       89 45 ec                mov    %eax,-0x14(%rbp)
    t += global_arr_big[7];
  40052b:       8b 04 25 6c 13 60 00    mov    0x60136c,%eax
  400532:       03 45 ec                add    -0x14(%rbp),%eax
  400535:       89 45 ec                mov    %eax,-0x14(%rbp)
    t += static_arr_big[7];
  400538:       8b 04 25 ac 20 63 00    mov    0x6320ac,%eax
  40053f:       03 45 ec                add    -0x14(%rbp),%eax
  400542:       89 45 ec                mov    %eax,-0x14(%rbp)
    return t;
  400545:       8b 45 ec                mov    -0x14(%rbp),%eax
  400548:       48 83 c4 20             add    $0x20,%rsp
  40054c:       5d                      pop    %rbp
  40054d:       c3                      retq
  40054e:       66 90                   xchg   %ax,%ax
 
 
So, does LTO support large code model? How to correctly specify the LTO code model option?

Same answer as before: LTO is setup by the linker, so the option for that, if it exists, will be linker specific.

As far as I can tell, neither libLTO-based linker (ld64 on OS X for example), neither the gold plugin supports such an option and the code model is always "default". 

I don't know about lld, CC Rafael about that.

-- 
Mehdi




 
 
Steven Shi
Intel\SSG\STO\UEFI Firmware
 
Tel: +86 021-61166522
iNet: 821-6522
 
> -----Original Message-----
> Sent: Wednesday, May 18, 2016 4:02 AM
> To: Umesh Kalappa <[hidden email]>
> Cc: Shi, Steven <[hidden email]>; llvm-dev <[hidden email]>;
> Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?
> 
> 
> > On May 17, 2016, at 11:21 AM, Umesh Kalappa
> <[hidden email]> wrote:
> >
> > Steven,
> >
> > As mehdi stated , the optimisation level is specific to linker and it
> > enables Inter-Pro  opts passes ,please  refer function
> 
> To be very clear: the -O option may trigger *linker* optimizations as well,
> independently of LTO.
> 
> --
> Mehdi
> 
> 
> 


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] How to debug if LTO generate wrong code?

David Chisnall via cfe-dev
On Sun, May 29, 2016 at 1:27 PM, Mehdi Amini via llvm-dev
<[hidden email]> wrote:

> Hi,
>
>
> On May 29, 2016, at 7:36 AM, Shi, Steven <[hidden email]> wrote:
>
> Hi Mehdi,
> After deeper debug, I found my firmware LTO wrong code issue is related to
> X64 code model (-mcmodel=large) is always overridden as small
> (-mcmodel=small) if LTO build. And I don't know how to correctly specific
> the large code model for my X64 firmware LTO build. Appreciate if you could
> let me know it.
>
> You know, parts of my Uefi firmware (BIOS) have to been loaded to run in
> high address (larger than 2 GB) at the very beginning, and I need the code
> makes absolutely no assumptions about the addresses and data sections. But
> current LLVM LTO seems stick to use the small code model and generate many
> code with 32-bit RIP-relative addressing, which cause CPU exceptions when
> run in address larger than 2GB.
>
> Below, I just simply reuse the Eli's codemodel1.c example (link:
> http://eli.thegreenplace.net/2012/01/03/understanding-the-x64-code-models)
> to show the LLVM LTO code model issue.
> $ clang -g -O0 codemodel1.c -mcmodel=large -o codemodel1_large.bin
> $ clang -g -O0 codemodel1.c -mcmodel=small -o codemodel1_small.bin
> $ clang -g -O0 -flto codemodel1.c -mcmodel=large -o codemodel1_large_lto.bin
> $ clang -g -O0 -flto codemodel1.c -mcmodel=small -o codemodel1_small_lto.bin
>
> You will see the codemodel1_large_lto.bin and codemodel1_small_lto.bin are
> exactly the same!
> And if you disassemble the codemodel1_large_lto.bin, you will see it uses
> the small code model (32-bit RIP-relative), not large, to do addressing as
> below.
>
> $ objdump -dS codemodel1_large_lto.bin
>
> int main(int argc, const char* argv[])
> {
>   4004f0:       55                      push   %rbp
>   4004f1:       48 89 e5                mov    %rsp,%rbp
>   4004f4:       48 83 ec 20             sub    $0x20,%rsp
>   4004f8:       c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)
>   4004ff:       89 7d f8                mov    %edi,-0x8(%rbp)
>   400502:       48 89 75 f0             mov    %rsi,-0x10(%rbp)
>     int t = global_func(argc);
>   400506:       8b 7d f8                mov    -0x8(%rbp),%edi
>   400509:       e8 d2 ff ff ff          callq  4004e0 <global_func>
>   40050e:       89 45 ec                mov    %eax,-0x14(%rbp)
>     t += global_arr[7];
>   400511:       8b 04 25 4c 10 60 00    mov    0x60104c,%eax
>   400518:       03 45 ec                add    -0x14(%rbp),%eax
>   40051b:       89 45 ec                mov    %eax,-0x14(%rbp)
>     t += static_arr[7];
>   40051e:       8b 04 25 dc 11 60 00    mov    0x6011dc,%eax
>   400525:       03 45 ec                add    -0x14(%rbp),%eax
>   400528:       89 45 ec                mov    %eax,-0x14(%rbp)
>     t += global_arr_big[7];
>   40052b:       8b 04 25 6c 13 60 00    mov    0x60136c,%eax
>   400532:       03 45 ec                add    -0x14(%rbp),%eax
>   400535:       89 45 ec                mov    %eax,-0x14(%rbp)
>     t += static_arr_big[7];
>   400538:       8b 04 25 ac 20 63 00    mov    0x6320ac,%eax
>   40053f:       03 45 ec                add    -0x14(%rbp),%eax
>   400542:       89 45 ec                mov    %eax,-0x14(%rbp)
>     return t;
>   400545:       8b 45 ec                mov    -0x14(%rbp),%eax
>   400548:       48 83 c4 20             add    $0x20,%rsp
>   40054c:       5d                      pop    %rbp
>   40054d:       c3                      retq
>   40054e:       66 90                   xchg   %ax,%ax
>
>
> So, does LTO support large code model? How to correctly specify the LTO code
> model option?
>
>
> Same answer as before: LTO is setup by the linker, so the option for that,
> if it exists, will be linker specific.
>
> As far as I can tell, neither libLTO-based linker (ld64 on OS X for
> example), neither the gold plugin supports such an option and the code model
> is always "default".
>
> I don't know about lld, CC Rafael about that.
>

Neither lld does (yet), to the best of my knowledge.

Cheers,

--
Davide
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] How to debug if LTO generate wrong code?

David Chisnall via cfe-dev
In reply to this post by David Chisnall via cfe-dev

Hi Mehdi,

GCC LTO seems support large code model in my side as below, if the code model is linker specific, does the GCC LTO use a special linker which is different from the one in GNU Binutils?

I’m a bit surprised if both OS X ld64 and gold plugin do not support large code model in LTO. Since modern system widely use the 64bit, the code need to run in high address (larger than 2 GB) is a reasonable requirement.

 

$ gcc -g -O0 -flto codemodel1.c -mcmodel=large -o codemodel1_large_lto_gcc.bin

$ objdump -dS codemodel1_large_lto_gcc.bin

 

int main(int argc, const char* argv[])

{

  40048b:       55                      push   %rbp

  40048c:       48 89 e5                mov    %rsp,%rbp

  40048f:       48 83 ec 20             sub    $0x20,%rsp

  400493:       89 7d ec                mov    %edi,-0x14(%rbp)

  400496:       48 89 75 e0             mov    %rsi,-0x20(%rbp)

    int t = global_func(argc);

  40049a:       8b 45 ec                mov    -0x14(%rbp),%eax

  40049d:       89 c7                   mov    %eax,%edi

  40049f:       48 b8 76 04 40 00 00    movabs $0x400476,%rax

  4004a6:       00 00 00

  4004a9:       ff d0                   callq  *%rax

  4004ab:       89 45 fc                mov    %eax,-0x4(%rbp)

    t += global_arr[7];

  4004ae:       48 b8 20 09 60 00 00    movabs $0x600920,%rax

  4004b5:       00 00 00

  4004b8:       8b 40 1c                mov    0x1c(%rax),%eax

  4004bb:       01 45 fc                add    %eax,-0x4(%rbp)

    t += static_arr[7];

  4004be:       48 b8 c0 0a 60 00 00    movabs $0x600ac0,%rax

  4004c5:       00 00 00

  4004c8:       8b 40 1c                mov    0x1c(%rax),%eax

  4004cb:       01 45 fc                add    %eax,-0x4(%rbp)

    t += global_arr_big[7];

  4004ce:       48 b8 60 0c 60 00 00    movabs $0x600c60,%rax

  4004d5:       00 00 00

  4004d8:       8b 40 1c                mov    0x1c(%rax),%eax

  4004db:       01 45 fc                add    %eax,-0x4(%rbp)

    t += static_arr_big[7];

  4004de:       48 b8 a0 19 63 00 00    movabs $0x6319a0,%rax

  4004e5:       00 00 00

  4004e8:       8b 40 1c                mov    0x1c(%rax),%eax

  4004eb:       01 45 fc                add    %eax,-0x4(%rbp)

    return t;

  4004ee:       8b 45 fc                mov    -0x4(%rbp),%eax

}

 

Steven Shi

Intel\SSG\STO\UEFI Firmware

 

Tel: +86 021-61166522

iNet: 821-6522

 

From: [hidden email] [mailto:[hidden email]]
Sent: Monday, May 30, 2016 4:28 AM
To: Shi, Steven <[hidden email]>
Cc: Umesh Kalappa <[hidden email]>; [hidden email]; llvm-dev <[hidden email]>; [hidden email]; Rafael Espíndola <[hidden email]>
Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?

 

Hi,

 

 

On May 29, 2016, at 7:36 AM, Shi, Steven <[hidden email]> wrote:

 

Hi Mehdi,

After deeper debug, I found my firmware LTO wrong code issue is related to X64 code model (-mcmodel=large) is always overridden as small (-mcmodel=small) if LTO build. And I don't know how to correctly specific the large code model for my X64 firmware LTO build. Appreciate if you could let me know it. 

 

You know, parts of my Uefi firmware (BIOS) have to been loaded to run in high address (larger than 2 GB) at the very beginning, and I need the code makes absolutely no assumptions about the addresses and data sections. But current LLVM LTO seems stick to use the small code model and generate many code with 32-bit RIP-relative addressing, which cause CPU exceptions when run in address larger than 2GB.

 

Below, I just simply reuse the Eli's codemodel1.c example (link: http://eli.thegreenplace.net/2012/01/03/understanding-the-x64-code-models) to show the LLVM LTO code model issue.

$ clang -g -O0 codemodel1.c -mcmodel=large -o codemodel1_large.bin

$ clang -g -O0 codemodel1.c -mcmodel=small -o codemodel1_small.bin

$ clang -g -O0 -flto codemodel1.c -mcmodel=large -o codemodel1_large_lto.bin

$ clang -g -O0 -flto codemodel1.c -mcmodel=small -o codemodel1_small_lto.bin

 

You will see the codemodel1_large_lto.bin and codemodel1_small_lto.bin are exactly the same!

And if you disassemble the codemodel1_large_lto.bin, you will see it uses the small code model (32-bit RIP-relative), not large, to do addressing as below.

 

$ objdump -dS codemodel1_large_lto.bin

 

int main(int argc, const char* argv[])

{

  4004f0:       55                      push   %rbp

  4004f1:       48 89 e5                mov    %rsp,%rbp

  4004f4:       48 83 ec 20             sub    $0x20,%rsp

  4004f8:       c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)

  4004ff:       89 7d f8                mov    %edi,-0x8(%rbp)

  400502:       48 89 75 f0             mov    %rsi,-0x10(%rbp)

    int t = global_func(argc);

  400506:       8b 7d f8                mov    -0x8(%rbp),%edi

  400509:       e8 d2 ff ff ff          callq  4004e0 <global_func>

  40050e:       89 45 ec                mov    %eax,-0x14(%rbp)

    t += global_arr[7];

  400511:       8b 04 25 4c 10 60 00    mov    0x60104c,%eax

  400518:       03 45 ec                add    -0x14(%rbp),%eax

  40051b:       89 45 ec                mov    %eax,-0x14(%rbp)

    t += static_arr[7];

  40051e:       8b 04 25 dc 11 60 00    mov    0x6011dc,%eax

  400525:       03 45 ec                add    -0x14(%rbp),%eax

  400528:       89 45 ec                mov    %eax,-0x14(%rbp)

    t += global_arr_big[7];

  40052b:       8b 04 25 6c 13 60 00    mov    0x60136c,%eax

  400532:       03 45 ec                add    -0x14(%rbp),%eax

  400535:       89 45 ec                mov    %eax,-0x14(%rbp)

    t += static_arr_big[7];

  400538:       8b 04 25 ac 20 63 00    mov    0x6320ac,%eax

  40053f:       03 45 ec                add    -0x14(%rbp),%eax

  400542:       89 45 ec                mov    %eax,-0x14(%rbp)

    return t;

  400545:       8b 45 ec                mov    -0x14(%rbp),%eax

  400548:       48 83 c4 20             add    $0x20,%rsp

  40054c:       5d                      pop    %rbp

  40054d:       c3                      retq

  40054e:       66 90                   xchg   %ax,%ax

 

 

So, does LTO support large code model? How to correctly specify the LTO code model option?

 

Same answer as before: LTO is setup by the linker, so the option for that, if it exists, will be linker specific.

 

As far as I can tell, neither libLTO-based linker (ld64 on OS X for example), neither the gold plugin supports such an option and the code model is always "default". 

 

I don't know about lld, CC Rafael about that.

 

-- 

Mehdi

 

 

 



 

 

Steven Shi

Intel\SSG\STO\UEFI Firmware

 

Tel: +86 021-61166522

iNet: 821-6522

 

> -----Original Message-----

> Sent: Wednesday, May 18, 2016 4:02 AM

> To: Umesh Kalappa <[hidden email]>

> Cc: Shi, Steven <[hidden email]>; llvm-dev <[hidden email]>;

> Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?

> 

> 

> > On May 17, 2016, at 11:21 AM, Umesh Kalappa

> <[hidden email]> wrote:

> >

> > Steven,

> >

> > As mehdi stated , the optimisation level is specific to linker and it

> > enables Inter-Pro  opts passes ,please  refer function

> 

> To be very clear: the -O option may trigger *linker* optimizations as well,

> independently of LTO.

> 

> --

> Mehdi

> 

> 

> 

 


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] How to debug if LTO generate wrong code?

David Chisnall via cfe-dev

On May 29, 2016, at 5:10 PM, Shi, Steven <[hidden email]> wrote:

Hi Mehdi,
GCC LTO seems support large code model in my side as below, if the code model is linker specific, does the GCC LTO use a special linker which is different from the one in GNU Binutils?

I don't know anything about GCC.
(And I doubt the GNU linker supports LTO with LLVM).

I’m a bit surprised if both OS X ld64 and gold plugin do not support large code model in LTO. Since modern system widely use the 64bit, the code need to run in high address (larger than 2 GB) is a reasonable requirement.

The fact that we don't support it for now seems to indicate that it is not a widely requested feature, especially considering that it is really a trivial option to add.
What is the linker you're using? Are you building your own clang?

-- 
Mehdi



 
$ gcc -g -O0 -flto codemodel1.c -mcmodel=large -o codemodel1_large_lto_gcc.bin
$ objdump -dS codemodel1_large_lto_gcc.bin
 
int main(int argc, const char* argv[])
{
  40048b:       55                      push   %rbp
  40048c:       48 89 e5                mov    %rsp,%rbp
  40048f:       48 83 ec 20             sub    $0x20,%rsp
  400493:       89 7d ec                mov    %edi,-0x14(%rbp)
  400496:       48 89 75 e0             mov    %rsi,-0x20(%rbp)
    int t = global_func(argc);
  40049a:       8b 45 ec                mov    -0x14(%rbp),%eax
  40049d:       89 c7                   mov    %eax,%edi
  40049f:       48 b8 76 04 40 00 00    movabs $0x400476,%rax
  4004a6:       00 00 00
  4004a9:       ff d0                   callq  *%rax
  4004ab:       89 45 fc                mov    %eax,-0x4(%rbp)
    t += global_arr[7];
  4004ae:       48 b8 20 09 60 00 00    movabs $0x600920,%rax
  4004b5:       00 00 00
  4004b8:       8b 40 1c                mov    0x1c(%rax),%eax
  4004bb:       01 45 fc                add    %eax,-0x4(%rbp)
    t += static_arr[7];
  4004be:       48 b8 c0 0a 60 00 00    movabs $0x600ac0,%rax
  4004c5:       00 00 00
  4004c8:       8b 40 1c                mov    0x1c(%rax),%eax
  4004cb:       01 45 fc                add    %eax,-0x4(%rbp)
    t += global_arr_big[7];
  4004ce:       48 b8 60 0c 60 00 00    movabs $0x600c60,%rax
  4004d5:       00 00 00
  4004d8:       8b 40 1c                mov    0x1c(%rax),%eax
  4004db:       01 45 fc                add    %eax,-0x4(%rbp)
    t += static_arr_big[7];
  4004de:       48 b8 a0 19 63 00 00    movabs $0x6319a0,%rax
  4004e5:       00 00 00
  4004e8:       8b 40 1c                mov    0x1c(%rax),%eax
  4004eb:       01 45 fc                add    %eax,-0x4(%rbp)
    return t;
  4004ee:       8b 45 fc                mov    -0x4(%rbp),%eax
}
 
Steven Shi
Intel\SSG\STO\UEFI Firmware
 
Tel: +86 021-61166522
iNet: 821-6522
 
From: [hidden email] [[hidden email]] 
Sent: Monday, May 30, 2016 4:28 AM
To: Shi, Steven <[hidden email]>
Cc: Umesh Kalappa <[hidden email]>; [hidden email]; llvm-dev <[hidden email]>; [hidden email]; Rafael Espíndola <[hidden email]>
Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?
 
Hi,
 
 
On May 29, 2016, at 7:36 AM, Shi, Steven <[hidden email]> wrote:
 
Hi Mehdi,
After deeper debug, I found my firmware LTO wrong code issue is related to X64 code model (-mcmodel=large) is always overridden as small (-mcmodel=small) if LTO build. And I don't know how to correctly specific the large code model for my X64 firmware LTO build. Appreciate if you could let me know it. 
 
You know, parts of my Uefi firmware (BIOS) have to been loaded to run in high address (larger than 2 GB) at the very beginning, and I need the code makes absolutely no assumptions about the addresses and data sections. But current LLVM LTO seems stick to use the small code model and generate many code with 32-bit RIP-relative addressing, which cause CPU exceptions when run in address larger than 2GB.
 
Below, I just simply reuse the Eli's codemodel1.c example (link: http://eli.thegreenplace.net/2012/01/03/understanding-the-x64-code-models) to show the LLVM LTO code model issue.
$ clang -g -O0 codemodel1.c -mcmodel=large -o codemodel1_large.bin
$ clang -g -O0 codemodel1.c -mcmodel=small -o codemodel1_small.bin
$ clang -g -O0 -flto codemodel1.c -mcmodel=large -o codemodel1_large_lto.bin
$ clang -g -O0 -flto codemodel1.c -mcmodel=small -o codemodel1_small_lto.bin
 
You will see the codemodel1_large_lto.bin and codemodel1_small_lto.bin are exactly the same!
And if you disassemble the codemodel1_large_lto.bin, you will see it uses the small code model (32-bit RIP-relative), not large, to do addressing as below.
 
$ objdump -dS codemodel1_large_lto.bin
 
int main(int argc, const char* argv[])
{
  4004f0:       55                      push   %rbp
  4004f1:       48 89 e5                mov    %rsp,%rbp
  4004f4:       48 83 ec 20             sub    $0x20,%rsp
  4004f8:       c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)
  4004ff:       89 7d f8                mov    %edi,-0x8(%rbp)
  400502:       48 89 75 f0             mov    %rsi,-0x10(%rbp)
    int t = global_func(argc);
  400506:       8b 7d f8                mov    -0x8(%rbp),%edi
  400509:       e8 d2 ff ff ff          callq  4004e0 <global_func>
  40050e:       89 45 ec                mov    %eax,-0x14(%rbp)
    t += global_arr[7];
  400511:       8b 04 25 4c 10 60 00    mov    0x60104c,%eax
  400518:       03 45 ec                add    -0x14(%rbp),%eax
  40051b:       89 45 ec                mov    %eax,-0x14(%rbp)
    t += static_arr[7];
  40051e:       8b 04 25 dc 11 60 00    mov    0x6011dc,%eax
  400525:       03 45 ec                add    -0x14(%rbp),%eax
  400528:       89 45 ec                mov    %eax,-0x14(%rbp)
    t += global_arr_big[7];
  40052b:       8b 04 25 6c 13 60 00    mov    0x60136c,%eax
  400532:       03 45 ec                add    -0x14(%rbp),%eax
  400535:       89 45 ec                mov    %eax,-0x14(%rbp)
    t += static_arr_big[7];
  400538:       8b 04 25 ac 20 63 00    mov    0x6320ac,%eax
  40053f:       03 45 ec                add    -0x14(%rbp),%eax
  400542:       89 45 ec                mov    %eax,-0x14(%rbp)
    return t;
  400545:       8b 45 ec                mov    -0x14(%rbp),%eax
  400548:       48 83 c4 20             add    $0x20,%rsp
  40054c:       5d                      pop    %rbp
  40054d:       c3                      retq
  40054e:       66 90                   xchg   %ax,%ax
 
 
So, does LTO support large code model? How to correctly specify the LTO code model option?
 
Same answer as before: LTO is setup by the linker, so the option for that, if it exists, will be linker specific.
 
As far as I can tell, neither libLTO-based linker (ld64 on OS X for example), neither the gold plugin supports such an option and the code model is always "default". 
 
I don't know about lld, CC Rafael about that.
 
-- 
Mehdi
 
 
 


 
 
Steven Shi
Intel\SSG\STO\UEFI Firmware
 
Tel: +86 021-61166522
iNet: 821-6522
 
> -----Original Message-----
> Sent: Wednesday, May 18, 2016 4:02 AM
> To: Umesh Kalappa <[hidden email]>
> Cc: Shi, Steven <[hidden email]>; llvm-dev <[hidden email]>;
> Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?
> 
> 
> > On May 17, 2016, at 11:21 AM, Umesh Kalappa
> <[hidden email]> wrote:
> >
> > Steven,
> >
> > As mehdi stated , the optimisation level is specific to linker and it
> > enables Inter-Pro  opts passes ,please  refer function
> 
> To be very clear: the -O option may trigger *linker* optimizations as well,
> independently of LTO.
> 
> --
> Mehdi
> 
> 
>


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] How to debug if LTO generate wrong code?

David Chisnall via cfe-dev
In reply to this post by David Chisnall via cfe-dev
On Mon, May 30, 2016 at 12:10:09AM +0000, Shi, Steven via cfe-dev wrote:
> I'm a bit surprised if both OS X ld64 and gold plugin do not support
> large code model in LTO. Since modern system widely use the 64bit, the
> code need to run in high address (larger than 2 GB) is a reasonable requirement.

Actually, given that PIC is (almost) free in terms of codegen, there is
rarely a need for the large model on AMD64. Programs with more than 2GB
of static data are moderately rare and programs with more than 2GB of
text even more so.

Joerg
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] How to debug if LTO generate wrong code?

David Chisnall via cfe-dev
In reply to this post by David Chisnall via cfe-dev

(And I doubt the GNU linker supports LTO with LLVM).

[Steven]: I’ve pushed GNU Binutils ld to support LLVM gold plugin, see detail in this bug https://sourceware.org/bugzilla/show_bug.cgi?id=20070. The new GNU ld linker works well with LLVM/Clang LTO when build IA32 code in my side. And from the ld owner input in the bug comments, the current X64 LLVM LTO issue is in llvm LTO plugin.

 

 

The fact that we don't support it for now seems to indicate that it is not a widely requested feature, especially considering that it is really a trivial option to add.

What is the linker you're using? Are you building your own clang?

[Steven]: I’m using the standard LLVM 3.8 with the above GNU new ld linker. I can build my own clang in my side if needed. I’m happy to know it is not difficult to enable the large code model in LLVM LTO and “it is really a trivial option to add”. Could you let me know how to enable it? My lots of work have been blocked by the large code model issue. Thank you!

 

 

Steven Shi

Intel\SSG\STO\UEFI Firmware

 

Tel: +86 021-61166522

iNet: 821-6522

 

From: [hidden email] [mailto:[hidden email]]
Sent: Monday, May 30, 2016 8:17 AM
To: Shi, Steven <[hidden email]>
Cc: Umesh Kalappa <[hidden email]>; [hidden email]; llvm-dev <[hidden email]>; [hidden email]; Rafael Espíndola <[hidden email]>
Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?

 

 

On May 29, 2016, at 5:10 PM, Shi, Steven <[hidden email]> wrote:

 

Hi Mehdi,

GCC LTO seems support large code model in my side as below, if the code model is linker specific, does the GCC LTO use a special linker which is different from the one in GNU Binutils?

 

I don't know anything about GCC.

(And I doubt the GNU linker supports LTO with LLVM).



I’m a bit surprised if both OS X ld64 and gold plugin do not support large code model in LTO. Since modern system widely use the 64bit, the code need to run in high address (larger than 2 GB) is a reasonable requirement.

 

The fact that we don't support it for now seems to indicate that it is not a widely requested feature, especially considering that it is really a trivial option to add.

What is the linker you're using? Are you building your own clang?

 

-- 

Mehdi

 

 



 

$ gcc -g -O0 -flto codemodel1.c -mcmodel=large -o codemodel1_large_lto_gcc.bin

$ objdump -dS codemodel1_large_lto_gcc.bin

 

int main(int argc, const char* argv[])

{

  40048b:       55                      push   %rbp

  40048c:       48 89 e5                mov    %rsp,%rbp

  40048f:       48 83 ec 20             sub    $0x20,%rsp

  400493:       89 7d ec                mov    %edi,-0x14(%rbp)

  400496:       48 89 75 e0             mov    %rsi,-0x20(%rbp)

    int t = global_func(argc);

  40049a:       8b 45 ec                mov    -0x14(%rbp),%eax

  40049d:       89 c7                   mov    %eax,%edi

  40049f:       48 b8 76 04 40 00 00    movabs $0x400476,%rax

  4004a6:       00 00 00

  4004a9:       ff d0                   callq  *%rax

  4004ab:       89 45 fc                mov    %eax,-0x4(%rbp)

    t += global_arr[7];

  4004ae:       48 b8 20 09 60 00 00    movabs $0x600920,%rax

  4004b5:       00 00 00

  4004b8:       8b 40 1c                mov    0x1c(%rax),%eax

  4004bb:       01 45 fc                add    %eax,-0x4(%rbp)

    t += static_arr[7];

  4004be:       48 b8 c0 0a 60 00 00    movabs $0x600ac0,%rax

  4004c5:       00 00 00

  4004c8:       8b 40 1c                mov    0x1c(%rax),%eax

  4004cb:       01 45 fc                add    %eax,-0x4(%rbp)

    t += global_arr_big[7];

  4004ce:       48 b8 60 0c 60 00 00    movabs $0x600c60,%rax

  4004d5:       00 00 00

  4004d8:       8b 40 1c                mov    0x1c(%rax),%eax

  4004db:       01 45 fc                add    %eax,-0x4(%rbp)

    t += static_arr_big[7];

  4004de:       48 b8 a0 19 63 00 00    movabs $0x6319a0,%rax

  4004e5:       00 00 00

  4004e8:       8b 40 1c                mov    0x1c(%rax),%eax

  4004eb:       01 45 fc                add    %eax,-0x4(%rbp)

    return t;

  4004ee:       8b 45 fc                mov    -0x4(%rbp),%eax

}

 

Steven Shi

Intel\SSG\STO\UEFI Firmware

 

Tel: +86 021-61166522

iNet: 821-6522

 

From: [hidden email] [[hidden email]] 
Sent: Monday, May 30, 2016 4:28 AM
To: Shi, Steven <[hidden email]>
Cc: Umesh Kalappa <[hidden email]>; [hidden email]; llvm-dev <[hidden email]>; [hidden email]; Rafael Espíndola <[hidden email]>
Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?

 

Hi,

 

 

On May 29, 2016, at 7:36 AM, Shi, Steven <[hidden email]> wrote:

 

Hi Mehdi,

After deeper debug, I found my firmware LTO wrong code issue is related to X64 code model (-mcmodel=large) is always overridden as small (-mcmodel=small) if LTO build. And I don't know how to correctly specific the large code model for my X64 firmware LTO build. Appreciate if you could let me know it. 

 

You know, parts of my Uefi firmware (BIOS) have to been loaded to run in high address (larger than 2 GB) at the very beginning, and I need the code makes absolutely no assumptions about the addresses and data sections. But current LLVM LTO seems stick to use the small code model and generate many code with 32-bit RIP-relative addressing, which cause CPU exceptions when run in address larger than 2GB.

 

Below, I just simply reuse the Eli's codemodel1.c example (link: http://eli.thegreenplace.net/2012/01/03/understanding-the-x64-code-models) to show the LLVM LTO code model issue.

$ clang -g -O0 codemodel1.c -mcmodel=large -o codemodel1_large.bin

$ clang -g -O0 codemodel1.c -mcmodel=small -o codemodel1_small.bin

$ clang -g -O0 -flto codemodel1.c -mcmodel=large -o codemodel1_large_lto.bin

$ clang -g -O0 -flto codemodel1.c -mcmodel=small -o codemodel1_small_lto.bin

 

You will see the codemodel1_large_lto.bin and codemodel1_small_lto.bin are exactly the same!

And if you disassemble the codemodel1_large_lto.bin, you will see it uses the small code model (32-bit RIP-relative), not large, to do addressing as below.

 

$ objdump -dS codemodel1_large_lto.bin

 

int main(int argc, const char* argv[])

{

  4004f0:       55                      push   %rbp

  4004f1:       48 89 e5                mov    %rsp,%rbp

  4004f4:       48 83 ec 20             sub    $0x20,%rsp

  4004f8:       c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)

  4004ff:       89 7d f8                mov    %edi,-0x8(%rbp)

  400502:       48 89 75 f0             mov    %rsi,-0x10(%rbp)

    int t = global_func(argc);

  400506:       8b 7d f8                mov    -0x8(%rbp),%edi

  400509:       e8 d2 ff ff ff          callq  4004e0 <global_func>

  40050e:       89 45 ec                mov    %eax,-0x14(%rbp)

    t += global_arr[7];

  400511:       8b 04 25 4c 10 60 00    mov    0x60104c,%eax

  400518:       03 45 ec                add    -0x14(%rbp),%eax

  40051b:       89 45 ec                mov    %eax,-0x14(%rbp)

    t += static_arr[7];

  40051e:       8b 04 25 dc 11 60 00    mov    0x6011dc,%eax

  400525:       03 45 ec                add    -0x14(%rbp),%eax

  400528:       89 45 ec                mov    %eax,-0x14(%rbp)

    t += global_arr_big[7];

  40052b:       8b 04 25 6c 13 60 00    mov    0x60136c,%eax

  400532:       03 45 ec                add    -0x14(%rbp),%eax

  400535:       89 45 ec                mov    %eax,-0x14(%rbp)

    t += static_arr_big[7];

  400538:       8b 04 25 ac 20 63 00    mov    0x6320ac,%eax

  40053f:       03 45 ec                add    -0x14(%rbp),%eax

  400542:       89 45 ec                mov    %eax,-0x14(%rbp)

    return t;

  400545:       8b 45 ec                mov    -0x14(%rbp),%eax

  400548:       48 83 c4 20             add    $0x20,%rsp

  40054c:       5d                      pop    %rbp

  40054d:       c3                      retq

  40054e:       66 90                   xchg   %ax,%ax

 

 

So, does LTO support large code model? How to correctly specify the LTO code model option?

 

Same answer as before: LTO is setup by the linker, so the option for that, if it exists, will be linker specific.

 

As far as I can tell, neither libLTO-based linker (ld64 on OS X for example), neither the gold plugin supports such an option and the code model is always "default". 

 

I don't know about lld, CC Rafael about that.

 

-- 

Mehdi

 

 

 




 

 

Steven Shi

Intel\SSG\STO\UEFI Firmware

 

Tel: +86 021-61166522

iNet: 821-6522

 

> -----Original Message-----

> Sent: Wednesday, May 18, 2016 4:02 AM

> To: Umesh Kalappa <[hidden email]>

> Cc: Shi, Steven <[hidden email]>; llvm-dev <[hidden email]>;

> Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?

> 

> 

> > On May 17, 2016, at 11:21 AM, Umesh Kalappa

> <[hidden email]> wrote:

> >

> > Steven,

> >

> > As mehdi stated , the optimisation level is specific to linker and it

> > enables Inter-Pro  opts passes ,please  refer function

> 

> To be very clear: the -O option may trigger *linker* optimizations as well,

> independently of LTO.

> 

> --

> Mehdi

> 

> 

> 

 


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] How to debug if LTO generate wrong code?

David Chisnall via cfe-dev
In reply to this post by David Chisnall via cfe-dev
Hi Joerg,
My firmware case is I need load my firmware to run in high address which is hardware required. It is not that my firmware really need large static data and text code. My firmware modules are shared library (like a DLL) built with "-fpic", and firmware loader will load them to high address (larger than 2GB). I think this need is quite common to system software, like firmware, driver and kernel mode code.


Steven Shi
Intel\SSG\STO\UEFI Firmware

Tel: +86 021-61166522
iNet: 821-6522


> -----Original Message-----
> From: llvm-dev [mailto:[hidden email]] On Behalf Of Joerg
> Sonnenberger via llvm-dev
> Sent: Monday, May 30, 2016 8:27 AM
> To: [hidden email]; llvm-dev <[hidden email]>
> Subject: Re: [llvm-dev] [cfe-dev] How to debug if LTO generate wrong code?
>
> On Mon, May 30, 2016 at 12:10:09AM +0000, Shi, Steven via cfe-dev wrote:
> > I'm a bit surprised if both OS X ld64 and gold plugin do not support
> > large code model in LTO. Since modern system widely use the 64bit, the
> > code need to run in high address (larger than 2 GB) is a reasonable
> requirement.
>
> Actually, given that PIC is (almost) free in terms of codegen, there is
> rarely a need for the large model on AMD64. Programs with more than 2GB
> of static data are moderately rare and programs with more than 2GB of
> text even more so.
>
> Joerg
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] How to debug if LTO generate wrong code?

David Chisnall via cfe-dev
In reply to this post by David Chisnall via cfe-dev

On May 29, 2016, at 5:44 PM, Shi, Steven <[hidden email]> wrote:

(And I doubt the GNU linker supports LTO with LLVM).
[Steven]: I’ve pushed GNU Binutils ld to support LLVM gold plugin, see detail in this bug https://sourceware.org/bugzilla/show_bug.cgi?id=20070. The new GNU ld linker works well with LLVM/Clang LTO when build IA32 code in my side. And from the ld owner input in the bug comments, the current X64 LLVM LTO issue is in llvm LTO plugin.
 
 
The fact that we don't support it for now seems to indicate that it is not a widely requested feature, especially considering that it is really a trivial option to add.
What is the linker you're using? Are you building your own clang?
[Steven]: I’m using the standard LLVM 3.8 with the above GNU new ld linker. I can build my own clang in my side if needed. I’m happy to know it is not difficult to enable the large code model in LLVM LTO and “it is really a trivial option to add”. Could you let me know how to enable it? My lots of work have been blocked by the large code model issue. Thank you!


I can't test it locally, but here is a starting point in the gold plugin, inspired by the code present in clang:



You need to use your linker-specific way of passing the option "-lto-use-large-codemodel=..." to the plugin.

Let me know if it works for you!

-- 
Mehdi


 
 
Steven Shi
Intel\SSG\STO\UEFI Firmware
 
Tel: +86 021-61166522
iNet: 821-6522
 
From: [hidden email] [[hidden email]] 
Sent: Monday, May 30, 2016 8:17 AM
To: Shi, Steven <[hidden email]>
Cc: Umesh Kalappa <[hidden email]>; [hidden email]; llvm-dev <[hidden email]>; [hidden email]; Rafael Espíndola <[hidden email]>
Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?
 
 
On May 29, 2016, at 5:10 PM, Shi, Steven <[hidden email]> wrote:
 
Hi Mehdi,
GCC LTO seems support large code model in my side as below, if the code model is linker specific, does the GCC LTO use a special linker which is different from the one in GNU Binutils?
 
I don't know anything about GCC.
(And I doubt the GNU linker supports LTO with LLVM).


I’m a bit surprised if both OS X ld64 and gold plugin do not support large code model in LTO. Since modern system widely use the 64bit, the code need to run in high address (larger than 2 GB) is a reasonable requirement.
 
The fact that we don't support it for now seems to indicate that it is not a widely requested feature, especially considering that it is really a trivial option to add.
What is the linker you're using? Are you building your own clang?
 
-- 
Mehdi
 
 


 
$ gcc -g -O0 -flto codemodel1.c -mcmodel=large -o codemodel1_large_lto_gcc.bin
$ objdump -dS codemodel1_large_lto_gcc.bin
 
int main(int argc, const char* argv[])
{
  40048b:       55                      push   %rbp
  40048c:       48 89 e5                mov    %rsp,%rbp
  40048f:       48 83 ec 20             sub    $0x20,%rsp
  400493:       89 7d ec                mov    %edi,-0x14(%rbp)
  400496:       48 89 75 e0             mov    %rsi,-0x20(%rbp)
    int t = global_func(argc);
  40049a:       8b 45 ec                mov    -0x14(%rbp),%eax
  40049d:       89 c7                   mov    %eax,%edi
  40049f:       48 b8 76 04 40 00 00    movabs $0x400476,%rax
  4004a6:       00 00 00
  4004a9:       ff d0                   callq  *%rax
  4004ab:       89 45 fc                mov    %eax,-0x4(%rbp)
    t += global_arr[7];
  4004ae:       48 b8 20 09 60 00 00    movabs $0x600920,%rax
  4004b5:       00 00 00
  4004b8:       8b 40 1c                mov    0x1c(%rax),%eax
  4004bb:       01 45 fc                add    %eax,-0x4(%rbp)
    t += static_arr[7];
  4004be:       48 b8 c0 0a 60 00 00    movabs $0x600ac0,%rax
  4004c5:       00 00 00
  4004c8:       8b 40 1c                mov    0x1c(%rax),%eax
  4004cb:       01 45 fc                add    %eax,-0x4(%rbp)
    t += global_arr_big[7];
  4004ce:       48 b8 60 0c 60 00 00    movabs $0x600c60,%rax
  4004d5:       00 00 00
  4004d8:       8b 40 1c                mov    0x1c(%rax),%eax
  4004db:       01 45 fc                add    %eax,-0x4(%rbp)
    t += static_arr_big[7];
  4004de:       48 b8 a0 19 63 00 00    movabs $0x6319a0,%rax
  4004e5:       00 00 00
  4004e8:       8b 40 1c                mov    0x1c(%rax),%eax
  4004eb:       01 45 fc                add    %eax,-0x4(%rbp)
    return t;
  4004ee:       8b 45 fc                mov    -0x4(%rbp),%eax
}
 
Steven Shi
Intel\SSG\STO\UEFI Firmware
 
Tel: +86 021-61166522
iNet: 821-6522
 
From: [hidden email] [[hidden email]] 
Sent: Monday, May 30, 2016 4:28 AM
To: Shi, Steven <[hidden email]>
Cc: Umesh Kalappa <[hidden email]>; [hidden email]; llvm-dev <[hidden email]>; [hidden email]; Rafael Espíndola <[hidden email]>
Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?
 
Hi,
 
 
On May 29, 2016, at 7:36 AM, Shi, Steven <[hidden email]> wrote:
 
Hi Mehdi,
After deeper debug, I found my firmware LTO wrong code issue is related to X64 code model (-mcmodel=large) is always overridden as small (-mcmodel=small) if LTO build. And I don't know how to correctly specific the large code model for my X64 firmware LTO build. Appreciate if you could let me know it. 
 
You know, parts of my Uefi firmware (BIOS) have to been loaded to run in high address (larger than 2 GB) at the very beginning, and I need the code makes absolutely no assumptions about the addresses and data sections. But current LLVM LTO seems stick to use the small code model and generate many code with 32-bit RIP-relative addressing, which cause CPU exceptions when run in address larger than 2GB.
 
Below, I just simply reuse the Eli's codemodel1.c example (link: http://eli.thegreenplace.net/2012/01/03/understanding-the-x64-code-models) to show the LLVM LTO code model issue.
$ clang -g -O0 codemodel1.c -mcmodel=large -o codemodel1_large.bin
$ clang -g -O0 codemodel1.c -mcmodel=small -o codemodel1_small.bin
$ clang -g -O0 -flto codemodel1.c -mcmodel=large -o codemodel1_large_lto.bin
$ clang -g -O0 -flto codemodel1.c -mcmodel=small -o codemodel1_small_lto.bin
 
You will see the codemodel1_large_lto.bin and codemodel1_small_lto.bin are exactly the same!
And if you disassemble the codemodel1_large_lto.bin, you will see it uses the small code model (32-bit RIP-relative), not large, to do addressing as below.
 
$ objdump -dS codemodel1_large_lto.bin
 
int main(int argc, const char* argv[])
{
  4004f0:       55                      push   %rbp
  4004f1:       48 89 e5                mov    %rsp,%rbp
  4004f4:       48 83 ec 20             sub    $0x20,%rsp
  4004f8:       c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)
  4004ff:       89 7d f8                mov    %edi,-0x8(%rbp)
  400502:       48 89 75 f0             mov    %rsi,-0x10(%rbp)
    int t = global_func(argc);
  400506:       8b 7d f8                mov    -0x8(%rbp),%edi
  400509:       e8 d2 ff ff ff          callq  4004e0 <global_func>
  40050e:       89 45 ec                mov    %eax,-0x14(%rbp)
    t += global_arr[7];
  400511:       8b 04 25 4c 10 60 00    mov    0x60104c,%eax
  400518:       03 45 ec                add    -0x14(%rbp),%eax
  40051b:       89 45 ec                mov    %eax,-0x14(%rbp)
    t += static_arr[7];
  40051e:       8b 04 25 dc 11 60 00    mov    0x6011dc,%eax
  400525:       03 45 ec                add    -0x14(%rbp),%eax
  400528:       89 45 ec                mov    %eax,-0x14(%rbp)
    t += global_arr_big[7];
  40052b:       8b 04 25 6c 13 60 00    mov    0x60136c,%eax
  400532:       03 45 ec                add    -0x14(%rbp),%eax
  400535:       89 45 ec                mov    %eax,-0x14(%rbp)
    t += static_arr_big[7];
  400538:       8b 04 25 ac 20 63 00    mov    0x6320ac,%eax
  40053f:       03 45 ec                add    -0x14(%rbp),%eax
  400542:       89 45 ec                mov    %eax,-0x14(%rbp)
    return t;
  400545:       8b 45 ec                mov    -0x14(%rbp),%eax
  400548:       48 83 c4 20             add    $0x20,%rsp
  40054c:       5d                      pop    %rbp
  40054d:       c3                      retq
  40054e:       66 90                   xchg   %ax,%ax
 
 
So, does LTO support large code model? How to correctly specify the LTO code model option?
 
Same answer as before: LTO is setup by the linker, so the option for that, if it exists, will be linker specific.
 
As far as I can tell, neither libLTO-based linker (ld64 on OS X for example), neither the gold plugin supports such an option and the code model is always "default". 
 
I don't know about lld, CC Rafael about that.
 
-- 
Mehdi
 
 
 



 
 
Steven Shi
Intel\SSG\STO\UEFI Firmware
 
Tel: +86 021-61166522
iNet: 821-6522
 
> -----Original Message-----
> Sent: Wednesday, May 18, 2016 4:02 AM
> To: Umesh Kalappa <[hidden email]>
> Cc: Shi, Steven <[hidden email]>; llvm-dev <[hidden email]>;
> Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?
> 
> 
> > On May 17, 2016, at 11:21 AM, Umesh Kalappa
> <[hidden email]> wrote:
> >
> > Steven,
> >
> > As mehdi stated , the optimisation level is specific to linker and it
> > enables Inter-Pro  opts passes ,please  refer function
> 
> To be very clear: the -O option may trigger *linker* optimizations as well,
> independently of LTO.
> 
> --
> Mehdi
> 
> 
>


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

code-model-gold.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] How to debug if LTO generate wrong code?

David Chisnall via cfe-dev

Hi Mehdi,

Should I apply your attached patch on my llvm3.8 source firstly? Or should I use the latest llvm SVN trunk instead?

 

 

Steven Shi

Intel\SSG\STO\UEFI Firmware

 

Tel: +86 021-61166522

iNet: 821-6522

 

From: [hidden email] [mailto:[hidden email]]
Sent: Monday, May 30, 2016 2:13 PM
To: Shi, Steven <[hidden email]>
Cc: Umesh Kalappa <[hidden email]>; [hidden email]; llvm-dev <[hidden email]>; [hidden email]; Rafael Espíndola <[hidden email]>
Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?

 

 

On May 29, 2016, at 5:44 PM, Shi, Steven <[hidden email]> wrote:

 

(And I doubt the GNU linker supports LTO with LLVM).

[Steven]: I’ve pushed GNU Binutils ld to support LLVM gold plugin, see detail in this bug https://sourceware.org/bugzilla/show_bug.cgi?id=20070. The new GNU ld linker works well with LLVM/Clang LTO when build IA32 code in my side. And from the ld owner input in the bug comments, the current X64 LLVM LTO issue is in llvm LTO plugin.

 

 

The fact that we don't support it for now seems to indicate that it is not a widely requested feature, especially considering that it is really a trivial option to add.

What is the linker you're using? Are you building your own clang?

[Steven]: I’m using the standard LLVM 3.8 with the above GNU new ld linker. I can build my own clang in my side if needed. I’m happy to know it is not difficult to enable the large code model in LLVM LTO and “it is really a trivial option to add”. Could you let me know how to enable it? My lots of work have been blocked by the large code model issue. Thank you!

 

 

I can't test it locally, but here is a starting point in the gold plugin, inspired by the code present in clang:

 

 

You need to use your linker-specific way of passing the option "-lto-use-large-codemodel=..." to the plugin.

 

Let me know if it works for you!

 

-- 

Mehdi

 



 

 

Steven Shi

Intel\SSG\STO\UEFI Firmware

 

Tel: +86 021-61166522

iNet: 821-6522

 

From: [hidden email] [[hidden email]] 
Sent: Monday, May 30, 2016 8:17 AM
To: Shi, Steven <[hidden email]>
Cc: Umesh Kalappa <[hidden email]>; [hidden email]; llvm-dev <[hidden email]>; [hidden email]; Rafael Espíndola <[hidden email]>
Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?

 

 

On May 29, 2016, at 5:10 PM, Shi, Steven <[hidden email]> wrote:

 

Hi Mehdi,

GCC LTO seems support large code model in my side as below, if the code model is linker specific, does the GCC LTO use a special linker which is different from the one in GNU Binutils?

 

I don't know anything about GCC.

(And I doubt the GNU linker supports LTO with LLVM).

 

I’m a bit surprised if both OS X ld64 and gold plugin do not support large code model in LTO. Since modern system widely use the 64bit, the code need to run in high address (larger than 2 GB) is a reasonable requirement.

 

The fact that we don't support it for now seems to indicate that it is not a widely requested feature, especially considering that it is really a trivial option to add.

What is the linker you're using? Are you building your own clang?

 

-- 

Mehdi

 

 

 

 

$ gcc -g -O0 -flto codemodel1.c -mcmodel=large -o codemodel1_large_lto_gcc.bin

$ objdump -dS codemodel1_large_lto_gcc.bin

 

int main(int argc, const char* argv[])

{

  40048b:       55                      push   %rbp

  40048c:       48 89 e5                mov    %rsp,%rbp

  40048f:       48 83 ec 20             sub    $0x20,%rsp

  400493:       89 7d ec                mov    %edi,-0x14(%rbp)

  400496:       48 89 75 e0             mov    %rsi,-0x20(%rbp)

    int t = global_func(argc);

  40049a:       8b 45 ec                mov    -0x14(%rbp),%eax

  40049d:       89 c7                   mov    %eax,%edi

  40049f:       48 b8 76 04 40 00 00    movabs $0x400476,%rax

  4004a6:       00 00 00

  4004a9:       ff d0                   callq  *%rax

  4004ab:       89 45 fc                mov    %eax,-0x4(%rbp)

    t += global_arr[7];

  4004ae:       48 b8 20 09 60 00 00    movabs $0x600920,%rax

  4004b5:       00 00 00

  4004b8:       8b 40 1c                mov    0x1c(%rax),%eax

  4004bb:       01 45 fc                add    %eax,-0x4(%rbp)

    t += static_arr[7];

  4004be:       48 b8 c0 0a 60 00 00    movabs $0x600ac0,%rax

  4004c5:       00 00 00

  4004c8:       8b 40 1c                mov    0x1c(%rax),%eax

  4004cb:       01 45 fc                add    %eax,-0x4(%rbp)

    t += global_arr_big[7];

  4004ce:       48 b8 60 0c 60 00 00    movabs $0x600c60,%rax

  4004d5:       00 00 00

  4004d8:       8b 40 1c                mov    0x1c(%rax),%eax

  4004db:       01 45 fc                add    %eax,-0x4(%rbp)

    t += static_arr_big[7];

  4004de:       48 b8 a0 19 63 00 00    movabs $0x6319a0,%rax

  4004e5:       00 00 00

  4004e8:       8b 40 1c                mov    0x1c(%rax),%eax

  4004eb:       01 45 fc                add    %eax,-0x4(%rbp)

    return t;

  4004ee:       8b 45 fc                mov    -0x4(%rbp),%eax

}

 

Steven Shi

Intel\SSG\STO\UEFI Firmware

 

Tel: +86 021-61166522

iNet: 821-6522

 

From: [hidden email] [[hidden email]] 
Sent: Monday, May 30, 2016 4:28 AM
To: Shi, Steven <[hidden email]>
Cc: Umesh Kalappa <[hidden email]>; [hidden email]; llvm-dev <[hidden email]>; [hidden email]; Rafael Espíndola <[hidden email]>
Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?

 

Hi,

 

 

On May 29, 2016, at 7:36 AM, Shi, Steven <[hidden email]> wrote:

 

Hi Mehdi,

After deeper debug, I found my firmware LTO wrong code issue is related to X64 code model (-mcmodel=large) is always overridden as small (-mcmodel=small) if LTO build. And I don't know how to correctly specific the large code model for my X64 firmware LTO build. Appreciate if you could let me know it. 

 

You know, parts of my Uefi firmware (BIOS) have to been loaded to run in high address (larger than 2 GB) at the very beginning, and I need the code makes absolutely no assumptions about the addresses and data sections. But current LLVM LTO seems stick to use the small code model and generate many code with 32-bit RIP-relative addressing, which cause CPU exceptions when run in address larger than 2GB.

 

Below, I just simply reuse the Eli's codemodel1.c example (link: http://eli.thegreenplace.net/2012/01/03/understanding-the-x64-code-models) to show the LLVM LTO code model issue.

$ clang -g -O0 codemodel1.c -mcmodel=large -o codemodel1_large.bin

$ clang -g -O0 codemodel1.c -mcmodel=small -o codemodel1_small.bin

$ clang -g -O0 -flto codemodel1.c -mcmodel=large -o codemodel1_large_lto.bin

$ clang -g -O0 -flto codemodel1.c -mcmodel=small -o codemodel1_small_lto.bin

 

You will see the codemodel1_large_lto.bin and codemodel1_small_lto.bin are exactly the same!

And if you disassemble the codemodel1_large_lto.bin, you will see it uses the small code model (32-bit RIP-relative), not large, to do addressing as below.

 

$ objdump -dS codemodel1_large_lto.bin

 

int main(int argc, const char* argv[])

{

  4004f0:       55                      push   %rbp

  4004f1:       48 89 e5                mov    %rsp,%rbp

  4004f4:       48 83 ec 20             sub    $0x20,%rsp

  4004f8:       c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)

  4004ff:       89 7d f8                mov    %edi,-0x8(%rbp)

  400502:       48 89 75 f0             mov    %rsi,-0x10(%rbp)

    int t = global_func(argc);

  400506:       8b 7d f8                mov    -0x8(%rbp),%edi

  400509:       e8 d2 ff ff ff          callq  4004e0 <global_func>

  40050e:       89 45 ec                mov    %eax,-0x14(%rbp)

    t += global_arr[7];

  400511:       8b 04 25 4c 10 60 00    mov    0x60104c,%eax

  400518:       03 45 ec                add    -0x14(%rbp),%eax

  40051b:       89 45 ec                mov    %eax,-0x14(%rbp)

    t += static_arr[7];

  40051e:       8b 04 25 dc 11 60 00    mov    0x6011dc,%eax

  400525:       03 45 ec                add    -0x14(%rbp),%eax

  400528:       89 45 ec                mov    %eax,-0x14(%rbp)

    t += global_arr_big[7];

  40052b:       8b 04 25 6c 13 60 00    mov    0x60136c,%eax

  400532:       03 45 ec                add    -0x14(%rbp),%eax

  400535:       89 45 ec                mov    %eax,-0x14(%rbp)

    t += static_arr_big[7];

  400538:       8b 04 25 ac 20 63 00    mov    0x6320ac,%eax

  40053f:       03 45 ec                add    -0x14(%rbp),%eax

  400542:       89 45 ec                mov    %eax,-0x14(%rbp)

    return t;

  400545:       8b 45 ec                mov    -0x14(%rbp),%eax

  400548:       48 83 c4 20             add    $0x20,%rsp

  40054c:       5d                      pop    %rbp

  40054d:       c3                      retq

  40054e:       66 90                   xchg   %ax,%ax

 

 

So, does LTO support large code model? How to correctly specify the LTO code model option?

 

Same answer as before: LTO is setup by the linker, so the option for that, if it exists, will be linker specific.

 

As far as I can tell, neither libLTO-based linker (ld64 on OS X for example), neither the gold plugin supports such an option and the code model is always "default". 

 

I don't know about lld, CC Rafael about that.

 

-- 

Mehdi

 

 

 



 

 

Steven Shi

Intel\SSG\STO\UEFI Firmware

 

Tel: +86 021-61166522

iNet: 821-6522

 

> -----Original Message-----

> Sent: Wednesday, May 18, 2016 4:02 AM

> To: Umesh Kalappa <[hidden email]>

> Cc: Shi, Steven <[hidden email]>; llvm-dev <[hidden email]>;

> Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?

> 

> 

> > On May 17, 2016, at 11:21 AM, Umesh Kalappa

> <[hidden email]> wrote:

> >

> > Steven,

> >

> > As mehdi stated , the optimisation level is specific to linker and it

> > enables Inter-Pro  opts passes ,please  refer function

> 

> To be very clear: the -O option may trigger *linker* optimizations as well,

> independently of LTO.

> 

> --

> Mehdi

> 

> 

> 

 


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] How to debug if LTO generate wrong code?

David Chisnall via cfe-dev
Hi Steven,


On May 29, 2016, at 11:28 PM, Shi, Steven <[hidden email]> wrote:

Hi Mehdi,
Should I apply your attached patch on my llvm3.8 source firstly? Or should I use the latest llvm SVN trunk instead?

I wrote it on trunk, but I expect it to be fairly easy to port on 3.8. This is really just quickly plumbing an option on the TargetMachine creation.

-- 
Mehdi



 
 
Steven Shi
Intel\SSG\STO\UEFI Firmware
 
Tel: +86 021-61166522
iNet: 821-6522
 
From: [hidden email] [[hidden email]] 
Sent: Monday, May 30, 2016 2:13 PM
To: Shi, Steven <[hidden email]>
Cc: Umesh Kalappa <[hidden email]>; [hidden email]; llvm-dev <[hidden email]>; [hidden email]; Rafael Espíndola <[hidden email]>
Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?
 
 
On May 29, 2016, at 5:44 PM, Shi, Steven <[hidden email]> wrote:
 
(And I doubt the GNU linker supports LTO with LLVM).
[Steven]: I’ve pushed GNU Binutils ld to support LLVM gold plugin, see detail in this bug https://sourceware.org/bugzilla/show_bug.cgi?id=20070. The new GNU ld linker works well with LLVM/Clang LTO when build IA32 code in my side. And from the ld owner input in the bug comments, the current X64 LLVM LTO issue is in llvm LTO plugin.
 
 
The fact that we don't support it for now seems to indicate that it is not a widely requested feature, especially considering that it is really a trivial option to add.
What is the linker you're using? Are you building your own clang?
[Steven]: I’m using the standard LLVM 3.8 with the above GNU new ld linker. I can build my own clang in my side if needed. I’m happy to know it is not difficult to enable the large code model in LLVM LTO and “it is really a trivial option to add”. Could you let me know how to enable it? My lots of work have been blocked by the large code model issue. Thank you!
 
 
I can't test it locally, but here is a starting point in the gold plugin, inspired by the code present in clang:
 
 
You need to use your linker-specific way of passing the option "-lto-use-large-codemodel=..." to the plugin.
 
Let me know if it works for you!
 
-- 
Mehdi
 


 
 
Steven Shi
Intel\SSG\STO\UEFI Firmware
 
Tel: +86 021-61166522
iNet: 821-6522
 
From: [hidden email] [[hidden email]] 
Sent: Monday, May 30, 2016 8:17 AM
To: Shi, Steven <[hidden email]>
Cc: Umesh Kalappa <[hidden email]>; [hidden email]; llvm-dev <[hidden email]>; [hidden email]; Rafael Espíndola <[hidden email]>
Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?
 
 
On May 29, 2016, at 5:10 PM, Shi, Steven <[hidden email]> wrote:
 
Hi Mehdi,
GCC LTO seems support large code model in my side as below, if the code model is linker specific, does the GCC LTO use a special linker which is different from the one in GNU Binutils?
 
I don't know anything about GCC.
(And I doubt the GNU linker supports LTO with LLVM).

 

I’m a bit surprised if both OS X ld64 and gold plugin do not support large code model in LTO. Since modern system widely use the 64bit, the code need to run in high address (larger than 2 GB) is a reasonable requirement.
 
The fact that we don't support it for now seems to indicate that it is not a widely requested feature, especially considering that it is really a trivial option to add.
What is the linker you're using? Are you building your own clang?
 
-- 
Mehdi
 
 

 

 
$ gcc -g -O0 -flto codemodel1.c -mcmodel=large -o codemodel1_large_lto_gcc.bin
$ objdump -dS codemodel1_large_lto_gcc.bin
 
int main(int argc, const char* argv[])
{
  40048b:       55                      push   %rbp
  40048c:       48 89 e5                mov    %rsp,%rbp
  40048f:       48 83 ec 20             sub    $0x20,%rsp
  400493:       89 7d ec                mov    %edi,-0x14(%rbp)
  400496:       48 89 75 e0             mov    %rsi,-0x20(%rbp)
    int t = global_func(argc);
  40049a:       8b 45 ec                mov    -0x14(%rbp),%eax
  40049d:       89 c7                   mov    %eax,%edi
  40049f:       48 b8 76 04 40 00 00    movabs $0x400476,%rax
  4004a6:       00 00 00
  4004a9:       ff d0                   callq  *%rax
  4004ab:       89 45 fc                mov    %eax,-0x4(%rbp)
    t += global_arr[7];
  4004ae:       48 b8 20 09 60 00 00    movabs $0x600920,%rax
  4004b5:       00 00 00
  4004b8:       8b 40 1c                mov    0x1c(%rax),%eax
  4004bb:       01 45 fc                add    %eax,-0x4(%rbp)
    t += static_arr[7];
  4004be:       48 b8 c0 0a 60 00 00    movabs $0x600ac0,%rax
  4004c5:       00 00 00
  4004c8:       8b 40 1c                mov    0x1c(%rax),%eax
  4004cb:       01 45 fc                add    %eax,-0x4(%rbp)
    t += global_arr_big[7];
  4004ce:       48 b8 60 0c 60 00 00    movabs $0x600c60,%rax
  4004d5:       00 00 00
  4004d8:       8b 40 1c                mov    0x1c(%rax),%eax
  4004db:       01 45 fc                add    %eax,-0x4(%rbp)
    t += static_arr_big[7];
  4004de:       48 b8 a0 19 63 00 00    movabs $0x6319a0,%rax
  4004e5:       00 00 00
  4004e8:       8b 40 1c                mov    0x1c(%rax),%eax
  4004eb:       01 45 fc                add    %eax,-0x4(%rbp)
    return t;
  4004ee:       8b 45 fc                mov    -0x4(%rbp),%eax
}
 
Steven Shi
Intel\SSG\STO\UEFI Firmware
 
Tel: +86 021-61166522
iNet: 821-6522
 
From: [hidden email] [[hidden email]] 
Sent: Monday, May 30, 2016 4:28 AM
To: Shi, Steven <[hidden email]>
Cc: Umesh Kalappa <[hidden email]>; [hidden email]; llvm-dev <[hidden email]>; [hidden email]; Rafael Espíndola <[hidden email]>
Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?
 
Hi,
 
 
On May 29, 2016, at 7:36 AM, Shi, Steven <[hidden email]> wrote:
 
Hi Mehdi,
After deeper debug, I found my firmware LTO wrong code issue is related to X64 code model (-mcmodel=large) is always overridden as small (-mcmodel=small) if LTO build. And I don't know how to correctly specific the large code model for my X64 firmware LTO build. Appreciate if you could let me know it. 
 
You know, parts of my Uefi firmware (BIOS) have to been loaded to run in high address (larger than 2 GB) at the very beginning, and I need the code makes absolutely no assumptions about the addresses and data sections. But current LLVM LTO seems stick to use the small code model and generate many code with 32-bit RIP-relative addressing, which cause CPU exceptions when run in address larger than 2GB.
 
Below, I just simply reuse the Eli's codemodel1.c example (link: http://eli.thegreenplace.net/2012/01/03/understanding-the-x64-code-models) to show the LLVM LTO code model issue.
$ clang -g -O0 codemodel1.c -mcmodel=large -o codemodel1_large.bin
$ clang -g -O0 codemodel1.c -mcmodel=small -o codemodel1_small.bin
$ clang -g -O0 -flto codemodel1.c -mcmodel=large -o codemodel1_large_lto.bin
$ clang -g -O0 -flto codemodel1.c -mcmodel=small -o codemodel1_small_lto.bin
 
You will see the codemodel1_large_lto.bin and codemodel1_small_lto.bin are exactly the same!
And if you disassemble the codemodel1_large_lto.bin, you will see it uses the small code model (32-bit RIP-relative), not large, to do addressing as below.
 
$ objdump -dS codemodel1_large_lto.bin
 
int main(int argc, const char* argv[])
{
  4004f0:       55                      push   %rbp
  4004f1:       48 89 e5                mov    %rsp,%rbp
  4004f4:       48 83 ec 20             sub    $0x20,%rsp
  4004f8:       c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)
  4004ff:       89 7d f8                mov    %edi,-0x8(%rbp)
  400502:       48 89 75 f0             mov    %rsi,-0x10(%rbp)
    int t = global_func(argc);
  400506:       8b 7d f8                mov    -0x8(%rbp),%edi
  400509:       e8 d2 ff ff ff          callq  4004e0 <global_func>
  40050e:       89 45 ec                mov    %eax,-0x14(%rbp)
    t += global_arr[7];
  400511:       8b 04 25 4c 10 60 00    mov    0x60104c,%eax
  400518:       03 45 ec                add    -0x14(%rbp),%eax
  40051b:       89 45 ec                mov    %eax,-0x14(%rbp)
    t += static_arr[7];
  40051e:       8b 04 25 dc 11 60 00    mov    0x6011dc,%eax
  400525:       03 45 ec                add    -0x14(%rbp),%eax
  400528:       89 45 ec                mov    %eax,-0x14(%rbp)
    t += global_arr_big[7];
  40052b:       8b 04 25 6c 13 60 00    mov    0x60136c,%eax
  400532:       03 45 ec                add    -0x14(%rbp),%eax
  400535:       89 45 ec                mov    %eax,-0x14(%rbp)
    t += static_arr_big[7];
  400538:       8b 04 25 ac 20 63 00    mov    0x6320ac,%eax
  40053f:       03 45 ec                add    -0x14(%rbp),%eax
  400542:       89 45 ec                mov    %eax,-0x14(%rbp)
    return t;
  400545:       8b 45 ec                mov    -0x14(%rbp),%eax
  400548:       48 83 c4 20             add    $0x20,%rsp
  40054c:       5d                      pop    %rbp
  40054d:       c3                      retq
  40054e:       66 90                   xchg   %ax,%ax
 
 
So, does LTO support large code model? How to correctly specify the LTO code model option?
 
Same answer as before: LTO is setup by the linker, so the option for that, if it exists, will be linker specific.
 
As far as I can tell, neither libLTO-based linker (ld64 on OS X for example), neither the gold plugin supports such an option and the code model is always "default". 
 
I don't know about lld, CC Rafael about that.
 
-- 
Mehdi
 
 
 



 
 
Steven Shi
Intel\SSG\STO\UEFI Firmware
 
Tel: +86 021-61166522
iNet: 821-6522
 
> -----Original Message-----
> Sent: Wednesday, May 18, 2016 4:02 AM
> To: Umesh Kalappa <[hidden email]>
> Cc: Shi, Steven <[hidden email]>; llvm-dev <[hidden email]>;
> Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?
> 
> 
> > On May 17, 2016, at 11:21 AM, Umesh Kalappa
> <[hidden email]> wrote:
> >
> > Steven,
> >
> > As mehdi stated , the optimisation level is specific to linker and it
> > enables Inter-Pro  opts passes ,please  refer function
> 
> To be very clear: the -O option may trigger *linker* optimizations as well,
> independently of LTO.
> 
> --
> Mehdi
> 
> 
>


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] How to debug if LTO generate wrong code?

David Chisnall via cfe-dev

Hi Mehdi,

The llvm3.8 gold-plugin.cpp is very different from the latest one on trunk. Your patch has compiling failure on llvm3.8 as below. I will try it on latest trunk later. Thank you help anyway!

 

Building CXX object tools/gold/CMakeFiles/LLVMgold.dir/gold-plugin.cpp.o

cd /home/jshi19/llvm38releasebuild/tools/gold && /home/jshi19/clang38/bin/clang++   -DGTEST_HAS_RTTI=0 -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/jshi19/llvm38releasebuild/tools/gold -I/home/jshi19/llvm-3.8.0.src/tools/gold -I/home/jshi19/llvm38releasebuild/include -I/home/jshi19/llvm-3.8.0.src/include -I/home/jshi19/binutils-2.26/include  -fPIC -fvisibility-inlines-hidden -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wcovered-switch-default -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -std=c++11 -ffunction-sections -fdata-sections -O3 -DNDEBUG -fPIC    -fno-exceptions -fno-rtti -o CMakeFiles/LLVMgold.dir/gold-plugin.cpp.o -c /home/jshi19/llvm-3.8.0.src/tools/gold/gold-plugin.cpp

/home/jshi19/llvm-3.8.0.src/tools/gold/gold-plugin.cpp:60:16: error: unknown type name 'string'; did you mean 'std::string'?

static cl::opt<string> LTOCodeModel("lto-use-large-codemodel", cl::Hidden,

               ^~~~~~

               std::string

/usr/lib/gcc/x86_64-linux-gnu/5.3.1/../../../../include/c++/5.3.1/bits/stringfwd.h:74:33: note: 'std::string' declared here

  typedef basic_string<char>    string;

                                ^

/home/jshi19/llvm-3.8.0.src/tools/gold/gold-plugin.cpp:800:9: error: no template named 'StringSwitch' in namespace 'llvm'; did you mean 'StringSet'?

  llvm::StringSwitch<unsigned>(LTOCodeModel)

  ~~~~~~^~~~~~~~~~~~

        StringSet

/home/jshi19/llvm-3.8.0.src/include/llvm/ADT/StringSet.h:23:9: note: 'StringSet' declared here

  class StringSet : public llvm::StringMap<char, AllocatorTy> {

        ^

In file included from /home/jshi19/llvm-3.8.0.src/tools/gold/gold-plugin.cpp:16:

In file included from /home/jshi19/llvm-3.8.0.src/include/llvm/ADT/DenseSet.h:17:

In file included from /home/jshi19/llvm-3.8.0.src/include/llvm/ADT/DenseMap.h:17:

In file included from /home/jshi19/llvm-3.8.0.src/include/llvm/ADT/DenseMapInfo.h:17:

In file included from /home/jshi19/llvm-3.8.0.src/include/llvm/ADT/ArrayRef.h:13:

In file included from /home/jshi19/llvm-3.8.0.src/include/llvm/ADT/Hashing.h:49:

In file included from /home/jshi19/llvm-3.8.0.src/include/llvm/Support/Host.h:17:

/home/jshi19/llvm-3.8.0.src/include/llvm/ADT/StringMap.h:228:12: error: multiple overloads of 'StringMap' instantiate to the same signature 'void (unsigned int)'

  explicit StringMap(AllocatorTy A)

           ^

/home/jshi19/llvm-3.8.0.src/include/llvm/ADT/StringSet.h:23:28: note: in instantiation of template class 'llvm::StringMap<char, unsigned int>' requested here

  class StringSet : public llvm::StringMap<char, AllocatorTy> {

                           ^

/home/jshi19/llvm-3.8.0.src/tools/gold/gold-plugin.cpp:800:3: note: in instantiation of template class 'llvm::StringSet<unsigned int>' requested here

  llvm::StringSwitch<unsigned>(LTOCodeModel)

  ^

/home/jshi19/llvm-3.8.0.src/include/llvm/ADT/StringMap.h:225:12: note: previous declaration is here

  explicit StringMap(unsigned InitialSize)

           ^

3 errors generated.

tools/gold/CMakeFiles/LLVMgold.dir/build.make:65: recipe for target 'tools/gold/CMakeFiles/LLVMgold.dir/gold-plugin.cpp.o' failed

make[3]: *** [tools/gold/CMakeFiles/LLVMgold.dir/gold-plugin.cpp.o] Error 1

make[3]: Leaving directory '/home/jshi19/llvm38releasebuild'

CMakeFiles/Makefile2:17855: recipe for target 'tools/gold/CMakeFiles/LLVMgold.dir/all' failed

make[2]: *** [tools/gold/CMakeFiles/LLVMgold.dir/all] Error 2

make[2]: Leaving directory '/home/jshi19/llvm38releasebuild'

CMakeFiles/Makefile2:17867: recipe for target 'tools/gold/CMakeFiles/LLVMgold.dir/rule' failed

make[1]: *** [tools/gold/CMakeFiles/LLVMgold.dir/rule] Error 2

make[1]: Leaving directory '/home/jshi19/llvm38releasebuild'

Makefile:3944: recipe for target 'LLVMgold' failed

make: *** [LLVMgold] Error 2

 

 

Steven Shi

Intel\SSG\STO\UEFI Firmware

 

Tel: +86 021-61166522

iNet: 821-6522

 

From: [hidden email] [mailto:[hidden email]]
Sent: Monday, May 30, 2016 2:32 PM
To: Shi, Steven <[hidden email]>
Cc: Umesh Kalappa <[hidden email]>; [hidden email]; llvm-dev <[hidden email]>; [hidden email]; Rafael Espíndola <[hidden email]>
Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?

 

Hi Steven,

 

 

On May 29, 2016, at 11:28 PM, Shi, Steven <[hidden email]> wrote:

 

Hi Mehdi,

Should I apply your attached patch on my llvm3.8 source firstly? Or should I use the latest llvm SVN trunk instead?

 

I wrote it on trunk, but I expect it to be fairly easy to port on 3.8. This is really just quickly plumbing an option on the TargetMachine creation.

 

-- 

Mehdi

 

 



 

 

Steven Shi

Intel\SSG\STO\UEFI Firmware

 

Tel: +86 021-61166522

iNet: 821-6522

 

From: [hidden email] [[hidden email]] 
Sent: Monday, May 30, 2016 2:13 PM
To: Shi, Steven <[hidden email]>
Cc: Umesh Kalappa <[hidden email]>; [hidden email]; llvm-dev <[hidden email]>; [hidden email]; Rafael Espíndola <[hidden email]>
Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?

 

 

On May 29, 2016, at 5:44 PM, Shi, Steven <[hidden email]> wrote:

 

(And I doubt the GNU linker supports LTO with LLVM).

[Steven]: I’ve pushed GNU Binutils ld to support LLVM gold plugin, see detail in this bug https://sourceware.org/bugzilla/show_bug.cgi?id=20070. The new GNU ld linker works well with LLVM/Clang LTO when build IA32 code in my side. And from the ld owner input in the bug comments, the current X64 LLVM LTO issue is in llvm LTO plugin.

 

 

The fact that we don't support it for now seems to indicate that it is not a widely requested feature, especially considering that it is really a trivial option to add.

What is the linker you're using? Are you building your own clang?

[Steven]: I’m using the standard LLVM 3.8 with the above GNU new ld linker. I can build my own clang in my side if needed. I’m happy to know it is not difficult to enable the large code model in LLVM LTO and “it is really a trivial option to add”. Could you let me know how to enable it? My lots of work have been blocked by the large code model issue. Thank you!

 

 

I can't test it locally, but here is a starting point in the gold plugin, inspired by the code present in clang:

 

 

You need to use your linker-specific way of passing the option "-lto-use-large-codemodel=..." to the plugin.

 

Let me know if it works for you!

 

-- 

Mehdi

 




 

 

Steven Shi

Intel\SSG\STO\UEFI Firmware

 

Tel: +86 021-61166522

iNet: 821-6522

 

From: [hidden email] [[hidden email]] 
Sent: Monday, May 30, 2016 8:17 AM
To: Shi, Steven <[hidden email]>
Cc: Umesh Kalappa <[hidden email]>; [hidden email]; llvm-dev <[hidden email]>; [hidden email]; Rafael Espíndola <[hidden email]>
Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?

 

 

On May 29, 2016, at 5:10 PM, Shi, Steven <[hidden email]> wrote:

 

Hi Mehdi,

GCC LTO seems support large code model in my side as below, if the code model is linker specific, does the GCC LTO use a special linker which is different from the one in GNU Binutils?

 

I don't know anything about GCC.

(And I doubt the GNU linker supports LTO with LLVM).

 

I’m a bit surprised if both OS X ld64 and gold plugin do not support large code model in LTO. Since modern system widely use the 64bit, the code need to run in high address (larger than 2 GB) is a reasonable requirement.

 

The fact that we don't support it for now seems to indicate that it is not a widely requested feature, especially considering that it is really a trivial option to add.

What is the linker you're using? Are you building your own clang?

 

-- 

Mehdi

 

 

 

 

$ gcc -g -O0 -flto codemodel1.c -mcmodel=large -o codemodel1_large_lto_gcc.bin

$ objdump -dS codemodel1_large_lto_gcc.bin

 

int main(int argc, const char* argv[])

{

  40048b:       55                      push   %rbp

  40048c:       48 89 e5                mov    %rsp,%rbp

  40048f:       48 83 ec 20             sub    $0x20,%rsp

  400493:       89 7d ec                mov    %edi,-0x14(%rbp)

  400496:       48 89 75 e0             mov    %rsi,-0x20(%rbp)

    int t = global_func(argc);

  40049a:       8b 45 ec                mov    -0x14(%rbp),%eax

  40049d:       89 c7                   mov    %eax,%edi

  40049f:       48 b8 76 04 40 00 00    movabs $0x400476,%rax

  4004a6:       00 00 00

  4004a9:       ff d0                   callq  *%rax

  4004ab:       89 45 fc                mov    %eax,-0x4(%rbp)

    t += global_arr[7];

  4004ae:       48 b8 20 09 60 00 00    movabs $0x600920,%rax

  4004b5:       00 00 00

  4004b8:       8b 40 1c                mov    0x1c(%rax),%eax

  4004bb:       01 45 fc                add    %eax,-0x4(%rbp)

    t += static_arr[7];

  4004be:       48 b8 c0 0a 60 00 00    movabs $0x600ac0,%rax

  4004c5:       00 00 00

  4004c8:       8b 40 1c                mov    0x1c(%rax),%eax

  4004cb:       01 45 fc                add    %eax,-0x4(%rbp)

    t += global_arr_big[7];

  4004ce:       48 b8 60 0c 60 00 00    movabs $0x600c60,%rax

  4004d5:       00 00 00

  4004d8:       8b 40 1c                mov    0x1c(%rax),%eax

  4004db:       01 45 fc                add    %eax,-0x4(%rbp)

    t += static_arr_big[7];

  4004de:       48 b8 a0 19 63 00 00    movabs $0x6319a0,%rax

  4004e5:       00 00 00

  4004e8:       8b 40 1c                mov    0x1c(%rax),%eax

  4004eb:       01 45 fc                add    %eax,-0x4(%rbp)

    return t;

  4004ee:       8b 45 fc                mov    -0x4(%rbp),%eax

}

 

Steven Shi

Intel\SSG\STO\UEFI Firmware

 

Tel: +86 021-61166522

iNet: 821-6522

 

From: [hidden email] [[hidden email]] 
Sent: Monday, May 30, 2016 4:28 AM
To: Shi, Steven <[hidden email]>
Cc: Umesh Kalappa <[hidden email]>; [hidden email]; llvm-dev <[hidden email]>; [hidden email]; Rafael Espíndola <[hidden email]>
Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?

 

Hi,

 

 

On May 29, 2016, at 7:36 AM, Shi, Steven <[hidden email]> wrote:

 

Hi Mehdi,

After deeper debug, I found my firmware LTO wrong code issue is related to X64 code model (-mcmodel=large) is always overridden as small (-mcmodel=small) if LTO build. And I don't know how to correctly specific the large code model for my X64 firmware LTO build. Appreciate if you could let me know it. 

 

You know, parts of my Uefi firmware (BIOS) have to been loaded to run in high address (larger than 2 GB) at the very beginning, and I need the code makes absolutely no assumptions about the addresses and data sections. But current LLVM LTO seems stick to use the small code model and generate many code with 32-bit RIP-relative addressing, which cause CPU exceptions when run in address larger than 2GB.

 

Below, I just simply reuse the Eli's codemodel1.c example (link: http://eli.thegreenplace.net/2012/01/03/understanding-the-x64-code-models) to show the LLVM LTO code model issue.

$ clang -g -O0 codemodel1.c -mcmodel=large -o codemodel1_large.bin

$ clang -g -O0 codemodel1.c -mcmodel=small -o codemodel1_small.bin

$ clang -g -O0 -flto codemodel1.c -mcmodel=large -o codemodel1_large_lto.bin

$ clang -g -O0 -flto codemodel1.c -mcmodel=small -o codemodel1_small_lto.bin

 

You will see the codemodel1_large_lto.bin and codemodel1_small_lto.bin are exactly the same!

And if you disassemble the codemodel1_large_lto.bin, you will see it uses the small code model (32-bit RIP-relative), not large, to do addressing as below.

 

$ objdump -dS codemodel1_large_lto.bin

 

int main(int argc, const char* argv[])

{

  4004f0:       55                      push   %rbp

  4004f1:       48 89 e5                mov    %rsp,%rbp

  4004f4:       48 83 ec 20             sub    $0x20,%rsp

  4004f8:       c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)

  4004ff:       89 7d f8                mov    %edi,-0x8(%rbp)

  400502:       48 89 75 f0             mov    %rsi,-0x10(%rbp)

    int t = global_func(argc);

  400506:       8b 7d f8                mov    -0x8(%rbp),%edi

  400509:       e8 d2 ff ff ff          callq  4004e0 <global_func>

  40050e:       89 45 ec                mov    %eax,-0x14(%rbp)

    t += global_arr[7];

  400511:       8b 04 25 4c 10 60 00    mov    0x60104c,%eax

  400518:       03 45 ec                add    -0x14(%rbp),%eax

  40051b:       89 45 ec                mov    %eax,-0x14(%rbp)

    t += static_arr[7];

  40051e:       8b 04 25 dc 11 60 00    mov    0x6011dc,%eax

  400525:       03 45 ec                add    -0x14(%rbp),%eax

  400528:       89 45 ec                mov    %eax,-0x14(%rbp)

    t += global_arr_big[7];

  40052b:       8b 04 25 6c 13 60 00    mov    0x60136c,%eax

  400532:       03 45 ec                add    -0x14(%rbp),%eax

  400535:       89 45 ec                mov    %eax,-0x14(%rbp)

    t += static_arr_big[7];

  400538:       8b 04 25 ac 20 63 00    mov    0x6320ac,%eax

  40053f:       03 45 ec                add    -0x14(%rbp),%eax

  400542:       89 45 ec                mov    %eax,-0x14(%rbp)

    return t;

  400545:       8b 45 ec                mov    -0x14(%rbp),%eax

  400548:       48 83 c4 20             add    $0x20,%rsp

  40054c:       5d                      pop    %rbp

  40054d:       c3                      retq

  40054e:       66 90                   xchg   %ax,%ax

 

 

So, does LTO support large code model? How to correctly specify the LTO code model option?

 

Same answer as before: LTO is setup by the linker, so the option for that, if it exists, will be linker specific.

 

As far as I can tell, neither libLTO-based linker (ld64 on OS X for example), neither the gold plugin supports such an option and the code model is always "default". 

 

I don't know about lld, CC Rafael about that.

 

-- 

Mehdi

 

 

 




 

 

Steven Shi

Intel\SSG\STO\UEFI Firmware

 

Tel: +86 021-61166522

iNet: 821-6522

 

> -----Original Message-----

> Sent: Wednesday, May 18, 2016 4:02 AM

> To: Umesh Kalappa <[hidden email]>

> Cc: Shi, Steven <[hidden email]>; llvm-dev <[hidden email]>;

> Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?

> 

> 

> > On May 17, 2016, at 11:21 AM, Umesh Kalappa

> <[hidden email]> wrote:

> >

> > Steven,

> >

> > As mehdi stated , the optimisation level is specific to linker and it

> > enables Inter-Pro  opts passes ,please  refer function

> 

> To be very clear: the -O option may trigger *linker* optimizations as well,

> independently of LTO.

> 

> --

> Mehdi

> 

> 

> 

 


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] How to debug if LTO generate wrong code?

David Chisnall via cfe-dev
In reply to this post by David Chisnall via cfe-dev
Yes, the "normal" code model you mentioned is the small code model, which use RIP-relative addressing, but my firmware need large code model which can reside anywhere in the full 64-bit address space. So, I need LLVM LTO support large code model.

Steven Shi
Intel\SSG\STO\UEFI Firmware

Tel: +86 021-61166522
iNet: 821-6522

> -----Original Message-----
> From: llvm-dev [mailto:[hidden email]] On Behalf Of Joerg
> Sonnenberger via llvm-dev
> Sent: Monday, May 30, 2016 6:18 PM
> To: [hidden email]
> Subject: Re: [llvm-dev] [cfe-dev] How to debug if LTO generate wrong code?
>
> On Mon, May 30, 2016 at 02:12:09AM +0000, Shi, Steven via llvm-dev wrote:
> > Hi Joerg,
> > My firmware case is I need load my firmware to run in high address
> > which is hardware required. It is not that my firmware really need
> > large static data and text code. My firmware modules are shared library
> > (like a DLL) built with "-fpic", and firmware loader will load them to
> > high address (larger than 2GB). I think this need is quite common to
> > system software, like firmware, driver and kernel mode code.
>
> It sounds more like there a confusion about what symbols are internal
> and what not. The normal code model on AMD64 requires code and data to
> fit into 2GB, but non-local symbols are accessed indirectly. That
> doesn't happen in your case. For functions, that's normally partially
> the job of the linker (via stubs), but for data the compiler has to be
> aware of it. The load address is irrelevant for PIC.
>
> Joerg
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
12