Opt seems to report fewer stats when splitting the compilation to the several stages, rather than using clang

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Opt seems to report fewer stats when splitting the compilation to the several stages, rather than using clang

Robinson, Paul via cfe-dev

Hi,


I am trying to recreate the compilation of using clang and -O2 by splitting the compilation to the several stages using the several llvm tools (clang as front-end, opt, llc..) but the optimiser seems to behave differently if called individually than when called by clang. I am using the latest llvm version. So I am calling the front end using:


clang  -mllvm -stats -mllvm -debug-pass=Structure -target arm-none-eabi -isystem /usr/local/lpcxpresso_8.2.2_650/lpcxpresso/tools/arm-none-eabi/include -DHAVE_CONFIG_H -I. -I../.. -Wall -I ../../i^Clude -I ../../config/arm/boards/lpcxpresso_1769_xpresso/ -I ../../config/arm/chips/lpc175x_6x/ "-DCALIB_SCALE=0" --static -mthumb -mcpu=cortex-m3 -D__USE_LPCOPEN -D__CODE_RED -D__REDLIB__ -DCHIP_LPC175X_6X -DCORE_M3 -ffunction-sections -fdata-sections -O2 -c -emit-llvm  -o adpcm_optimized.bc adpcm.c


this also gives me the optimizer statistics and the structure of calling the passes.

then I am using 
 

clang  -mllvm -stats -mllvm -debug-pass=Structure -target arm-none-eabi -isystem /usr/local/lpcxpresso_8.2.2_650/lpcxpresso/tools/arm-none-eabi/include -DHAVE_CONFIG_H -I. -I../.. -Wall -I ../../i^Clude -I ../../config/arm/boards/lpcxpresso_1769_xpresso/ -I ../../config/arm/chips/lpc175x_6x/ "-DCALIB_SCALE=0" --static -mthumb -mcpu=cortex-m3 -D__USE_LPCOPEN -D__CODE_RED -D__REDLIB__ -DCHIP_LPC175X_6X -DCORE_M3 -ffunction-sections -fdata-sections -O0 -c -emit-llvm  -o adpcm_no_opt.bc adpcm.c

and then

opt -stats -O2 -S  -debug-pass=Structure adpcm_no_opt.bc -o adpcm.ll

I was expecting the stats coming out of the opt call to be the same as calling clang with -O2. Instead they are very little:

 18 basicaa          - Number of times a GEP is decomposed

  1 cgscc-passmgr    - Maximum CGSCCPassMgr iterations on one SCC

104 globalopt        - Number of globals marked unnamed_addr

  6 globalsmodref-aa - Number of global vars without address taken


Am I doing something wrong or missing something? The Structure of calling the passes is still the same,  and also the  emitted code from the clang with -O2 and the opt with -O2 seems to be the same. So why is the optimiser called individually not reporting the full stats?


Thank you,

Kyriakos



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Opt seems to report fewer stats when splitting the compilation to the several stages, rather than using clang

Robinson, Paul via cfe-dev
You need to invoke clang with -O2 -Xclang -disable-llvm-optzns instead of -O0

-O0 implies to clang to mark all functions as optnone in the generated IR

On Mon, Dec 4, 2017 at 4:31 AM Kyriakos Georgiou via cfe-dev <[hidden email]> wrote:

Hi,


I am trying to recreate the compilation of using clang and -O2 by splitting the compilation to the several stages using the several llvm tools (clang as front-end, opt, llc..) but the optimiser seems to behave differently if called individually than when called by clang. I am using the latest llvm version. So I am calling the front end using:


clang  -mllvm -stats -mllvm -debug-pass=Structure -target arm-none-eabi -isystem /usr/local/lpcxpresso_8.2.2_650/lpcxpresso/tools/arm-none-eabi/include -DHAVE_CONFIG_H -I. -I../.. -Wall -I ../../i^Clude -I ../../config/arm/boards/lpcxpresso_1769_xpresso/ -I ../../config/arm/chips/lpc175x_6x/ "-DCALIB_SCALE=0" --static -mthumb -mcpu=cortex-m3 -D__USE_LPCOPEN -D__CODE_RED -D__REDLIB__ -DCHIP_LPC175X_6X -DCORE_M3 -ffunction-sections -fdata-sections -O2 -c -emit-llvm  -o adpcm_optimized.bc adpcm.c


this also gives me the optimizer statistics and the structure of calling the passes.

then I am using 
 

clang  -mllvm -stats -mllvm -debug-pass=Structure -target arm-none-eabi -isystem /usr/local/lpcxpresso_8.2.2_650/lpcxpresso/tools/arm-none-eabi/include -DHAVE_CONFIG_H -I. -I../.. -Wall -I ../../i^Clude -I ../../config/arm/boards/lpcxpresso_1769_xpresso/ -I ../../config/arm/chips/lpc175x_6x/ "-DCALIB_SCALE=0" --static -mthumb -mcpu=cortex-m3 -D__USE_LPCOPEN -D__CODE_RED -D__REDLIB__ -DCHIP_LPC175X_6X -DCORE_M3 -ffunction-sections -fdata-sections -O0 -c -emit-llvm  -o adpcm_no_opt.bc adpcm.c

and then

opt -stats -O2 -S  -debug-pass=Structure adpcm_no_opt.bc -o adpcm.ll

I was expecting the stats coming out of the opt call to be the same as calling clang with -O2. Instead they are very little:

 18 basicaa          - Number of times a GEP is decomposed

  1 cgscc-passmgr    - Maximum CGSCCPassMgr iterations on one SCC

104 globalopt        - Number of globals marked unnamed_addr

  6 globalsmodref-aa - Number of global vars without address taken


Am I doing something wrong or missing something? The Structure of calling the passes is still the same,  and also the  emitted code from the clang with -O2 and the opt with -O2 seems to be the same. So why is the optimiser called individually not reporting the full stats?


Thank you,

Kyriakos


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
--
~Craig

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Opt seems to report fewer stats when splitting the compilation to the several stages, rather than using clang

Robinson, Paul via cfe-dev

Thank you! This solved my issue!


From: Craig Topper <[hidden email]>
Sent: 04 December 2017 17:46:01
To: Kyriakos Georgiou
Cc: [hidden email]
Subject: Re: [cfe-dev] Opt seems to report fewer stats when splitting the compilation to the several stages, rather than using clang
 
You need to invoke clang with -O2 -Xclang -disable-llvm-optzns instead of -O0

-O0 implies to clang to mark all functions as optnone in the generated IR

On Mon, Dec 4, 2017 at 4:31 AM Kyriakos Georgiou via cfe-dev <[hidden email]> wrote:

Hi,


I am trying to recreate the compilation of using clang and -O2 by splitting the compilation to the several stages using the several llvm tools (clang as front-end, opt, llc..) but the optimiser seems to behave differently if called individually than when called by clang. I am using the latest llvm version. So I am calling the front end using:


clang  -mllvm -stats -mllvm -debug-pass=Structure -target arm-none-eabi -isystem /usr/local/lpcxpresso_8.2.2_650/lpcxpresso/tools/arm-none-eabi/include -DHAVE_CONFIG_H -I. -I../.. -Wall -I ../../i^Clude -I ../../config/arm/boards/lpcxpresso_1769_xpresso/ -I ../../config/arm/chips/lpc175x_6x/ "-DCALIB_SCALE=0" --static -mthumb -mcpu=cortex-m3 -D__USE_LPCOPEN -D__CODE_RED -D__REDLIB__ -DCHIP_LPC175X_6X -DCORE_M3 -ffunction-sections -fdata-sections -O2 -c -emit-llvm  -o adpcm_optimized.bc adpcm.c


this also gives me the optimizer statistics and the structure of calling the passes.

then I am using 
 

clang  -mllvm -stats -mllvm -debug-pass=Structure -target arm-none-eabi -isystem /usr/local/lpcxpresso_8.2.2_650/lpcxpresso/tools/arm-none-eabi/include -DHAVE_CONFIG_H -I. -I../.. -Wall -I ../../i^Clude -I ../../config/arm/boards/lpcxpresso_1769_xpresso/ -I ../../config/arm/chips/lpc175x_6x/ "-DCALIB_SCALE=0" --static -mthumb -mcpu=cortex-m3 -D__USE_LPCOPEN -D__CODE_RED -D__REDLIB__ -DCHIP_LPC175X_6X -DCORE_M3 -ffunction-sections -fdata-sections -O0 -c -emit-llvm  -o adpcm_no_opt.bc adpcm.c

and then

opt -stats -O2 -S  -debug-pass=Structure adpcm_no_opt.bc -o adpcm.ll

I was expecting the stats coming out of the opt call to be the same as calling clang with -O2. Instead they are very little:

 18 basicaa          - Number of times a GEP is decomposed

  1 cgscc-passmgr    - Maximum CGSCCPassMgr iterations on one SCC

104 globalopt        - Number of globals marked unnamed_addr

  6 globalsmodref-aa - Number of global vars without address taken


Am I doing something wrong or missing something? The Structure of calling the passes is still the same,  and also the  emitted code from the clang with -O2 and the opt with -O2 seems to be the same. So why is the optimiser called individually not reporting the full stats?


Thank you,

Kyriakos


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
--
~Craig

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev