Unexpected x86/AVX cmpXY builtin codegen

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Unexpected x86/AVX cmpXY builtin codegen

Syoyo Fujita
I am investigating the unexptected x86/AVX cmpXY builtin codegen in clang.
Actually, this is a problem of how clang handles I-C-E expression.

<<Problem definition>>
AVX's cmpXY instruction, for example _mm256_cmp_ps() in C intrinsic
function, allows only immediate value(or constant integer value) for
third args.

(code example)
#define _CMP_GE_OS 0x0d
_mm256_cmp_ps(a, b, _CMP_GE_OS)

but its definition in clang's avxintrin.h is using static inline
function definition.

static __inline __m256 __attribute__((__always_inline__, __nodebug__))
_mm256_cmp_ps(__m256 a, __m256 b, const int c)
{
  return (__m256)__builtin_ia32_cmpps256((__v8sf)a, (__v8sf)b, c);
}

clang's constant integer folder and ICE(Integer-Constant-Expression)
engine cannot detect constant integer value expresion over function
boundary,
thus in this case clang emits scalar expression for third argument
instead of constant(immediate value) expression.

  ...
  %tmp2.i = load i32* %c.addr.i, align 4
  %conv.i = trunc i32 %tmp2.i to i8
  %0 = call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %tmp.i,
<8 x float> %tmp1.i, i8 %conv.i) nounwind

With this, llc failed to emit assembly since x86/AVX backend in LLVM
expects third argument is immediate value.

Expected codegen is as follows

  %0 = call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %tmp, <8 x float> %
tmp1, i8 13)
  (third element is expanded to immediate value)

<<Solution proposed>>
There would be two solution to fix this problem.

1) Rewrite cmpXY inline function as macro

This is the easiest solution for clang, but might lose a compatibility
with avxintrin.h provided by other parties(e.g. gcc).

2) Extend constant integer folding

Extend CheckICE() and Evaluate() in lib/AST/ExprConstant.cpp so that
it can correctly handle constant integer expression over function
boundary.

<<Action>>
It is easy for me to provide a patch for 1) solution. But I'm not sure
how much clang guys want to maintain the compatibility with gcc or
other parties's avxintrin.h.
If clang guys want to maintain the compatibility as much as possible,
solution would be 2), it might require large modification of I-C-E
engine.

--
Syoyo
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Unexpected x86/AVX cmpXY builtin codegen

Chris Lattner

On Mar 25, 2011, at 11:39 PM, Syoyo Fujita wrote:

> I am investigating the unexptected x86/AVX cmpXY builtin codegen in clang.
> Actually, this is a problem of how clang handles I-C-E expression.

Great!

> 1) Rewrite cmpXY inline function as macro
>
> This is the easiest solution for clang, but might lose a compatibility
> with avxintrin.h provided by other parties(e.g. gcc).

This is the right answer.  it is somewhat annoying, but much simpler and better than the alternatives.  _mm_alignr_epi8 and other functions are already implemented this way.

-Chris
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Unexpected x86/AVX cmpXY builtin codegen

Syoyo Fujita
Hello Chris,

>> 1) Rewrite cmpXY inline function as macro
>>
>> This is the easiest solution for clang, but might lose a compatibility
>> with avxintrin.h provided by other parties(e.g. gcc).
>
> This is the right answer.  it is somewhat annoying, but much simpler and better than the alternatives.  _mm_alignr_epi8 and other functions are already implemented this way.

Okay, I'll provide a patch with this style within few days.

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev