Vectorizer and buffer overlaps

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Vectorizer and buffer overlaps

Nicola Gigante
Hello

I feel like this is a trivial question but I can’t find an answer.

I’m looking at the llvm vectorizer to learn how to better
take advantage of it.

I have this function:

void product(float *data1, float *data2, float *result, size_t size) {
    size_t i = 0;
    for(i = 0; i < size; i++) {
        result[i] = data1[i] * data2[i];
    }
}

It seems it correctly get vectorized with -O3, but I can see
the vector.memcheck block that looks for overlapping buffers.

How can I inform the compiler that I know the buffer won’t overlap?

Simulating a real case (kind of), I’ve tried something like this:
struct mass_tag;
struct acceleration_tag;
struct force_tag;

template<typename T, typename>
class wrapper {
    T _v;
public:
    wrapper(T v) : _v(v) { }
    operator T() const { return _v; }
};

using mass = wrapper<float, mass_tag>;
using acceleration = wrapper<float, acceleration_tag>;
using force = wrapper<float, force_tag>;

void product(mass *data1, acceleration *data2,
             force *result, size_t size) {
    size_t i = 0;
    for(i = 0; i < size; i++) {
        result[i] = data1[i] * data2[i];
    }
}

The concept here reminds something like Boost.Unit.
I thought that being the pointers of different type,
type based alias analysis would say that they don’t overlap,
but I’m missing something since the memcheck block is still
there. I’m compiling with clang 3.4 with -O3 -fstrict-aliasing, isn’t it enough?

Even this version has the memcheck block:
void product(std::array<mass, 32> data1, std::array<acceleration, 32> data2, std::array<force, 32> &result) {
    size_t i = 0;
    for(i = 0; i < 32; i++) {
        result[i] = data1[i] * data2[i];
    }
}

Inputs arrays are copied (unrealistic code)... Why does it check for overlaps?
I feel like I’m missing something obvious.
So how do I inform the compiler that it doesn’t need che memcheck block?

Thank you very much,
Nicola




_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Vectorizer and buffer overlaps

Philip Reames

On 05/02/2014 04:04 AM, Nicola Gigante wrote:

> Hello
>
> I feel like this is a trivial question but I can’t find an answer.
>
> I’m looking at the llvm vectorizer to learn how to better
> take advantage of it.
>
> I have this function:
>
> void product(float *data1, float *data2, float *result, size_t size) {
>      size_t i = 0;
>      for(i = 0; i < size; i++) {
>          result[i] = data1[i] * data2[i];
>      }
> }
>
> It seems it correctly get vectorized with -O3, but I can see
> the vector.memcheck block that looks for overlapping buffers.
>
> How can I inform the compiler that I know the buffer won’t overlap?
It sounds like you're looking for the "restrict" keyword in C/C++. I
haven't looked at how this gets translated into LLVM IR, but you can run
it through Clang if needed.

>
> Simulating a real case (kind of), I’ve tried something like this:
> struct mass_tag;
> struct acceleration_tag;
> struct force_tag;
>
> template<typename T, typename>
> class wrapper {
>      T _v;
> public:
>      wrapper(T v) : _v(v) { }
>      operator T() const { return _v; }
> };
>
> using mass = wrapper<float, mass_tag>;
> using acceleration = wrapper<float, acceleration_tag>;
> using force = wrapper<float, force_tag>;
>
> void product(mass *data1, acceleration *data2,
>               force *result, size_t size) {
>      size_t i = 0;
>      for(i = 0; i < size; i++) {
>          result[i] = data1[i] * data2[i];
>      }
> }
>
> The concept here reminds something like Boost.Unit.
> I thought that being the pointers of different type,
> type based alias analysis would say that they don’t overlap,
> but I’m missing something since the memcheck block is still
> there. I’m compiling with clang 3.4 with -O3 -fstrict-aliasing, isn’t it enough?
I'm going to leave this part to someone else.  I'm not quite sure of the
C++ rules here.  I suspect you're suffering from the general "cast
through void*" problem though.

>
> Even this version has the memcheck block:
> void product(std::array<mass, 32> data1, std::array<acceleration, 32> data2, std::array<force, 32> &result) {
>      size_t i = 0;
>      for(i = 0; i < 32; i++) {
>          result[i] = data1[i] * data2[i];
>      }
> }
>
> Inputs arrays are copied (unrealistic code)... Why does it check for overlaps?
> I feel like I’m missing something obvious.
> So how do I inform the compiler that it doesn’t need che memcheck block?
This sounds like either a) a bug, or b) information lost due to lack of
inlining.  The new dynamic allocations should be marked noalias. As a
result, we shouldn't need the check.  Have you looked at the O3 IR to
see if the constructors of array get inlined?

Philip
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Vectorizer and buffer overlaps

Renato Golin Linaro
In reply to this post by Nicola Gigante
On 2 May 2014 12:04, Nicola Gigante <[hidden email]> wrote:
> How can I inform the compiler that I know the buffer won’t overlap?

As Philip said, restrict should do the trick.


> Inputs arrays are copied (unrealistic code)... Why does it check for overlaps?

The types are different but they're not obviously different
(templates, lists of things), so it might just be that TBAA is not
recognizing them as different types.

Have a look at the IR, search for TBAA metadata, you may find some clues.


> So how do I inform the compiler that it doesn’t need che memcheck block?

In this case, the compiler should pick it up. While it doesn't, you
may try restrict on them, too.

Have a look at the Bugzilla, if there isn't anything obviously
reported for this problem, feel free to fill a new one.

Thanks,
--renato

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Vectorizer and buffer overlaps

Nicola Gigante

Il giorno 03/mag/2014, alle ore 12:55, Renato Golin <[hidden email]> ha scritto:

> On 2 May 2014 12:04, Nicola Gigante <[hidden email]> wrote:
>> How can I inform the compiler that I know the buffer won’t overlap?
>
> As Philip said, restrict should do the trick.
>

Yes, of course. I'm sorry I've not mentioned it in my original email,
but I know about restrict, it's only that it's not supported in C++, and yes it
may be supported as an extension, but I'm interested in writing
compliant code.

>
>> Inputs arrays are copied (unrealistic code)... Why does it check for overlaps?
>
> The types are different but they're not obviously different
> (templates, lists of things), so it might just be that TBAA is not
> recognizing them as different types.
>
> Have a look at the IR, search for TBAA metadata, you may find some clues.
>

The TBAA metadata in the IR seems to be there, in the original code (which
is left in the IR for the remainder, so I think it's there from before the vectorizer
pass) marking that data1, data2 and result are indeed of different types,
but I don't know the internals so I may misunderstand. I've attached the IR
file with the original source in comments.

The relevant piece is:

%22 = getelementptr inbounds %class.wrapper* %data1, i64 %i.01, i32 0
%23 = load float* %22, align 4, !tbaa !4
%24 = getelementptr inbounds %class.wrapper.0* %data2, i64 %i.01, i32 0
%25 = load float* %24, align 4, !tbaa !9
%26 = fmul float %23, %25
%27 = getelementptr inbounds %class.wrapper.1* %result, i64 %i.01, i32 0
store float %26, float* %27, align 4, !tbaa !11

These are the metadata:
!0 = metadata !{metadata !"Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn)"}
!1 = metadata !{metadata !1, metadata !2, metadata !3}
!2 = metadata !{metadata !"llvm.vectorizer.width", i32 1}
!3 = metadata !{metadata !"llvm.vectorizer.unroll", i32 1}
!4 = metadata !{metadata !5, metadata !6, i64 0}
!5 = metadata !{metadata !"_ZTS7wrapperIf8mass_tagE", metadata !6, i64 0}
!6 = metadata !{metadata !"float", metadata !7, i64 0}
!7 = metadata !{metadata !"omnipotent char", metadata !8, i64 0}
!8 = metadata !{metadata !"Simple C/C++ TBAA"}
!9 = metadata !{metadata !10, metadata !6, i64 0}
!10 = metadata !{metadata !"_ZTS7wrapperIf16acceleration_tagE", metadata !6, i64 0}
!11 = metadata !{metadata !6, metadata !6, i64 0}
!12 = metadata !{metadata !12, metadata !2, metadata !3}

metadata !4 indirectly contains wrapper<mass_tag>
and metadata !9 indirectly contains wrapper<acceleration_tag>
It seems that metadata !11 points only to "float", not to wrapper<force_tag>,
but I'm not very proficient in the internals of the TBAA pass, and I'm not
sure how to interpret those data.

Any clue?

>
>> So how do I inform the compiler that it doesn’t need che memcheck block?
>
> In this case, the compiler should pick it up. While it doesn't, you
> may try restrict on them, too.
>
> Have a look at the Bugzilla, if there isn't anything obviously
> reported for this problem, feel free to fill a new one.
>

I'll do.

> Thanks,
> --renato


Thanks,
Nicola


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

prova.ll (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Vectorizer and buffer overlaps

Renato Golin Linaro
On 3 May 2014 15:12, Nicola Gigante <[hidden email]> wrote:
> metadata !4 indirectly contains wrapper<mass_tag>
> and metadata !9 indirectly contains wrapper<acceleration_tag>
> It seems that metadata !11 points only to "float", not to wrapper<force_tag>,
> but I'm not very proficient in the internals of the TBAA pass, and I'm not
> sure how to interpret those data.

Yes, TBAA seems to be getting it right. It may be possible that the
vectorizer is bailing out because the class might not have a trivial
constructor/destructor (it may not even be checking).

Would be good to have a reduced case where this happens on the bug report.

thanks!
--renato
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Vectorizer and buffer overlaps

Nicola Gigante

Il giorno 03/mag/2014, alle ore 17:42, Renato Golin <[hidden email]> ha scritto:

> On 3 May 2014 15:12, Nicola Gigante <[hidden email]> wrote:
>> metadata !4 indirectly contains wrapper<mass_tag>
>> and metadata !9 indirectly contains wrapper<acceleration_tag>
>> It seems that metadata !11 points only to "float", not to wrapper<force_tag>,
>> but I'm not very proficient in the internals of the TBAA pass, and I'm not
>> sure how to interpret those data.
>
> Yes, TBAA seems to be getting it right. It may be possible that the
> vectorizer is bailing out because the class might not have a trivial
> constructor/destructor (it may not even be checking).
>
> Would be good to have a reduced case where this happens on the bug report.
>

I've filled the report:
http://llvm.org/bugs/show_bug.cgi?id=19651

> thanks!
> --renato

Thanks,
Nicola


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev