Clang generates calls to llvm.memcpy with overlapping arguments, but LangRef requires the arguments to not overlap

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Clang generates calls to llvm.memcpy with overlapping arguments, but LangRef requires the arguments to not overlap

Hollman, Daisy Sophia via cfe-dev
Hi,

It appears that Clang generates calls to `llvm.memcpy` with potentially overlapping arguments in some cases.

For the snippet below

struct S
{
  char s[25];
};

struct S *p;

void test2() {
 ...
  foo (&b, 1);
  b = a;
  b = *p;
...
}

 
Clang uses `llvm.memcpy` to copy the struct:

  call void @foo(%struct.S* %2, i32 1)
  %7 = bitcast %struct.S* %2 to i8*
  %8 = bitcast %struct.S* %1 to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %7, i8* align 1 %8, i64 25, i1 false)
  %9 = load %struct.S*, %struct.S** @p, align 8
  %10 = bitcast %struct.S* %2 to i8*
  %11 = bitcast %struct.S* %9 to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %10, i8* align 1 %11, i64 25, i1 false)


In the C example, `foo` could set `p = &b` and then `b = *p` would just copy the contents from `b` into `b`. This means that the the arguments to the second llvm.memcpy call may overlap, which seems not allowed according to the current version of the LangRef (https://llvm.org/docs/LangRef.html#llvm-memcpy-intrinsic). This is problematic, because the fact is used in BasicAliasAnalysis for example (https://github.com/llvm/llvm-project/blob/master/llvm/lib/Analysis/BasicAliasAnalysis.cpp#L982).

The full, build-able example can be found here: https://godbolt.org/z/PY1vKq

I might be missing something, but it appears that Clang should not create call to `llvm.memcpy` unless it can guarantee the arguments cannot overlap. I am not sure what the best alternative to `llvm.memcpy` would be in case the arguments overlap.

Cheers,
Florian
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Clang generates calls to llvm.memcpy with overlapping arguments, but LangRef requires the arguments to not overlap

Hollman, Daisy Sophia via cfe-dev
On Tue, 25 Aug 2020, 10:53 Florian Hahn via cfe-dev, <[hidden email]> wrote:
Hi,

It appears that Clang generates calls to `llvm.memcpy` with potentially overlapping arguments in some cases.

For the snippet below

struct S
{
  char s[25];
};

struct S *p;

void test2() {
 ...
  foo (&b, 1);
  b = a;
  b = *p;
...
}


Clang uses `llvm.memcpy` to copy the struct:

  call void @foo(%struct.S* %2, i32 1)
  %7 = bitcast %struct.S* %2 to i8*
  %8 = bitcast %struct.S* %1 to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %7, i8* align 1 %8, i64 25, i1 false)
  %9 = load %struct.S*, %struct.S** @p, align 8
  %10 = bitcast %struct.S* %2 to i8*
  %11 = bitcast %struct.S* %9 to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %10, i8* align 1 %11, i64 25, i1 false)


In the C example, `foo` could set `p = &b` and then `b = *p` would just copy the contents from `b` into `b`. This means that the the arguments to the second llvm.memcpy call may overlap, which seems not allowed according to the current version of the LangRef (https://llvm.org/docs/LangRef.html#llvm-memcpy-intrinsic). This is problematic, because the fact is used in BasicAliasAnalysis for example (https://github.com/llvm/llvm-project/blob/master/llvm/lib/Analysis/BasicAliasAnalysis.cpp#L982).

The full, build-able example can be found here: https://godbolt.org/z/PY1vKq

I might be missing something, but it appears that Clang should not create call to `llvm.memcpy` unless it can guarantee the arguments cannot overlap. I am not sure what the best alternative to `llvm.memcpy` would be in case the arguments overlap.

To be clear: the concern here is that the source and destination could be equal, not that they might partially overlap (which I'm pretty sure the relevant language rules disallow)?

This seems reminiscent of the issue that memcpy is formally undefined if given a size of zero and a null pointer argument, for which we changed the rules for @llvm.memcpy to define the behaviour in that case. Can we do the same here (or at least add another LLVM intrinsic that permits exact overlap, but not partial overlap)? Note that both GCC and Clang have pretty much always generated a memcpy for cases such as:

struct A { char buff[999999]; };
void f(struct A *p, struct A *q) { *p = *q; }

in both C and C++ modes, and MSVC and ICC do roughly the same thing when optimizing (though in ICC's case it invokes an intel-specific function, and who knows if that explicitly defines the behavior for exact overlap), so one would imagine that if there are any memcpy implementations for which memcpy(p, p, n) doesn't work, we'd have seen program misbehavior by now. As a consequence, as with the memcpy(0, 0, 0) case, LLVM can probably lower a memcpy-with-exact-overlap-permitted to a memcpy libcall on ~all targets.

Cheers,
Florian
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Clang generates calls to llvm.memcpy with overlapping arguments, but LangRef requires the arguments to not overlap

Hollman, Daisy Sophia via cfe-dev


On Aug 25, 2020, at 4:16 PM, Richard Smith via cfe-dev <[hidden email]> wrote:

On Tue, 25 Aug 2020, 10:53 Florian Hahn via cfe-dev, <[hidden email]> wrote:
Hi,

It appears that Clang generates calls to `llvm.memcpy` with potentially overlapping arguments in some cases.

For the snippet below

struct S
{
  char s[25];
};

struct S *p;

void test2() {
 ...
  foo (&b, 1);
  b = a;
  b = *p;
...
}


Clang uses `llvm.memcpy` to copy the struct:

  call void @foo(%struct.S* %2, i32 1)
  %7 = bitcast %struct.S* %2 to i8*
  %8 = bitcast %struct.S* %1 to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %7, i8* align 1 %8, i64 25, i1 false)
  %9 = load %struct.S*, %struct.S** @p, align 8
  %10 = bitcast %struct.S* %2 to i8*
  %11 = bitcast %struct.S* %9 to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %10, i8* align 1 %11, i64 25, i1 false)


In the C example, `foo` could set `p = &b` and then `b = *p` would just copy the contents from `b` into `b`. This means that the the arguments to the second llvm.memcpy call may overlap, which seems not allowed according to the current version of the LangRef (https://llvm.org/docs/LangRef.html#llvm-memcpy-intrinsic). This is problematic, because the fact is used in BasicAliasAnalysis for example (https://github.com/llvm/llvm-project/blob/master/llvm/lib/Analysis/BasicAliasAnalysis.cpp#L982).

The full, build-able example can be found here: https://godbolt.org/z/PY1vKq

I might be missing something, but it appears that Clang should not create call to `llvm.memcpy` unless it can guarantee the arguments cannot overlap. I am not sure what the best alternative to `llvm.memcpy` would be in case the arguments overlap.

To be clear: the concern here is that the source and destination could be equal, not that they might partially overlap (which I'm pretty sure the relevant language rules disallow)?

This seems reminiscent of the issue that memcpy is formally undefined if given a size of zero and a null pointer argument, for which we changed the rules for @llvm.memcpy to define the behaviour in that case. Can we do the same here (or at least add another LLVM intrinsic that permits exact overlap, but not partial overlap)? Note that both GCC and Clang have pretty much always generated a memcpy for cases such as:

struct A { char buff[999999]; };
void f(struct A *p, struct A *q) { *p = *q; }

in both C and C++ modes, and MSVC and ICC do roughly the same thing when optimizing (though in ICC's case it invokes an intel-specific function, and who knows if that explicitly defines the behavior for exact overlap), so one would imagine that if there are any memcpy implementations for which memcpy(p, p, n) doesn't work, we'd have seen program misbehavior by now. As a consequence, as with the memcpy(0, 0, 0) case, LLVM can probably lower a memcpy-with-exact-overlap-permitted to a memcpy libcall on ~all targets.

This was also discussed in https://bugs.llvm.org/show_bug.cgi?id=11763, I’ll copy and paste James’ comment at the end, which seems reasonable and also confirms the partial-overlap case is not an issue:

***
James Y Knight 2018-07-19 11:19:11 PDT
In light of the last updates, it sounds like what's required here is just a documentation update.

1. LLVM requires that the platform provide a memcpy function which works when the two pointers are equal. That is not a requirement of the C standard for the user-visible memcpy function, but it is a requirement that LLVM has on platforms it compiles code for.

If there are any platform which provides a memcpy which does not satisfy that requirement, I believe that platform needs to be modified, not LLVM. I'd be surprised if there were any such platforms, since GCC has the same requirement.

2. Likewise, the same requirement exists for the llvm.memcpy IR intrinsic. The LangRef should be updated to state that.

3. Partially-overlapping calls from struct assignment are UB, so we're fine there, such issues correspond to actual bugs in user-code.
***

Unless anyone disagrees with that approach, or anything has changed since, Florian perhaps you could update the relevant documentation via the noalias patch I believe you were working on, since this is in that same family of issues?

Dave



Cheers,
Florian
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Clang generates calls to llvm.memcpy with overlapping arguments, but LangRef requires the arguments to not overlap

Hollman, Daisy Sophia via cfe-dev
In reply to this post by Hollman, Daisy Sophia via cfe-dev
  • llvm-dev

On 25 Aug 2020, at 13:53, Florian Hahn wrote:

Hi,

It appears that Clang generates calls to `llvm.memcpy` with potentially overlapping arguments in some cases.

For the snippet below

struct S
{
char s[25];
};

struct S *p;

void test2() {
...
foo (&b, 1);
b = a;
b = *p;
...
}


Clang uses `llvm.memcpy` to copy the struct:

call void @foo(%struct.S* %2, i32 1)
%7 = bitcast %struct.S* %2 to i8*
%8 = bitcast %struct.S* %1 to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %7, i8* align 1 %8, i64 25, i1 false)
%9 = load %struct.S*, %struct.S** @p, align 8
%10 = bitcast %struct.S* %2 to i8*
%11 = bitcast %struct.S* %9 to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %10, i8* align 1 %11, i64 25, i1 false)


In the C example, `foo` could set `p = &b` and then `b = *p` would just copy the contents from `b` into `b`. This means that the the arguments to the second llvm.memcpy call may overlap, which seems not allowed according to the current version of the LangRef (https://llvm.org/docs/LangRef.html#llvm-memcpy-intrinsic). This is problematic, because the fact is used in BasicAliasAnalysis for example (https://github.com/llvm/llvm-project/blob/master/llvm/lib/Analysis/BasicAliasAnalysis.cpp#L982).

The full, build-able example can be found here: https://godbolt.org/z/PY1vKq

I might be missing something, but it appears that Clang should not create call to `llvm.memcpy` unless it can guarantee the arguments cannot overlap. I am not sure what the best alternative to `llvm.memcpy` would be in case the arguments overlap.

C allows overlapping assignments only when the overlap is exact, i.e. the addresses are exactly the same. I agree that just emitting memcpy isn’t strictly legal, because neither C nor LLVM allows even exact overlap. On the other hand, Clang would really like to avoid emitting extra control flow here, because exact overlap is uncommon enough that emitting a no-op memcpy (if it were indeed a no-op) is definitely the right trade-off in practice; and LLVM probably won’t reliably remove branches around a memcpy call if we add them. And it’s very hard to imagine a memcpy implementation that actually wouldn’t work with exact overlap (maybe something that tried to just remap pages?).

I think we have are four options:

  1. We can relax llvm.memcpy to allow exact overlap. Practically, this would depend on memcpy being a no-op on exact overlap; we definitely wouldn’t want LLVM to have to start inserting control flow around memcpy calls.
  2. We can make Clang emit assignments with llvm.memmove. This would make assignment work even with non-exact overlap, which is unnecessary; I’m not sure the cost is that high.
  3. We can make Clang emit an explicit check and control flow around memcpy when there might be overlap. Clang IRGen should already maintain the right information to avoid doing this when e.g. initializing a variable.
  4. We can add a new intrinsic — or some sort of decoration of the existing one — that does a memcpy but allows exact overlap.

John.


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Clang generates calls to llvm.memcpy with overlapping arguments, but LangRef requires the arguments to not overlap

Hollman, Daisy Sophia via cfe-dev
On Aug 25, 2020, at 21:16, Richard Smith <[hidden email]> wrote:

To be clear: the concern here is that the source and destination could be equal, not that they might partially overlap (which I'm pretty sure the relevant language rules disallow)?


Yes after going through your response and the discussion at  https://bugs.llvm.org/show_bug.cgi?id=11763 it seems like the only issue is that source & destination could be equal. 


On Aug 25, 2020, at 22:45, David Rector <[hidden email]> wrote:

This was also discussed in https://bugs.llvm.org/show_bug.cgi?id=11763, I’ll copy and paste James’ comment at the end, which seems reasonable and also confirms the partial-overlap case is not an issue:

***
James Y Knight 2018-07-19 11:19:11 PDT
In light of the last updates, it sounds like what's required here is just a documentation update.

1. LLVM requires that the platform provide a memcpy function which works when the two pointers are equal. That is not a requirement of the C standard for the user-visible memcpy function, but it is a requirement that LLVM has on platforms it compiles code for.

If there are any platform which provides a memcpy which does not satisfy that requirement, I believe that platform needs to be modified, not LLVM. I'd be surprised if there were any such platforms, since GCC has the same requirement.

2. Likewise, the same requirement exists for the llvm.memcpy IR intrinsic. The LangRef should be updated to state that.

3. Partially-overlapping calls from struct assignment are UB, so we're fine there, such issues correspond to actual bugs in user-code.
***

Unless anyone disagrees with that approach, or anything has changed since, Florian perhaps you could update the relevant documentation via the noalias patch I believe you were working on, since this is in that same family of issues?

Thanks for pointing out the existing, longstanding issue! I am not sure if the change is as simple as updating the LangRef for `llvm.memcpy` I am afraid, as we would have to also track down all the places in the LLVM codebase that use the assumption. But there are likely not too many (mostly alias analysis).


On Aug 25, 2020, at 23:06, John McCall <[hidden email]> wrote:

C allows overlapping assignments only when the overlap is exact, i.e. the addresses are exactly the same. I agree that just emitting memcpy isn’t strictly legal, because neither C nor LLVM allows even exact overlap. On the other hand, Clang would really like to avoid emitting extra control flow here, because exact overlap is uncommon enough that emitting a no-op memcpy (if it were indeed a no-op) is definitely the right trade-off in practice; and LLVM probably won’t reliably remove branches around a memcpy call if we add them. And it’s very hard to imagine a memcpy implementation that actually wouldn’t work with exact overlap (maybe something that tried to just remap pages?).

I think we have are four options:

  1. We can relax llvm.memcpy to allow exact overlap. Practically, this would depend on memcpy being a no-op on exact overlap; we definitely wouldn’t want LLVM to have to start inserting control flow around memcpy calls.
  2. We can make Clang emit assignments with llvm.memmove. This would make assignment work even with non-exact overlap, which is unnecessary; I’m not sure the cost is that high.
  3. We can make Clang emit an explicit check and control flow around memcpy when there might be overlap. Clang IRGen should already maintain the right information to avoid doing this when e.g. initializing a variable.
  4. We can add a new intrinsic — or some sort of decoration of the existing one — that does a memcpy but allows exact overlap.

Thanks for summarizing the options! I think 4. would probably the safest option, because we won’t have to track down the places that use the non-overlapping assumption. There might also be such cases somewhere downstream, which might be broken silently. If Clang already has the information when the operands cannot overlap (as mentioned in 3.), we only have to emit the non-overlapping version for those cases, causing the least amount of change for the generated IR.

1. Also seems possible (there probably are not too many places in LLVM that use the non-overlapping assumption), but I think in some cases the non-overlapping info can be a useful property to convey for frontends.

Cheers,
Florian


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Clang generates calls to llvm.memcpy with overlapping arguments, but LangRef requires the arguments to not overlap

Hollman, Daisy Sophia via cfe-dev


On Aug 26, 2020, at 16:40, Florian Hahn via llvm-dev <[hidden email]> wrote:
On Aug 25, 2020, at 23:06, John McCall <[hidden email]> wrote:

C allows overlapping assignments only when the overlap is exact, i.e. the addresses are exactly the same. I agree that just emitting memcpy isn’t strictly legal, because neither C nor LLVM allows even exact overlap. On the other hand, Clang would really like to avoid emitting extra control flow here, because exact overlap is uncommon enough that emitting a no-op memcpy (if it were indeed a no-op) is definitely the right trade-off in practice; and LLVM probably won’t reliably remove branches around a memcpy call if we add them. And it’s very hard to imagine a memcpy implementation that actually wouldn’t work with exact overlap (maybe something that tried to just remap pages?).

I think we have are four options:

  1. We can relax llvm.memcpy to allow exact overlap. Practically, this would depend on memcpy being a no-op on exact overlap; we definitely wouldn’t want LLVM to have to start inserting control flow around memcpy calls.
  2. We can make Clang emit assignments with llvm.memmove. This would make assignment work even with non-exact overlap, which is unnecessary; I’m not sure the cost is that high.
  3. We can make Clang emit an explicit check and control flow around memcpy when there might be overlap. Clang IRGen should already maintain the right information to avoid doing this when e.g. initializing a variable.
  4. We can add a new intrinsic — or some sort of decoration of the existing one — that does a memcpy but allows exact overlap.

Thanks for summarizing the options! I think 4. would probably the safest option, because we won’t have to track down the places that use the non-overlapping assumption. There might also be such cases somewhere downstream, which might be broken silently. If Clang already has the information when the operands cannot overlap (as mentioned in 3.), we only have to emit the non-overlapping version for those cases, causing the least amount of change for the generated IR.

1. Also seems possible (there probably are not too many places in LLVM that use the non-overlapping assumption), but I think in some cases the non-overlapping info can be a useful property to convey for frontends.


I took a look at the places in LLVM that make use of the non-overlapping guarantee and it appears BasicAliasAnalysis is the only place. I put up a patch https://reviews.llvm.org/D86815 to illustrate the impact on the code & tests. It seems quite manageable. It might be feasible to start with removing the guarantee and re-introduce a variant with the additional guarantee.

What do you think? Alternatively I can also put up a patch to add a new variant without the guarantee. In that case, I would appreciate some pointers on how to tell whether arguments are guaranteed to not overlap in Clang.

Cheers,
Florian

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Clang generates calls to llvm.memcpy with overlapping arguments, but LangRef requires the arguments to not overlap

Hollman, Daisy Sophia via cfe-dev


On Aug 28, 2020, at 21:49, Florian Hahn <[hidden email]> wrote:



On Aug 26, 2020, at 16:40, Florian Hahn via llvm-dev <[hidden email]> wrote:
On Aug 25, 2020, at 23:06, John McCall <[hidden email]> wrote:

C allows overlapping assignments only when the overlap is exact, i.e. the addresses are exactly the same. I agree that just emitting memcpy isn’t strictly legal, because neither C nor LLVM allows even exact overlap. On the other hand, Clang would really like to avoid emitting extra control flow here, because exact overlap is uncommon enough that emitting a no-op memcpy (if it were indeed a no-op) is definitely the right trade-off in practice; and LLVM probably won’t reliably remove branches around a memcpy call if we add them. And it’s very hard to imagine a memcpy implementation that actually wouldn’t work with exact overlap (maybe something that tried to just remap pages?).

I think we have are four options:

  1. We can relax llvm.memcpy to allow exact overlap. Practically, this would depend on memcpy being a no-op on exact overlap; we definitely wouldn’t want LLVM to have to start inserting control flow around memcpy calls.
  2. We can make Clang emit assignments with llvm.memmove. This would make assignment work even with non-exact overlap, which is unnecessary; I’m not sure the cost is that high.
  3. We can make Clang emit an explicit check and control flow around memcpy when there might be overlap. Clang IRGen should already maintain the right information to avoid doing this when e.g. initializing a variable.
  4. We can add a new intrinsic — or some sort of decoration of the existing one — that does a memcpy but allows exact overlap.

Thanks for summarizing the options! I think 4. would probably the safest option, because we won’t have to track down the places that use the non-overlapping assumption. There might also be such cases somewhere downstream, which might be broken silently. If Clang already has the information when the operands cannot overlap (as mentioned in 3.), we only have to emit the non-overlapping version for those cases, causing the least amount of change for the generated IR.

1. Also seems possible (there probably are not too many places in LLVM that use the non-overlapping assumption), but I think in some cases the non-overlapping info can be a useful property to convey for frontends.


I took a look at the places in LLVM that make use of the non-overlapping guarantee and it appears BasicAliasAnalysis is the only place. I put up a patch https://reviews.llvm.org/D86815 to illustrate the impact on the code & tests. It seems quite manageable. It might be feasible to start with removing the guarantee and re-introduce a variant with the additional guarantee.

What do you think? Alternatively I can also put up a patch to add a new variant without the guarantee. In that case, I would appreciate some pointers on how to tell whether arguments are guaranteed to not overlap in Clang.


Just a quick update. The behavior of llvm.memcpy has been adjusted and the alias analysis code has been updated:  https://reviews.llvm.org/rG1ddb3a369f7ebdf738486cd60261c3143658c0e6


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev