[RFC] ASM Goto With Output Constraints

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[RFC] ASM Goto With Output Constraints

Nathan Ridge via cfe-dev

Now that ASM goto support has landed, Nick Desaulniers and I wrote up a document describing how to expand clang's implementation of ASM goto to support output constraints. The work should be straight-forward, but as always will need to be verified to work. Below is a copy of our whitepaper. Please take a look and offer any comments you have.

Share and enjoy!
-bw

Overview

Support for asm goto with output constraints is a feature that the Linux community is interested in having. Adding this new feature should give Clang a higher profile in the Linux community:


  • It demonstrates the Clang community's commitment to supporting Linux.

  • Developers are likely to adopt it on their own, which means they will need to use Clang in some fashion, either as a complete replacement for or in addition to GCC.

Current state

Clang's implementation of asm goto converts this code:


int vogon(unsigned a, unsigned b) {
  asm goto("poetry %0, %1" : : "r"(a), "r"(b) : : error);
  return a + b;

error:
  return -1;
}


into the following LLVM IR:


define i32 @vogon(i32 %a, i32 %b) {
entry:
  callbr void asm sideeffect "poetry $0, $1", "r,r,X"
      (i32 %a, i32 %b, i8* blockaddress(@vogon, %return))
          to label %asm.fallthrough [label %return]

asm.fallthrough:
  %add = add i32 %b, %a
  br label %return

return:
  %retval.0 = phi i32 [ %add, %asm.fallthrough ], [ -1, %entry ]
  ret i32 %retval.0
}


Our proposal won't change LLVM's current behavior–i.e. a callbr without a return value will act in the same way as the current implementation.

Proposal

GCC restricts asm goto from having output constraints due to limitations in its internal representation–i.e. GCC's control transfer instructions cannot have outputs. For example:


int vogon(int a, int b) {
  asm goto("poetry %0, %1" : "=r"(a), "=r"(b) : : : error);
  return a + b;

error:
  return -1;
}


currently fails to compile in GCC with the following error:

<source>: In function 'vogon':
<source>:2:29: error: expected ':' before string constant
  2 |   asm goto("poetry %0, %1" : "=r"(a), "=r"(b) : : : error);
    |                         ^~~~~
    |                         :

   

ToT Clang matches GCC's behavior:


<source>:2:30: error: 'asm goto' cannot have output constraints
  asm goto("poetry %0, %1" : "=r"(a), "=r"(b) : : : error);


However, LLVM doesn't restrict control transfer instructions from having outputs (e.g. the invoke instruction). We propose changing LLVM's callbr instruction to allow return values, similar to how LLVM's implementation of inline assembly (via the call instruction) allows return values. Since there can potentially be zero to many output constraints, callbr would now return an aggregate which contains an element for each output constraint.  These values would then be extracted via extractvalue. With our proposal, the above C example will be converted to LLVM IR like this:


define i32 @vogon(i32 %a, i32 %b) {
entry:
  %0 = callbr { i32, i32 } asm sideeffect "poetry $0, $1", "=r,=r,X"
      (i8* blockaddress(@vogon, %error))
          to label %asm.fallthrough [label %error]


asm.fallthrough:
  %asmresult.a = extractvalue { i32, i32 } %0, 0
  %asmresult.b = extractvalue { i32, i32 } %0, 1
  %result = add i32 %asmresult.a, %asmresult.b
  ret i32 %result

error:
  ret i32 -1
}


Note that unlike the invoke instruction, callbr's return values are assumed valid on all branches. The assumption is that the programmer knows what their inline assembly is doing and where its output constraints are valid. If the value isn't valid on a particular branch but is used there anyway, then the result is a poison value. (Also, if a callbr's return values affect a branch, it will be handled similarly to the invoke instruction's implementation.) Here's an example of how this would work:


int vogon(int a, int b) {
  asm goto("poetry %0, %1" : "=r"(a), "=r"(b) : : : error);
  if (a == 42)
    return 42 * b;
  return a + b;

error:
  return b - 42;
}


generates the following LLVM IR:


define i32 @vogon(i32 %a, i32 %b) {
entry:
  %0 = callbr { i32, i32 } asm sideeffect "poetry $0, $1", "=r,=r,X"
      (i8* blockaddress(@vogon, %error))
          to label %asm.fallthrough [label %error]

asm.fallthrough:
  %asmresult.a = extractvalue { i32, i32 } %0, 0
  %tobool = icmp eq i32 %asmresult.a, 42
  br i1 %tobool, label %if.true, label %if.false 

if.true:
  %asmresult.b = extractvalue { i32, i32 } %0, 1
  %mul = mul i32 42, %asmresult.b
  ret i32 %mul

if.false:
  %asmresult.a.1 = extractvalue { i32, i32 } %0, 0
  %asmresult.b.1 = extractvalue { i32, i32 } %0, 1
  %result = add i32 %asmresult.a.1, %asmresult.b.1
  ret i32 %result

error:
  %asmresult.b.error = extractvalue { i32, i32 } %0, 1
  %error.result = sub i32 %asmresult.b.error, 42
  ret i32 %error.result
}

Implementation

Because LLVM's invoke instruction is a terminating instruction that may have return values, we can use it as a template for callbr's changes. The new functionality lies mostly in modifying Clang's front-end. In particular, we need to do the following:


  • Remove all error checks restricting asm goto from returning values, and

  • Generate the extractvalue instructions on callbr's branches.


LLVM's middle- and back-ends need to be audited to ensure there are no restrictions on callbr returning a value. We expect all passes to Just Work™ without modifications, but of course will be verified.


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] ASM Goto With Output Constraints

Nathan Ridge via cfe-dev
+ CBL mailing list


On Thu, Jun 27, 2019 at 11:08 AM Bill Wendling <[hidden email]> wrote:
[Adding the correct cfe-dev mailing list address.]

On Thu, Jun 27, 2019 at 11:06 AM Bill Wendling <[hidden email]> wrote:

Now that ASM goto support has landed, Nick Desaulniers and I wrote up a document describing how to expand clang's implementation of ASM goto to support output constraints. The work should be straight-forward, but as always will need to be verified to work. Below is a copy of our whitepaper. Please take a look and offer any comments you have.

Share and enjoy!
-bw

Overview

Support for asm goto with output constraints is a feature that the Linux community is interested in having. Adding this new feature should give Clang a higher profile in the Linux community:


  • It demonstrates the Clang community's commitment to supporting Linux.

  • Developers are likely to adopt it on their own, which means they will need to use Clang in some fashion, either as a complete replacement for or in addition to GCC.

Current state

Clang's implementation of asm goto converts this code:


int vogon(unsigned a, unsigned b) {
  asm goto("poetry %0, %1" : : "r"(a), "r"(b) : : error);
  return a + b;

error:
  return -1;
}


into the following LLVM IR:


define i32 @vogon(i32 %a, i32 %b) {
entry:
  callbr void asm sideeffect "poetry $0, $1", "r,r,X"
      (i32 %a, i32 %b, i8* blockaddress(@vogon, %return))
          to label %asm.fallthrough [label %return]

asm.fallthrough:
  %add = add i32 %b, %a
  br label %return

return:
  %retval.0 = phi i32 [ %add, %asm.fallthrough ], [ -1, %entry ]
  ret i32 %retval.0
}


Our proposal won't change LLVM's current behavior–i.e. a callbr without a return value will act in the same way as the current implementation.

Proposal

GCC restricts asm goto from having output constraints due to limitations in its internal representation–i.e. GCC's control transfer instructions cannot have outputs. For example:


int vogon(int a, int b) {
  asm goto("poetry %0, %1" : "=r"(a), "=r"(b) : : : error);
  return a + b;

error:
  return -1;
}


currently fails to compile in GCC with the following error:

<source>: In function 'vogon':
<source>:2:29: error: expected ':' before string constant
  2 |   asm goto("poetry %0, %1" : "=r"(a), "=r"(b) : : : error);
    |                         ^~~~~
    |                         :

   

ToT Clang matches GCC's behavior:


<source>:2:30: error: 'asm goto' cannot have output constraints
  asm goto("poetry %0, %1" : "=r"(a), "=r"(b) : : : error);


However, LLVM doesn't restrict control transfer instructions from having outputs (e.g. the invoke instruction). We propose changing LLVM's callbr instruction to allow return values, similar to how LLVM's implementation of inline assembly (via the call instruction) allows return values. Since there can potentially be zero to many output constraints, callbr would now return an aggregate which contains an element for each output constraint.  These values would then be extracted via extractvalue. With our proposal, the above C example will be converted to LLVM IR like this:


define i32 @vogon(i32 %a, i32 %b) {
entry:
  %0 = callbr { i32, i32 } asm sideeffect "poetry $0, $1", "=r,=r,X"
      (i8* blockaddress(@vogon, %error))
          to label %asm.fallthrough [label %error]


asm.fallthrough:
  %asmresult.a = extractvalue { i32, i32 } %0, 0
  %asmresult.b = extractvalue { i32, i32 } %0, 1
  %result = add i32 %asmresult.a, %asmresult.b
  ret i32 %result

error:
  ret i32 -1
}


Note that unlike the invoke instruction, callbr's return values are assumed valid on all branches. The assumption is that the programmer knows what their inline assembly is doing and where its output constraints are valid. If the value isn't valid on a particular branch but is used there anyway, then the result is a poison value. (Also, if a callbr's return values affect a branch, it will be handled similarly to the invoke instruction's implementation.) Here's an example of how this would work:


int vogon(int a, int b) {
  asm goto("poetry %0, %1" : "=r"(a), "=r"(b) : : : error);
  if (a == 42)
    return 42 * b;
  return a + b;

error:
  return b - 42;
}


generates the following LLVM IR:


define i32 @vogon(i32 %a, i32 %b) {
entry:
  %0 = callbr { i32, i32 } asm sideeffect "poetry $0, $1", "=r,=r,X"
      (i8* blockaddress(@vogon, %error))
          to label %asm.fallthrough [label %error]

asm.fallthrough:
  %asmresult.a = extractvalue { i32, i32 } %0, 0
  %tobool = icmp eq i32 %asmresult.a, 42
  br i1 %tobool, label %if.true, label %if.false 

if.true:
  %asmresult.b = extractvalue { i32, i32 } %0, 1
  %mul = mul i32 42, %asmresult.b
  ret i32 %mul

if.false:
  %asmresult.a.1 = extractvalue { i32, i32 } %0, 0
  %asmresult.b.1 = extractvalue { i32, i32 } %0, 1
  %result = add i32 %asmresult.a.1, %asmresult.b.1
  ret i32 %result

error:
  %asmresult.b.error = extractvalue { i32, i32 } %0, 1
  %error.result = sub i32 %asmresult.b.error, 42
  ret i32 %error.result
}

Implementation

Because LLVM's invoke instruction is a terminating instruction that may have return values, we can use it as a template for callbr's changes. The new functionality lies mostly in modifying Clang's front-end. In particular, we need to do the following:


  • Remove all error checks restricting asm goto from returning values, and

  • Generate the extractvalue instructions on callbr's branches.


LLVM's middle- and back-ends need to be audited to ensure there are no restrictions on callbr returning a value. We expect all passes to Just Work™ without modifications, but of course will be verified.



--
Thanks,
~Nick Desaulniers

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] ASM Goto With Output Constraints

Nathan Ridge via cfe-dev
What about SelectionDAG representation? Currently we expand callbr to INLINEASM_BR and BR. Both of which are terminators. But in order to support outputs we would need to put CopyFromReg nodes between them.

~Craig


On Thu, Jun 27, 2019 at 12:18 PM Nick Desaulniers via cfe-dev <[hidden email]> wrote:
+ CBL mailing list


On Thu, Jun 27, 2019 at 11:08 AM Bill Wendling <[hidden email]> wrote:
[Adding the correct cfe-dev mailing list address.]

On Thu, Jun 27, 2019 at 11:06 AM Bill Wendling <[hidden email]> wrote:

Now that ASM goto support has landed, Nick Desaulniers and I wrote up a document describing how to expand clang's implementation of ASM goto to support output constraints. The work should be straight-forward, but as always will need to be verified to work. Below is a copy of our whitepaper. Please take a look and offer any comments you have.

Share and enjoy!
-bw

Overview

Support for asm goto with output constraints is a feature that the Linux community is interested in having. Adding this new feature should give Clang a higher profile in the Linux community:


  • It demonstrates the Clang community's commitment to supporting Linux.

  • Developers are likely to adopt it on their own, which means they will need to use Clang in some fashion, either as a complete replacement for or in addition to GCC.

Current state

Clang's implementation of asm goto converts this code:


int vogon(unsigned a, unsigned b) {
  asm goto("poetry %0, %1" : : "r"(a), "r"(b) : : error);
  return a + b;

error:
  return -1;
}


into the following LLVM IR:


define i32 @vogon(i32 %a, i32 %b) {
entry:
  callbr void asm sideeffect "poetry $0, $1", "r,r,X"
      (i32 %a, i32 %b, i8* blockaddress(@vogon, %return))
          to label %asm.fallthrough [label %return]

asm.fallthrough:
  %add = add i32 %b, %a
  br label %return

return:
  %retval.0 = phi i32 [ %add, %asm.fallthrough ], [ -1, %entry ]
  ret i32 %retval.0
}


Our proposal won't change LLVM's current behavior–i.e. a callbr without a return value will act in the same way as the current implementation.

Proposal

GCC restricts asm goto from having output constraints due to limitations in its internal representation–i.e. GCC's control transfer instructions cannot have outputs. For example:


int vogon(int a, int b) {
  asm goto("poetry %0, %1" : "=r"(a), "=r"(b) : : : error);
  return a + b;

error:
  return -1;
}


currently fails to compile in GCC with the following error:

<source>: In function 'vogon':
<source>:2:29: error: expected ':' before string constant
  2 |   asm goto("poetry %0, %1" : "=r"(a), "=r"(b) : : : error);
    |                         ^~~~~
    |                         :

   

ToT Clang matches GCC's behavior:


<source>:2:30: error: 'asm goto' cannot have output constraints
  asm goto("poetry %0, %1" : "=r"(a), "=r"(b) : : : error);


However, LLVM doesn't restrict control transfer instructions from having outputs (e.g. the invoke instruction). We propose changing LLVM's callbr instruction to allow return values, similar to how LLVM's implementation of inline assembly (via the call instruction) allows return values. Since there can potentially be zero to many output constraints, callbr would now return an aggregate which contains an element for each output constraint.  These values would then be extracted via extractvalue. With our proposal, the above C example will be converted to LLVM IR like this:


define i32 @vogon(i32 %a, i32 %b) {
entry:
  %0 = callbr { i32, i32 } asm sideeffect "poetry $0, $1", "=r,=r,X"
      (i8* blockaddress(@vogon, %error))
          to label %asm.fallthrough [label %error]


asm.fallthrough:
  %asmresult.a = extractvalue { i32, i32 } %0, 0
  %asmresult.b = extractvalue { i32, i32 } %0, 1
  %result = add i32 %asmresult.a, %asmresult.b
  ret i32 %result

error:
  ret i32 -1
}


Note that unlike the invoke instruction, callbr's return values are assumed valid on all branches. The assumption is that the programmer knows what their inline assembly is doing and where its output constraints are valid. If the value isn't valid on a particular branch but is used there anyway, then the result is a poison value. (Also, if a callbr's return values affect a branch, it will be handled similarly to the invoke instruction's implementation.) Here's an example of how this would work:


int vogon(int a, int b) {
  asm goto("poetry %0, %1" : "=r"(a), "=r"(b) : : : error);
  if (a == 42)
    return 42 * b;
  return a + b;

error:
  return b - 42;
}


generates the following LLVM IR:


define i32 @vogon(i32 %a, i32 %b) {
entry:
  %0 = callbr { i32, i32 } asm sideeffect "poetry $0, $1", "=r,=r,X"
      (i8* blockaddress(@vogon, %error))
          to label %asm.fallthrough [label %error]

asm.fallthrough:
  %asmresult.a = extractvalue { i32, i32 } %0, 0
  %tobool = icmp eq i32 %asmresult.a, 42
  br i1 %tobool, label %if.true, label %if.false 

if.true:
  %asmresult.b = extractvalue { i32, i32 } %0, 1
  %mul = mul i32 42, %asmresult.b
  ret i32 %mul

if.false:
  %asmresult.a.1 = extractvalue { i32, i32 } %0, 0
  %asmresult.b.1 = extractvalue { i32, i32 } %0, 1
  %result = add i32 %asmresult.a.1, %asmresult.b.1
  ret i32 %result

error:
  %asmresult.b.error = extractvalue { i32, i32 } %0, 1
  %error.result = sub i32 %asmresult.b.error, 42
  ret i32 %error.result
}

Implementation

Because LLVM's invoke instruction is a terminating instruction that may have return values, we can use it as a template for callbr's changes. The new functionality lies mostly in modifying Clang's front-end. In particular, we need to do the following:


  • Remove all error checks restricting asm goto from returning values, and

  • Generate the extractvalue instructions on callbr's branches.


LLVM's middle- and back-ends need to be audited to ensure there are no restrictions on callbr returning a value. We expect all passes to Just Work™ without modifications, but of course will be verified.



--
Thanks,
~Nick Desaulniers
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] ASM Goto With Output Constraints

Nathan Ridge via cfe-dev
In reply to this post by Nathan Ridge via cfe-dev
On Thu, Jun 27, 2019 at 12:18 PM Nick Desaulniers via cfe-dev <[hidden email]> wrote:
+ CBL mailing list
On Thu, Jun 27, 2019 at 11:08 AM Bill Wendling <[hidden email]> wrote:
[Adding the correct cfe-dev mailing list address.]

On Thu, Jun 27, 2019 at 11:06 AM Bill Wendling <[hidden email]> wrote:

<source>:2:30: error: 'asm goto' cannot have output constraints
  asm goto("poetry %0, %1" : "=r"(a), "=r"(b) : : : error);


However, LLVM doesn't restrict control transfer instructions from having outputs (e.g. the invoke instruction). We propose changing LLVM's callbr instruction to allow return values, similar to how LLVM's implementation of inline assembly (via the call instruction) allows return values. Since there can potentially be zero to many output constraints, callbr would now return an aggregate which contains an element for each output constraint.  These values would then be extracted via extractvalue. With our proposal, the above C example will be converted to LLVM IR like this:


define i32 @vogon(i32 %a, i32 %b) {
entry:
  %0 = callbr { i32, i32 } asm sideeffect "poetry $0, $1", "=r,=r,X"
      (i8* blockaddress(@vogon, %error))
          to label %asm.fallthrough [label %error]


asm.fallthrough:
  %asmresult.a = extractvalue { i32, i32 } %0, 0
  %asmresult.b = extractvalue { i32, i32 } %0, 1
  %result = add i32 %asmresult.a, %asmresult.b
  ret i32 %result

error:
  ret i32 -1
}


Note that unlike the invoke instruction, callbr's return values are assumed valid on all branches. The assumption is that the programmer knows what their inline assembly is doing and where its output constraints are valid. If the value isn't valid on a particular branch but is used there anyway, then the result is a poison value. (Also, if a callbr's return values affect a branch, it will be handled similarly to the invoke instruction's implementation.) Here's an example of how this would work:


Generally, I'd prefer if we didn't keep designing new features that assume the programmer knows what they're doing. Personally, I had been considering reworking LLVM's Windows EH representation to eliminate the catchswith instruction, which just exists to multiplex invoke unwind edges to multiple catch blocks. Instead, we'd use callbr, and I had been assuming it would have the normal behavior of producing the return value only along the normal path.

Do you think landingpad offers alternative inspiration for how to handle this? i.e. you could have a special EHPad-like instruction (must be first non-PHI instruction) that produces a value along abnormal paths.

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] ASM Goto With Output Constraints

Nathan Ridge via cfe-dev
In reply to this post by Nathan Ridge via cfe-dev
On Thu, Jun 27, 2019 at 12:32 PM Craig Topper <[hidden email]> wrote:
What about SelectionDAG representation? Currently we expand callbr to INLINEASM_BR and BR. Both of which are terminators. But in order to support outputs we would need to put CopyFromReg nodes between them.

Is there a reason why callbr needs to be lowered to INLINEASM_BR and not a normal INLINEASM?

-bw

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] ASM Goto With Output Constraints

Nathan Ridge via cfe-dev
In reply to this post by Nathan Ridge via cfe-dev
I think this is fine, except that it stops at the point where things actually start to get interesting and tricky.

How will you actually handle the flow of values from the callbr into the error blocks? A callbr can specify requirements on where its outputs live. So, what if two callbr, in different branches of code, specify _different_ constraints for the same output, and list the same block as a possible error successor? How can the resulting phi be codegened?

It'd sure be a whole lot easier to not have the values valid on the secondary exit blocks. Can you present examples where preserving the values on the branches is be a requirement? (I feel like I've seen some before, but it'd be good to be reminded).

E.g., imagine code like this:

<<
entry:
  br i1 %cmp, label %true, label %false
true:
  %0 = callbr { i32, i32 } asm sideeffect "poetry $0, $1", "={r8},={r9},X" (i8* blockaddress(@vogon, %error)) to label %asm.fallthrough [label %error]
false:
  %1 = callbr { i32, i32 } asm sideeffect "poetry2 $0, $1", "={r10},={r11},X" (i8* blockaddress(@vogon, %error)) to label %asm.fallthrough [label %error]

error:
  %vals = phi { i32, i32 } [ %0, %true ], [ %1, %false ]
>>

Normally, if a common register cannot be found to use across relevant block transitions, it can simply fall back on storing values on the stack. But, that's not possible with callbr, since the location is fixed by the asm, and no code can be inserted after the values are written, before the branch (as both value writes and the branch are inside the asm blob). So what can be done, in that case?

One thing you might be able to do is to duplicate the error block so you have a different target for every callbr, but I'd consider that an invalid transform (because the address of the block is potentially being used as a value in the asm too).

Another thing you could perhaps do is reify the source-block-number as an actual value -- storing a "1" before the callbr in true, and storing a "2" before the callbr in "false". Then conditional-branch based on that...but that's real ugly...

On Thu, Jun 27, 2019 at 3:18 PM Nick Desaulniers via cfe-dev <[hidden email]> wrote:
+ CBL mailing list


On Thu, Jun 27, 2019 at 11:08 AM Bill Wendling <[hidden email]> wrote:
[Adding the correct cfe-dev mailing list address.]

On Thu, Jun 27, 2019 at 11:06 AM Bill Wendling <[hidden email]> wrote:

Now that ASM goto support has landed, Nick Desaulniers and I wrote up a document describing how to expand clang's implementation of ASM goto to support output constraints. The work should be straight-forward, but as always will need to be verified to work. Below is a copy of our whitepaper. Please take a look and offer any comments you have.

Share and enjoy!
-bw

Overview

Support for asm goto with output constraints is a feature that the Linux community is interested in having. Adding this new feature should give Clang a higher profile in the Linux community:


  • It demonstrates the Clang community's commitment to supporting Linux.

  • Developers are likely to adopt it on their own, which means they will need to use Clang in some fashion, either as a complete replacement for or in addition to GCC.

Current state

Clang's implementation of asm goto converts this code:


int vogon(unsigned a, unsigned b) {
  asm goto("poetry %0, %1" : : "r"(a), "r"(b) : : error);
  return a + b;

error:
  return -1;
}


into the following LLVM IR:


define i32 @vogon(i32 %a, i32 %b) {
entry:
  callbr void asm sideeffect "poetry $0, $1", "r,r,X"
      (i32 %a, i32 %b, i8* blockaddress(@vogon, %return))
          to label %asm.fallthrough [label %return]

asm.fallthrough:
  %add = add i32 %b, %a
  br label %return

return:
  %retval.0 = phi i32 [ %add, %asm.fallthrough ], [ -1, %entry ]
  ret i32 %retval.0
}


Our proposal won't change LLVM's current behavior–i.e. a callbr without a return value will act in the same way as the current implementation.

Proposal

GCC restricts asm goto from having output constraints due to limitations in its internal representation–i.e. GCC's control transfer instructions cannot have outputs. For example:


int vogon(int a, int b) {
  asm goto("poetry %0, %1" : "=r"(a), "=r"(b) : : : error);
  return a + b;

error:
  return -1;
}


currently fails to compile in GCC with the following error:

<source>: In function 'vogon':
<source>:2:29: error: expected ':' before string constant
  2 |   asm goto("poetry %0, %1" : "=r"(a), "=r"(b) : : : error);
    |                         ^~~~~
    |                         :

   

ToT Clang matches GCC's behavior:


<source>:2:30: error: 'asm goto' cannot have output constraints
  asm goto("poetry %0, %1" : "=r"(a), "=r"(b) : : : error);


However, LLVM doesn't restrict control transfer instructions from having outputs (e.g. the invoke instruction). We propose changing LLVM's callbr instruction to allow return values, similar to how LLVM's implementation of inline assembly (via the call instruction) allows return values. Since there can potentially be zero to many output constraints, callbr would now return an aggregate which contains an element for each output constraint.  These values would then be extracted via extractvalue. With our proposal, the above C example will be converted to LLVM IR like this:


define i32 @vogon(i32 %a, i32 %b) {
entry:
  %0 = callbr { i32, i32 } asm sideeffect "poetry $0, $1", "=r,=r,X"
      (i8* blockaddress(@vogon, %error))
          to label %asm.fallthrough [label %error]


asm.fallthrough:
  %asmresult.a = extractvalue { i32, i32 } %0, 0
  %asmresult.b = extractvalue { i32, i32 } %0, 1
  %result = add i32 %asmresult.a, %asmresult.b
  ret i32 %result

error:
  ret i32 -1
}


Note that unlike the invoke instruction, callbr's return values are assumed valid on all branches. The assumption is that the programmer knows what their inline assembly is doing and where its output constraints are valid. If the value isn't valid on a particular branch but is used there anyway, then the result is a poison value. (Also, if a callbr's return values affect a branch, it will be handled similarly to the invoke instruction's implementation.) Here's an example of how this would work:


int vogon(int a, int b) {
  asm goto("poetry %0, %1" : "=r"(a), "=r"(b) : : : error);
  if (a == 42)
    return 42 * b;
  return a + b;

error:
  return b - 42;
}


generates the following LLVM IR:


define i32 @vogon(i32 %a, i32 %b) {
entry:
  %0 = callbr { i32, i32 } asm sideeffect "poetry $0, $1", "=r,=r,X"
      (i8* blockaddress(@vogon, %error))
          to label %asm.fallthrough [label %error]

asm.fallthrough:
  %asmresult.a = extractvalue { i32, i32 } %0, 0
  %tobool = icmp eq i32 %asmresult.a, 42
  br i1 %tobool, label %if.true, label %if.false 

if.true:
  %asmresult.b = extractvalue { i32, i32 } %0, 1
  %mul = mul i32 42, %asmresult.b
  ret i32 %mul

if.false:
  %asmresult.a.1 = extractvalue { i32, i32 } %0, 0
  %asmresult.b.1 = extractvalue { i32, i32 } %0, 1
  %result = add i32 %asmresult.a.1, %asmresult.b.1
  ret i32 %result

error:
  %asmresult.b.error = extractvalue { i32, i32 } %0, 1
  %error.result = sub i32 %asmresult.b.error, 42
  ret i32 %error.result
}

Implementation

Because LLVM's invoke instruction is a terminating instruction that may have return values, we can use it as a template for callbr's changes. The new functionality lies mostly in modifying Clang's front-end. In particular, we need to do the following:


  • Remove all error checks restricting asm goto from returning values, and

  • Generate the extractvalue instructions on callbr's branches.


LLVM's middle- and back-ends need to be audited to ensure there are no restrictions on callbr returning a value. We expect all passes to Just Work™ without modifications, but of course will be verified.



--
Thanks,
~Nick Desaulniers
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] ASM Goto With Output Constraints

Nathan Ridge via cfe-dev
In reply to this post by Nathan Ridge via cfe-dev
On Thu, Jun 27, 2019 at 1:15 PM Reid Kleckner <[hidden email]> wrote:
On Thu, Jun 27, 2019 at 12:18 PM Nick Desaulniers via cfe-dev <[hidden email]> wrote:
+ CBL mailing list
On Thu, Jun 27, 2019 at 11:08 AM Bill Wendling <[hidden email]> wrote:
[Adding the correct cfe-dev mailing list address.]

On Thu, Jun 27, 2019 at 11:06 AM Bill Wendling <[hidden email]> wrote:

<source>:2:30: error: 'asm goto' cannot have output constraints
  asm goto("poetry %0, %1" : "=r"(a), "=r"(b) : : : error);


However, LLVM doesn't restrict control transfer instructions from having outputs (e.g. the invoke instruction). We propose changing LLVM's callbr instruction to allow return values, similar to how LLVM's implementation of inline assembly (via the call instruction) allows return values. Since there can potentially be zero to many output constraints, callbr would now return an aggregate which contains an element for each output constraint.  These values would then be extracted via extractvalue. With our proposal, the above C example will be converted to LLVM IR like this:


define i32 @vogon(i32 %a, i32 %b) {
entry:
  %0 = callbr { i32, i32 } asm sideeffect "poetry $0, $1", "=r,=r,X"
      (i8* blockaddress(@vogon, %error))
          to label %asm.fallthrough [label %error]


asm.fallthrough:
  %asmresult.a = extractvalue { i32, i32 } %0, 0
  %asmresult.b = extractvalue { i32, i32 } %0, 1
  %result = add i32 %asmresult.a, %asmresult.b
  ret i32 %result

error:
  ret i32 -1
}


Note that unlike the invoke instruction, callbr's return values are assumed valid on all branches. The assumption is that the programmer knows what their inline assembly is doing and where its output constraints are valid. If the value isn't valid on a particular branch but is used there anyway, then the result is a poison value. (Also, if a callbr's return values affect a branch, it will be handled similarly to the invoke instruction's implementation.) Here's an example of how this would work:


Generally, I'd prefer if we didn't keep designing new features that assume the programmer knows what they're doing. Personally, I had been considering reworking LLVM's Windows EH representation to eliminate the catchswith instruction, which just exists to multiplex invoke unwind edges to multiple catch blocks. Instead, we'd use callbr, and I had been assuming it would have the normal behavior of producing the return value only along the normal path.

Wouldn't it be unnecessarily restrictive though to limit the valid return values only to the normal edge? (This is more for the generality of the callbr instruction and not necessarily related to the initial inspiration for "asm goto".)
 
Do you think landingpad offers alternative inspiration for how to handle this? i.e. you could have a special EHPad-like instruction (must be first non-PHI instruction) that produces a value along abnormal paths.

I haven't touched EH stuff for awhile so things have probably changed. It's an intriguing notion and may help alleviate the issues James mentioned. Could you write some pseudo-IR to show more what you're thinking? 

-bw 

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] ASM Goto With Output Constraints

Nathan Ridge via cfe-dev
In reply to this post by Nathan Ridge via cfe-dev
This was Chandler's proposal after observing the number of places I had to update in MachinePasses to understand the control flow change happening in the middle of the basic block. He thought just making it a terminator would make it simpler.

There is some special casing of exception handling in MachineIR passes to make the control flow for invoke work. Look for isEHPad() or hasEHPadSuccessor()

~Craig


On Thu, Jun 27, 2019 at 1:23 PM Bill Wendling <[hidden email]> wrote:
On Thu, Jun 27, 2019 at 12:32 PM Craig Topper <[hidden email]> wrote:
What about SelectionDAG representation? Currently we expand callbr to INLINEASM_BR and BR. Both of which are terminators. But in order to support outputs we would need to put CopyFromReg nodes between them.

Is there a reason why callbr needs to be lowered to INLINEASM_BR and not a normal INLINEASM?

-bw

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] ASM Goto With Output Constraints

Nathan Ridge via cfe-dev
In reply to this post by Nathan Ridge via cfe-dev
On Thu, Jun 27, 2019 at 1:29 PM James Y Knight <[hidden email]> wrote:
I think this is fine, except that it stops at the point where things actually start to get interesting and tricky.

How will you actually handle the flow of values from the callbr into the error blocks? A callbr can specify requirements on where its outputs live. So, what if two callbr, in different branches of code, specify _different_ constraints for the same output, and list the same block as a possible error successor? How can the resulting phi be codegened?

This is where I fall back on the statement about how "the programmer knows what they're doing". Perhaps I'm being too cavalier here? My concern, if you want to call it that, is that we don't be too restrictive on the new behavior. For example, the "asm goto" may set a register to an error value (made up on the spot; may not be a common use). But, if there's no real reason to have the value be valid on the abnormal path, then sure we can declare that it's not valid on the abnormal path.

It'd sure be a whole lot easier to not have the values valid on the secondary exit blocks. Can you present examples where preserving the values on the branches is be a requirement? (I feel like I've seen some before, but it'd be good to be reminded).

E.g., imagine code like this:

<<
entry:
  br i1 %cmp, label %true, label %false
true:
  %0 = callbr { i32, i32 } asm sideeffect "poetry $0, $1", "={r8},={r9},X" (i8* blockaddress(@vogon, %error)) to label %asm.fallthrough [label %error]
false:
  %1 = callbr { i32, i32 } asm sideeffect "poetry2 $0, $1", "={r10},={r11},X" (i8* blockaddress(@vogon, %error)) to label %asm.fallthrough [label %error]

error:
  %vals = phi { i32, i32 } [ %0, %true ], [ %1, %false ]
>>

Normally, if a common register cannot be found to use across relevant block transitions, it can simply fall back on storing values on the stack. But, that's not possible with callbr, since the location is fixed by the asm, and no code can be inserted after the values are written, before the branch (as both value writes and the branch are inside the asm blob). So what can be done, in that case?

One thing you might be able to do is to duplicate the error block so you have a different target for every callbr, but I'd consider that an invalid transform (because the address of the block is potentially being used as a value in the asm too).

Another thing you could perhaps do is reify the source-block-number as an actual value -- storing a "1" before the callbr in true, and storing a "2" before the callbr in "false". Then conditional-branch based on that...but that's real ugly...

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] ASM Goto With Output Constraints

Nathan Ridge via cfe-dev
In reply to this post by Nathan Ridge via cfe-dev
I see. I can think of a couple of ways to handle this, but they fall into the hacky category. It may just be that we need to bite the bullet and update all of the places in the MachinePasses. :-( Could you point to the comments made by you and Chandler? I'd like to familiarize myself with them.

-bw

On Thu, Jun 27, 2019 at 1:34 PM Craig Topper <[hidden email]> wrote:
This was Chandler's proposal after observing the number of places I had to update in MachinePasses to understand the control flow change happening in the middle of the basic block. He thought just making it a terminator would make it simpler.

There is some special casing of exception handling in MachineIR passes to make the control flow for invoke work. Look for isEHPad() or hasEHPadSuccessor()

~Craig


On Thu, Jun 27, 2019 at 1:23 PM Bill Wendling <[hidden email]> wrote:
On Thu, Jun 27, 2019 at 12:32 PM Craig Topper <[hidden email]> wrote:
What about SelectionDAG representation? Currently we expand callbr to INLINEASM_BR and BR. Both of which are terminators. But in order to support outputs we would need to put CopyFromReg nodes between them.

Is there a reason why callbr needs to be lowered to INLINEASM_BR and not a normal INLINEASM?

-bw

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] ASM Goto With Output Constraints

Nathan Ridge via cfe-dev
In reply to this post by Nathan Ridge via cfe-dev


On 6/27/19 1:10 PM, Bill Wendling via cfe-dev wrote:

Now that ASM goto support has landed, Nick Desaulniers and I wrote up a document describing how to expand clang's implementation of ASM goto to support output constraints. The work should be straight-forward, but as always will need to be verified to work. Below is a copy of our whitepaper. Please take a look and offer any comments you have.


This all sounds fairly straightforward and removes an technically-unnecessary restriction to produce a more-general capability - LLVM terminators can have return values, and so we have no problem representing the underlying concept. There is no governing standard here, and we made a fairly invasive change to LLVM already to support this extension in the first place. We should leverage that work to make the extension as useful as possible.

 -Hal


Share and enjoy!
-bw

Overview

Support for asm goto with output constraints is a feature that the Linux community is interested in having. Adding this new feature should give Clang a higher profile in the Linux community:


  • It demonstrates the Clang community's commitment to supporting Linux.

  • Developers are likely to adopt it on their own, which means they will need to use Clang in some fashion, either as a complete replacement for or in addition to GCC.

Current state

Clang's implementation of asm goto converts this code:


int vogon(unsigned a, unsigned b) {   asm goto("poetry %0, %1" : : "r"(a), "r"(b) : : error);   return a + b; error:   return -1; }


into the following LLVM IR:


define i32 @vogon(i32 %a, i32 %b) { entry:   callbr void asm sideeffect "poetry $0, $1", "r,r,X"       (i32 %a, i32 %b, i8* blockaddress(@vogon, %return))           to label %asm.fallthrough [label %return] asm.fallthrough:   %add = add i32 %b, %a   br label %return return:   %retval.0 = phi i32 [ %add, %asm.fallthrough ], [ -1, %entry ]   ret i32 %retval.0 }


Our proposal won't change LLVM's current behavior–i.e. a callbr without a return value will act in the same way as the current implementation.

Proposal

GCC restricts asm goto from having output constraints due to limitations in its internal representation–i.e. GCC's control transfer instructions cannot have outputs. For example:


int vogon(int a, int b) {   asm goto("poetry %0, %1" : "=r"(a), "=r"(b) : : : error);   return a + b; error:   return -1; }


currently fails to compile in GCC with the following error:

<source>: In function 'vogon': <source>:2:29: error: expected ':' before string constant   2 |   asm goto("poetry %0, %1" : "=r"(a), "=r"(b) : : : error);     |                         ^~~~~     |                         :

   

ToT Clang matches GCC's behavior:


<source>:2:30: error: 'asm goto' cannot have output constraints   asm goto("poetry %0, %1" : "=r"(a), "=r"(b) : : : error);


However, LLVM doesn't restrict control transfer instructions from having outputs (e.g. the invoke instruction). We propose changing LLVM's callbr instruction to allow return values, similar to how LLVM's implementation of inline assembly (via the call instruction) allows return values. Since there can potentially be zero to many output constraints, callbr would now return an aggregate which contains an element for each output constraint.  These values would then be extracted via extractvalue. With our proposal, the above C example will be converted to LLVM IR like this:


define i32 @vogon(i32 %a, i32 %b) { entry:   %0 = callbr { i32, i32 } asm sideeffect "poetry $0, $1", "=r,=r,X"       (i8* blockaddress(@vogon, %error))           to label %asm.fallthrough [label %error]

asm.fallthrough:   %asmresult.a = extractvalue { i32, i32 } %0, 0   %asmresult.b = extractvalue { i32, i32 } %0, 1   %result = add i32 %asmresult.a, %asmresult.b   ret i32 %result error:   ret i32 -1 }


Note that unlike the invoke instruction, callbr's return values are assumed valid on all branches. The assumption is that the programmer knows what their inline assembly is doing and where its output constraints are valid. If the value isn't valid on a particular branch but is used there anyway, then the result is a poison value. (Also, if a callbr's return values affect a branch, it will be handled similarly to the invoke instruction's implementation.) Here's an example of how this would work:


int vogon(int a, int b) {   asm goto("poetry %0, %1" : "=r"(a), "=r"(b) : : : error);   if (a == 42)     return 42 * b;   return a + b; error:   return b - 42; }


generates the following LLVM IR:


define i32 @vogon(i32 %a, i32 %b) { entry:   %0 = callbr { i32, i32 } asm sideeffect "poetry $0, $1", "=r,=r,X"       (i8* blockaddress(@vogon, %error))           to label %asm.fallthrough [label %error] asm.fallthrough:   %asmresult.a = extractvalue { i32, i32 } %0, 0   %tobool = icmp eq i32 %asmresult.a, 42   br i1 %tobool, label %if.true, label %if.false  if.true:   %asmresult.b = extractvalue { i32, i32 } %0, 1   %mul = mul i32 42, %asmresult.b   ret i32 %mul if.false:   %asmresult.a.1 = extractvalue { i32, i32 } %0, 0   %asmresult.b.1 = extractvalue { i32, i32 } %0, 1   %result = add i32 %asmresult.a.1, %asmresult.b.1   ret i32 %result error:   %asmresult.b.error = extractvalue { i32, i32 } %0, 1   %error.result = sub i32 %asmresult.b.error, 42   ret i32 %error.result }

Implementation

Because LLVM's invoke instruction is a terminating instruction that may have return values, we can use it as a template for callbr's changes. The new functionality lies mostly in modifying Clang's front-end. In particular, we need to do the following:


  • Remove all error checks restricting asm goto from returning values, and

  • Generate the extractvalue instructions on callbr's branches.


LLVM's middle- and back-ends need to be audited to ensure there are no restrictions on callbr returning a value. We expect all passes to Just Work™ without modifications, but of course will be verified.


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] ASM Goto With Output Constraints

Nathan Ridge via cfe-dev
In reply to this post by Nathan Ridge via cfe-dev


On 6/27/19 2:31 PM, Craig Topper via llvm-dev wrote:
What about SelectionDAG representation? Currently we expand callbr to INLINEASM_BR and BR. Both of which are terminators. But in order to support outputs we would need to put CopyFromReg nodes between them.


Or maybe we should support having terminators that define values? People ask about this from time to time, and that seems like the higher-overall-value extension to make to the MI representation.

 -Hal



~Craig


...

--
Thanks,
~Nick Desaulniers
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] ASM Goto With Output Constraints

Nathan Ridge via cfe-dev
In reply to this post by Nathan Ridge via cfe-dev
I believe at least some portion of the INLINEASM_BR decision is discussed here  https://reviews.llvm.org/D53765?id=184024#inline-508610 Anything that's on record anywhere should be in that review. I may have had some conversations with Chandler on IRC, but I'm not sure.


~Craig


On Thu, Jun 27, 2019 at 2:53 PM Finkel, Hal J. <[hidden email]> wrote:


On 6/27/19 2:31 PM, Craig Topper via llvm-dev wrote:
What about SelectionDAG representation? Currently we expand callbr to INLINEASM_BR and BR. Both of which are terminators. But in order to support outputs we would need to put CopyFromReg nodes between them.


Or maybe we should support having terminators that define values? People ask about this from time to time, and that seems like the higher-overall-value extension to make to the MI representation.

 -Hal



~Craig


On Thu, Jun 27, 2019 at 12:18 PM Nick Desaulniers via cfe-dev <[hidden email]> wrote:
+ CBL mailing list


On Thu, Jun 27, 2019 at 11:08 AM Bill Wendling <[hidden email]> wrote:
[Adding the correct cfe-dev mailing list address.]

On Thu, Jun 27, 2019 at 11:06 AM Bill Wendling <[hidden email]> wrote:

Now that ASM goto support has landed, Nick Desaulniers and I wrote up a document describing how to expand clang's implementation of ASM goto to support output constraints. The work should be straight-forward, but as always will need to be verified to work. Below is a copy of our whitepaper. Please take a look and offer any comments you have.

Share and enjoy!
-bw

Overview

Support for asm goto with output constraints is a feature that the Linux community is interested in having. Adding this new feature should give Clang a higher profile in the Linux community:


  • It demonstrates the Clang community's commitment to supporting Linux.

  • Developers are likely to adopt it on their own, which means they will need to use Clang in some fashion, either as a complete replacement for or in addition to GCC.

Current state

Clang's implementation of asm goto converts this code:


int vogon(unsigned a, unsigned b) {   asm goto("poetry %0, %1" : : "r"(a), "r"(b) : : error);   return a + b; error:   return -1; }


into the following LLVM IR:


define i32 @vogon(i32 %a, i32 %b) { entry:   callbr void asm sideeffect "poetry $0, $1", "r,r,X"       (i32 %a, i32 %b, i8* blockaddress(@vogon, %return))           to label %asm.fallthrough [label %return] asm.fallthrough:   %add = add i32 %b, %a   br label %return return:   %retval.0 = phi i32 [ %add, %asm.fallthrough ], [ -1, %entry ]   ret i32 %retval.0 }


Our proposal won't change LLVM's current behavior–i.e. a callbr without a return value will act in the same way as the current implementation.

Proposal

GCC restricts asm goto from having output constraints due to limitations in its internal representation–i.e. GCC's control transfer instructions cannot have outputs. For example:


int vogon(int a, int b) {   asm goto("poetry %0, %1" : "=r"(a), "=r"(b) : : : error);   return a + b; error:   return -1; }


currently fails to compile in GCC with the following error:

<source>: In function 'vogon': <source>:2:29: error: expected ':' before string constant   2 |   asm goto("poetry %0, %1" : "=r"(a), "=r"(b) : : : error);     |                         ^~~~~     |                         :

   

ToT Clang matches GCC's behavior:


<source>:2:30: error: 'asm goto' cannot have output constraints   asm goto("poetry %0, %1" : "=r"(a), "=r"(b) : : : error);


However, LLVM doesn't restrict control transfer instructions from having outputs (e.g. the invoke instruction). We propose changing LLVM's callbr instruction to allow return values, similar to how LLVM's implementation of inline assembly (via the call instruction) allows return values. Since there can potentially be zero to many output constraints, callbr would now return an aggregate which contains an element for each output constraint.  These values would then be extracted via extractvalue. With our proposal, the above C example will be converted to LLVM IR like this:


define i32 @vogon(i32 %a, i32 %b) { entry:   %0 = callbr { i32, i32 } asm sideeffect "poetry $0, $1", "=r,=r,X"       (i8* blockaddress(@vogon, %error))           to label %asm.fallthrough [label %error]

asm.fallthrough:   %asmresult.a = extractvalue { i32, i32 } %0, 0   %asmresult.b = extractvalue { i32, i32 } %0, 1   %result = add i32 %asmresult.a, %asmresult.b   ret i32 %result error:   ret i32 -1 }


Note that unlike the invoke instruction, callbr's return values are assumed valid on all branches. The assumption is that the programmer knows what their inline assembly is doing and where its output constraints are valid. If the value isn't valid on a particular branch but is used there anyway, then the result is a poison value. (Also, if a callbr's return values affect a branch, it will be handled similarly to the invoke instruction's implementation.) Here's an example of how this would work:


int vogon(int a, int b) {   asm goto("poetry %0, %1" : "=r"(a), "=r"(b) : : : error);   if (a == 42)     return 42 * b;   return a + b; error:   return b - 42; }


generates the following LLVM IR:


define i32 @vogon(i32 %a, i32 %b) { entry:   %0 = callbr { i32, i32 } asm sideeffect "poetry $0, $1", "=r,=r,X"       (i8* blockaddress(@vogon, %error))           to label %asm.fallthrough [label %error] asm.fallthrough:   %asmresult.a = extractvalue { i32, i32 } %0, 0   %tobool = icmp eq i32 %asmresult.a, 42   br i1 %tobool, label %if.true, label %if.false  if.true:   %asmresult.b = extractvalue { i32, i32 } %0, 1   %mul = mul i32 42, %asmresult.b   ret i32 %mul if.false:   %asmresult.a.1 = extractvalue { i32, i32 } %0, 0   %asmresult.b.1 = extractvalue { i32, i32 } %0, 1   %result = add i32 %asmresult.a.1, %asmresult.b.1   ret i32 %result error:   %asmresult.b.error = extractvalue { i32, i32 } %0, 1   %error.result = sub i32 %asmresult.b.error, 42   ret i32 %error.result }

Implementation

Because LLVM's invoke instruction is a terminating instruction that may have return values, we can use it as a template for callbr's changes. The new functionality lies mostly in modifying Clang's front-end. In particular, we need to do the following:


  • Remove all error checks restricting asm goto from returning values, and

  • Generate the extractvalue instructions on callbr's branches.


LLVM's middle- and back-ends need to be audited to ensure there are no restrictions on callbr returning a value. We expect all passes to Just Work™ without modifications, but of course will be verified.



--
Thanks,
~Nick Desaulniers
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] ASM Goto With Output Constraints

Nathan Ridge via cfe-dev
In reply to this post by Nathan Ridge via cfe-dev
On Thu, Jun 27, 2019 at 1:44 PM Bill Wendling <[hidden email]> wrote:
On Thu, Jun 27, 2019 at 1:29 PM James Y Knight <[hidden email]> wrote:
I think this is fine, except that it stops at the point where things actually start to get interesting and tricky.

How will you actually handle the flow of values from the callbr into the error blocks? A callbr can specify requirements on where its outputs live. So, what if two callbr, in different branches of code, specify _different_ constraints for the same output, and list the same block as a possible error successor? How can the resulting phi be codegened?

This is where I fall back on the statement about how "the programmer knows what they're doing". Perhaps I'm being too cavalier here? My concern, if you want to call it that, is that we don't be too restrictive on the new behavior. For example, the "asm goto" may set a register to an error value (made up on the spot; may not be a common use). But, if there's no real reason to have the value be valid on the abnormal path, then sure we can declare that it's not valid on the abnormal path.

I think I should explain my "programmer knows what they're doing" statement a bit better. I'm specifically referring to inline asm here. The more general "callbr" case may still need to be considered (see Reid's reply).

When a programmer uses inline asm, they're implicitly telling the compiler that they *do* know what they're doing  (I know this is common knowledge, but I wanted to reiterate it.). In particular, either they need to reference an instruction not readily available from the compiler (e.g. "cpuid") or the compiler isn't able to give them the needed performance in a critical section. I'm extending this sentiment to callbr with output constraints. Let's take your example below and write it as "normal" asm statements one on each branch of an if-then-else (please ignore any syntax errors):

if:
  br i1 %cmp, label %true, label %false

true:
  %0 = call { i32, i32 } asm sideeffect "poetry $0, $1", "={r8},={r9}" ()
  br label %end

false:
  %1 = call { i32, i32 } asm sideeffect "poetry2 $0, $1", "={r10},={r11}" ()
  br label %end

end:
  %vals = phi { i32, i32 } [ %0, %true ], [ %1, %false ]

How is this handled in codegen? Is it an error or does the back-end handle it? Whatever's done today for "normal" inline asm is what I *think* should be the behavior for the inline asm callbr variant. If this doesn't seem sensible (and I realize that I may be thinking of an "in a perfect world" scenario), then we'll need to come up with a more sensible solution which may be to disallow the values on the error block until we can think of a better way to handle them.

-bw
 
It'd sure be a whole lot easier to not have the values valid on the secondary exit blocks. Can you present examples where preserving the values on the branches is be a requirement? (I feel like I've seen some before, but it'd be good to be reminded).

E.g., imagine code like this:

<<
entry:
  br i1 %cmp, label %true, label %false
true:
  %0 = callbr { i32, i32 } asm sideeffect "poetry $0, $1", "={r8},={r9},X" (i8* blockaddress(@vogon, %error)) to label %asm.fallthrough [label %error]
false:
  %1 = callbr { i32, i32 } asm sideeffect "poetry2 $0, $1", "={r10},={r11},X" (i8* blockaddress(@vogon, %error)) to label %asm.fallthrough [label %error]

error:
  %vals = phi { i32, i32 } [ %0, %true ], [ %1, %false ]
>>

Normally, if a common register cannot be found to use across relevant block transitions, it can simply fall back on storing values on the stack. But, that's not possible with callbr, since the location is fixed by the asm, and no code can be inserted after the values are written, before the branch (as both value writes and the branch are inside the asm blob). So what can be done, in that case?

One thing you might be able to do is to duplicate the error block so you have a different target for every callbr, but I'd consider that an invalid transform (because the address of the block is potentially being used as a value in the asm too).

Another thing you could perhaps do is reify the source-block-number as an actual value -- storing a "1" before the callbr in true, and storing a "2" before the callbr in "false". Then conditional-branch based on that...but that's real ugly...

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] ASM Goto With Output Constraints

Nathan Ridge via cfe-dev
On Fri, Jun 28, 2019 at 12:00 PM Bill Wendling via cfe-dev <[hidden email]> wrote:
I think I should explain my "programmer knows what they're doing" statement a bit better. I'm specifically referring to inline asm here. The more general "callbr" case may still need to be considered (see Reid's reply).

When a programmer uses inline asm, they're implicitly telling the compiler that they *do* know what they're doing  (I know this is common knowledge, but I wanted to reiterate it.). In particular, either they need to reference an instruction not readily available from the compiler (e.g. "cpuid") or the compiler isn't able to give them the needed performance in a critical section. I'm extending this sentiment to callbr with output constraints. Let's take your example below and write it as "normal" asm statements one on each branch of an if-then-else (please ignore any syntax errors):

if:
  br i1 %cmp, label %true, label %false

true:
  %0 = call { i32, i32 } asm sideeffect "poetry $0, $1", "={r8},={r9}" ()
  br label %end

false:
  %1 = call { i32, i32 } asm sideeffect "poetry2 $0, $1", "={r10},={r11}" ()
  br label %end

end:
  %vals = phi { i32, i32 } [ %0, %true ], [ %1, %false ]

How is this handled in codegen? Is it an error or does the back-end handle it? Whatever's done today for "normal" inline asm is what I *think* should be the behavior for the inline asm callbr variant. If this doesn't seem sensible (and I realize that I may be thinking of an "in a perfect world" scenario), then we'll need to come up with a more sensible solution which may be to disallow the values on the error block until we can think of a better way to handle them.

I guess distinguishing between callbr and asm goto is reasonable. We can tolerate optionally initialized outputs for inline asm. It's just the same as having an output constraint register that you forget to write in the asm blob. However, it would be good if callbr had some way to represent whether the returned value is alive along any particular outgoing edge.

I mentioned that we could look to landingpad for inspiration here. I mention it because it is, essentially, the alternate exceptional return value of a possibly throwing call. The values it produces are carried in the usual X86 return registers, RAX:RDX, so they really are kind of an alternate return value. However, with asm goto, it's not possible to have different output constraints along different edges, so after thinking about it some more, I think this is overkill. It's just one way we could implement that live value indication, and I think it's probably not as good as changing callbr itself.

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] ASM Goto With Output Constraints

Nathan Ridge via cfe-dev
In reply to this post by Nathan Ridge via cfe-dev


On Fri, Jun 28, 2019 at 3:00 PM Bill Wendling <[hidden email]> wrote:
On Thu, Jun 27, 2019 at 1:44 PM Bill Wendling <[hidden email]> wrote:
On Thu, Jun 27, 2019 at 1:29 PM James Y Knight <[hidden email]> wrote:
I think this is fine, except that it stops at the point where things actually start to get interesting and tricky.

How will you actually handle the flow of values from the callbr into the error blocks? A callbr can specify requirements on where its outputs live. So, what if two callbr, in different branches of code, specify _different_ constraints for the same output, and list the same block as a possible error successor? How can the resulting phi be codegened?

This is where I fall back on the statement about how "the programmer knows what they're doing". Perhaps I'm being too cavalier here? My concern, if you want to call it that, is that we don't be too restrictive on the new behavior. For example, the "asm goto" may set a register to an error value (made up on the spot; may not be a common use). But, if there's no real reason to have the value be valid on the abnormal path, then sure we can declare that it's not valid on the abnormal path.

I think I should explain my "programmer knows what they're doing" statement a bit better. I'm specifically referring to inline asm here. The more general "callbr" case may still need to be considered (see Reid's reply).

When a programmer uses inline asm, they're implicitly telling the compiler that they *do* know what they're doing  (I know this is common knowledge, but I wanted to reiterate it.). In particular, either they need to reference an instruction not readily available from the compiler (e.g. "cpuid") or the compiler isn't able to give them the needed performance in a critical section. I'm extending this sentiment to callbr with output constraints. Let's take your example below and write it as "normal" asm statements one on each branch of an if-then-else (please ignore any syntax errors):

if:
  br i1 %cmp, label %true, label %false

true:
  %0 = call { i32, i32 } asm sideeffect "poetry $0, $1", "={r8},={r9}" ()
  br label %end

false:
  %1 = call { i32, i32 } asm sideeffect "poetry2 $0, $1", "={r10},={r11}" ()
  br label %end

end:
  %vals = phi { i32, i32 } [ %0, %true ], [ %1, %false ]

How is this handled in codegen? Is it an error or does the back-end handle it? Whatever's done today for "normal" inline asm is what I *think* should be the behavior for the inline asm callbr variant. If this doesn't seem sensible (and I realize that I may be thinking of an "in a perfect world" scenario), then we'll need to come up with a more sensible solution which may be to disallow the values on the error block until we can think of a better way to handle them.

This example is no problem, because instructions can be emitted between what's emitted by "call asm" and the end of the block (be it a fallthrough, or a jump instruction. What gets emitted there is a move of the output register to another location -- either a register or to the stack. And therefore at the beginning of the "end" block, "%vals" is always in a consistent location, no matter how you got to that block.

But in the callbr case, there is not a location at which those moves can be emitted, after the callbr, before the jump to "error".

 
-bw
 
It'd sure be a whole lot easier to not have the values valid on the secondary exit blocks. Can you present examples where preserving the values on the branches is be a requirement? (I feel like I've seen some before, but it'd be good to be reminded).

E.g., imagine code like this:

<<
entry:
  br i1 %cmp, label %true, label %false
true:
  %0 = callbr { i32, i32 } asm sideeffect "poetry $0, $1", "={r8},={r9},X" (i8* blockaddress(@vogon, %error)) to label %asm.fallthrough [label %error]
false:
  %1 = callbr { i32, i32 } asm sideeffect "poetry2 $0, $1", "={r10},={r11},X" (i8* blockaddress(@vogon, %error)) to label %asm.fallthrough [label %error]

error:
  %vals = phi { i32, i32 } [ %0, %true ], [ %1, %false ]
>>

Normally, if a common register cannot be found to use across relevant block transitions, it can simply fall back on storing values on the stack. But, that's not possible with callbr, since the location is fixed by the asm, and no code can be inserted after the values are written, before the branch (as both value writes and the branch are inside the asm blob). So what can be done, in that case?

One thing you might be able to do is to duplicate the error block so you have a different target for every callbr, but I'd consider that an invalid transform (because the address of the block is potentially being used as a value in the asm too).

Another thing you could perhaps do is reify the source-block-number as an actual value -- storing a "1" before the callbr in true, and storing a "2" before the callbr in "false". Then conditional-branch based on that...but that's real ugly...

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] ASM Goto With Output Constraints

Nathan Ridge via cfe-dev
On Fri, Jun 28, 2019 at 1:48 PM James Y Knight <[hidden email]> wrote:
On Fri, Jun 28, 2019 at 3:00 PM Bill Wendling <[hidden email]> wrote:
On Thu, Jun 27, 2019 at 1:44 PM Bill Wendling <[hidden email]> wrote:
On Thu, Jun 27, 2019 at 1:29 PM James Y Knight <[hidden email]> wrote:
I think this is fine, except that it stops at the point where things actually start to get interesting and tricky.

How will you actually handle the flow of values from the callbr into the error blocks? A callbr can specify requirements on where its outputs live. So, what if two callbr, in different branches of code, specify _different_ constraints for the same output, and list the same block as a possible error successor? How can the resulting phi be codegened?

This is where I fall back on the statement about how "the programmer knows what they're doing". Perhaps I'm being too cavalier here? My concern, if you want to call it that, is that we don't be too restrictive on the new behavior. For example, the "asm goto" may set a register to an error value (made up on the spot; may not be a common use). But, if there's no real reason to have the value be valid on the abnormal path, then sure we can declare that it's not valid on the abnormal path.

I think I should explain my "programmer knows what they're doing" statement a bit better. I'm specifically referring to inline asm here. The more general "callbr" case may still need to be considered (see Reid's reply).

When a programmer uses inline asm, they're implicitly telling the compiler that they *do* know what they're doing  (I know this is common knowledge, but I wanted to reiterate it.). In particular, either they need to reference an instruction not readily available from the compiler (e.g. "cpuid") or the compiler isn't able to give them the needed performance in a critical section. I'm extending this sentiment to callbr with output constraints. Let's take your example below and write it as "normal" asm statements one on each branch of an if-then-else (please ignore any syntax errors):

if:
  br i1 %cmp, label %true, label %false

true:
  %0 = call { i32, i32 } asm sideeffect "poetry $0, $1", "={r8},={r9}" ()
  br label %end

false:
  %1 = call { i32, i32 } asm sideeffect "poetry2 $0, $1", "={r10},={r11}" ()
  br label %end

end:
  %vals = phi { i32, i32 } [ %0, %true ], [ %1, %false ]

How is this handled in codegen? Is it an error or does the back-end handle it? Whatever's done today for "normal" inline asm is what I *think* should be the behavior for the inline asm callbr variant. If this doesn't seem sensible (and I realize that I may be thinking of an "in a perfect world" scenario), then we'll need to come up with a more sensible solution which may be to disallow the values on the error block until we can think of a better way to handle them.

This example is no problem, because instructions can be emitted between what's emitted by "call asm" and the end of the block (be it a fallthrough, or a jump instruction. What gets emitted there is a move of the output register to another location -- either a register or to the stack. And therefore at the beginning of the "end" block, "%vals" is always in a consistent location, no matter how you got to that block.

But in the callbr case, there is not a location at which those moves can be emitted, after the callbr, before the jump to "error".

I see what you mean. Let's say we create a pseudo-instruction (similar to landingpad, et al) that needs to be lowered by the backend in a reasonable manner. The EH stuff has an external process/library that performs the actual unwinding and which sets the values accordingly. We won't have this. What we could do instead is split the edges and insert the copy-to-<where ever> statements there. So something like:

>>>

bb1:

  callbr ... [label %asm.goto.dest]


bb2:

  callbr ... [label %asm.goto.dest]


asm.goto.dest:

  ...

<<<


converted to something like:

>>>

bb1:

  callbr ... [label %asm.goto.dest.bb1]


bb2:

  callbr ... [label %asm.goto.dest.bb2]


asm.goto.dest.bb1:

  %v.bb1 = extractvalue ...

  br label %asm.goto.dest


asm.goto.dest.bb2:

  %v.bb2 = extractvalue ...

  br label %asm.goto.dest


asm.goto.dest:

  %v = phi [%v.bb1, label %asm.goto.dest.bb1], [%v.bb2, label %asm.goto.bb2]

  ...

  ...

<<<


It's not 100% not barfy, but it's what the compiler does in similar situations.

-bw

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] ASM Goto With Output Constraints

Nathan Ridge via cfe-dev


On Fri, Jun 28, 2019 at 5:53 PM Bill Wendling <[hidden email]> wrote:
On Fri, Jun 28, 2019 at 1:48 PM James Y Knight <[hidden email]> wrote:
On Fri, Jun 28, 2019 at 3:00 PM Bill Wendling <[hidden email]> wrote:
On Thu, Jun 27, 2019 at 1:44 PM Bill Wendling <[hidden email]> wrote:
On Thu, Jun 27, 2019 at 1:29 PM James Y Knight <[hidden email]> wrote:
I think this is fine, except that it stops at the point where things actually start to get interesting and tricky.

How will you actually handle the flow of values from the callbr into the error blocks? A callbr can specify requirements on where its outputs live. So, what if two callbr, in different branches of code, specify _different_ constraints for the same output, and list the same block as a possible error successor? How can the resulting phi be codegened?

This is where I fall back on the statement about how "the programmer knows what they're doing". Perhaps I'm being too cavalier here? My concern, if you want to call it that, is that we don't be too restrictive on the new behavior. For example, the "asm goto" may set a register to an error value (made up on the spot; may not be a common use). But, if there's no real reason to have the value be valid on the abnormal path, then sure we can declare that it's not valid on the abnormal path.

I think I should explain my "programmer knows what they're doing" statement a bit better. I'm specifically referring to inline asm here. The more general "callbr" case may still need to be considered (see Reid's reply).

When a programmer uses inline asm, they're implicitly telling the compiler that they *do* know what they're doing  (I know this is common knowledge, but I wanted to reiterate it.). In particular, either they need to reference an instruction not readily available from the compiler (e.g. "cpuid") or the compiler isn't able to give them the needed performance in a critical section. I'm extending this sentiment to callbr with output constraints. Let's take your example below and write it as "normal" asm statements one on each branch of an if-then-else (please ignore any syntax errors):

if:
  br i1 %cmp, label %true, label %false

true:
  %0 = call { i32, i32 } asm sideeffect "poetry $0, $1", "={r8},={r9}" ()
  br label %end

false:
  %1 = call { i32, i32 } asm sideeffect "poetry2 $0, $1", "={r10},={r11}" ()
  br label %end

end:
  %vals = phi { i32, i32 } [ %0, %true ], [ %1, %false ]

How is this handled in codegen? Is it an error or does the back-end handle it? Whatever's done today for "normal" inline asm is what I *think* should be the behavior for the inline asm callbr variant. If this doesn't seem sensible (and I realize that I may be thinking of an "in a perfect world" scenario), then we'll need to come up with a more sensible solution which may be to disallow the values on the error block until we can think of a better way to handle them.

This example is no problem, because instructions can be emitted between what's emitted by "call asm" and the end of the block (be it a fallthrough, or a jump instruction. What gets emitted there is a move of the output register to another location -- either a register or to the stack. And therefore at the beginning of the "end" block, "%vals" is always in a consistent location, no matter how you got to that block.

But in the callbr case, there is not a location at which those moves can be emitted, after the callbr, before the jump to "error".

I see what you mean. Let's say we create a pseudo-instruction (similar to landingpad, et al) that needs to be lowered by the backend in a reasonable manner. The EH stuff has an external process/library that performs the actual unwinding and which sets the values accordingly. We won't have this.

 
What we could do instead is split the edges and insert the copy-to-<where ever> statements there.

Exactly -- except that doing that is potentially an invalid transform, because the address is being used as a value, not simply a jump target. The label list is just a list of _possible_ jump targets, changing those won't actually affect anything. You'd instead need to change the blockaddress constant, but in the general case you don't know where that address came from -- (and it may therefore be required that you have the same address for two separate callbr instructions).

I guess this kinda touches on some of the same issues as in the other discussion about the handling of the blockaddress in callbr and inlining, etc...

I wonder if we could put some validity restrictions on the IR structure, rather than trying to fix things up after the fact by attempting to split blocks. E.g., we could state that it's invalid to have a phi which uses the value defined by a callbr, if it's conditioned on that same block as predecessor.  That is: it's valid to use _other_ values defined in the block ending in callbr, because they can be moved prior to the callbr. It's also valid to use the value defined by the callbr in a phi conditioned on some other intermediate block as predecessor, because then any required moves can happen in the intermediate block.

I believe such an IR restriction should be sufficient to make it possible to emit valid code from the IR in all cases, but I'm a bit afraid of how badly adding such odd edge-cases might screw up the rest of the compiler and optimizer.


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] ASM Goto With Output Constraints

Nathan Ridge via cfe-dev


On 6/28/19 5:35 PM, James Y Knight via llvm-dev wrote:


On Fri, Jun 28, 2019 at 5:53 PM Bill Wendling <[hidden email]> wrote:
On Fri, Jun 28, 2019 at 1:48 PM James Y Knight <[hidden email]> wrote:
On Fri, Jun 28, 2019 at 3:00 PM Bill Wendling <[hidden email]> wrote:
On Thu, Jun 27, 2019 at 1:44 PM Bill Wendling <[hidden email]> wrote:
On Thu, Jun 27, 2019 at 1:29 PM James Y Knight <[hidden email]> wrote:
I think this is fine, except that it stops at the point where things actually start to get interesting and tricky.

How will you actually handle the flow of values from the callbr into the error blocks? A callbr can specify requirements on where its outputs live. So, what if two callbr, in different branches of code, specify _different_ constraints for the same output, and list the same block as a possible error successor? How can the resulting phi be codegened?

This is where I fall back on the statement about how "the programmer knows what they're doing". Perhaps I'm being too cavalier here? My concern, if you want to call it that, is that we don't be too restrictive on the new behavior. For example, the "asm goto" may set a register to an error value (made up on the spot; may not be a common use). But, if there's no real reason to have the value be valid on the abnormal path, then sure we can declare that it's not valid on the abnormal path.

I think I should explain my "programmer knows what they're doing" statement a bit better. I'm specifically referring to inline asm here. The more general "callbr" case may still need to be considered (see Reid's reply).

When a programmer uses inline asm, they're implicitly telling the compiler that they *do* know what they're doing  (I know this is common knowledge, but I wanted to reiterate it.). In particular, either they need to reference an instruction not readily available from the compiler (e.g. "cpuid") or the compiler isn't able to give them the needed performance in a critical section. I'm extending this sentiment to callbr with output constraints. Let's take your example below and write it as "normal" asm statements one on each branch of an if-then-else (please ignore any syntax errors):

if:
  br i1 %cmp, label %true, label %false

true:
  %0 = call { i32, i32 } asm sideeffect "poetry $0, $1", "={r8},={r9}" ()
  br label %end

false:
  %1 = call { i32, i32 } asm sideeffect "poetry2 $0, $1", "={r10},={r11}" ()
  br label %end

end:
  %vals = phi { i32, i32 } [ %0, %true ], [ %1, %false ]

How is this handled in codegen? Is it an error or does the back-end handle it? Whatever's done today for "normal" inline asm is what I *think* should be the behavior for the inline asm callbr variant. If this doesn't seem sensible (and I realize that I may be thinking of an "in a perfect world" scenario), then we'll need to come up with a more sensible solution which may be to disallow the values on the error block until we can think of a better way to handle them.

This example is no problem, because instructions can be emitted between what's emitted by "call asm" and the end of the block (be it a fallthrough, or a jump instruction. What gets emitted there is a move of the output register to another location -- either a register or to the stack. And therefore at the beginning of the "end" block, "%vals" is always in a consistent location, no matter how you got to that block.

But in the callbr case, there is not a location at which those moves can be emitted, after the callbr, before the jump to "error".

I see what you mean. Let's say we create a pseudo-instruction (similar to landingpad, et al) that needs to be lowered by the backend in a reasonable manner. The EH stuff has an external process/library that performs the actual unwinding and which sets the values accordingly. We won't have this.

 
What we could do instead is split the edges and insert the copy-to-<where ever> statements there.

Exactly -- except that doing that is potentially an invalid transform, because the address is being used as a value, not simply a jump target. The label list is just a list of _possible_ jump targets, changing those won't actually affect anything. You'd instead need to change the blockaddress constant, but in the general case you don't know where that address came from -- (and it may therefore be required that you have the same address for two separate callbr instructions).

I guess this kinda touches on some of the same issues as in the other discussion about the handling of the blockaddress in callbr and inlining, etc...

I wonder if we could put some validity restrictions on the IR structure, rather than trying to fix things up after the fact by attempting to split blocks. E.g., we could state that it's invalid to have a phi which uses the value defined by a callbr, if it's conditioned on that same block as predecessor.  That is: it's valid to use _other_ values defined in the block ending in callbr, because they can be moved prior to the callbr. It's also valid to use the value defined by the callbr in a phi conditioned on some other intermediate block as predecessor, because then any required moves can happen in the intermediate block.

I believe such an IR restriction should be sufficient to make it possible to emit valid code from the IR in all cases, but I'm a bit afraid of how badly adding such odd edge-cases might screw up the rest of the compiler and optimizer.


I think that your fear is justified.

In any case, if we're going to support forming this kind of callbr in Clang, then Clang still needs a place to put the stack stores after the inline asm in order to represent the output constraints - which are specified in terms of source-level variables and those are always in stack locations when Clang is generating IR. I think that we can make all of this work if we say that the output constraints, and thus the outputs of the callbr, dominate only uses on the normal "fallthrough" branch. Then the compiler has a single place to put the stores (and, later, a place to put register copies, etc.).

 -Hal




_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] ASM Goto With Output Constraints

Nathan Ridge via cfe-dev
On Fri, Jun 28, 2019 at 5:39 PM Finkel, Hal J. <[hidden email]> wrote:

On 6/28/19 5:35 PM, James Y Knight via llvm-dev wrote:

On Fri, Jun 28, 2019 at 5:53 PM Bill Wendling <[hidden email]> wrote:
On Fri, Jun 28, 2019 at 1:48 PM James Y Knight <