CFG simplification question, and preservation of branching in the original code

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

CFG simplification question, and preservation of branching in the original code

Kristof Beyls via cfe-dev
Hi all,

For my custom architecture, I want to relax the CFG simplification pass, and any other passes replacing conditional branches.

I found that the replacement of conditional branches by “select" and other instructions is often too aggressive, and this causes inefficient code for my target as in most cases branches would be cheaper.

For example, considering the following c code:

long test (long a, long b)
{
  int neg = 0;
  long res;

  if (a < 0
  {
    a = -a; 
    neg = 1;
  }

  res = a*b;

  if (neg)
    res = -res;

  return res;
}


This code can be simplified in c, but it’s just an example to show the point.

The code above gets compiled like this (-Oz flag):

; Function Attrs: minsize norecurse nounwind optsize readnone
define dso_local i32 @test(i32 %a, i32 %b) local_unnamed_addr #0 {
entry:
  %cmp = icmp slt i32 %a, 0
  %sub = sub nsw i32 0, %a
  %a.addr.0 = select i1 %cmp, i32 %sub, i32 %a
  %mul = mul nsw i32 %a.addr.0, %b
  %sub2 = sub nsw i32 0, %mul
  %res.0 = select i1 %cmp, i32 %sub2, i32 %mul
  ret i32 %res.0
}


All branching was removed and replaced by ‘select’ instructions. For my architecture, it would be desirable to keep the original branches in most cases, because even simple 32 bit operations are too expensive to speculatively execute them, and branches are cheap.

Setting  'phi-node-folding-threshold’ to 1 or even 0 (instead of the default 2), definitely improves the situation in many cases, but Clang still creates many instances of ‘select’ instructions, which are detrimental to my target. I am unsure about where are they created, as I believe that the simplifycfg pass does not longer create them. 

So the question is: Are there any other hooks in clang, or custom code that I can implement, to relax the creation of ’select’ instructions and make it preserve branches in the original c code?

Thanks,

John




_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: CFG simplification question, and preservation of branching in the original code

Kristof Beyls via cfe-dev
On Sat, Sep 21, 2019 at 3:18 PM Joan Lluch via cfe-dev
<[hidden email]> wrote:

>
> Hi all,
>
> For my custom architecture, I want to relax the CFG simplification pass, and any other passes replacing conditional branches.
>
> I found that the replacement of conditional branches by “select" and other instructions is often too aggressive, and this causes inefficient code for my target as in most cases branches would be cheaper.
>
> For example, considering the following c code:
>
> long test (long a, long b)
> {
>   int neg = 0;
>   long res;
>
>   if (a < 0)
>   {
>     a = -a;
>     neg = 1;
>   }
>
>   res = a*b;
>
>   if (neg)
>     res = -res;
>
>   return res;
> }
>
>
> This code can be simplified in c, but it’s just an example to show the point.
>
> The code above gets compiled like this (-Oz flag):
>
> ; Function Attrs: minsize norecurse nounwind optsize readnone
> define dso_local i32 @test(i32 %a, i32 %b) local_unnamed_addr #0 {
> entry:
>   %cmp = icmp slt i32 %a, 0
>   %sub = sub nsw i32 0, %a
>   %a.addr.0 = select i1 %cmp, i32 %sub, i32 %a
>   %mul = mul nsw i32 %a.addr.0, %b
>   %sub2 = sub nsw i32 0, %mul
>   %res.0 = select i1 %cmp, i32 %sub2, i32 %mul
>   ret i32 %res.0
> }
>
>
> All branching was removed and replaced by ‘select’ instructions. For my architecture, it would be desirable to keep the original branches in most cases, because even simple 32 bit operations are too expensive to speculatively execute them, and branches are cheap.
>
> Setting  'phi-node-folding-threshold’ to 1 or even 0 (instead of the default 2), definitely improves the situation in many cases, but Clang still creates many instances of ‘select’ instructions, which are detrimental to my target. I am unsure about where are they created, as I believe that the simplifycfg pass does not longer create them.
You definitively can't ban llvm passes/clang from creating select's.

> So the question is: Are there any other hooks in clang, or custom code that I can implement, to relax the creation of ’select’ instructions and make it preserve branches in the original c code?
I think this is backwards.
Sure, you could maybe disable most of the folds that produce selects.
That may be good for final codegen, but will also affect other passes
since not everything deals with 2-node PHI as good as wit selects.

But, what happens if you still get the select-y IR?
Doesn't matter how, could be hand-written.

I think you might want to instead aggressively convert selects into
branches in backend.

> Thanks,
>
> John
Roman

> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: CFG simplification question, and preservation of branching in the original code

Kristof Beyls via cfe-dev
Actually, this should go to llvm-dev.

On Sat, Sep 21, 2019 at 3:48 PM Roman Lebedev <[hidden email]> wrote:

>
> On Sat, Sep 21, 2019 at 3:18 PM Joan Lluch via cfe-dev
> <[hidden email]> wrote:
> >
> > Hi all,
> >
> > For my custom architecture, I want to relax the CFG simplification pass, and any other passes replacing conditional branches.
> >
> > I found that the replacement of conditional branches by “select" and other instructions is often too aggressive, and this causes inefficient code for my target as in most cases branches would be cheaper.
> >
> > For example, considering the following c code:
> >
> > long test (long a, long b)
> > {
> >   int neg = 0;
> >   long res;
> >
> >   if (a < 0)
> >   {
> >     a = -a;
> >     neg = 1;
> >   }
> >
> >   res = a*b;
> >
> >   if (neg)
> >     res = -res;
> >
> >   return res;
> > }
> >
> >
> > This code can be simplified in c, but it’s just an example to show the point.
> >
> > The code above gets compiled like this (-Oz flag):
> >
> > ; Function Attrs: minsize norecurse nounwind optsize readnone
> > define dso_local i32 @test(i32 %a, i32 %b) local_unnamed_addr #0 {
> > entry:
> >   %cmp = icmp slt i32 %a, 0
> >   %sub = sub nsw i32 0, %a
> >   %a.addr.0 = select i1 %cmp, i32 %sub, i32 %a
> >   %mul = mul nsw i32 %a.addr.0, %b
> >   %sub2 = sub nsw i32 0, %mul
> >   %res.0 = select i1 %cmp, i32 %sub2, i32 %mul
> >   ret i32 %res.0
> > }
> >
> >
> > All branching was removed and replaced by ‘select’ instructions. For my architecture, it would be desirable to keep the original branches in most cases, because even simple 32 bit operations are too expensive to speculatively execute them, and branches are cheap.
> >
> > Setting  'phi-node-folding-threshold’ to 1 or even 0 (instead of the default 2), definitely improves the situation in many cases, but Clang still creates many instances of ‘select’ instructions, which are detrimental to my target. I am unsure about where are they created, as I believe that the simplifycfg pass does not longer create them.
> You definitively can't ban llvm passes/clang from creating select's.
>
> > So the question is: Are there any other hooks in clang, or custom code that I can implement, to relax the creation of ’select’ instructions and make it preserve branches in the original c code?
> I think this is backwards.
> Sure, you could maybe disable most of the folds that produce selects.
> That may be good for final codegen, but will also affect other passes
> since not everything deals with 2-node PHI as good as wit selects.
>
> But, what happens if you still get the select-y IR?
> Doesn't matter how, could be hand-written.
>
> I think you might want to instead aggressively convert selects into
> branches in backend.
>
> > Thanks,
> >
> > John
> Roman
>
> > _______________________________________________
> > LLVM Developers mailing list
> > [hidden email]
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > _______________________________________________
> > cfe-dev mailing list
> > [hidden email]
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: CFG simplification question, and preservation of branching in the original code

Kristof Beyls via cfe-dev
In reply to this post by Kristof Beyls via cfe-dev
Hi Roman,

Thank you for your reply. I understand your point. I just want to add something to clarify my original post in relation to your reply.

There are already implemented 8-bit and 16-bit backends, namely the AVR and the MSP430, which already "aggressively convert selects into branches”, which already benefit (as they are) from setting "phi-node-folding-threshold’ to 1 or zero. This is because otherwise Clang will generate several selects depending on the same “icmp”. These backends are unable to optimise that, and they just create a comparison and a conditional branch for every “select” in the IR code, in spite that the original C code was already written in a much better way. So the resulting effect is the presence of redundant comparisons and branches in the final code, with a detrimental of generated code quality.

The above gets improved by setting "phi-node-folding-threshold’ to 1 because some of these extra ‘selects' are no longer there so the backend stops generating redundant code.

John.




> On 21 Sep 2019, at 14:48, Roman Lebedev <[hidden email]> wrote:
>
> On Sat, Sep 21, 2019 at 3:18 PM Joan Lluch via cfe-dev
> <[hidden email]> wrote:
>>
>> Hi all,
>>
>> For my custom architecture, I want to relax the CFG simplification pass, and any other passes replacing conditional branches.
>>
>> I found that the replacement of conditional branches by “select" and other instructions is often too aggressive, and this causes inefficient code for my target as in most cases branches would be cheaper.
>>
>> For example, considering the following c code:
>>
>> long test (long a, long b)
>> {
>>  int neg = 0;
>>  long res;
>>
>>  if (a < 0)
>>  {
>>    a = -a;
>>    neg = 1;
>>  }
>>
>>  res = a*b;
>>
>>  if (neg)
>>    res = -res;
>>
>>  return res;
>> }
>>
>>
>> This code can be simplified in c, but it’s just an example to show the point.
>>
>> The code above gets compiled like this (-Oz flag):
>>
>> ; Function Attrs: minsize norecurse nounwind optsize readnone
>> define dso_local i32 @test(i32 %a, i32 %b) local_unnamed_addr #0 {
>> entry:
>>  %cmp = icmp slt i32 %a, 0
>>  %sub = sub nsw i32 0, %a
>>  %a.addr.0 = select i1 %cmp, i32 %sub, i32 %a
>>  %mul = mul nsw i32 %a.addr.0, %b
>>  %sub2 = sub nsw i32 0, %mul
>>  %res.0 = select i1 %cmp, i32 %sub2, i32 %mul
>>  ret i32 %res.0
>> }
>>
>>
>> All branching was removed and replaced by ‘select’ instructions. For my architecture, it would be desirable to keep the original branches in most cases, because even simple 32 bit operations are too expensive to speculatively execute them, and branches are cheap.
>>
>> Setting  'phi-node-folding-threshold’ to 1 or even 0 (instead of the default 2), definitely improves the situation in many cases, but Clang still creates many instances of ‘select’ instructions, which are detrimental to my target. I am unsure about where are they created, as I believe that the simplifycfg pass does not longer create them.
> You definitively can't ban llvm passes/clang from creating select's.
>
>> So the question is: Are there any other hooks in clang, or custom code that I can implement, to relax the creation of ’select’ instructions and make it preserve branches in the original c code?
> I think this is backwards.
> Sure, you could maybe disable most of the folds that produce selects.
> That may be good for final codegen, but will also affect other passes
> since not everything deals with 2-node PHI as good as wit selects.
>
> But, what happens if you still get the select-y IR?
> Doesn't matter how, could be hand-written.
>
> I think you might want to instead aggressively convert selects into
> branches in backend.
>
>> Thanks,
>>
>> John
> Roman
>
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> _______________________________________________
>> cfe-dev mailing list
>> [hidden email]
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: CFG simplification question, and preservation of branching in the original code

Kristof Beyls via cfe-dev
There is code in CodeGenPrepare.cpp that can turn selects into branches that tries to account for multiple selects sharing the same condition. It doesn't look like either AVR or MSP430 enable that code though.

~Craig


On Tue, Sep 24, 2019 at 11:27 PM Joan Lluch via cfe-dev <[hidden email]> wrote:
Hi Roman,

Thank you for your reply. I understand your point. I just want to add something to clarify my original post in relation to your reply.

There are already implemented 8-bit and 16-bit backends, namely the AVR and the MSP430, which already "aggressively convert selects into branches”, which already benefit (as they are) from setting "phi-node-folding-threshold’ to 1 or zero. This is because otherwise Clang will generate several selects depending on the same “icmp”. These backends are unable to optimise that, and they just create a comparison and a conditional branch for every “select” in the IR code, in spite that the original C code was already written in a much better way. So the resulting effect is the presence of redundant comparisons and branches in the final code, with a detrimental of generated code quality.

The above gets improved by setting "phi-node-folding-threshold’ to 1 because some of these extra ‘selects' are no longer there so the backend stops generating redundant code.

John.




> On 21 Sep 2019, at 14:48, Roman Lebedev <[hidden email]> wrote:
>
> On Sat, Sep 21, 2019 at 3:18 PM Joan Lluch via cfe-dev
> <[hidden email]> wrote:
>>
>> Hi all,
>>
>> For my custom architecture, I want to relax the CFG simplification pass, and any other passes replacing conditional branches.
>>
>> I found that the replacement of conditional branches by “select" and other instructions is often too aggressive, and this causes inefficient code for my target as in most cases branches would be cheaper.
>>
>> For example, considering the following c code:
>>
>> long test (long a, long b)
>> {
>>  int neg = 0;
>>  long res;
>>
>>  if (a < 0)
>>  {
>>    a = -a;
>>    neg = 1;
>>  }
>>
>>  res = a*b;
>>
>>  if (neg)
>>    res = -res;
>>
>>  return res;
>> }
>>
>>
>> This code can be simplified in c, but it’s just an example to show the point.
>>
>> The code above gets compiled like this (-Oz flag):
>>
>> ; Function Attrs: minsize norecurse nounwind optsize readnone
>> define dso_local i32 @test(i32 %a, i32 %b) local_unnamed_addr #0 {
>> entry:
>>  %cmp = icmp slt i32 %a, 0
>>  %sub = sub nsw i32 0, %a
>>  %a.addr.0 = select i1 %cmp, i32 %sub, i32 %a
>>  %mul = mul nsw i32 %a.addr.0, %b
>>  %sub2 = sub nsw i32 0, %mul
>>  %res.0 = select i1 %cmp, i32 %sub2, i32 %mul
>>  ret i32 %res.0
>> }
>>
>>
>> All branching was removed and replaced by ‘select’ instructions. For my architecture, it would be desirable to keep the original branches in most cases, because even simple 32 bit operations are too expensive to speculatively execute them, and branches are cheap.
>>
>> Setting  'phi-node-folding-threshold’ to 1 or even 0 (instead of the default 2), definitely improves the situation in many cases, but Clang still creates many instances of ‘select’ instructions, which are detrimental to my target. I am unsure about where are they created, as I believe that the simplifycfg pass does not longer create them.
> You definitively can't ban llvm passes/clang from creating select's.
>
>> So the question is: Are there any other hooks in clang, or custom code that I can implement, to relax the creation of ’select’ instructions and make it preserve branches in the original c code?
> I think this is backwards.
> Sure, you could maybe disable most of the folds that produce selects.
> That may be good for final codegen, but will also affect other passes
> since not everything deals with 2-node PHI as good as wit selects.
>
> But, what happens if you still get the select-y IR?
> Doesn't matter how, could be hand-written.
>
> I think you might want to instead aggressively convert selects into
> branches in backend.
>
>> Thanks,
>>
>> John
> Roman
>
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> _______________________________________________
>> cfe-dev mailing list
>> [hidden email]
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: CFG simplification question, and preservation of branching in the original code

Kristof Beyls via cfe-dev
Hi Craig,

Thank you for your reply. I have started looking at “CodeGenPrepare” and I assume you reffer to CodeGenPrepare::optimizeSelectInst. I will try to play a bit with that possibly later today. At first glance, it looks to me that for targets that do not support ’select’ at all, the fact that the function exits early for ‘OptSize’ can be detrimental, because this will just leave ALL existing selects in the code anyway. As said, I will try to play with that later, but right now it looks to me that maybe we should check  for TLI->isSelectSupported earlier in the function, to get some more opportunities to such targets without explicit ’select’ support? 

Thanks 

John


On 25 Sep 2019, at 08:59, Craig Topper <[hidden email]> wrote:

There is code in CodeGenPrepare.cpp that can turn selects into branches that tries to account for multiple selects sharing the same condition. It doesn't look like either AVR or MSP430 enable that code though.

~Craig


On Tue, Sep 24, 2019 at 11:27 PM Joan Lluch via cfe-dev <[hidden email]> wrote:
Hi Roman,

Thank you for your reply. I understand your point. I just want to add something to clarify my original post in relation to your reply.

There are already implemented 8-bit and 16-bit backends, namely the AVR and the MSP430, which already "aggressively convert selects into branches”, which already benefit (as they are) from setting "phi-node-folding-threshold’ to 1 or zero. This is because otherwise Clang will generate several selects depending on the same “icmp”. These backends are unable to optimise that, and they just create a comparison and a conditional branch for every “select” in the IR code, in spite that the original C code was already written in a much better way. So the resulting effect is the presence of redundant comparisons and branches in the final code, with a detrimental of generated code quality.

The above gets improved by setting "phi-node-folding-threshold’ to 1 because some of these extra ‘selects' are no longer there so the backend stops generating redundant code.

John.




> On 21 Sep 2019, at 14:48, Roman Lebedev <[hidden email]> wrote:
>
> On Sat, Sep 21, 2019 at 3:18 PM Joan Lluch via cfe-dev
> <[hidden email]> wrote:
>>
>> Hi all,
>>
>> For my custom architecture, I want to relax the CFG simplification pass, and any other passes replacing conditional branches.
>>
>> I found that the replacement of conditional branches by “select" and other instructions is often too aggressive, and this causes inefficient code for my target as in most cases branches would be cheaper.
>>
>> For example, considering the following c code:
>>
>> long test (long a, long b)
>> {
>>  int neg = 0;
>>  long res;
>>
>>  if (a < 0)
>>  {
>>    a = -a;
>>    neg = 1;
>>  }
>>
>>  res = a*b;
>>
>>  if (neg)
>>    res = -res;
>>
>>  return res;
>> }
>>
>>
>> This code can be simplified in c, but it’s just an example to show the point.
>>
>> The code above gets compiled like this (-Oz flag):
>>
>> ; Function Attrs: minsize norecurse nounwind optsize readnone
>> define dso_local i32 @test(i32 %a, i32 %b) local_unnamed_addr #0 {
>> entry:
>>  %cmp = icmp slt i32 %a, 0
>>  %sub = sub nsw i32 0, %a
>>  %a.addr.0 = select i1 %cmp, i32 %sub, i32 %a
>>  %mul = mul nsw i32 %a.addr.0, %b
>>  %sub2 = sub nsw i32 0, %mul
>>  %res.0 = select i1 %cmp, i32 %sub2, i32 %mul
>>  ret i32 %res.0
>> }
>>
>>
>> All branching was removed and replaced by ‘select’ instructions. For my architecture, it would be desirable to keep the original branches in most cases, because even simple 32 bit operations are too expensive to speculatively execute them, and branches are cheap.
>>
>> Setting  'phi-node-folding-threshold’ to 1 or even 0 (instead of the default 2), definitely improves the situation in many cases, but Clang still creates many instances of ‘select’ instructions, which are detrimental to my target. I am unsure about where are they created, as I believe that the simplifycfg pass does not longer create them.
> You definitively can't ban llvm passes/clang from creating select's.
>
>> So the question is: Are there any other hooks in clang, or custom code that I can implement, to relax the creation of ’select’ instructions and make it preserve branches in the original c code?
> I think this is backwards.
> Sure, you could maybe disable most of the folds that produce selects.
> That may be good for final codegen, but will also affect other passes
> since not everything deals with 2-node PHI as good as wit selects.
>
> But, what happens if you still get the select-y IR?
> Doesn't matter how, could be hand-written.
>
> I think you might want to instead aggressively convert selects into
> branches in backend.
>
>> Thanks,
>>
>> John
> Roman
>
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> _______________________________________________
>> cfe-dev mailing list
>> [hidden email]
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev