C++ AST Question

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

C++ AST Question

Chris Lattner
I'm trying to get this simple function to compile to decent code.  Lack of it is causing -O0 codegen to be monsterous:

struct DeclGroup {
  unsigned NumDecls;
};

int foo(DeclGroup D);
void bar(DeclGroup *D) {
  foo(*D);
}

The current issue that I'm fighting is that we get an extra pointless temporary and memcpy, which llvm-gcc isn't generating:

$ clang t.cc -S -o - -emit-llvm
...
define void @_Z3barP9DeclGroup(%struct.DeclGroup* %D) ssp {
entry:
  %D.addr = alloca %struct.DeclGroup*, align 8    ; <%struct.DeclGroup**> [#uses=2]
  %agg.tmp = alloca %struct.DeclGroup, align 4    ; <%struct.DeclGroup*> [#uses=2]
  store %struct.DeclGroup* %D, %struct.DeclGroup** %D.addr
  %tmp = load %struct.DeclGroup** %D.addr         ; <%struct.DeclGroup*> [#uses=1]
  %tmp1 = bitcast %struct.DeclGroup* %agg.tmp to i8* ; <i8*> [#uses=1]
  %tmp2 = bitcast %struct.DeclGroup* %tmp to i8*  ; <i8*> [#uses=1]
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %tmp1, i8* %tmp2, i64 4, i32 4, i1 false)
  %coerce.dive = getelementptr %struct.DeclGroup* %agg.tmp, i32 0, i32 0 ; <i32*> [#uses=1]
  %0 = load i32* %coerce.dive                     ; <i32> [#uses=1]
  %coerce.val.ii = zext i32 %0 to i64             ; <i64> [#uses=1]
  %call = call i32 @_Z3foo9DeclGroup(i64 %coerce.val.ii) ; <i32> [#uses=0]
  ret void
}

After diving into it, the memcpy is being made by CodeGenFunction::EmitCXXConstructorCall, so I came up with this patch:




However, though it is a cleanup, it doesn't help this case.  I think the problem here is actually the AST, which looks like this:

void bar(DeclGroup *D) (CompoundStmt 0x103a19080 <t.cc:7:24, line:9:1>
  (CallExpr 0x103a19040 <line:8:3, col:9> 'int'
    (ImplicitCastExpr 0x103a19000 <col:3> 'int (*)(struct DeclGroup)' <FunctionToPointerDecay>
      (DeclRefExpr 0x103a18fa0 <col:3> 'int (struct DeclGroup)' FunctionDecl='foo' 0x103a18d20))
    (CXXConstructExpr 0x103a191f0 <col:7, col:8> 'struct DeclGroup''void (struct DeclGroup const &)'
      (ImplicitCastExpr 0x103a191b0 <col:7, col:8> 'struct DeclGroup const' <NoOp> lvalue
        (UnaryOperator 0x103a18f70 <col:7, col:8> 'struct DeclGroup' prefix '*'
          (DeclRefExpr 0x103a18f40 <col:8> 'struct DeclGroup *' ParmVar='D' 0x103a18da0))))))

I don't see why the CXXConstructExpr exists at all in this case.  Is this a C++ Sema bug?  Also, is the patch above useful?

-Chris



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

cxxctorcall.patch (767 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: C++ AST Question

Eli Friedman
On Sun, Jun 27, 2010 at 12:44 AM, Chris Lattner <[hidden email]> wrote:

> I'm trying to get this simple function to compile to decent code.  Lack of it is causing -O0 codegen to be monsterous:
>
> struct DeclGroup {
>  unsigned NumDecls;
> };
>
> int foo(DeclGroup D);
> void bar(DeclGroup *D) {
>  foo(*D);
> }
>
> The current issue that I'm fighting is that we get an extra pointless temporary and memcpy, which llvm-gcc isn't generating:
>
> $ clang t.cc -S -o - -emit-llvm
> ...
> define void @_Z3barP9DeclGroup(%struct.DeclGroup* %D) ssp {
> entry:
>  %D.addr = alloca %struct.DeclGroup*, align 8    ; <%struct.DeclGroup**> [#uses=2]
>  %agg.tmp = alloca %struct.DeclGroup, align 4    ; <%struct.DeclGroup*> [#uses=2]
>  store %struct.DeclGroup* %D, %struct.DeclGroup** %D.addr
>  %tmp = load %struct.DeclGroup** %D.addr         ; <%struct.DeclGroup*> [#uses=1]
>  %tmp1 = bitcast %struct.DeclGroup* %agg.tmp to i8* ; <i8*> [#uses=1]
>  %tmp2 = bitcast %struct.DeclGroup* %tmp to i8*  ; <i8*> [#uses=1]
>  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %tmp1, i8* %tmp2, i64 4, i32 4, i1 false)
>  %coerce.dive = getelementptr %struct.DeclGroup* %agg.tmp, i32 0, i32 0 ; <i32*> [#uses=1]
>  %0 = load i32* %coerce.dive                     ; <i32> [#uses=1]
>  %coerce.val.ii = zext i32 %0 to i64             ; <i64> [#uses=1]
>  %call = call i32 @_Z3foo9DeclGroup(i64 %coerce.val.ii) ; <i32> [#uses=0]
>  ret void
> }
>
> After diving into it, the memcpy is being made by CodeGenFunction::EmitCXXConstructorCall, so I came up with this patch:

Stop right there; the behavior here is exactly the same in C mode,
where the CXXConstructorCall clearly doesn't exist.  You're not going
to get anywhere examining it...

Take a look at what happens in CodeGenFunction::EmitCall (the version
in CGExpr.cpp).  The copy already exists after EmitCallArgs runs:
since it isn't aware that the ABI is going to pass the struct by
value, and you can't just pass the pointer D into foo(), it
immediately make a copy to ensure correct behavior.

-Eli

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: C++ AST Question

Chris Lattner
On Jun 27, 2010, at 1:36 AM, Eli Friedman wrote:
>> After diving into it, the memcpy is being made by CodeGenFunction::EmitCXXConstructorCall, so I came up with this patch:
>
> Stop right there; the behavior here is exactly the same in C mode,
> where the CXXConstructorCall clearly doesn't exist.  You're not going
> to get anywhere examining it...

Good point, that occurred to me in the middle of the night :)

> Take a look at what happens in CodeGenFunction::EmitCall (the version
> in CGExpr.cpp).  The copy already exists after EmitCallArgs runs:
> since it isn't aware that the ABI is going to pass the struct by
> value, and you can't just pass the pointer D into foo(), it
> immediately make a copy to ensure correct behavior.

Ok, right.  Specifically, we end up down in the bowels of:

RValue CodeGenFunction::EmitCallArg(const Expr *E, QualType ArgType) {
  if (ArgType->isReferenceType())
    return EmitReferenceBindingToExpr(E, /*InitializedDecl=*/0);

  return EmitAnyExprToTemp(E);
}

EmitAnyExprToTemp(E) is emitting the extra copy.  The fix is to emit aggregates as l-values, but this won't work for the CallArgList data structure that Daniel has set up.  Daniel, what do you think is the right approach here?  Here's a simple C example:

struct DeclGroup {
  long NumDecls;
  long Y;
};

long foo(struct DeclGroup D);
void bar(struct DeclGroup *D) {
  foo(*D);
}

Compiles to:

define void @bar(%struct.DeclGroup* %D) {
entry:
  %D.addr = alloca %struct.DeclGroup*, align 8    ; <%struct.DeclGroup**> [#uses=2]
  %agg.tmp = alloca %struct.DeclGroup, align 8    ; <%struct.DeclGroup*> [#uses=1]
  %allocapt = bitcast i32 undef to i32            ; <i32> [#uses=0]
  store %struct.DeclGroup* %D, %struct.DeclGroup** %D.addr
  %tmp = load %struct.DeclGroup** %D.addr         ; <%struct.DeclGroup*> [#uses=1]
  %tmp1 = bitcast %struct.DeclGroup* %agg.tmp to i8* ; <i8*> [#uses=1]
  %tmp2 = bitcast %struct.DeclGroup* %tmp to i8*  ; <i8*> [#uses=1]
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %tmp1, i8* %tmp2, i64 16, i32 8, i1 false)
}

I want to kill off the extra temporary + memcpy.

-Chris
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev