Matching Clang's AST nodes to the LLVM IR instructions they produced.

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Matching Clang's AST nodes to the LLVM IR instructions they produced.

Matthew Heinsen Egan
Hi all,

I need to be able to determine, from an LLVM IR instruction, the
specific node in the AST that produced that instruction. For example,
if I had a C program containing the following:

  // int a, b, c
  (a * b) + c

Which produced the following:

  %0 = load i32* %a.addr, align 4
  %1 = load i32* %b.addr, align 4
  %mul = mul nsw i32 %0, %1
  %2 = load i32* %c.addr, align 4
  %add = add nsw i32 %mul, %2

I would like to be able to determine that %mul represented the value
of (a * b). The IR will be unoptimized, and I'm only interested in C
language programs at this stage.

What is the best way to approach this? I was thinking of modifying
codegen to add some metadata that mapped back to the AST (similar to
EmitDeclMetadata), but I imagine that having to maintain the
modifications would cause difficulty with keeping up-to-date with
Clang. Ideally I would like to be able to use some sort of plugin, or
extend codegen from outside of Clang's source tree, so that I can
avoid these issues and use a pure version of Clang. Any pointers on
how to achieve this would be greatly appreciated.

Thanks for you time,
Matthew
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Matching Clang's AST nodes to the LLVM IR instructions they produced.

John McCall
On Jan 16, 2012, at 6:30 PM, Matthew Heinsen Egan wrote:

> I need to be able to determine, from an LLVM IR instruction, the
> specific node in the AST that produced that instruction. For example,
> if I had a C program containing the following:
>
>  // int a, b, c
>  (a * b) + c
>
> Which produced the following:
>
>  %0 = load i32* %a.addr, align 4
>  %1 = load i32* %b.addr, align 4
>  %mul = mul nsw i32 %0, %1
>  %2 = load i32* %c.addr, align 4
>  %add = add nsw i32 %mul, %2
>
> I would like to be able to determine that %mul represented the value
> of (a * b). The IR will be unoptimized, and I'm only interested in C
> language programs at this stage.
>
> What is the best way to approach this? I was thinking of modifying
> codegen to add some metadata that mapped back to the AST (similar to
> EmitDeclMetadata), but I imagine that having to maintain the
> modifications would cause difficulty with keeping up-to-date with
> Clang. Ideally I would like to be able to use some sort of plugin, or
> extend codegen from outside of Clang's source tree, so that I can
> avoid these issues and use a pure version of Clang. Any pointers on
> how to achieve this would be greatly appreciated.

There is no way to do this without modifying the Clang source.

The main problem is going to be deciding what the current expression
is, but I think that's quite achievable.  IR-gen uses recursive descent;
all you really need to do is maintain a stack of what expressions are
currently being emitted.  That's relatively easily done with RAII
objects in all the major "dispatch" functions that switch over all
possible expression kinds.  That should be a pretty modest number
of modifications:
  - IRGenFunction::EmitLValue
  - ScalarExprEmitter::Visit
  - ComplexExprEmitter::Visit
  - AggExprEmitter::Visit
  - probably somewhere in CGStmt.cpp
  - maybe a few other random places like the short-circuit evaluator

Once you've got that, it's relatively easy to slap custom metadata
on every new instruction as it's inserted;  just change the
CGBuilderTy typedef to make IRBuilder use a custom inserter class.

John.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev