Dumping AST information to other formats

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Dumping AST information to other formats

Oleg Smolsky via cfe-dev
Clang currently supports various -cc1 options that allow displaying
AST information (-ast-dump, -ast-print, -ast-list, etc), but these
options are not convenient to consume by third-party tools. GrammaTech
has ongoing research efforts where we would like to output some
information from the AST to a more open machine-consumable format
(such as JSON or s-expressions). We propose adding an optional output
format to the -ast-dump command allowing the user to select from
either the default or JSON formats. If the output format is not
explicitly specified, it will continue to default to the same textual
representation it uses today. e.g., clang -cc1 -ast-dump=json foo.c.
This feature is intended to output a safe subset of AST information
that is considered crucial rather than an implementation detail (like
the name of a NamedDecl object and the SourceRange for the name), so
the output is expected to be mostly stable between releases.

Once upon a time, there was -ast-print-xml. This -cc1 option was
dropped because it was frequently out of sync with the AST data. It is
right to ask: why would JSON, etc be any different? This is still an
open question, but a goal of this implementation will be to ensure
it's easier to maintain as the AST evolves. However, this feature is
intended to output a safe subset of AST information, so I don't think
this feature will require any more burden to support than -ast-dump
already requires (which is extremely limited). If AST information is
found to be missing from the output, it seems reasonable to have a
discussion as to whether it is stable information or an implementation
detail, so missing information is to be expected rather than concerned
by. That said, GrammaTech is able to commit to maintaining this code
for at least the next 1-2 years and possibly beyond as it useful
functionality for our research efforts.

I wanted to see if there were concerns or implementation ideas the
community wanted to share before beginning the implementation phase of
this feature.

~Aaron
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Dumping AST information to other formats

Oleg Smolsky via cfe-dev
On Tue, Nov 27, 2018 at 4:50 AM Stephen Kelly via cfe-commits
<[hidden email]> wrote:

>
> On 26/11/2018 19:20, Aaron Ballman via cfe-commits wrote:
> > Once upon a time, there was -ast-print-xml. This -cc1 option was
> > dropped because it was frequently out of sync with the AST data. It is
> > right to ask: why would JSON, etc be any different? This is still an
> > open question, but a goal of this implementation will be to ensure
> > it's easier to maintain as the AST evolves. However, this feature is
> > intended to output a safe subset of AST information, so I don't think
> > this feature will require any more burden to support than -ast-dump
> > already requires (which is extremely limited).
>
> > I wanted to see if there were concerns or implementation ideas the
> > community wanted to share before beginning the implementation phase of
> > this feature.
>
> Hi Aaron,
>
> As you know, I've already done some work in this area.
>
> I split the ASTDumper.cpp into multiple classes so that the traversal of
> the AST is separate to the printing of it to the output stream. You can
> see the proof of concept here:
>
>   https://github.com/steveire/clang/commits/extract-AST-dumping
>
> though it is not ready for a real review. I just extracted it to a
> branch for the purpose of this mailing list reply.
>
> In my branch there are two implementations of outputter - one detailed
> and one 'simplified' AST. You can see the difference here:
>
>   http://ec2-52-14-16-249.us-east-2.compute.amazonaws.com:10240/z/JuAvs8
>
> Because the traversal is in a separate class, it should be possible to
> port ASTMatchFinder.cpp to use it, which will eliminate the class of
> bugs that arise due to ASTMatchFinder.cpp currently using RAV. Here is
> one such bug:
>
>   https://bugs.llvm.org/show_bug.cgi?id=37629
>
> but there are others for example relating to getting to a
> CXXConstructorDecl from a CXXCtorInitializer, so it is a class of bug
> rather than a single bug.
>
> Because the traversal is in a separate class, you should be able to also
> implement it for different output formats without a significant
> maintenance burden.
>
> Using this approach, the output formats will not get out of sync and
> fall to the same fate as the XML output feature.
>
> Let me know if you're interested.

Thank you for passing this along -- it's actually somewhat aligned
with what I was envisioning. I very much like splitting out the
traversal and the printing mechanisms.

Would you like to be included on the review thread when I submit a patch?

~Aaron
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Dumping AST information to other formats

Oleg Smolsky via cfe-dev
Hi Aaron,

You might find useful the recent work we have done on stable identifiers for AST:
now Stmt and Decl classes have a “getID” method,
which returns an identifier stable across different runs (at least on the same architecture, probably not the same for different ones).

George

On Nov 27, 2018, at 6:22 AM, Aaron Ballman via cfe-dev <[hidden email]> wrote:

On Tue, Nov 27, 2018 at 4:50 AM Stephen Kelly via cfe-commits
<[hidden email]> wrote:

On 26/11/2018 19:20, Aaron Ballman via cfe-commits wrote:
Once upon a time, there was -ast-print-xml. This -cc1 option was
dropped because it was frequently out of sync with the AST data. It is
right to ask: why would JSON, etc be any different? This is still an
open question, but a goal of this implementation will be to ensure
it's easier to maintain as the AST evolves. However, this feature is
intended to output a safe subset of AST information, so I don't think
this feature will require any more burden to support than -ast-dump
already requires (which is extremely limited).

I wanted to see if there were concerns or implementation ideas the
community wanted to share before beginning the implementation phase of
this feature.

Hi Aaron,

As you know, I've already done some work in this area.

I split the ASTDumper.cpp into multiple classes so that the traversal of
the AST is separate to the printing of it to the output stream. You can
see the proof of concept here:

 https://github.com/steveire/clang/commits/extract-AST-dumping

though it is not ready for a real review. I just extracted it to a
branch for the purpose of this mailing list reply.

In my branch there are two implementations of outputter - one detailed
and one 'simplified' AST. You can see the difference here:

 http://ec2-52-14-16-249.us-east-2.compute.amazonaws.com:10240/z/JuAvs8

Because the traversal is in a separate class, it should be possible to
port ASTMatchFinder.cpp to use it, which will eliminate the class of
bugs that arise due to ASTMatchFinder.cpp currently using RAV. Here is
one such bug:

 https://bugs.llvm.org/show_bug.cgi?id=37629

but there are others for example relating to getting to a
CXXConstructorDecl from a CXXCtorInitializer, so it is a class of bug
rather than a single bug.

Because the traversal is in a separate class, you should be able to also
implement it for different output formats without a significant
maintenance burden.

Using this approach, the output formats will not get out of sync and
fall to the same fate as the XML output feature.

Let me know if you're interested.

Thank you for passing this along -- it's actually somewhat aligned
with what I was envisioning. I very much like splitting out the
traversal and the printing mechanisms.

Would you like to be included on the review thread when I submit a patch?

~Aaron
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev