Quantcast

AST Representation

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

AST Representation

Philip Reames via cfe-dev
As the AST is not really a Tree as it seems to have circular references, working with the AST is sometimes a bit messy (eg. cloning).

A while ago Stroustroup pointed me to Gabriel Dos Reis' work on a different approach to represent C++-AST: https://github.com/GabrielDosReis/ipr

Did anyone try to integrate his work into clang or has an opinion to share ?

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: AST Representation

Philip Reames via cfe-dev
Yes, the clang "abstract syntax tree" is often jokingly referred to as the "concrete syntax graph". We try to provide a generally useful representation, but being a fast production C++ compiler comes first. Clang's AST is very concrete. You have to know a lot about it to navigate it. There is no "Node" base class that you can use as a cursor to navigate around Decls, Exprs, Types, and TemplateArguments. Template instantiation, the closest thing I can think of to cloning, is done relatively manually with TreeTransform.

I think it would be nice to revisit the design of clang's AST to simplify it, normalize it, and abstract it, but it is not a task to be taken lightly, and I don't expect it to happen in the near future.

On Wed, Feb 1, 2017 at 8:44 AM, Gaetano Checinski via cfe-dev <[hidden email]> wrote:
As the AST is not really a Tree as it seems to have circular references, working with the AST is sometimes a bit messy (eg. cloning).

A while ago Stroustroup pointed me to Gabriel Dos Reis' work on a different approach to represent C++-AST: https://github.com/GabrielDosReis/ipr

Did anyone try to integrate his work into clang or has an opinion to share ?

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: AST Representation

Philip Reames via cfe-dev
I haven't looked too deeply into it, but from talking to various clang developers, the common theme is disbelieve that a high-level C++ AST can be created that is both useful and not overly specific to a single compiler.

From the paper referenced in the project, I see multiple things that make it seem not interesting to me from a point of refactoring and semantic analysis:
- The IPR does not handle macros before their expansion in the preprocessor.
- does not mimic C++ language irregularities; general rules are used, rather than long lists of special cases
- Unfortunately, a program cannot fully automate the generation of “skeletons.” If our aim is portability, we still need to (by hand) eliminate non-standard additions to the contents of header file.

Generally, I would not trust a representation that I can't generate code from to be correct enough for tools.

On Wed, Feb 1, 2017 at 8:32 PM Reid Kleckner via cfe-dev <[hidden email]> wrote:
Yes, the clang "abstract syntax tree" is often jokingly referred to as the "concrete syntax graph". We try to provide a generally useful representation, but being a fast production C++ compiler comes first. Clang's AST is very concrete. You have to know a lot about it to navigate it. There is no "Node" base class that you can use as a cursor to navigate around Decls, Exprs, Types, and TemplateArguments. Template instantiation, the closest thing I can think of to cloning, is done relatively manually with TreeTransform.

I think it would be nice to revisit the design of clang's AST to simplify it, normalize it, and abstract it, but it is not a task to be taken lightly, and I don't expect it to happen in the near future.

On Wed, Feb 1, 2017 at 8:44 AM, Gaetano Checinski via cfe-dev <[hidden email]> wrote:
As the AST is not really a Tree as it seems to have circular references, working with the AST is sometimes a bit messy (eg. cloning).

A while ago Stroustroup pointed me to Gabriel Dos Reis' work on a different approach to represent C++-AST: https://github.com/GabrielDosReis/ipr

Did anyone try to integrate his work into clang or has an opinion to share ?

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: AST Representation

Philip Reames via cfe-dev
Your comment is very surprising to me. 
The points you make sound to me like advantages not disadvantages:

C++ AST can be created that is both useful and not overly specific to a single compiler.
Maybe, but what more important is an AST you can build, analyse and transform easily.

The IPR does not handle macros [...]
Why should it ? - what would it be good for, besides having source-locations ? - how is this currently handled in clang ?
 As far as i know clang's AST has no notion of macros.


I imagine it could be very handy to have multiple ASTs:
One having only nodes for Preprocessor (ppAST) constructs and one for C++ (cppAST).
The Preprocessor could process ppAST and return cppAST alongside with a sourcemap.
We could even go a step further and have separate ASTs for C++ with and without templates.
This would make the codebase more functional and composable.

To get a proper source-location we then just need to follow the sourcemaps.

Dealing with real (immutable) ASTs  would make (de-)serialization, cloning and comparing an easy task.
As far as i can see, the preprocessor is a big liability:
The preprocessor is stateful and macros can transform the sourcefile in almost any imaginable way.

As far as i can see the c++-parser could be paralyzed. 
However this is only possible if the parser is decoupled from the preprocessor.


> - does not mimic C++ language irregularities; general rules are used, rather than long lists of special cases
We still can write an validator to be sure the AST conforms to a specific c++-standard.
I think that having a nice and regular AST would simplify working with the AST. 
Writing visitors and pattern matchers for semantic analysis might become easier.  



Generally, I would not trust a representation that I can't generate code from to be correct enough for tools.
Well, my goal would definitely be to generate code from the AST. If information are missing in the representation then we need to improve it.
IMHO the main message of the paper is that there is that you can build an AST representation that is more regular and still complete.


- Unfortunately, a program cannot fully automate the generation of “skeletons.” If our aim is portability, we still need to (by hand) eliminate non-standard additions to the contents of header file.

can you elaborate? - what do you mean by "skeletons" and  to which header files are you referring to?

2017-02-06 15:45 GMT+00:00 Manuel Klimek <[hidden email]>:
I haven't looked too deeply into it, but from talking to various clang developers, the common theme is disbelieve that a high-level C++ AST can be created that is both useful and not overly specific to a single compiler.

From the paper referenced in the project, I see multiple things that make it seem not interesting to me from a point of refactoring and semantic analysis:
- The IPR does not handle macros before their expansion in the preprocessor.
- does not mimic C++ language irregularities; general rules are used, rather than long lists of special cases
- Unfortunately, a program cannot fully automate the generation of “skeletons.” If our aim is portability, we still need to (by hand) eliminate non-standard additions to the contents of header file.

Generally, I would not trust a representation that I can't generate code from to be correct enough for tools.

On Wed, Feb 1, 2017 at 8:32 PM Reid Kleckner via cfe-dev <[hidden email]> wrote:
Yes, the clang "abstract syntax tree" is often jokingly referred to as the "concrete syntax graph". We try to provide a generally useful representation, but being a fast production C++ compiler comes first. Clang's AST is very concrete. You have to know a lot about it to navigate it. There is no "Node" base class that you can use as a cursor to navigate around Decls, Exprs, Types, and TemplateArguments. Template instantiation, the closest thing I can think of to cloning, is done relatively manually with TreeTransform.

I think it would be nice to revisit the design of clang's AST to simplify it, normalize it, and abstract it, but it is not a task to be taken lightly, and I don't expect it to happen in the near future.

On Wed, Feb 1, 2017 at 8:44 AM, Gaetano Checinski via cfe-dev <[hidden email]> wrote:
As the AST is not really a Tree as it seems to have circular references, working with the AST is sometimes a bit messy (eg. cloning).

A while ago Stroustroup pointed me to Gabriel Dos Reis' work on a different approach to represent C++-AST: https://github.com/GabrielDosReis/ipr

Did anyone try to integrate his work into clang or has an opinion to share ?

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev




Sent with Mailtrack

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: AST Representation

Philip Reames via cfe-dev


On Mon, Feb 6, 2017 at 6:18 PM Gaetano Checinski <[hidden email]> wrote:
Your comment is very surprising to me. 
The points you make sound to me like advantages not disadvantages:

C++ AST can be created that is both useful and not overly specific to a single compiler.
Maybe, but what more important is an AST you can build, analyse and transform easily.

Well, "useful" seems like the most important part to me :)
 
The IPR does not handle macros [...]
Why should it ? - what would it be good for, besides having source-locations ? - how is this currently handled in clang ?
 As far as i know clang's AST has no notion of macros.

Having full information about what source locations each part of the AST was generated from would be enough; that's what clang provides. Transformations for C++ code need to take that into account, though.
 
I imagine it could be very handy to have multiple ASTs:
One having only nodes for Preprocessor (ppAST) constructs and one for C++ (cppAST).
The Preprocessor could process ppAST and return cppAST alongside with a sourcemap.
We could even go a step further and have separate ASTs for C++ with and without templates.
This would make the codebase more functional and composable.

To get a proper source-location we then just need to follow the sourcemaps.

Dealing with real (immutable) ASTs  would make (de-)serialization, cloning and comparing an easy task.
As far as i can see, the preprocessor is a big liability:
The preprocessor is stateful and macros can transform the sourcefile in almost any imaginable way.

As far as i can see the c++-parser could be paralyzed. 
However this is only possible if the parser is decoupled from the preprocessor.

Preprocessing is fast; parallelization is made hard by C++ (the language), not by the preprocessor.
 
> - does not mimic C++ language irregularities; general rules are used, rather than long lists of special cases
We still can write an validator to be sure the AST conforms to a specific c++-standard.
I think that having a nice and regular AST would simplify working with the AST. 
Writing visitors and pattern matchers for semantic analysis might become easier.  

Yes, the question is what you do when you want to match some arcane detail of the language, and the information is not there?
 
Generally, I would not trust a representation that I can't generate code from to be correct enough for tools.
Well, my goal would definitely be to generate code from the AST. If information are missing in the representation then we need to improve it.
IMHO the main message of the paper is that there is that you can build an AST representation that is more regular and still complete.

So, if somebody builds it and maintains it that's great (I'd find that very cool), but it sound like a rather large project :)
 
- Unfortunately, a program cannot fully automate the generation of “skeletons.” If our aim is portability, we still need to (by hand) eliminate non-standard additions to the contents of header file.

can you elaborate? - what do you mean by "skeletons" and  to which header files are you referring to?

It means that the authors admit that their representation cannot be generated for system headers, so manual work to model system headers is necessary. This is problematic at scale, unless I'm misunderstanding something.

Don't get me wrong, I'd love it if it was possible to create a simplified AST for C++ that's still useful (-> efficient to generate, memory efficient, has all necessary information).
Somebody has to prove that it's possible by writing it, I guess. Time will tell. Given the information I have today, I'm not holding my breath, and it seems like there are higher priority things to spend time on.
 
2017-02-06 15:45 GMT+00:00 Manuel Klimek <[hidden email]>:
I haven't looked too deeply into it, but from talking to various clang developers, the common theme is disbelieve that a high-level C++ AST can be created that is both useful and not overly specific to a single compiler.

From the paper referenced in the project, I see multiple things that make it seem not interesting to me from a point of refactoring and semantic analysis:
- The IPR does not handle macros before their expansion in the preprocessor.
- does not mimic C++ language irregularities; general rules are used, rather than long lists of special cases
- Unfortunately, a program cannot fully automate the generation of “skeletons.” If our aim is portability, we still need to (by hand) eliminate non-standard additions to the contents of header file.

Generally, I would not trust a representation that I can't generate code from to be correct enough for tools.

On Wed, Feb 1, 2017 at 8:32 PM Reid Kleckner via cfe-dev <[hidden email]> wrote:
Yes, the clang "abstract syntax tree" is often jokingly referred to as the "concrete syntax graph". We try to provide a generally useful representation, but being a fast production C++ compiler comes first. Clang's AST is very concrete. You have to know a lot about it to navigate it. There is no "Node" base class that you can use as a cursor to navigate around Decls, Exprs, Types, and TemplateArguments. Template instantiation, the closest thing I can think of to cloning, is done relatively manually with TreeTransform.

I think it would be nice to revisit the design of clang's AST to simplify it, normalize it, and abstract it, but it is not a task to be taken lightly, and I don't expect it to happen in the near future.

On Wed, Feb 1, 2017 at 8:44 AM, Gaetano Checinski via cfe-dev <[hidden email]> wrote:
As the AST is not really a Tree as it seems to have circular references, working with the AST is sometimes a bit messy (eg. cloning).

A while ago Stroustroup pointed me to Gabriel Dos Reis' work on a different approach to represent C++-AST: https://github.com/GabrielDosReis/ipr

Did anyone try to integrate his work into clang or has an opinion to share ?

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev




Sent with Mailtrack

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Loading...