new design of RecursiveASTVisitor

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

new design of RecursiveASTVisitor

Zhanyong Wan (λx.x x)
Hi,

While working on improving the RecursiveASTVisitor class, some of us saw an opportunity for a different design that's cleaner and easier to understand and use.  Here's the proposal.  Comments and suggestions welcome!

Clang AST Visitor Design

This document discusses the design of a customizable visitor class for the AST generated by the Clang C++ parser.

Background

Clang comes with two AST visitor classes: ASTVisitor and RecursiveASTVisitor. The ideas behind the two are similar. Both are implemented using the CRTP pattern. The former is considered an internal class and not for use in tools developed on top of Clang. The latter is supposed to be a public API.

At this time, neither class is complete (i.e. they'll miss some AST nodes) and there are known bugs (e.g. some AST nodes may be visited more than once). We are working on improving RecursiveASTVisitor. Our experience shows that its design has much space for improvement. Therefore it would be a worthwhile exercise to see what we can come up with if we do it from scratch, without being constrained or influenced by the current RecursiveASTVisitor design.

Concepts and Terminology

Muddy concepts lead to poor, confusing designs. Before we explore the design space, it's important to clarify some key concepts.

  • An AST is a tree-shaped data structure. A node in an AST can have 0 or many other nodes as its children.
  • There a three categories of AST nodes in Clang: statements, declarations/definitions, and types. They are derived from class Stmt, Decl, and TypeLoc, respectively. In general, a node may have children of any or all of the three categories. The class hierarchies for Stmt, Decl, and TypeLoc can be found here.
  • The AST visitor has three distinct tasks:
    1. traverse the entire AST (i.e. go to every node),
    2. at each node, walk up the class hierarchy, starting with the dynamic type of the node,
    3. for a given (node, type) combination (where 'type' is some base class of the dynamic type of 'node'), call a user-overridable function to actually "visit" the node.

Design Goals

  • easy-to-understand API
  • performs a depth-first, preorder traversal on the AST (no child of a node can be visited until all work on the node has been done)
  • complete coverage: each node in the AST must be visited
  • no duplication: a node cannot be visited more than once
  • common use cases should be easy
  • fast

Our Design

The distinction between the AST visitor's three tasks (see the "Concepts and Terminology" section above) is important for understanding how the AST visitor works and using it effectively. Therefore we use naming conventions to highlight the role of a method of the AST visitor:

  • TraverseFoo(Foo* x) does task #1. It traveses (i.e. recursively visits) the sub-tree rooted at x, where Foo is *x's dynamic type (it's the caller's responsibility to ensure the correct type).
  • Traverse(Decl* x) simply dispatches (i.e. forwards) to TraverseFoo(Foo* x) where Foo is the dynamic type of *x. Traverse(Stmt* x) and Traverse(TypeLoc* x) work similarly.
  • WalkUpFromFoo(Foo* x) (better name welcome) does task #2. It does not try to visit any child of x. Instead, it first calls WalkUpFromBar(x) where Bar is the direct parent class of Foo (unless Foo has no parent), and then calls VisitFoo(x) (see the next bullet).
  • VisitFoo(Foo* x) does task #3.

Here's some pesudo code to show what these methods' implementation looks like:

// A function returns false if it wants the traversal to abort (note that
// the current RecursiveASTVisitor class returns true for abortion - we
// feel that 'false' is much more intuitive).

// Therefore, each caller needs to check the return code and propagate
// it up appropriately.  Such plumbing code is omitted here for clarify.

// The entry point for visiting an AST rooted at x.
bool Traverse(Decl* x) {
 
switch(x->getKind()) {
 
case kFoo:
   
TraverseFoo((Foo*)x);
   
break;
 
case kBar:
   
TraverseBar((Bar*)x);
   
break;
 
...
 
}
}

bool TraverseFoo(Foo* x) {
 
WalkUpFromFoo(x);
 
for each child node n of x {
   
Traverse(n)
 
}
}

bool WalkUpFromFoo(Foo* x) {
 
WalkUpFromBar(x);  // Bar is Foo's parent.
 
VisitFoo(x);
}

bool WalkUpFromDecl(Decl* x) {
 
// Decl has no parent, so no more walking up.
 
VisitDecl(x);
}

// All Visit*() methods do nothing by default.
bool VisitFoo(Foo* x) { return true; }

Usually, a user shouldn't override a Traverse*() or WalkUpFrom*() function - they are the "skeleton" of the AST visitor and do the mundane plumbing that a normal user is not interested in. We recommend that only people truly understand how the traversal works can override them.

A user is expected to override the Visit*() methods as needed. The default implementation of them does nothing.

Returning a bool to indicate whether the traversal needs to be aborted requires much tedious, error-prone plumbing in the implementation. Exceptions are the most natural solution here, but unfortunately they are banned from Clang code.

One advantage of this design over the existing RecursiveASTVisitor is that when a user overrides VisitFoo(), he only needs to do what he's interested in, without worrying about calling VisitFoo() or any other methods in the base AST visitor class (so less chance to shoot his own feet).

With this design, all Visit*() methods on an AST ndoe are called before any Visit*() method is called on a child node of it.  So the trace of the Visit*() calls is easy to understand and reason about.

One thing people might not like about the design is that it uses more methods. I actually think it's a good thing that we have separate families of methods for separate roles. RecursiveASTVisitor's approach of lumping all logic (traversal, walking up the type hierarchy, and the custom visiting logic) in the same Visit*() method makes the implementation more tricky and error-prone - that's probably why there was so much confusion when we were trying to improve RecursiveASTVisitor.

To ensure performance, we would use CRTP as the existing AST visitors do. That means two things:

  1. No need for virtual functions.
  2. The entire code is templated s.t. the compiler can easily inline trivial functions and do cross-function optimization.

Use Cases

Let's see how some typical use cases can be implemented using our AST visitor.

Dump the AST

Only 3 methods need to be overridden:

VisitDecl(Decl* x) {
 
print information about x;
}

VisitStmt(Stmt* x) {
 
print information about x;
}

VisitTypeLoc(TypeLoc* x) {
 
print information about x;
}

Search for an AST Node of Type Foo

The tool would override VisitFoo(Foo* x) to return false if x is the node it's looking for.

Find the Parents of an AST Node

The tool would maintain a stack of the nodes that lead to the current node. It can build the stack by overriding exactly 3 methods (plumbing of the return code omitted):

Traverse(Decl* d) {
 
Push(d);
 
Base::Traverse(d);
 
Pop();
}

Traverse(Stmt* s) {
 
Push(s);
 
Base::Traverse(s);
 
Pop();
}

Traverse(TypeLoc* t) {
 
Push(t);
 
Base::Traverse(t);
 
Pop();
}

--
Zhanyong


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: new design of RecursiveASTVisitor

Abramo Bagnara-2

For proper implementation of tree finite automata I'd suggest you to
consider the (needed) distinction from node entering and node exiting.

  call Enter node handler;
  visit node childs;
  call Exit Node handler;

Another very useful thing you should consider is to add a child
identifier to handler arguments that inform handler about *which* parent
child is currently visited: this information is not otherwise deducible.

To better understand this try to imagine an application that want to
check lhs of assign binary operator.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: new design of RecursiveASTVisitor

Zhanyong Wan (λx.x x)
Thanks for the comments.

On Sat, Jun 5, 2010 at 12:01 AM, Abramo Bagnara
<[hidden email]> wrote:
>
> For proper implementation of tree finite automata I'd suggest you to
> consider the (needed) distinction from node entering and node exiting.
>
>  call Enter node handler;
>  visit node childs;
>  call Exit Node handler;

This is beyond the scope of what I'd like to do now.  The generality
you proposed can be useful (e.g. for implementing postorder
traversal), but all the use cases I have in mind only need preorder
traversal (as stated in my design goal).  Note that the "exit node
handler" can always be added later without breaking existing clients,
so we can incorporate your proposal later, when there's actual need
for it.

> Another very useful thing you should consider is to add a child
> identifier to handler arguments that inform handler about *which* parent
> child is currently visited: this information is not otherwise deducible.

Sorry, I don't understand what you mean by "which parent child is
currently visited".  Care to clarify?

> To better understand this try to imagine an application that want to
> check lhs of assign binary operator.

--
Zhanyong

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: new design of RecursiveASTVisitor

Abramo Bagnara-2
>> Another very useful thing you should consider is to add a child
>> identifier to handler arguments that inform handler about *which* parent
>> child is currently visited: this information is not otherwise deducible.
>
> Sorry, I don't understand what you mean by "which parent child is
> currently visited".  Care to clarify?
>
>> To better understand this try to imagine an application that want to
>> check lhs of assign binary operator.

As you've noted in your documentation it might be useful to know the
ancestor chain of current node and for this aim it is possibile to save
a stack of node entered, but what about to know *which* child of its
parent current node is?

This info is not available or deducible *unless* the parent visit (in
its general purpose code) pass down this information.

This make possibile to node visit to answer to the following questions:

"I'm the lhs or the rhs expression?", "I'm the return type of function
decl or the full function type?",  "I'm the return type of function type
or an argument type?", etc.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: new design of RecursiveASTVisitor

Zhanyong Wan (λx.x x)
On Saturday, June 5, 2010, Abramo Bagnara <[hidden email]> wrote:

>>> Another very useful thing you should consider is to add a child
>>> identifier to handler arguments that inform handler about *which* parent
>>> child is currently visited: this information is not otherwise deducible.
>>
>> Sorry, I don't understand what you mean by "which parent child is
>> currently visited".  Care to clarify?
>>
>>> To better understand this try to imagine an application that want to
>>> check lhs of assign binary operator.
>
> As you've noted in your documentation it might be useful to know the
> ancestor chain of current node and for this aim it is possibile to save
> a stack of node entered, but what about to know *which* child of its
> parent current node is?
>
> This info is not available or deducible *unless* the parent visit (in

It is easily deducible.  Just search for the current node (a pointer)
in the parent's children.

> its general purpose code) pass down this information.
>
> This make possibile to node visit to answer to the following questions:
>
> "I'm the lhs or the rhs expression?", "I'm the return type of function
> decl or the full function type?",  "I'm the return type of function type
> or an argument type?", etc.

Just compare the pointer to the current node with the parent node's
lhs expression child, etc.

--
Zhanyong

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: new design of RecursiveASTVisitor

Jordy Rose
In reply to this post by Zhanyong Wan (λx.x x)

Having just written a small tree-walker myself, I want to ask what
returning "false" means. There's lots of ways you might want to "stop" in
the middle of a traversal:

1. Stop traversing altogether. This is what the original proposal says, if
I'm reading it correctly.

2. Stop walking the class hierarchy, but still traverse the children. This
doesn't seem very useful -- a client could probably get around this without
any real trouble.

3. Skip this node's children and move on to the next node. Would have to
override TraverseFoo() for this right now, but not so hard since you just
call WalkUpFromFoo() and don't look at the children.

4. Skip the rest of the /parent/'s children. This is definitely trickier.

5. Jump an arbitrary number of levels back up (i.e. #4 recursively).
Probably not very common, but powerful enough to implement #3, #4, and #1,
with each one wrapped in simple API.


Do you (or does anyone) think there are enough uses of #3, #4, or #5 to
warrant implementing something like this? A nice way to do it could be a
simple push()/pop() system -- when you say abort, it pop()s back to the
last saved node. Alternately, you could return a special value that says
how far back to pop() (such as a particular Stmt*), with special constants
for "Continue" and "Abort".

Overall, though, I like the separation of responsibility. I'm wondering a
little bit about Abramo's point of pre- and post- visiting -- sometimes you
might need the children first before you can finish the parent. (For
example, the type of an expression depends on its children...not that we'd
be using RecursiveASTVisitor to figure out types.)

Jordy
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: new design of RecursiveASTVisitor

Zhanyong Wan (λx.x x)
On Sat, Jun 5, 2010 at 4:20 PM, Jordy Rose <[hidden email]> wrote:
>
> Having just written a small tree-walker myself, I want to ask what
> returning "false" means. There's lots of ways you might want to "stop" in
> the middle of a traversal:
>
> 1. Stop traversing altogether. This is what the original proposal says, if
> I'm reading it correctly.

Yes, that's what I meant.  Think of it as throwing an exception.

> 2. Stop walking the class hierarchy, but still traverse the children. This
> doesn't seem very useful -- a client could probably get around this without
> any real trouble.

Agreed.

> 3. Skip this node's children and move on to the next node. Would have to
> override TraverseFoo() for this right now, but not so hard since you just
> call WalkUpFromFoo() and don't look at the children.

Agreed.  It's easy for a user to do this.

> 4. Skip the rest of the /parent/'s children. This is definitely trickier.

You can override TraverseParent().  Doable.

> 5. Jump an arbitrary number of levels back up (i.e. #4 recursively).
> Probably not very common, but powerful enough to implement #3, #4, and #1,
> with each one wrapped in simple API.
>
>
> Do you (or does anyone) think there are enough uses of #3, #4, or #5 to
> warrant implementing something like this? A nice way to do it could be a
> simple push()/pop() system -- when you say abort, it pop()s back to the
> last saved node. Alternately, you could return a special value that says
> how far back to pop() (such as a particular Stmt*), with special constants
> for "Continue" and "Abort".

I suspect we are getting a bit ahead of ourselves here.  I like to
make the design as general as possible too - as long as we don't
introduce unnecessary complexity.  I think the use cases you described
are rare, and I'm fine with not supporting them directly.  The user
can always override enough methods to make the AST visitor do whatever
he wants to do.  Or he may just create a new AST visitor class if that
makes more sense.

> Overall, though, I like the separation of responsibility. I'm wondering a
> little bit about Abramo's point of pre- and post- visiting -- sometimes you
> might need the children first before you can finish the parent. (For
> example, the type of an expression depends on its children...not that we'd
> be using RecursiveASTVisitor to figure out types.)

Support for postorder traversal can be added later without breaking
existing users.  We just need to add a bunch of PostVisitFoo() methods
and change WalkUpFromFoo()'s body to:

{
  VisitFoo(x);
  WalkUpFromBar(x);
  PostVisitFoo(x);  // This is new.
}

Therefore I don't intend to bloat the API with this upfront.

Thanks!

> Jordy
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>



--
Zhanyong
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: new design of RecursiveASTVisitor

Jordy Rose

On Sat, 5 Jun 2010 23:07:40 -0700, Zhanyong Wan (λx.x x) <[hidden email]>
wrote:
> On Sat, Jun 5, 2010 at 4:20 PM, Jordy Rose <[hidden email]>
wrote:
>>
>> Having just written a small tree-walker myself, I want to ask what
>> returning "false" means. There's lots of ways you might want to "stop"
in
>> the middle of a traversal:
>>
>> 1. Stop traversing altogether. This is what the original proposal says,
>> if
>> I'm reading it correctly.
>
> Yes, that's what I meant.  Think of it as throwing an exception.

<snip>

>> Do you (or does anyone) think there are enough uses of #3, #4, or #5 to
>> warrant implementing something like this? A nice way to do it could be
a
>> simple push()/pop() system -- when you say abort, it pop()s back to the
>> last saved node. Alternately, you could return a special value that
says

>> how far back to pop() (such as a particular Stmt*), with special
>> constants
>> for "Continue" and "Abort".
>
> I suspect we are getting a bit ahead of ourselves here.  I like to
> make the design as general as possible too - as long as we don't
> introduce unnecessary complexity.  I think the use cases you described
> are rare, and I'm fine with not supporting them directly.  The user
> can always override enough methods to make the AST visitor do whatever
> he wants to do.  Or he may just create a new AST visitor class if that
> makes more sense.

Heh, I suppose so. I started out writing with the intent to make sure
"stop" was well-defined (and eventually, clearly documented), but then got
carried away with all the possible uses. Of course it's easy to modify
behavior by subclasing RecursiveASTVisitor.

Thanks,
Jordy
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: new design of RecursiveASTVisitor

Olaf Krzikalla
In reply to this post by Zhanyong Wan (λx.x x)
I've build my AST traversal on top of llvm's df_iterator.

To get an idea, check

https://gforge.zih.tu-dresden.de/scm/viewvc.php/trunk/Scout/clangAddons/include/clang/ASTProcessing/StmtTraversal.h?root=hicfd&view=markup

stmt_iterator enables the traversal over a particular type, which at
least for me is sufficient in most cases. For more complex tasks there
is visit_df taking a StmtVisitor.

Best regards
Olaf Krzikalla
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: new design of RecursiveASTVisitor

Zhanyong Wan (λx.x x)
In reply to this post by Zhanyong Wan (λx.x x)
Hi, all,

Craig and I implemented most of the new design.  I uploaded the patch
to http://codereview.appspot.com/1607042 for easy reviewing - you can
download it from
http://codereview.appspot.com/download/issue1607042_1.diff

In the implementation we made a slight change to the proposed design:
instead of overloading Traverse() for Stmt, Type, and Decl, we prefer
to have separate TraverseStmt, TraverseType, and TraverseDecl
functions.  We think that makes the code clearer.

Chandler plan to review it after 3pm today.  Doug, would you have some
time to look at this today?  Thanks!

On Fri, Jun 4, 2010 at 4:35 PM, Zhanyong Wan (λx.x x) <[hidden email]> wrote:

> Hi,
>
> While working on improving the RecursiveASTVisitor class, some of us saw an
> opportunity for a different design that's cleaner and easier to understand
> and use.  Here's the proposal.  Comments and suggestions welcome!
>
> Clang AST Visitor Design
>
> This document discusses the design of a customizable visitor class for the
> AST generated by the Clang C++ parser.
>
> Background
>
> Clang comes with two AST visitor classes: ASTVisitor and
> RecursiveASTVisitor. The ideas behind the two are similar. Both are
> implemented using the CRTP pattern. The former is considered an internal
> class and not for use in tools developed on top of Clang. The latter is
> supposed to be a public API.
>
> At this time, neither class is complete (i.e. they'll miss some AST nodes)
> and there are known bugs (e.g. some AST nodes may be visited more than
> once). We are working on improving RecursiveASTVisitor. Our experience shows
> that its design has much space for improvement. Therefore it would be a
> worthwhile exercise to see what we can come up with if we do it from
> scratch, without being constrained or influenced by the current
> RecursiveASTVisitor design.
>
> Concepts and Terminology
>
> Muddy concepts lead to poor, confusing designs. Before we explore the design
> space, it's important to clarify some key concepts.
>
> An AST is a tree-shaped data structure. A node in an AST can have 0 or many
> other nodes as its children.
> There a three categories of AST nodes in Clang: statements,
> declarations/definitions, and types. They are derived from class Stmt, Decl,
> and TypeLoc, respectively. In general, a node may have children of any or
> all of the three categories. The class hierarchies for Stmt, Decl, and
> TypeLoc can be found here.
> The AST visitor has three distinct tasks:
>
> traverse the entire AST (i.e. go to every node),
> at each node, walk up the class hierarchy, starting with the dynamic type of
> the node,
> for a given (node, type) combination (where 'type' is some base class of the
> dynamic type of 'node'), call a user-overridable function to actually
> "visit" the node.
>
> Design Goals
>
> easy-to-understand API
> performs a depth-first, preorder traversal on the AST (no child of a node
> can be visited until all work on the node has been done)
> complete coverage: each node in the AST must be visited
> no duplication: a node cannot be visited more than once
> common use cases should be easy
> fast
>
> Our Design
>
> The distinction between the AST visitor's three tasks (see the "Concepts and
> Terminology" section above) is important for understanding how the AST
> visitor works and using it effectively. Therefore we use naming conventions
> to highlight the role of a method of the AST visitor:
>
> TraverseFoo(Foo* x) does task #1. It traveses (i.e. recursively visits) the
> sub-tree rooted at x, where Foo is *x's dynamic type (it's the caller's
> responsibility to ensure the correct type).
> Traverse(Decl* x) simply dispatches (i.e. forwards) to TraverseFoo(Foo* x)
> where Foo is the dynamic type of *x. Traverse(Stmt* x) and Traverse(TypeLoc*
> x) work similarly.
> WalkUpFromFoo(Foo* x) (better name welcome) does task #2. It does not try to
> visit any child of x. Instead, it first calls WalkUpFromBar(x) where Bar is
> the direct parent class of Foo (unless Foo has no parent), and then calls
> VisitFoo(x) (see the next bullet).
> VisitFoo(Foo* x) does task #3.
>
> Here's some pesudo code to show what these methods' implementation looks
> like:
>
> // A function returns false if it wants the traversal to abort (note that
> // the current RecursiveASTVisitor class returns true for abortion - we
> // feel that 'false' is much more intuitive).
>
> // Therefore, each caller needs to check the return code and propagate
> // it up appropriately.  Such plumbing code is omitted here for clarify.
>
>
> // The entry point for visiting an AST rooted at x.
> bool Traverse(Decl* x) {
>
>   switch(x->getKind()) {
>
>   case kFoo:
>     TraverseFoo((Foo*)x);
>
>     break;
>   case kBar:
>     TraverseBar((Bar*)x);
>
>     break;
>   ...
>   }
> }
>
>
> bool TraverseFoo(Foo* x) {
>
>   WalkUpFromFoo(x);
>   for each child node n of x {
>
>     Traverse(n)
>   }
> }
>
>
> bool WalkUpFromFoo(Foo* x) {
>
>   WalkUpFromBar(x);  // Bar is Foo's parent.
>
>   VisitFoo(x);
> }
>
> bool WalkUpFromDecl(Decl* x) {
>
>   // Decl has no parent, so no more walking up.
>   VisitDecl(x);
>
> }
>
> // All Visit*() methods do nothing by default.
> bool VisitFoo(Foo* x) { return true; }
>
> Usually, a user shouldn't override a Traverse*() or WalkUpFrom*() function -
> they are the "skeleton" of the AST visitor and do the mundane plumbing that
> a normal user is not interested in. We recommend that only people truly
> understand how the traversal works can override them.
>
> A user is expected to override the Visit*() methods as needed. The default
> implementation of them does nothing.
>
> Returning a bool to indicate whether the traversal needs to be aborted
> requires much tedious, error-prone plumbing in the implementation.
> Exceptions are the most natural solution here, but unfortunately they are
> banned from Clang code.
>
> One advantage of this design over the existing RecursiveASTVisitor is that
> when a user overrides VisitFoo(), he only needs to do what he's interested
> in, without worrying about calling VisitFoo() or any other methods in the
> base AST visitor class (so less chance to shoot his own feet).
>
> With this design, all Visit*() methods on an AST ndoe are called before any
> Visit*() method is called on a child node of it.  So the trace of the
> Visit*() calls is easy to understand and reason about.
>
> One thing people might not like about the design is that it uses more
> methods. I actually think it's a good thing that we have separate families
> of methods for separate roles. RecursiveASTVisitor's approach of lumping all
> logic (traversal, walking up the type hierarchy, and the custom visiting
> logic) in the same Visit*() method makes the implementation more tricky and
> error-prone - that's probably why there was so much confusion when we were
> trying to improve RecursiveASTVisitor.
>
> To ensure performance, we would use CRTP as the existing AST visitors do.
> That means two things:
>
> No need for virtual functions.
> The entire code is templated s.t. the compiler can easily inline trivial
> functions and do cross-function optimization.
>
> Use Cases
>
> Let's see how some typical use cases can be implemented using our AST
> visitor.
>
> Dump the AST
>
> Only 3 methods need to be overridden:
>
> VisitDecl(Decl* x) {
>
>   print information about x;
> }
>
> VisitStmt(Stmt* x) {
>
>   print information about x;
> }
>
> VisitTypeLoc(TypeLoc* x) {
>
>   print information about x;
> }
>
> Search for an AST Node of Type Foo
>
> The tool would override VisitFoo(Foo* x) to return false if x is the node
> it's looking for.
>
> Find the Parents of an AST Node
>
> The tool would maintain a stack of the nodes that lead to the current node.
> It can build the stack by overriding exactly 3 methods (plumbing of the
> return code omitted):
>
> Traverse(Decl* d) {
>
>   Push(d);
>   Base::Traverse(d);
>
>   Pop();
> }
>
> Traverse(Stmt* s) {
>
>   Push(s);
>   Base::Traverse(s);
>
>   Pop();
> }
>
> Traverse(TypeLoc* t) {
>
>   Push(t);
>   Base::Traverse(t);
>
>   Pop();
> }
>
> --
> Zhanyong
>
>



--
Zhanyong

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: new design of RecursiveASTVisitor

Douglas Gregor

On Jun 8, 2010, at 2:44 PM, Zhanyong Wan (λx.x x) wrote:

> Hi, all,
>
> Craig and I implemented most of the new design.  I uploaded the patch
> to http://codereview.appspot.com/1607042 for easy reviewing - you can
> download it from
> http://codereview.appspot.com/download/issue1607042_1.diff
>
> In the implementation we made a slight change to the proposed design:
> instead of overloading Traverse() for Stmt, Type, and Decl, we prefer
> to have separate TraverseStmt, TraverseType, and TraverseDecl
> functions.  We think that makes the code clearer.
>
> Chandler plan to review it after 3pm today.  Doug, would you have some
> time to look at this today?  Thanks!

I've added my comments up on codereview.appspot.com. Looks good!

        - Doug

> On Fri, Jun 4, 2010 at 4:35 PM, Zhanyong Wan (λx.x x) <[hidden email]> wrote:
>> Hi,
>>
>> While working on improving the RecursiveASTVisitor class, some of us saw an
>> opportunity for a different design that's cleaner and easier to understand
>> and use.  Here's the proposal.  Comments and suggestions welcome!
>>
>> Clang AST Visitor Design
>>
>> This document discusses the design of a customizable visitor class for the
>> AST generated by the Clang C++ parser.
>>
>> Background
>>
>> Clang comes with two AST visitor classes: ASTVisitor and
>> RecursiveASTVisitor. The ideas behind the two are similar. Both are
>> implemented using the CRTP pattern. The former is considered an internal
>> class and not for use in tools developed on top of Clang. The latter is
>> supposed to be a public API.
>>
>> At this time, neither class is complete (i.e. they'll miss some AST nodes)
>> and there are known bugs (e.g. some AST nodes may be visited more than
>> once). We are working on improving RecursiveASTVisitor. Our experience shows
>> that its design has much space for improvement. Therefore it would be a
>> worthwhile exercise to see what we can come up with if we do it from
>> scratch, without being constrained or influenced by the current
>> RecursiveASTVisitor design.
>>
>> Concepts and Terminology
>>
>> Muddy concepts lead to poor, confusing designs. Before we explore the design
>> space, it's important to clarify some key concepts.
>>
>> An AST is a tree-shaped data structure. A node in an AST can have 0 or many
>> other nodes as its children.
>> There a three categories of AST nodes in Clang: statements,
>> declarations/definitions, and types. They are derived from class Stmt, Decl,
>> and TypeLoc, respectively. In general, a node may have children of any or
>> all of the three categories. The class hierarchies for Stmt, Decl, and
>> TypeLoc can be found here.
>> The AST visitor has three distinct tasks:
>>
>> traverse the entire AST (i.e. go to every node),
>> at each node, walk up the class hierarchy, starting with the dynamic type of
>> the node,
>> for a given (node, type) combination (where 'type' is some base class of the
>> dynamic type of 'node'), call a user-overridable function to actually
>> "visit" the node.
>>
>> Design Goals
>>
>> easy-to-understand API
>> performs a depth-first, preorder traversal on the AST (no child of a node
>> can be visited until all work on the node has been done)
>> complete coverage: each node in the AST must be visited
>> no duplication: a node cannot be visited more than once
>> common use cases should be easy
>> fast
>>
>> Our Design
>>
>> The distinction between the AST visitor's three tasks (see the "Concepts and
>> Terminology" section above) is important for understanding how the AST
>> visitor works and using it effectively. Therefore we use naming conventions
>> to highlight the role of a method of the AST visitor:
>>
>> TraverseFoo(Foo* x) does task #1. It traveses (i.e. recursively visits) the
>> sub-tree rooted at x, where Foo is *x's dynamic type (it's the caller's
>> responsibility to ensure the correct type).
>> Traverse(Decl* x) simply dispatches (i.e. forwards) to TraverseFoo(Foo* x)
>> where Foo is the dynamic type of *x. Traverse(Stmt* x) and Traverse(TypeLoc*
>> x) work similarly.
>> WalkUpFromFoo(Foo* x) (better name welcome) does task #2. It does not try to
>> visit any child of x. Instead, it first calls WalkUpFromBar(x) where Bar is
>> the direct parent class of Foo (unless Foo has no parent), and then calls
>> VisitFoo(x) (see the next bullet).
>> VisitFoo(Foo* x) does task #3.
>>
>> Here's some pesudo code to show what these methods' implementation looks
>> like:
>>
>> // A function returns false if it wants the traversal to abort (note that
>> // the current RecursiveASTVisitor class returns true for abortion - we
>> // feel that 'false' is much more intuitive).
>>
>> // Therefore, each caller needs to check the return code and propagate
>> // it up appropriately.  Such plumbing code is omitted here for clarify.
>>
>>
>> // The entry point for visiting an AST rooted at x.
>> bool Traverse(Decl* x) {
>>
>>   switch(x->getKind()) {
>>
>>   case kFoo:
>>     TraverseFoo((Foo*)x);
>>
>>     break;
>>   case kBar:
>>     TraverseBar((Bar*)x);
>>
>>     break;
>>   ...
>>   }
>> }
>>
>>
>> bool TraverseFoo(Foo* x) {
>>
>>   WalkUpFromFoo(x);
>>   for each child node n of x {
>>
>>     Traverse(n)
>>   }
>> }
>>
>>
>> bool WalkUpFromFoo(Foo* x) {
>>
>>   WalkUpFromBar(x);  // Bar is Foo's parent.
>>
>>   VisitFoo(x);
>> }
>>
>> bool WalkUpFromDecl(Decl* x) {
>>
>>   // Decl has no parent, so no more walking up.
>>   VisitDecl(x);
>>
>> }
>>
>> // All Visit*() methods do nothing by default.
>> bool VisitFoo(Foo* x) { return true; }
>>
>> Usually, a user shouldn't override a Traverse*() or WalkUpFrom*() function -
>> they are the "skeleton" of the AST visitor and do the mundane plumbing that
>> a normal user is not interested in. We recommend that only people truly
>> understand how the traversal works can override them.
>>
>> A user is expected to override the Visit*() methods as needed. The default
>> implementation of them does nothing.
>>
>> Returning a bool to indicate whether the traversal needs to be aborted
>> requires much tedious, error-prone plumbing in the implementation.
>> Exceptions are the most natural solution here, but unfortunately they are
>> banned from Clang code.
>>
>> One advantage of this design over the existing RecursiveASTVisitor is that
>> when a user overrides VisitFoo(), he only needs to do what he's interested
>> in, without worrying about calling VisitFoo() or any other methods in the
>> base AST visitor class (so less chance to shoot his own feet).
>>
>> With this design, all Visit*() methods on an AST ndoe are called before any
>> Visit*() method is called on a child node of it.  So the trace of the
>> Visit*() calls is easy to understand and reason about.
>>
>> One thing people might not like about the design is that it uses more
>> methods. I actually think it's a good thing that we have separate families
>> of methods for separate roles. RecursiveASTVisitor's approach of lumping all
>> logic (traversal, walking up the type hierarchy, and the custom visiting
>> logic) in the same Visit*() method makes the implementation more tricky and
>> error-prone - that's probably why there was so much confusion when we were
>> trying to improve RecursiveASTVisitor.
>>
>> To ensure performance, we would use CRTP as the existing AST visitors do.
>> That means two things:
>>
>> No need for virtual functions.
>> The entire code is templated s.t. the compiler can easily inline trivial
>> functions and do cross-function optimization.
>>
>> Use Cases
>>
>> Let's see how some typical use cases can be implemented using our AST
>> visitor.
>>
>> Dump the AST
>>
>> Only 3 methods need to be overridden:
>>
>> VisitDecl(Decl* x) {
>>
>>   print information about x;
>> }
>>
>> VisitStmt(Stmt* x) {
>>
>>   print information about x;
>> }
>>
>> VisitTypeLoc(TypeLoc* x) {
>>
>>   print information about x;
>> }
>>
>> Search for an AST Node of Type Foo
>>
>> The tool would override VisitFoo(Foo* x) to return false if x is the node
>> it's looking for.
>>
>> Find the Parents of an AST Node
>>
>> The tool would maintain a stack of the nodes that lead to the current node.
>> It can build the stack by overriding exactly 3 methods (plumbing of the
>> return code omitted):
>>
>> Traverse(Decl* d) {
>>
>>   Push(d);
>>   Base::Traverse(d);
>>
>>   Pop();
>> }
>>
>> Traverse(Stmt* s) {
>>
>>   Push(s);
>>   Base::Traverse(s);
>>
>>   Pop();
>> }
>>
>> Traverse(TypeLoc* t) {
>>
>>   Push(t);
>>   Base::Traverse(t);
>>
>>   Pop();
>> }
>>
>> --
>> Zhanyong
>>
>>
>
>
>
> --
> Zhanyong


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: new design of RecursiveASTVisitor

Zhanyong Wan (λx.x x)
Thanks for the excellent comments, Doug and Chandler!  I've uploaded a
new patch to the codereview app.

On Tue, Jun 8, 2010 at 5:33 PM, Douglas Gregor <[hidden email]> wrote:

>
> On Jun 8, 2010, at 2:44 PM, Zhanyong Wan (λx.x x) wrote:
>
>> Hi, all,
>>
>> Craig and I implemented most of the new design.  I uploaded the patch
>> to http://codereview.appspot.com/1607042 for easy reviewing - you can
>> download it from
>> http://codereview.appspot.com/download/issue1607042_1.diff
>>
>> In the implementation we made a slight change to the proposed design:
>> instead of overloading Traverse() for Stmt, Type, and Decl, we prefer
>> to have separate TraverseStmt, TraverseType, and TraverseDecl
>> functions.  We think that makes the code clearer.
>>
>> Chandler plan to review it after 3pm today.  Doug, would you have some
>> time to look at this today?  Thanks!
>
> I've added my comments up on codereview.appspot.com. Looks good!
>
>        - Doug
>
>> On Fri, Jun 4, 2010 at 4:35 PM, Zhanyong Wan (λx.x x) <[hidden email]> wrote:
>>> Hi,
>>>
>>> While working on improving the RecursiveASTVisitor class, some of us saw an
>>> opportunity for a different design that's cleaner and easier to understand
>>> and use.  Here's the proposal.  Comments and suggestions welcome!
>>>
>>> Clang AST Visitor Design
>>>
>>> This document discusses the design of a customizable visitor class for the
>>> AST generated by the Clang C++ parser.
>>>
>>> Background
>>>
>>> Clang comes with two AST visitor classes: ASTVisitor and
>>> RecursiveASTVisitor. The ideas behind the two are similar. Both are
>>> implemented using the CRTP pattern. The former is considered an internal
>>> class and not for use in tools developed on top of Clang. The latter is
>>> supposed to be a public API.
>>>
>>> At this time, neither class is complete (i.e. they'll miss some AST nodes)
>>> and there are known bugs (e.g. some AST nodes may be visited more than
>>> once). We are working on improving RecursiveASTVisitor. Our experience shows
>>> that its design has much space for improvement. Therefore it would be a
>>> worthwhile exercise to see what we can come up with if we do it from
>>> scratch, without being constrained or influenced by the current
>>> RecursiveASTVisitor design.
>>>
>>> Concepts and Terminology
>>>
>>> Muddy concepts lead to poor, confusing designs. Before we explore the design
>>> space, it's important to clarify some key concepts.
>>>
>>> An AST is a tree-shaped data structure. A node in an AST can have 0 or many
>>> other nodes as its children.
>>> There a three categories of AST nodes in Clang: statements,
>>> declarations/definitions, and types. They are derived from class Stmt, Decl,
>>> and TypeLoc, respectively. In general, a node may have children of any or
>>> all of the three categories. The class hierarchies for Stmt, Decl, and
>>> TypeLoc can be found here.
>>> The AST visitor has three distinct tasks:
>>>
>>> traverse the entire AST (i.e. go to every node),
>>> at each node, walk up the class hierarchy, starting with the dynamic type of
>>> the node,
>>> for a given (node, type) combination (where 'type' is some base class of the
>>> dynamic type of 'node'), call a user-overridable function to actually
>>> "visit" the node.
>>>
>>> Design Goals
>>>
>>> easy-to-understand API
>>> performs a depth-first, preorder traversal on the AST (no child of a node
>>> can be visited until all work on the node has been done)
>>> complete coverage: each node in the AST must be visited
>>> no duplication: a node cannot be visited more than once
>>> common use cases should be easy
>>> fast
>>>
>>> Our Design
>>>
>>> The distinction between the AST visitor's three tasks (see the "Concepts and
>>> Terminology" section above) is important for understanding how the AST
>>> visitor works and using it effectively. Therefore we use naming conventions
>>> to highlight the role of a method of the AST visitor:
>>>
>>> TraverseFoo(Foo* x) does task #1. It traveses (i.e. recursively visits) the
>>> sub-tree rooted at x, where Foo is *x's dynamic type (it's the caller's
>>> responsibility to ensure the correct type).
>>> Traverse(Decl* x) simply dispatches (i.e. forwards) to TraverseFoo(Foo* x)
>>> where Foo is the dynamic type of *x. Traverse(Stmt* x) and Traverse(TypeLoc*
>>> x) work similarly.
>>> WalkUpFromFoo(Foo* x) (better name welcome) does task #2. It does not try to
>>> visit any child of x. Instead, it first calls WalkUpFromBar(x) where Bar is
>>> the direct parent class of Foo (unless Foo has no parent), and then calls
>>> VisitFoo(x) (see the next bullet).
>>> VisitFoo(Foo* x) does task #3.
>>>
>>> Here's some pesudo code to show what these methods' implementation looks
>>> like:
>>>
>>> // A function returns false if it wants the traversal to abort (note that
>>> // the current RecursiveASTVisitor class returns true for abortion - we
>>> // feel that 'false' is much more intuitive).
>>>
>>> // Therefore, each caller needs to check the return code and propagate
>>> // it up appropriately.  Such plumbing code is omitted here for clarify.
>>>
>>>
>>> // The entry point for visiting an AST rooted at x.
>>> bool Traverse(Decl* x) {
>>>
>>>   switch(x->getKind()) {
>>>
>>>   case kFoo:
>>>     TraverseFoo((Foo*)x);
>>>
>>>     break;
>>>   case kBar:
>>>     TraverseBar((Bar*)x);
>>>
>>>     break;
>>>   ...
>>>   }
>>> }
>>>
>>>
>>> bool TraverseFoo(Foo* x) {
>>>
>>>   WalkUpFromFoo(x);
>>>   for each child node n of x {
>>>
>>>     Traverse(n)
>>>   }
>>> }
>>>
>>>
>>> bool WalkUpFromFoo(Foo* x) {
>>>
>>>   WalkUpFromBar(x);  // Bar is Foo's parent.
>>>
>>>   VisitFoo(x);
>>> }
>>>
>>> bool WalkUpFromDecl(Decl* x) {
>>>
>>>   // Decl has no parent, so no more walking up.
>>>   VisitDecl(x);
>>>
>>> }
>>>
>>> // All Visit*() methods do nothing by default.
>>> bool VisitFoo(Foo* x) { return true; }
>>>
>>> Usually, a user shouldn't override a Traverse*() or WalkUpFrom*() function -
>>> they are the "skeleton" of the AST visitor and do the mundane plumbing that
>>> a normal user is not interested in. We recommend that only people truly
>>> understand how the traversal works can override them.
>>>
>>> A user is expected to override the Visit*() methods as needed. The default
>>> implementation of them does nothing.
>>>
>>> Returning a bool to indicate whether the traversal needs to be aborted
>>> requires much tedious, error-prone plumbing in the implementation.
>>> Exceptions are the most natural solution here, but unfortunately they are
>>> banned from Clang code.
>>>
>>> One advantage of this design over the existing RecursiveASTVisitor is that
>>> when a user overrides VisitFoo(), he only needs to do what he's interested
>>> in, without worrying about calling VisitFoo() or any other methods in the
>>> base AST visitor class (so less chance to shoot his own feet).
>>>
>>> With this design, all Visit*() methods on an AST ndoe are called before any
>>> Visit*() method is called on a child node of it.  So the trace of the
>>> Visit*() calls is easy to understand and reason about.
>>>
>>> One thing people might not like about the design is that it uses more
>>> methods. I actually think it's a good thing that we have separate families
>>> of methods for separate roles. RecursiveASTVisitor's approach of lumping all
>>> logic (traversal, walking up the type hierarchy, and the custom visiting
>>> logic) in the same Visit*() method makes the implementation more tricky and
>>> error-prone - that's probably why there was so much confusion when we were
>>> trying to improve RecursiveASTVisitor.
>>>
>>> To ensure performance, we would use CRTP as the existing AST visitors do.
>>> That means two things:
>>>
>>> No need for virtual functions.
>>> The entire code is templated s.t. the compiler can easily inline trivial
>>> functions and do cross-function optimization.
>>>
>>> Use Cases
>>>
>>> Let's see how some typical use cases can be implemented using our AST
>>> visitor.
>>>
>>> Dump the AST
>>>
>>> Only 3 methods need to be overridden:
>>>
>>> VisitDecl(Decl* x) {
>>>
>>>   print information about x;
>>> }
>>>
>>> VisitStmt(Stmt* x) {
>>>
>>>   print information about x;
>>> }
>>>
>>> VisitTypeLoc(TypeLoc* x) {
>>>
>>>   print information about x;
>>> }
>>>
>>> Search for an AST Node of Type Foo
>>>
>>> The tool would override VisitFoo(Foo* x) to return false if x is the node
>>> it's looking for.
>>>
>>> Find the Parents of an AST Node
>>>
>>> The tool would maintain a stack of the nodes that lead to the current node.
>>> It can build the stack by overriding exactly 3 methods (plumbing of the
>>> return code omitted):
>>>
>>> Traverse(Decl* d) {
>>>
>>>   Push(d);
>>>   Base::Traverse(d);
>>>
>>>   Pop();
>>> }
>>>
>>> Traverse(Stmt* s) {
>>>
>>>   Push(s);
>>>   Base::Traverse(s);
>>>
>>>   Pop();
>>> }
>>>
>>> Traverse(TypeLoc* t) {
>>>
>>>   Push(t);
>>>   Base::Traverse(t);
>>>
>>>   Pop();
>>> }
>>>
>>> --
>>> Zhanyong
>>>
>>>
>>
>>
>>
>> --
>> Zhanyong
>
>



--
Zhanyong

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev