Understanding Clang parsing

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Understanding Clang parsing

kalyan ponnala
Hi,

I am trying to understand how the lexer and parser work inside Clang/LLVM solution file. I am using a cmake generated visual studio 2008 solution file. I tried stepping into the lexer and understand the way clang's lexer works.

1. Can anyone tell me how the parser is designed in a short sentence. I would like to concentrate on how the AST works inside this parser design.

 I think this will help me understand clang in a better way, Its hard for me to understand this design with all those interconnected classes.

Thanks
--
Kalyan Ponnala
phone: 8163772059

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Understanding Clang parsing

Charles Davis-3
On 3/3/10 7:40 PM, kalyan ponnala wrote:
> Hi,
>
> I am trying to understand how the lexer and parser work inside
> Clang/LLVM solution file. I am using a cmake generated visual studio
> 2008 solution file. I tried stepping into the lexer and understand the
> way clang's lexer works.
>
> 1. Can anyone tell me how the parser is designed in a short sentence. I
> would like to concentrate on how the AST works inside this parser design.
The Parser takes tokens read by the Lexer and assigns meaning to them.
It's responsible for determining what 'int' means, for example. The
parser uses this information to build the AST.

I understand your concern. Parts of Clang have me confused, too :). But
don't worry; there's plenty of people around here (including me) who are
willing to help you understand how Clang works.

Chip
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Understanding Clang parsing

Salman Pervez
This is something I would like to learn more about as well. For  
instance, the file lib/Parse/ParseObj.c contains an entire list of  
tokens e.g. 'kw_if', 'kw_new'. I am assuming the lexer reads these  
tokens and prepares them for the parser. Could someone point me to  
where these 'kw_*'  enums are defined?

What would be really helpful is if someone could give a brief overview  
of how I would go about adding a new expression to C. So far what I've  
learned is this...

- I would have to add the relevant token so the lexer can recognize it.
- I would have to add parser code in lib/Parse/Parser.cpp?
- I would have to construct the relevant AST for this expr.

If I could just get the names of the files/directories where these  
changes would need to be made, that would be a great starting point.  
thanks,

Salman

On Mar 3, 2010, at 9:49 PM, Charles Davis wrote:

> On 3/3/10 7:40 PM, kalyan ponnala wrote:
>> Hi,
>>
>> I am trying to understand how the lexer and parser work inside
>> Clang/LLVM solution file. I am using a cmake generated visual studio
>> 2008 solution file. I tried stepping into the lexer and understand  
>> the
>> way clang's lexer works.
>>
>> 1. Can anyone tell me how the parser is designed in a short  
>> sentence. I
>> would like to concentrate on how the AST works inside this parser  
>> design.
> The Parser takes tokens read by the Lexer and assigns meaning to them.
> It's responsible for determining what 'int' means, for example. The
> parser uses this information to build the AST.
>
> I understand your concern. Parts of Clang have me confused, too :).  
> But
> don't worry; there's plenty of people around here (including me) who  
> are
> willing to help you understand how Clang works.
>
> Chip
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Understanding Clang parsing

kalyan ponnala
In reply to this post by kalyan ponnala

Thanks for the support Charles.
Could you tell me a better way to understand how parser works. Or is "steping into" the code is the only solid way to understand it.
I would like to know some important files inside the solution file which are responsible for building the AST.
Thanks.
--
Kalyan Ponnala
phone: 8163772059

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Understanding Clang parsing

Charles Davis-3
In reply to this post by Charles Davis-3
On 3/3/10 7:57 PM, kalyan ponnala wrote:
> Thanks for the support Charles.
> Could you tell me a better way to understand how parser works. Or is
> "steping into" the code is the only solid way to understand it.
Clang's Parser class works like this:
1. Someone calls ParseTopLevelDecl() to parse a top-level declaration
from the source.
2. ParseTopLevelDecl() calls other methods inside the Parser class to
parse fragments of the source according to the grammar as defined in the
C and C++ standards. Most functions get called indirectly; for example,
when the Parser encounters a function declaration, the
ParseFunctionDeclaration() method gets called.
3. These parsing methods notify some object through the "Action"
interface every time something gets parsed. The default Action object
does nothing. The Semantic Analyzer (another really important part of
the compiler), or just "Sema" for short, provides an implementation
which not only builds the AST but also annotates it with types and
checks the semantics of the program specified by the source code.
4. Steps 1-3 are repeated for every top-level declaration in the
translation unit (a source file plus all its included headers).
5. When it does hit the end of the unit, ParseTopLevelDecl() calls
Action::ActOnEndOfTranslationUnit(). This lets the object on the other
side know that the translation unit has ended, and it's time to do
whatever. The Sema implementation invokes an ASTConsumer object with
this information.
> I would like to know some important files inside the solution file which
> are responsible for building the AST.
The Sema library is your best bet. Study it; that's where I started with
Clang. Trust me: it's one of the easiest parts of the compiler to
understand. Like most of Clang, it was designed to be relatively easy to
understand, even for someone with little or no prior experience with
compiler technology. In particular, look at lib/Sema/SemaDecl.cpp and
lib/Sema/SemaExpr.cpp .

Also, take a look at include/clang/Parse/Action.h . This is the file
that defines the Action interface.
> Thanks.
No problem.

Chip

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Understanding Clang parsing

kalyan ponnala
Hi again,

Could you tell me what's a Qualtype in detail. How does it save space for representing different types ?

Thanks

On Wed, Mar 3, 2010 at 10:19 PM, Charles Davis <[hidden email]> wrote:
On 3/3/10 7:57 PM, kalyan ponnala wrote:
> Thanks for the support Charles.
> Could you tell me a better way to understand how parser works. Or is
> "steping into" the code is the only solid way to understand it.
Clang's Parser class works like this:
1. Someone calls ParseTopLevelDecl() to parse a top-level declaration
from the source.
2. ParseTopLevelDecl() calls other methods inside the Parser class to
parse fragments of the source according to the grammar as defined in the
C and C++ standards. Most functions get called indirectly; for example,
when the Parser encounters a function declaration, the
ParseFunctionDeclaration() method gets called.
3. These parsing methods notify some object through the "Action"
interface every time something gets parsed. The default Action object
does nothing. The Semantic Analyzer (another really important part of
the compiler), or just "Sema" for short, provides an implementation
which not only builds the AST but also annotates it with types and
checks the semantics of the program specified by the source code.
4. Steps 1-3 are repeated for every top-level declaration in the
translation unit (a source file plus all its included headers).
5. When it does hit the end of the unit, ParseTopLevelDecl() calls
Action::ActOnEndOfTranslationUnit(). This lets the object on the other
side know that the translation unit has ended, and it's time to do
whatever. The Sema implementation invokes an ASTConsumer object with
this information.
> I would like to know some important files inside the solution file which
> are responsible for building the AST.
The Sema library is your best bet. Study it; that's where I started with
Clang. Trust me: it's one of the easiest parts of the compiler to
understand. Like most of Clang, it was designed to be relatively easy to
understand, even for someone with little or no prior experience with
compiler technology. In particular, look at lib/Sema/SemaDecl.cpp and
lib/Sema/SemaExpr.cpp .

Also, take a look at include/clang/Parse/Action.h . This is the file
that defines the Action interface.
> Thanks.
No problem.

Chip




--
Kalyan Ponnala
phone: 8163772059

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Understanding Clang parsing

Charles Davis-3
In reply to this post by Salman Pervez
On 3/3/10 8:07 PM, Salman Pervez wrote:
> This is something I would like to learn more about as well. For  
> instance, the file lib/Parse/ParseObj.c contains an entire list of  
> tokens e.g. 'kw_if', 'kw_new'. I am assuming the lexer reads these  
> tokens and prepares them for the parser. Could someone point me to  
> where these 'kw_*'  enums are defined?
Believe it or not, they're defined as part of the Basic library. See
include/clang/Basic/TokenKinds.def.
>
> What would be really helpful is if someone could give a brief overview  
> of how I would go about adding a new expression to C. So far what I've  
> learned is this...
>
> - I would have to add the relevant token so the lexer can recognize it.
Only if you have a new keyword or some such to add.
> - I would have to add parser code in lib/Parse/Parser.cpp?
Not there, but to the relevant source file--probably
lib/Parse/ParseExpr.cpp.
> - I would have to construct the relevant AST for this expr.
Look at the AST library--particularly lib/AST/Expr.cpp and friends.
>
> If I could just get the names of the files/directories where these  
> changes would need to be made, that would be a great starting point.  
You'll also have to add a new action to the Action interface
(include/clang/Parse/Action.h), and you'll also have to modify Sema to
understand the new expression (if you intend to use Sema; see
lib/Sema/SemaExpr.cpp). If you want to generate IR from it, you may also
have to modify CodeGen (lib/CodeGen/CGExpr.cpp) to understand the new
AST node. If you want to do static analysis, you may need to modify the
Analysis library, etc.

Chip
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Understanding Clang parsing

Charles Davis-3
In reply to this post by kalyan ponnala
On 3/3/10 9:15 PM, kalyan ponnala wrote:
> Hi again,
>
> Could you tell me what's a Qualtype in detail. How does it save space
> for representing different types ?
A QualType holds a pointer to a Type object as well as qualifiers such
as 'const', 'volatile', and 'restrict'. This way, we don't have to have
separate Type objects for 'int', 'const int', 'volatile int', 'const
volatile int', etc.

Some of the qualifiers are stored in the lower bits of the pointer
itself (on the assumption that it is always 8-byte aligned). Other
qualifiers have to be stored elsewhere. The ones that are stored in the
pointer are called 'fast' qualifiers, and the others are called 'slow'
qualifiers.

You can get the underlying Type pointer by calling getTypePtr(), and you
can read all the qualifiers by getting a Qualifiers object from the
QualType with getQualifiers(). You can read only the ones that don't
come from typedefs with the getLocalQualifiers() method, and you can
query particular qualifiers with the isXxxQualified() and
isLocalXxxQualified() methods.

The definition of the QualType class is in include/clang/AST/Type.h.
Take a look.

Chip
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Understanding Clang parsing

Chris Lattner

On Mar 3, 2010, at 8:44 PM, Charles Davis wrote:

> On 3/3/10 9:15 PM, kalyan ponnala wrote:
>> Hi again,
>>
>> Could you tell me what's a Qualtype in detail. How does it save space
>> for representing different types ?
> A QualType holds a pointer to a Type object as well as qualifiers such
> as 'const', 'volatile', and 'restrict'. This way, we don't have to have
> separate Type objects for 'int', 'const int', 'volatile int', 'const
> volatile int', etc.

This is a useful resource:
http://clang.llvm.org/docs/InternalsManual.html

-Chris

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev