Introduction and Help Request for Scan, Parse, Codegen

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Introduction and Help Request for Scan, Parse, Codegen

Fangrui Song via cfe-dev

Greetings, my name is Allyn Shell I am the instructor of the Compiler Design with LLVM course at Johns Hopkins University Engineering for Professionals in the CS Department. (I have included my short bio as an introduction at the end of this email.)

 

I spent the last two years updating the Compiler Design course to include LLVM which required me to learn “everything” about the LLVM Project. There are several difficult steps that I could not overcome in two years of studying LLVM. I am writing to request help from the community to fill in these missing points which basically involve using clang to create a compiler for an imperative C like language (for teaching students how to do this). It is easy to create a scanner and a parser using public domain tools, but when I reached the codegen to LLVM’s IR I quickly ran out of good guidance. The biggest help came from the LLVM Tutorials, but they are a little shallow and a little hard to follow at times due to what appears to be generations of updates.

 

Are there LLVM based tools available to make the transition from AST to IR, and where do I find them?

 

Is the transition between AST and IR mapped in a way that is compatible with the mappings used for debug and the IR to Machine Specific Object Code? Where can I find that information?

 

Is there an LLVM specific scanner/parser pair that integrates with the AST to IR mapping? Where do I find information about them?

 

 - - - - - - - - - - - - - - - -

Bio for Allyn Shell:

I am the son of Dr. Donald L. Shell (author of the Shellsort). I am a Computer Scientist (semi-retired) and an Educator. I worked (indirectly) for NASA for 25 years building ground stations and simulators. I worked in missile defense for 10 years building simulators. Within industry I have taught seminars and short courses on a variety of topics including: Introductory Ada, Introductory C++, Advanced C++, Object Oriented Software Development, Introduction to Java, and Management of Ada Software Development. I am the author of the NASA/GSFC Standard Ada Pretty Printer (NASA/GSFC DSTL-88-003, May 1988) and is the co-author of the Ada Style Guide (NASA/GSFC SEL-87-002, May 1987). Recently, I authored the paper, “RISC Hardware and Simplified Software, Part 1: the Hardware.” and am currently writing “RISC Hardware and Simplified Software, Part 2: the Software.” I have four U.S. patents, as well as an EPO (European patent application) for a Multi-Level Marketing Computer Network Server (US Patents #6134533, 6415265, 6408281, 6691093 and EPO Patent Application # 97119108.5). I have a bachelor’s degree from Michigan Technical University in Applied Physics and a master’s degree in Computer Science from Johns Hopkins University. I am currently teaching the Foundations of Computer Architecture course and the Compiler Design with LLVM course.

 

Allyn Shell



_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Introduction and Help Request for Scan, Parse, Codegen

Fangrui Song via cfe-dev
On Tue, May 19, 2020 at 11:11 AM Allyn Shell via cfe-dev <[hidden email]> wrote:

Greetings, my name is Allyn Shell I am the instructor of the Compiler Design with LLVM course at Johns Hopkins University Engineering for Professionals in the CS Department. (I have included my short bio as an introduction at the end of this email.)


Welcome to the LLVM community!
 
(for context I mostly work on LLVM's Debug Info (DWARF) support - so I won't know everything about what you need to know, but I'll try to give some rough ideas, at least)

 I spent the last two years updating the Compiler Design course to include LLVM


Sounds great!
 

which required me to learn “everything” about the LLVM Project. There are several difficult steps that I could not overcome in two years of studying LLVM. I am writing to request help from the community to fill in these missing points which basically involve using clang to create a compiler for an imperative C like language (for teaching students how to do this).


Is Clang the right foundation for this, rather than LLVM? Clang is pretty complicated by the needs of a production-focussed C++ compiler, and might not be the best place to go tinkering to add new language support.
 

It is easy to create a scanner and a parser using public domain tools, but when I reached the codegen to LLVM’s IR I quickly ran out of good guidance. The biggest help came from the LLVM Tutorials, but they are a little shallow and a little hard to follow at times due to what appears to be generations of updates.


Yep, documentation's rarely kept up to date or revisited wholesale to give it consistent polish, etc.

If there's any particular bits you think might be improved - patches for the documentation are most appreciated so the next person might have an easier time than you did.
 

 Are there LLVM based tools available to make the transition from AST to IR, and where do I find them?


Generic tools for custom AST to IR? Nothing I know of. The closest to "tooling" that exists, certainly inside the LLVM project itself, is IRBuilder - a helper API for building LLVM IR.
 

 Is the transition between AST and IR mapped in a way that is compatible with the mappings used for debug and the IR to Machine Specific Object Code?


I'm not sure I understand the question, could you rephrase it? Myself and Eric Christopher presented a tutorial at the LLVM developers meeting several years ago about how to generate LLVM IR that includes debug information that is then used by LLVM's middle (IR optimizations) and backend (Machine Specific Object Code generation) to create DWARF (or Windows CodeView) debug information, usable by a debugger like gdb, etc.
 

Where can I find that information?


The written version of that tutorial is here: https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl09.html 
 

 

Is there an LLVM specific scanner/parser pair that integrates with the AST to IR mapping?


Nope - LLVM doesn't really offer anything above the IR. Some utilities (like IRBuilder and DIBuilder) to help create it - but what you use to create it is entirely up to you.
 

Where do I find information about them?

 

 - - - - - - - - - - - - - - - -

Bio for Allyn Shell:

I am the son of Dr. Donald L. Shell (author of the Shellsort). I am a Computer Scientist (semi-retired) and an Educator. I worked (indirectly) for NASA for 25 years building ground stations and simulators. I worked in missile defense for 10 years building simulators. Within industry I have taught seminars and short courses on a variety of topics including: Introductory Ada, Introductory C++, Advanced C++, Object Oriented Software Development, Introduction to Java, and Management of Ada Software Development. I am the author of the NASA/GSFC Standard Ada Pretty Printer (NASA/GSFC DSTL-88-003, May 1988) and is the co-author of the Ada Style Guide (NASA/GSFC SEL-87-002, May 1987). Recently, I authored the paper, “RISC Hardware and Simplified Software, Part 1: the Hardware.” and am currently writing “RISC Hardware and Simplified Software, Part 2: the Software.” I have four U.S. patents, as well as an EPO (European patent application) for a Multi-Level Marketing Computer Network Server (US Patents #6134533, 6415265, 6408281, 6691093 and EPO Patent Application # 97119108.5). I have a bachelor’s degree from Michigan Technical University in Applied Physics and a master’s degree in Computer Science from Johns Hopkins University. I am currently teaching the Foundations of Computer Architecture course and the Compiler Design with LLVM course.

 

Allyn Shell


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Scan, Parse, IR Codegen & Debug compatibility

Fangrui Song via cfe-dev
Thanks David,


From: David Blaikie <[hidden email]>
Sent: Thursday, May 21, 2020 5:54 PM
To: Allyn Shell <[hidden email]>
Cc: via cfe-dev <[hidden email]>
Subject: Re: [cfe-dev] Scan, Parse, IR Codegen, Debug
 
 

 Is the transition between AST and IR mapped in a way that is compatible with the mappings used for debug and the IR to Machine Specific Object Code?


I'm not sure I understand the question, could you rephrase it? Myself and Eric Christopher presented a tutorial at the LLVM developers meeting several years ago about how to generate LLVM IR that includes debug information that is then used by LLVM's middle (IR optimizations) and backend (Machine Specific Object Code generation) to create DWARF (or Windows CodeView) debug information, usable by a debugger like gdb, etc.

AMS: I appologize for not having a reference, but while I was reading all the LLVM documentation, I read that TableGen and the DWARD debug format had mappings that somebody recomended be made consistent with the target independent code generation so that there would only be one mapping for these things instead of different mappings at the different levels. I simply wanted to know if that effort toward uniformity was being advanced and if the AST to IR code generation was planned to be part of the uniformity. (And if not, should I consider trying to be compatible with some ongoing effort to create uniformity of mappings across phases of LLVM while I add front end features.)

Allyn Shell

 

Where can I find that information?

The written version of that tutorial is here: https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl09.html 
 
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Scan, Parse, IR Codegen & Debug compatibility

Fangrui Song via cfe-dev


On Fri, May 22, 2020 at 12:51 PM Allyn Shell <[hidden email]> wrote:
Thanks David,


From: David Blaikie <[hidden email]>
Sent: Thursday, May 21, 2020 5:54 PM
To: Allyn Shell <[hidden email]>
Cc: via cfe-dev <[hidden email]>
Subject: Re: [cfe-dev] Scan, Parse, IR Codegen, Debug
 
 

 Is the transition between AST and IR mapped in a way that is compatible with the mappings used for debug and the IR to Machine Specific Object Code?


I'm not sure I understand the question, could you rephrase it? Myself and Eric Christopher presented a tutorial at the LLVM developers meeting several years ago about how to generate LLVM IR that includes debug information that is then used by LLVM's middle (IR optimizations) and backend (Machine Specific Object Code generation) to create DWARF (or Windows CodeView) debug information, usable by a debugger like gdb, etc.

AMS: I appologize for not having a reference, but while I was reading all the LLVM documentation, I read that TableGen and the DWARD debug format had mappings that somebody recomended be made consistent with the target independent code generation so that there would only be one mapping for these things instead of different mappings at the different levels.

Hmm - I'm honestly not sure what consistencies or inconsistencies that might've been in reference to... so I'm not sure what efforts there might be around this issue. If you happen to find any references/pointers I might be able to provide more information on that.
 
I simply wanted to know if that effort toward uniformity was being advanced and if the AST to IR code generation was planned to be part of the uniformity. (And if not, should I consider trying to be compatible with some ongoing effort to create uniformity of mappings across phases of LLVM while I add front end features.)

Allyn Shell

 

Where can I find that information?

The written version of that tutorial is here: https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl09.html 
 
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev