Lenient lexing/parsing of code snippets

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Lenient lexing/parsing of code snippets

Will Hawkins
Hello awesome clang community!

The combination of clang/llvm is so powerful that I'm sure what I'm
about to ask is simple. Unfortunately I've worked for about a month
and gotten nowhere. After RTFMing as much as possible, I'm reaching
out here.

I'm trying to use cfe to lex/parse code snippets. The snippets will be
complete functions but they will be taken mostly out of their original
context. I'd like to generate an AST for the code. I've tried several
different ways of doing this one my own. The major problem is that
when taken out of context, most of the variables are undefined. I've
tried iteratively compiling the code and modifying it at after each
compilation by adding declarations for the missing variables (which I
catch using a DiagnosticConsumer). This is cumbersome and, actually,
mostly just does not work.

Is there a way to do this? Did I miss some obvious set of
documentation somewhere? If this isn't already doable, can any of the
experts here recommend some best practices for something like this?

Any help would be greatly appreciated. Thanks for everything that you
are doing to make the CFE as great a toolset as it is!

Will
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Lenient lexing/parsing of code snippets

David Chisnall-4
On 7 Dec 2014, at 02:55, Will Hawkins <[hidden email]> wrote:

> I'm trying to use cfe to lex/parse code snippets. The snippets will be
> complete functions but they will be taken mostly out of their original
> context. I'd like to generate an AST for the code

This problem doesn't sound possible.  When you have a code snippet taken out of any context, then you have no information about the types of any variable (or the definition of most of the types).  This makes constructing an AST impossible (you don't say what language you're using, but if it's C++ then you don't even know from a snippet whether a+b is an arithmetic operation or a method invocation, even if it's C, then you don't know what the type promotion should be).

I think, to answer the question that you want to be asking, we need to know what you want to do with the AST.

David


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Lenient lexing/parsing of code snippets

Guilherme
Take a look a this interesting summer of code project. I think it goes
into the same direction as yours, flexible parsing of incomplete code.

https://www.google-melange.com/gsoc/project/details/google/gsoc2014/kapf/5643440998055936

On Sun, Dec 7, 2014 at 10:55 AM, David Chisnall
<[hidden email]> wrote:

> On 7 Dec 2014, at 02:55, Will Hawkins <[hidden email]> wrote:
>
>> I'm trying to use cfe to lex/parse code snippets. The snippets will be
>> complete functions but they will be taken mostly out of their original
>> context. I'd like to generate an AST for the code
>
> This problem doesn't sound possible.  When you have a code snippet taken out of any context, then you have no information about the types of any variable (or the definition of most of the types).  This makes constructing an AST impossible (you don't say what language you're using, but if it's C++ then you don't even know from a snippet whether a+b is an arithmetic operation or a method invocation, even if it's C, then you don't know what the type promotion should be).
>
> I think, to answer the question that you want to be asking, we need to know what you want to do with the AST.
>
> David
>
>
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Lenient lexing/parsing of code snippets

Will Hawkins
Thank you Guilherme and David for your responses!

As you rightly pointed out, David, this is not a "normal" thing to do.
And, again as you rightly deciphered, the target language is C++.

All might not be lost, though. My ultimate goal is not so detailed
that I would need to know every detail of the program. In fact, David,
your example is quite illustrative of what I do NOT need to know.

Given

y = x + b;

I don't particularly care whether or not that is a method invocation
and/or the type conversion. I am mostly concerned with doing analysis
on the names of variables and very high-level control flow
calculations.

Ideally, I would really just like to have a parse tree, although I've
yet to find a way to get that without clang knowing much more about
the types than I can possibly tell it.

I will look very closely at the GSOC output, Guilherme. Based on what
little I've seen so far, it looks incredibly promising. Thank you for
digging that up for me.

Based on the additional information that I sent here (which I should
have provided in my first message, sorry!), please let me know if I am
missing something obvious.

Again, I appreciate your willingness to let me tap into the collective
expertise on this list!

Will

On Sun, Dec 7, 2014 at 9:21 AM, Guilherme <[hidden email]> wrote:

> Take a look a this interesting summer of code project. I think it goes
> into the same direction as yours, flexible parsing of incomplete code.
>
> https://www.google-melange.com/gsoc/project/details/google/gsoc2014/kapf/5643440998055936
>
> On Sun, Dec 7, 2014 at 10:55 AM, David Chisnall
> <[hidden email]> wrote:
>> On 7 Dec 2014, at 02:55, Will Hawkins <[hidden email]> wrote:
>>
>>> I'm trying to use cfe to lex/parse code snippets. The snippets will be
>>> complete functions but they will be taken mostly out of their original
>>> context. I'd like to generate an AST for the code
>>
>> This problem doesn't sound possible.  When you have a code snippet taken out of any context, then you have no information about the types of any variable (or the definition of most of the types).  This makes constructing an AST impossible (you don't say what language you're using, but if it's C++ then you don't even know from a snippet whether a+b is an arithmetic operation or a method invocation, even if it's C, then you don't know what the type promotion should be).
>>
>> I think, to answer the question that you want to be asking, we need to know what you want to do with the AST.
>>
>> David
>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> [hidden email]
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Lenient lexing/parsing of code snippets

Nikola Smiljanic
Hi Will, I don't think you're missing anything obvious. Clang's parse tree is the result of semantic analysis, at which point a + b is fully resolved (type promotion, overload resolution, etc.). Clang knows how to skip function bodies for example but that's much more clearly defined than what you're trying to do.

On Mon, Dec 8, 2014 at 3:28 PM, Will Hawkins <[hidden email]> wrote:
Thank you Guilherme and David for your responses!

As you rightly pointed out, David, this is not a "normal" thing to do.
And, again as you rightly deciphered, the target language is C++.

All might not be lost, though. My ultimate goal is not so detailed
that I would need to know every detail of the program. In fact, David,
your example is quite illustrative of what I do NOT need to know.

Given

y = x + b;

I don't particularly care whether or not that is a method invocation
and/or the type conversion. I am mostly concerned with doing analysis
on the names of variables and very high-level control flow
calculations.

Ideally, I would really just like to have a parse tree, although I've
yet to find a way to get that without clang knowing much more about
the types than I can possibly tell it.

I will look very closely at the GSOC output, Guilherme. Based on what
little I've seen so far, it looks incredibly promising. Thank you for
digging that up for me.

Based on the additional information that I sent here (which I should
have provided in my first message, sorry!), please let me know if I am
missing something obvious.

Again, I appreciate your willingness to let me tap into the collective
expertise on this list!

Will

On Sun, Dec 7, 2014 at 9:21 AM, Guilherme <[hidden email]> wrote:
> Take a look a this interesting summer of code project. I think it goes
> into the same direction as yours, flexible parsing of incomplete code.
>
> https://www.google-melange.com/gsoc/project/details/google/gsoc2014/kapf/5643440998055936
>
> On Sun, Dec 7, 2014 at 10:55 AM, David Chisnall
> <[hidden email]> wrote:
>> On 7 Dec 2014, at 02:55, Will Hawkins <[hidden email]> wrote:
>>
>>> I'm trying to use cfe to lex/parse code snippets. The snippets will be
>>> complete functions but they will be taken mostly out of their original
>>> context. I'd like to generate an AST for the code
>>
>> This problem doesn't sound possible.  When you have a code snippet taken out of any context, then you have no information about the types of any variable (or the definition of most of the types).  This makes constructing an AST impossible (you don't say what language you're using, but if it's C++ then you don't even know from a snippet whether a+b is an arithmetic operation or a method invocation, even if it's C, then you don't know what the type promotion should be).
>>
>> I think, to answer the question that you want to be asking, we need to know what you want to do with the AST.
>>
>> David
>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> [hidden email]
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev