Quantcast

Keeping invalid AST nodes

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Keeping invalid AST nodes

Erik Verbruggen-2
When using libclang for syntax highlighting, I noticed that no AST nodes get built for e.g. invalid expressions. For example, in the following C++ code:

int func(int i) {
    int j = undefinedFunction(i) + i;
    return j;
}

the whole initialisation for int j gets dropped, including the (somewhat) valid references to i. I understand that this is done to keep the AST valid after parsing, but an IDE or refactoring tools cannot help out with fixing the problem in that snippet. My question: is there some way to keep the invalid AST nodes, or is somebody already working on this?

Regards,
Erik.

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Keeping invalid AST nodes

Eli Friedman
On Mon, Nov 5, 2012 at 4:18 AM, Erik Verbruggen <[hidden email]> wrote:
> When using libclang for syntax highlighting, I noticed that no AST nodes get built for e.g. invalid expressions. For example, in the following C++ code:
>
> int func(int i) {
>     int j = undefinedFunction(i) + i;
>     return j;
> }
>
> the whole initialisation for int j gets dropped, including the (somewhat) valid references to i. I understand that this is done to keep the AST valid after parsing, but an IDE or refactoring tools cannot help out with fixing the problem in that snippet. My question: is there some way to keep the invalid AST nodes, or is somebody already working on this?

No, there's no way to keep them, and AFAIK nobody is working on this.
I'm also not sure it's worthwhile: it's a ton of work for a use case
which doesn't seem very important, given that the vast majority of
code people care about is valid.

-Eli

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Keeping invalid AST nodes

Sean Silva-2
> No, there's no way to keep them, and AFAIK nobody is working on this.
> I'm also not sure it's worthwhile: it's a ton of work for a use case
> which doesn't seem very important, given that the vast majority of
> code people care about is valid.

However, there is an important use case where the code is almost
always invalid: IDE "intellisense" while you are typing new code.

Xcode seems to handle this fairly well. I wonder how it does it?

-- Sean Silva

On Mon, Nov 5, 2012 at 1:02 PM, Eli Friedman <[hidden email]> wrote:

> On Mon, Nov 5, 2012 at 4:18 AM, Erik Verbruggen <[hidden email]> wrote:
>> When using libclang for syntax highlighting, I noticed that no AST nodes get built for e.g. invalid expressions. For example, in the following C++ code:
>>
>> int func(int i) {
>>     int j = undefinedFunction(i) + i;
>>     return j;
>> }
>>
>> the whole initialisation for int j gets dropped, including the (somewhat) valid references to i. I understand that this is done to keep the AST valid after parsing, but an IDE or refactoring tools cannot help out with fixing the problem in that snippet. My question: is there some way to keep the invalid AST nodes, or is somebody already working on this?
>
> No, there's no way to keep them, and AFAIK nobody is working on this.
> I'm also not sure it's worthwhile: it's a ton of work for a use case
> which doesn't seem very important, given that the vast majority of
> code people care about is valid.
>
> -Eli
>
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Keeping invalid AST nodes

Douglas Gregor
In reply to this post by Eli Friedman

On Nov 5, 2012, at 10:02 AM, Eli Friedman <[hidden email]> wrote:

> On Mon, Nov 5, 2012 at 4:18 AM, Erik Verbruggen <[hidden email]> wrote:
>> When using libclang for syntax highlighting, I noticed that no AST nodes get built for e.g. invalid expressions. For example, in the following C++ code:
>>
>> int func(int i) {
>>    int j = undefinedFunction(i) + i;
>>    return j;
>> }
>>
>> the whole initialisation for int j gets dropped, including the (somewhat) valid references to i. I understand that this is done to keep the AST valid after parsing, but an IDE or refactoring tools cannot help out with fixing the problem in that snippet. My question: is there some way to keep the invalid AST nodes, or is somebody already working on this?
>
> No, there's no way to keep them, and AFAIK nobody is working on this.

I agree with this…

> I'm also not sure it's worthwhile: it's a ton of work for a use case
> which doesn't seem very important, given that the vast majority of
> code people care about is valid.


… but not necessarily this. You won't get any sympathy from me if you want to, say, refactor invalid code, since you could never validate the results. But syntax highlighting and related code queries do matter, and it makes sense to try to keep invalid AST nodes around.

There are several caveats. Obviously, we don't want to slow the compiler down. Moreover, we need to be careful not to keep invalid ASTs around for template instantiations, where we encounter invalid ASTs all the time but don't want to hold on to them.

        - Doug
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Keeping invalid AST nodes

Erik Verbruggen

On Nov 5, 2012, at 19:38, Douglas Gregor <[hidden email]> wrote:

>
> On Nov 5, 2012, at 10:02 AM, Eli Friedman <[hidden email]> wrote:
>
>> On Mon, Nov 5, 2012 at 4:18 AM, Erik Verbruggen <[hidden email]> wrote:
>>> When using libclang for syntax highlighting, I noticed that no AST nodes get built for e.g. invalid expressions. For example, in the following C++ code:
>>>
>>> int func(int i) {
>>>   int j = undefinedFunction(i) + i;
>>>   return j;
>>> }
>>>
>>> the whole initialisation for int j gets dropped, including the (somewhat) valid references to i. I understand that this is done to keep the AST valid after parsing, but an IDE or refactoring tools cannot help out with fixing the problem in that snippet. My question: is there some way to keep the invalid AST nodes, or is somebody already working on this?
>>
>> No, there's no way to keep them, and AFAIK nobody is working on this.
>
> I agree with this…
>
>> I'm also not sure it's worthwhile: it's a ton of work for a use case
>> which doesn't seem very important, given that the vast majority of
>> code people care about is valid.
>
>
> … but not necessarily this. You won't get any sympathy from me if you want to, say, refactor invalid code, since you could never validate the results. But syntax highlighting and related code queries do matter, and it makes sense to try to keep invalid AST nodes around.
>
> There are several caveats. Obviously, we don't want to slow the compiler down. Moreover, we need to be careful not to keep invalid ASTs around for template instantiations, where we encounter invalid ASTs all the time but don't want to hold on to them.

Ok, I now understand that I phrased my question a bit too broad. I do agree that for a compiler, invalid AST nodes are not interesting, and should not be retained. Also, in C++ with templates, one has to be really careful about retaining invalid nodes. Also, clang does already do a great job with fix-its, including the one catching misspelled identifiers.

So I will narrow it down to the use-cases I am currently interested in, and that is for the use in an IDE. More specific, for files that are opened, possibly even unsaved. (So not for indexing, but in cases where a (lib)clang client could easily pass a flag.) And yes, quite a bit of that is incorrect code, maybe because it's not finished yet, or because of a typo, or possibly some misunderstanding or another human error.

1. Highlighting, finding declarations, definitions and/or usages: most of the "surprises" here seem to be caused by the choice to drop whole expressions when parts are invalid. For example, if an lhs in a binary expression is invalid, the rhs gets dropped too. Same for unary expression operators, and parameter lists in call expressions. With the exception of parameter lists, I don't think that the memory foot-print would increase too much if only the invalid piece of the AST would be dropped. Apart from the possibly visual aspect, an won't be able to help with finding back things like (defined but invalid) parameters, incompatible rhsses, etc.

2. Probably a specific case of the above: if, in C++, an expression consists of only a function call, and that function  is undeclared and undefined, all information about it gets dropped. If an IDE would want to offer a declare-/define-function fix-it, it cannot get back to any information about the call.

3. Invalid method/function calls: first of all, overloaded methods seem to do fine. But if a method is not declared/defined, and is accessed through a member access operator or a qualifier, then the valid parts of the member access and the valid parts of the qualifier get dropped. So the same as for item 2, offering an "implement it" is, well, tedious. Also, finding the definition of the qualifier/lhs is also impossible, so a user cannot "jump there to see what is available".

4. If a C++ method is defined without being declared, the qualifier gets dropped (!). Implementing an insert-declaration-from-definition fix-it would require some re-parsing and quite some very fuzzy logic.

5. Not fully verified, but it looks like clang drops at least some information when a method definition does not match up with any declaration, or vice-versa. Apart from the previous item/case, this can happen when the parameter list of a method gets changed. Now you could say that changing the method/function signature is a proper refactoring action, most people "just do it and fix the other afterwards". I admit that this one is very, erm, fuzy, but it happens so often that if an IDE can help out here, it is considered as a "cool thing".

A note about templates: all of the items mentioned above are without any templates involved. I know that templates do complicate things. But when talking with users, the general consensus seems to be that if templates cannot be handled for the cases mentioned above, it still covers more than three-quarters of the use-cases. And most seem to understand the complexity when templates are involved.

For anything else than item #1, I think that some flag is needed in Stmt, which indicates that, although a Stmt is invalid and utterly useless for compilation, it might still hold some useful information for (lib)clang clients although the node is erroneous and should not be used for code generation or proper refactoring.

So, again the question: feedback? :-) Where did I miss details, or are these too narrow use-cases to be useful to cover with clang as a front-end?

-- Erik.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Keeping invalid AST nodes

Vassil Vassilev-2
In reply to this post by Erik Verbruggen-2

On 11/5/12 6:18 AM, Erik Verbruggen wrote:
When using libclang for syntax highlighting, I noticed that no AST nodes get built for e.g. invalid expressions. For example, in the following C++ code:

int func(int i) {
    int j = undefinedFunction(i) + i;
    return j;
}

the whole initialisation for int j gets dropped, including the (somewhat) valid references to i. I understand that this is done to keep the AST valid after parsing, but an IDE or refactoring tools cannot help out with fixing the problem in that snippet. My question: is there some way to keep the invalid AST nodes, or is somebody already working on this?
Have you considered flagging the broken (invalid) nodes as dependent? There is a hook in clang::ExternalSemaSource:: LookupUnqualified, which gets called by clang just before complaining about an error and dropping the node. At that point I believe you can declare the name as dependent (which will lead to creating a broken AST) and continue compilation.

Does that make sense to you?

Vassil

Regards,
Erik.

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Keeping invalid AST nodes

Etienne Ollivier
Hello,
Could you describe what you mean by declaring the name as dependent?
I am trying to implement your solution with ExternalSemaSource::LookupUnqualified but I didn't succeed yet.

I would like to keep the invalid nodes in the AST since I am working on a project where I try to get metrics from files.
For example sometimes the declaration of a function is in another file that I don't have but I would still need to count the times where it is called.

Thanks for your help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Keeping invalid AST nodes

Vassil Vassilev-2
On 24/03/14 20:25, Etienne Ollivier wrote:
> Hello,
> Could you describe what you mean by declaring the name as dependent?
> I am trying to implement your solution with
> ExternalSemaSource::LookupUnqualified but I didn't succeed yet.
You will get a call to this function, just before clang issues an error.
Say upon lookup of unknown identifier. There you are given the chance to
do something with this unknown name. For example, one could define the
unknown name and return it so that the compiler can carry on parsing.
The issue is that if you declare it as a concrete type, many semantic
checks will be performed. One way to avoid this is to declare the name
as of dependent type, which by definition turns of further semantic
checks. This will produce more 'dependent' nodes which then one needs to
fix up by himself/herself.
>
> I would like to keep the invalid nodes in the AST since I am working on a
> project where I try to get metrics from files.
As long as you don't do codegen and you don't need *very* concrete AST
it should be fine.

I hope this helps.
Vassil

> For example sometimes the declaration of a function is in another file that
> I don't have but I would still need to count the times where it is called.
>
> Thanks for your help
>
>
>
> --
> View this message in context: http://clang-developers.42468.n3.nabble.com/Keeping-invalid-AST-nodes-tp4027991p4038587.html
> Sent from the Clang Developers mailing list archive at Nabble.com.
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Keeping invalid AST nodes

Etienne Ollivier
Hello,
I tried again to implement your solution but unfortunately did not succed.
My main problem is that the function LookupUnqualified is indeed called before an error is issued, but apparently also in other cases where everything is fine.
The only difference is that LookupUnqualified will be called several times before the error is issued.
Then I can not know exactly if there is an error.
I tried to see if we could get some information about it in the Scope or in LookupResult but did not find anything that could help me.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Keeping invalid AST nodes

Vassil Vassilev-2
On 28/03/14 00:15, Etienne Ollivier wrote:

> Hello,
> I tried again to implement your solution but unfortunately did not succed.
> My main problem is that the function LookupUnqualified is indeed called
> before an error is issued, but apparently also in other cases where
> everything is fine.
> The only difference is that LookupUnqualified will be called several times
> before the error is issued.
> Then I can not know exactly if there is an error.
> I tried to see if we could get some information about it in the Scope or in
> LookupResult but did not find anything that could help me.
I use the LookupNameKind to filter the false positives (+ some extra
rules based on the DeclContext, which you can get from the Scope)
Vassil

>
>
>
> --
> View this message in context: http://clang-developers.42468.n3.nabble.com/Keeping-invalid-AST-nodes-tp4027991p4038647.html
> Sent from the Clang Developers mailing list archive at Nabble.com.
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Keeping invalid AST nodes

Etienne Ollivier
Hello,
I still didn't succeed to use your solution.
Could you tell me which rules I could use from the DeclContext? I couldn't find some that would work.
I also don't know how to declare the name of the type as of dependent type from LookupResult.
I tried another solution by using the function ExternalSemaSource::CorrectTypo. It was interesting because always called before an error, but I didn't succeed modifying the AST from it.
Thank you for your help,
Etienne
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Keeping invalid AST nodes

hemant
This post has NOT been accepted by the mailing list yet.
In reply to this post by Eli Friedman
Hi Erik,

I have somewhat similar requirement as yours but I am using libtooling for parsing.
Have you found the solution to your problem?
If yes, would you please help me achieve the same thing that is keeping the invalid nodes in AST.

Thanks,
Hemant
Loading...