[RFC] [GSoC] AST differencing

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[RFC] [GSoC] AST differencing

Xin Wang via cfe-dev
Hi,

I have been working on creating a syntax tree differencing library.
So far, my focus has been on getting the core algorithm right, some
parts are missing.

I have submitted a patch [1]. I already received some great feedback and
I am always open to suggestions for improvement.

Additionally, if you have ideas to discuss about this, please let me
know, then I can keep it in mind while designing.

I think it should be fairly easy to create a tool that uses the same
output format as diff while only showing semantic changes. More advanced
tools seem to be quite possible also.

One difficulty I am facing so far is that extracting values of
individual AST nodes requires me to manually combine all relevant
attributes. The value of a node is all things that make it semantically
different from other nodes of the same kind, except for its children.

After an edit script has been computed it would be possible to apply it,
but in order such an edit script it will be necessary to implement a way
to do the opposite of extracting node information: updating the source
code based on the previously extracted information (node value). It is
probably not particularly difficult but it will require qutie some
effort to do it in a way that it can work for any node.


Johannes

[1] https://reviews.llvm.org/D34329
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] [GSoC] AST differencing

Xin Wang via cfe-dev
Hi Johannes,

I think this is a great area of research. For Java, for example, refaster (http://errorprone.info/docs/refaster) exists which lets you express simple code changes with "before / after" samples. If we can get something like this based on Clang's AST, that would be most awesome - is this a direction you're working towards?

Cheers,
/Manuel

On Thu, Jun 22, 2017 at 2:50 AM Johannes Altmanninger via cfe-dev <[hidden email]> wrote:
Hi,

I have been working on creating a syntax tree differencing library.
So far, my focus has been on getting the core algorithm right, some
parts are missing.

I have submitted a patch [1]. I already received some great feedback and
I am always open to suggestions for improvement.

Additionally, if you have ideas to discuss about this, please let me
know, then I can keep it in mind while designing.

I think it should be fairly easy to create a tool that uses the same
output format as diff while only showing semantic changes. More advanced
tools seem to be quite possible also.

One difficulty I am facing so far is that extracting values of
individual AST nodes requires me to manually combine all relevant
attributes. The value of a node is all things that make it semantically
different from other nodes of the same kind, except for its children.

After an edit script has been computed it would be possible to apply it,
but in order such an edit script it will be necessary to implement a way
to do the opposite of extracting node information: updating the source
code based on the previously extracted information (node value). It is
probably not particularly difficult but it will require qutie some
effort to do it in a way that it can work for any node.


Johannes

[1] https://reviews.llvm.org/D34329
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] [GSoC] AST differencing

Xin Wang via cfe-dev
Manuel Klimek <[hidden email]> writes:

> Hi Johannes,
>
> I think this is a great area of research. For Java, for example, refaster (
> http://errorprone.info/docs/refaster) exists which lets you express simple
> code changes with "before / after" samples. If we can get something like
> this based on Clang's AST, that would be most awesome - is this a direction
> you're working towards?

Hi Manuel,

Yes, this is definitely something that can be done, I will design for
that also. I am in the process of modularizing the API, so that it will
allow this kind of pattern matching.

So the main advantage of such template based refactoring rules is that
they are easy to write (compared to an AST-matchers based solution). I
think it is a very nice way of writing refactoring, I am working on a
prototype.

For the AST matching / diffing API I am thinking of getting rid of the
custom Node class and instead just let the user pass Decl or Expr
nodes. Also, it should be possible to pass a custom predicate for
matching nodes, this seems to be the best way to enable exact pattern
based matches - as opposed to the gumtree algorithm, which is based on a
heuristic.

Johannes

>
> Cheers,
> /Manuel
>
> On Thu, Jun 22, 2017 at 2:50 AM Johannes Altmanninger via cfe-dev <
> [hidden email]> wrote:
>
>> Hi,
>>
>> I have been working on creating a syntax tree differencing library.
>> So far, my focus has been on getting the core algorithm right, some
>> parts are missing.
>>
>> I have submitted a patch [1]. I already received some great feedback and
>> I am always open to suggestions for improvement.
>>
>> Additionally, if you have ideas to discuss about this, please let me
>> know, then I can keep it in mind while designing.
>>
>> I think it should be fairly easy to create a tool that uses the same
>> output format as diff while only showing semantic changes. More advanced
>> tools seem to be quite possible also.
>>
>> One difficulty I am facing so far is that extracting values of
>> individual AST nodes requires me to manually combine all relevant
>> attributes. The value of a node is all things that make it semantically
>> different from other nodes of the same kind, except for its children.
>>
>> After an edit script has been computed it would be possible to apply it,
>> but in order such an edit script it will be necessary to implement a way
>> to do the opposite of extracting node information: updating the source
>> code based on the previously extracted information (node value). It is
>> probably not particularly difficult but it will require qutie some
>> effort to do it in a way that it can work for any node.
>>
>>
>> Johannes
>>
>> [1] https://reviews.llvm.org/D34329
>> _______________________________________________
>> cfe-dev mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] [GSoC] AST differencing

Xin Wang via cfe-dev

On Fri, Jun 23, 2017 at 6:21 PM Johannes Altmanninger <[hidden email]> wrote:
Manuel Klimek <[hidden email]> writes:

> Hi Johannes,
>
> I think this is a great area of research. For Java, for example, refaster (
> http://errorprone.info/docs/refaster) exists which lets you express simple
> code changes with "before / after" samples. If we can get something like
> this based on Clang's AST, that would be most awesome - is this a direction
> you're working towards?

Hi Manuel,

Yes, this is definitely something that can be done, I will design for
that also. I am in the process of modularizing the API, so that it will
allow this kind of pattern matching.

So the main advantage of such template based refactoring rules is that
they are easy to write (compared to an AST-matchers based solution). I
think it is a very nice way of writing refactoring, I am working on a
prototype.

For the AST matching / diffing API I am thinking of getting rid of the
custom Node class and instead just let the user pass Decl or Expr
nodes.

I'm not sure what you mean here - the custom Node class is mainly there to make it easier to write code that works on all entities in the AST. If you want Decl and Expr, you'll probably at least also want Type. Usually you'll also want some for of locs, because often you'll change NestedNameSpecifiers which do not have location information per se.
 
Also, it should be possible to pass a custom predicate for
matching nodes, this seems to be the best way to enable exact pattern
based matches - as opposed to the gumtree algorithm, which is based on a
heuristic.

Johannes

>
> Cheers,
> /Manuel
>
> On Thu, Jun 22, 2017 at 2:50 AM Johannes Altmanninger via cfe-dev <
> [hidden email]> wrote:
>
>> Hi,
>>
>> I have been working on creating a syntax tree differencing library.
>> So far, my focus has been on getting the core algorithm right, some
>> parts are missing.
>>
>> I have submitted a patch [1]. I already received some great feedback and
>> I am always open to suggestions for improvement.
>>
>> Additionally, if you have ideas to discuss about this, please let me
>> know, then I can keep it in mind while designing.
>>
>> I think it should be fairly easy to create a tool that uses the same
>> output format as diff while only showing semantic changes. More advanced
>> tools seem to be quite possible also.
>>
>> One difficulty I am facing so far is that extracting values of
>> individual AST nodes requires me to manually combine all relevant
>> attributes. The value of a node is all things that make it semantically
>> different from other nodes of the same kind, except for its children.
>>
>> After an edit script has been computed it would be possible to apply it,
>> but in order such an edit script it will be necessary to implement a way
>> to do the opposite of extracting node information: updating the source
>> code based on the previously extracted information (node value). It is
>> probably not particularly difficult but it will require qutie some
>> effort to do it in a way that it can work for any node.
>>
>>
>> Johannes
>>
>> [1] https://reviews.llvm.org/D34329
>> _______________________________________________
>> cfe-dev mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Loading...