[RFC] Adding refactoring support to Clang

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[RFC] Adding refactoring support to Clang

Xin Wang via cfe-dev

Hi,

This is a proposal for a new Clang-based refactoring engine (as indicated by Duncan last week [1]). The new refactoring engine should be based on both the existing code from llvm.org and the code that I've just committed to github.com/apple/swift-clang [2] (Please note that it has some outdated designs and APIs that I'd like to replace). This email is split into three different sections:
- The first section lists the refactoring actions supported by Xcode 9.
- The second section describes the actual proposal.
- The third section outlines an action plan that I would like to follow when submitting patches for the new refactoring engine.

I'm looking forward to any feedback that you might have!

# Background

At first I would like to provide a little bit of background on the C++/Objective-C refactorings in Xcode. Xcode 9 contains a Clang-based refactoring engine that supports several refactoring operations. The primary operation that it supports is “Rename”. The refactoring engine gathers the renamed symbol occurrences using two distinct mechanisms:

- Local rename. It gathers occurrences for declarations that are declared in a function's body. It is based on existing code from `clang-rename` that was moved over to Clang's `Tooling` library. It was then modified slightly to fit our goals. I also added support for some Objective-C declarations.

- Global rename. It gathers occurrences for all other declarations. We decided to put the high-level logic for global rename into Xcode's indexer because Xcode had to support cross-language rename (between Swift and Objective-C). The indexer uses the core engine in Clang’s lib/Index, so Clang is still very much involved in this process. The indexer also delegates low-level “indexed-file” renames to Clang. This lets us gather occurrences for symbolic references that don't have the full location information (like Objective-C selector names), textual matches for occurrences of a matching string in comments, and textual matches for a matching filename in the `#include` directives.

Xcode 9 also support some local refactoring actions:
- Extract function/method.
- Extract duplicate expression.
- Add missing switch cases (GIF: https://reviews.llvm.org/file/data/q2qz53jd7ug4cpx4zlfx/PHID-FILE-aq75y26bgkfvm5e775jt/switchCases). It is also available as a fixit.
- [C++] Fill-in method stubs from abstract base classes.
- Convert if to switch.
- [ObjC] Fill in protocol stubs. It is also available as a fixit.
- [ObjC] Wrap in NSLocalizedString

Another action that Xcode supports is called “Generate Missing Function Definitions” (GIF: https://reviews.llvm.org/file/data/u4kzt62wjdin5dpl65jl/PHID-FILE-w7jl2vz42b5aybd6oizj/generateFunctionDefinitions). This action actually depends on the indexer to determine the set of methods that have to be implemented and where should the generated code go. It’s also the one and only cross-TU refactoring action that is currently supported (without taking "rename" into account).

# Refactoring Engine Proposal

I would like to create a new refactoring engine that will go into Clang's `Tooling` library. The engine's API should be designed with different use cases in mind, and I would like to ensure that the engine accommodates the following three use cases:

- Libclang. It should provide a C-based refactoring API that we can use in Xcode.
- Clangd.
- A standalone refactoring tool, `clang-refactor`. This tool will be used for testing. It will also allow users to perform refactoring actions without using other tools.

The engine should try to follow the following high level goals:
- It should be easy for others to create, test, and integrate new local refactoring operations.
- Refactoring operations should be allowed to fail. The engine should have good error reporting mechanisms.
- Initiation via source selection should be quick (if we assume that we have access to a parsed AST) as we'd like to let a user know which actions are available as soon as possible.
- The engine should allow at least a basic level of configurability and customization (users should be able select options for a refactoring action if an action chooses to provide some). In the future we might also want to provide more powerful configuration mechanisms that will allow IDEs to generate custom UIs for different actions.

## Initiation

I think that there are two equally important modes of initiation that should be supported by Clang's refactoring engine:

- Initiation via source selection. This mode is important for IDEs.
- Initiation via AST queries. This mode is important for standalone tools / scripts that want to refactor an entire project.

Unfortunately our code doesn't provide a good enough abstraction that generalizes well for different actions. I'm still hashing out a better initiation scheme, and I will send a separate RFC with our proposal for initiation in the near future. Let me know if you have any ideas/suggestions!

## Refactoring

I would like to propose keeping the two modes that were previously described for the "Rename" operation. This will probably mean that a standalone tool like `clang-refactor` won't support the global rename mode of rename. On the other hand, if Clangd will gain some form of indexing infrastructure (which seems likely given the recent interest in indexing [3]), it will be able to use the engine's global renaming infrastructure as well. I'm hoping the community will provide some input on what should happen to `clang-rename`, as I'm not sure whether it should stay or be deprecated in favour of new tools.

Other refactoring operations will primarily work with source modifications. I think that the refactoring engine should provide a set of shared utility components that can be reused by different operations. For example, things like return type deduction for extracted declarations should be accessible to all operations. This kind of design should also make testing easier.

Certain refactoring actions (e.g. "Fill-in switch cases") work well together with diagnostic fix-its, as they can be presented in an editor and a user can initiate a refactoring operation using a UI that's already familiar to them. I would like to ensure that this kind of mechanism is accessible to refactoring operations that need it. Our current code contains two actions that are implemented in the `Edit` library. They're usable from `Sema`, and we take advantage of that to generate the fix-its during semantic analysis.

## Cross-TU Operations & Indexer Queries

The majority of refactoring actions that we've implemented perform source transformations in a single translation unit (TU). However, there are some actions that need to work across TU boundaries. Xcode 9 includes one such cross-TU action - “Generate Missing Function Definitions”. A naïve cross-TU solution that I've looked at initially was based on an API that had a series of callbacks between Xcode's indexer and libclang. However, I quickly discovered that this model didn't work that well, as it interfered with the code that loaded and handled different TUs in the indexer.

Note: while "Rename" is a global operation, it doesn't actually interact with other TUs because source replacements can be derived from the data stored in the project's index. Thus, our implementation doesn't treat it as a cross-TU operation.

The current implementation of the “Generate Missing Function Definitions" action is based on the "refactoring continuations" API. A refactoring continuation is just a suspended refactoring operation that can be executed in a TU that's different to the one in which the operation was created. Each continuation includes TU-specific state that's automatically managed by Clang, as well as a set of questions about that state. Each individual question is called an "indexer query".

Indexer queries are the core structures that enable cross-TU actions. An external component (like Xcode's indexer) has to evaluate them before running the continuation in order for the operation to work correctly. Each query can be described using four different attributes:

- Question: The kind of question that should be answered. Our implementation exposes the following two kinds at the libclang level: `Decl_FileThatShouldImplement` and `Decl_IsDefined`.
- Subject: The TU-specific state from the parent continuation that acts as the question's subject.
- Result: The value that has to be set by the external indexer component before running the refactoring continuation.
- Action: A flag that represents an additional action that has to be performed by the indexer before running the continuation. We use the `RunContinuationInTUThatHasThisFile` action in Xcode's indexer to determine which TU should be loaded and used when running the refactoring continuation.

Please note that while I mention "Xcode's indexer" quite a lot in this section, our refactoring continuation API is indexer-agnostic, and is designed to work with any indexer. The indexer just has to implement support for the indexer queries that are required for a particular refactoring operation, and the refactoring engine handles everything else. The code sample below shows how our code constructs the refactoring continuation for the "Generate Missing Function Definitions" action:

```
return continueInExternalASTUnit(
/*Indexer query that gets annotated with the RunContinuationInTUThatHasThisFile action */ fileThatShouldContainImplementationOf(Container),
/*Pointer to the continuation function*/ runInImplementationAST,
/*A state value*/ Container,
/*An additional indexer query over some state*/ filter(llvm::makeArrayRef(SelectedMethods), [](const DeclEntity &D) { return !D.isDefined(); }));
```

In practical terms, the following series of events occur in Xcode when it performs the “Generate Missing Function Definitions” operation:

- After the user chooses the "Generate Missing Function Definitions” action, Xcode notifies its indexer that it has to perform that action. The indexer then uses libclang to get the refactoring results after running the action's operation in the initiation TU.
- Xcode's indexer sees that the results provide a refactoring continuation instead of source replacements, so it starts looking at the attached indexer queries.
- Xcode's indexer evaluates each query by looking at the question's kind and the TU-specific state (A set of `CXCursor`s). It sends back the result to libclang using a set of query-agnostic routines that consume different data types.
- After evaluating the queries, Xcode's indexer loads the TU in which the refactoring continuation should run. It knows which TU it has to load by looking at the result of the query that included the `RunContinuationInTUThatHasThisFile` action flag.
- Xcode's indexer then invokes the refactoring continuation in the freshly loaded TU. Clang converts the continuation's state from previously serialized TU-independent state to a state that's specific to the new TU, and continues running the refactoring operation with this state.
- Xcode's indexer then receives the source replacements produced by the refactoring operation and passes them back to Xcode so that it can apply the replacements to the source.

I would like to propose an API model that's based on our current "refactoring continuations" API for the cross-TU operations in the new refactoring engine. Clang will provide a C++ API that can be used in a way that's similar to the code presented above. It will also manage all of the interactions with libclang and Clangd, so individual refactoring operations won't have to worry about the fine details of a particular indexer. Furthermore, the new refactoring engine will limit the set of available actions based on the set of queries that are supported by an indexer, so the indexer that chooses to support refactoring continuations won't have to support all of the queries.

## Testing & Tooling

The `clang-refactor` tool will be used to test the refactoring engine. I would like to propose the following action-agnostic command line interface:

```
clang-refactor <action> [options] <initiation file> [additional build options]
```

I think that the tool should use comment-based annotations in the test source files to control things like initiation. For example, given a selection annotation `// selection: +1:1 -> +2:12` on line 12, `clang-refactor` will map it to a selection range of 13:1 -> 14:12 and will initiate a certain refactoring action using that selection.

In addition to `clang-refactor`, I would also like to provide an additional stress-test tool that will work with entire projects (using compilation databases), and will be capable of:

- Initiating/performing refactoring actions at each token in all of the source files in a project.
- Verifying the consistency between the indexer and the refactoring engine to ensure that they have the same model of the source.

This tool will also come with a script that generates compilation databases for Clang's tests.

I also have a great little reviewing tool that helps others to visualize the refactoring changes for a particular action, and I'd be more than happy to share it if you're interested. The tool generates a set of HTML pages that contain the sources of the refactoring tests. The pages get annotated to show the changes that were made during refactoring. We found that looking at this visualization instead of the raw `CHECK` lines makes reviewers' life much easier.

# Action Plan

Initially I would like to adopt the following high-level plan for the creation of the new refactoring engine:

- Upstream any non-refactoring specific changes from our code.
- Move the core of `clang-rename` over to clang (`clang-rename` should still work as before).
- Start working on the `clang-refactor` tool so that it can use the previously moved code. Move the tests from `clang-rename` over to Clang and convert them to `clang-refactor` tests.
- Submit our changes to the core of the `clang-rename` engine.
- Create and test the new global rename component.
- Create an AST source selection component. Initiation via source selection will be based on this component.
- Create a refactoring initiation API that will be easy to use.
- Start submitting our local refactoring actions.
- Start working on the libclang API.
- Create and submit everything else that remains, e.g. Cross-TU Operations & Indexer Queries.

Cheers,
Alex



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] Adding refactoring support to Clang

Xin Wang via cfe-dev
(+Manuel +Alex K.)

Hi Alex, thanks for the proposal! This sounds awesome! I'm looking forward to the new engine!

On Fri, Jun 16, 2017 at 1:57 AM Alex Lorenz via cfe-dev <[hidden email]> wrote:

Hi,

This is a proposal for a new Clang-based refactoring engine (as indicated by Duncan last week [1]). The new refactoring engine should be based on both the existing code from llvm.org and the code that I've just committed to github.com/apple/swift-clang [2] (Please note that it has some outdated designs and APIs that I'd like to replace). This email is split into three different sections:
- The first section lists the refactoring actions supported by Xcode 9.
- The second section describes the actual proposal.
- The third section outlines an action plan that I would like to follow when submitting patches for the new refactoring engine.

I'm looking forward to any feedback that you might have!

It seems that the code you committed to the swift-clang repo is using some home-grown refactoring infrastructure like RefactoringReplacement [e1]. Do you plan to switch to use the existing tooling infrastructure in clang (e.g. clang::tooling::Replacement) in the future? 

Just to let you know that we also have some plans to improve clang's tooling infrastructure (both clang tooling and refactoring-specific). For example, the new AtomicChange [e2] for better representation of refactoring changes and an ongoing redesign of ClangTool interface [e3] to ease multi-TU/codebase-wide refactoring. I hope our work can converge in the future :) 

# Background

At first I would like to provide a little bit of background on the C++/Objective-C refactorings in Xcode. Xcode 9 contains a Clang-based refactoring engine that supports several refactoring operations. The primary operation that it supports is “Rename”. The refactoring engine gathers the renamed symbol occurrences using two distinct mechanisms:

- Local rename. It gathers occurrences for declarations that are declared in a function's body. It is based on existing code from `clang-rename` that was moved over to Clang's `Tooling` library. It was then modified slightly to fit our goals. I also added support for some Objective-C declarations.

- Global rename. It gathers occurrences for all other declarations. We decided to put the high-level logic for global rename into Xcode's indexer because Xcode had to support cross-language rename (between Swift and Objective-C). The indexer uses the core engine in Clang’s lib/Index, so Clang is still very much involved in this process. The indexer also delegates low-level “indexed-file” renames to Clang. This lets us gather occurrences for symbolic references that don't have the full location information (like Objective-C selector names), textual matches for occurrences of a matching string in comments, and textual matches for a matching filename in the `#include` directives.

Xcode 9 also support some local refactoring actions:
- Extract function/method.
- Extract duplicate expression.
- Add missing switch cases (GIF: https://reviews.llvm.org/file/data/q2qz53jd7ug4cpx4zlfx/PHID-FILE-aq75y26bgkfvm5e775jt/switchCases). It is also available as a fixit.
- [C++] Fill-in method stubs from abstract base classes.
- Convert if to switch.
- [ObjC] Fill in protocol stubs. It is also available as a fixit.
- [ObjC] Wrap in NSLocalizedString

Another action that Xcode supports is called “Generate Missing Function Definitions” (GIF: https://reviews.llvm.org/file/data/u4kzt62wjdin5dpl65jl/PHID-FILE-w7jl2vz42b5aybd6oizj/generateFunctionDefinitions). This action actually depends on the indexer to determine the set of methods that have to be implemented and where should the generated code go. It’s also the one and only cross-TU refactoring action that is currently supported (without taking "rename" into account).

# Refactoring Engine Proposal

I would like to create a new refactoring engine that will go into Clang's `Tooling` library. The engine's API should be designed with different use cases in mind, and I would like to ensure that the engine accommodates the following three use cases:

- Libclang. It should provide a C-based refactoring API that we can use in Xcode.
- Clangd.
- A standalone refactoring tool, `clang-refactor`. This tool will be used for testing. It will also allow users to perform refactoring actions without using other tools.

The engine should try to follow the following high level goals:
- It should be easy for others to create, test, and integrate new local refactoring operations.
- Refactoring operations should be allowed to fail. The engine should have good error reporting mechanisms.
- Initiation via source selection should be quick (if we assume that we have access to a parsed AST) as we'd like to let a user know which actions are available as soon as possible.
- The engine should allow at least a basic level of configurability and customization (users should be able select options for a refactoring action if an action chooses to provide some). In the future we might also want to provide more powerful configuration mechanisms that will allow IDEs to generate custom UIs for different actions.

## Initiation

I think that there are two equally important modes of initiation that should be supported by Clang's refactoring engine:

- Initiation via source selection. This mode is important for IDEs.
- Initiation via AST queries. This mode is important for standalone tools / scripts that want to refactor an entire project.

Unfortunately our code doesn't provide a good enough abstraction that generalizes well for different actions. I'm still hashing out a better initiation scheme, and I will send a separate RFC with our proposal for initiation in the near future. Let me know if you have any ideas/suggestions!

## Refactoring

I would like to propose keeping the two modes that were previously described for the "Rename" operation. This will probably mean that a standalone tool like `clang-refactor` won't support the global rename mode of rename. On the other hand, if Clangd will gain some form of indexing infrastructure (which seems likely given the recent interest in indexing [3]), it will be able to use the engine's global renaming infrastructure as well. I'm hoping the community will provide some input on what should happen to `clang-rename`, as I'm not sure whether it should stay or be deprecated in favour of new tools.
I think clang-rename should eventually be a module of clang-refactor.
 

Other refactoring operations will primarily work with source modifications. I think that the refactoring engine should provide a set of shared utility components that can be reused by different operations. For example, things like return type deduction for extracted declarations should be accessible to all operations. This kind of design should also make testing easier.

Certain refactoring actions (e.g. "Fill-in switch cases") work well together with diagnostic fix-its, as they can be presented in an editor and a user can initiate a refactoring operation using a UI that's already familiar to them. I would like to ensure that this kind of mechanism is accessible to refactoring operations that need it. Our current code contains two actions that are implemented in the `Edit` library. They're usable from `Sema`, and we take advantage of that to generate the fix-its during semantic analysis.

## Cross-TU Operations & Indexer Queries

The majority of refactoring actions that we've implemented perform source transformations in a single translation unit (TU). However, there are some actions that need to work across TU boundaries. Xcode 9 includes one such cross-TU action - “Generate Missing Function Definitions”. A naïve cross-TU solution that I've looked at initially was based on an API that had a series of callbacks between Xcode's indexer and libclang. However, I quickly discovered that this model didn't work that well, as it interfered with the code that loaded and handled different TUs in the indexer.

Note: while "Rename" is a global operation, it doesn't actually interact with other TUs because source replacements can be derived from the data stored in the project's index. Thus, our implementation doesn't treat it as a cross-TU operation.

The current implementation of the “Generate Missing Function Definitions" action is based on the "refactoring continuations" API. A refactoring continuation is just a suspended refactoring operation that can be executed in a TU that's different to the one in which the operation was created. Each continuation includes TU-specific state that's automatically managed by Clang, as well as a set of questions about that state. Each individual question is called an "indexer query".

Indexer queries are the core structures that enable cross-TU actions. An external component (like Xcode's indexer) has to evaluate them before running the continuation in order for the operation to work correctly. Each query can be described using four different attributes:

- Question: The kind of question that should be answered. Our implementation exposes the following two kinds at the libclang level: `Decl_FileThatShouldImplement` and `Decl_IsDefined`.
- Subject: The TU-specific state from the parent continuation that acts as the question's subject.
- Result: The value that has to be set by the external indexer component before running the refactoring continuation.
- Action: A flag that represents an additional action that has to be performed by the indexer before running the continuation. We use the `RunContinuationInTUThatHasThisFile` action in Xcode's indexer to determine which TU should be loaded and used when running the refactoring continuation.

Please note that while I mention "Xcode's indexer" quite a lot in this section, our refactoring continuation API is indexer-agnostic, and is designed to work with any indexer. The indexer just has to implement support for the indexer queries that are required for a particular refactoring operation, and the refactoring engine handles everything else. The code sample below shows how our code constructs the refactoring continuation for the "Generate Missing Function Definitions" action:

```
return continueInExternalASTUnit(
/*Indexer query that gets annotated with the RunContinuationInTUThatHasThisFile action */ fileThatShouldContainImplementationOf(Container),
/*Pointer to the continuation function*/ runInImplementationAST,
/*A state value*/ Container,
/*An additional indexer query over some state*/ filter(llvm::makeArrayRef(SelectedMethods), [](const DeclEntity &D) { return !D.isDefined(); }));
```

In practical terms, the following series of events occur in Xcode when it performs the “Generate Missing Function Definitions” operation:

- After the user chooses the "Generate Missing Function Definitions” action, Xcode notifies its indexer that it has to perform that action. The indexer then uses libclang to get the refactoring results after running the action's operation in the initiation TU.
- Xcode's indexer sees that the results provide a refactoring continuation instead of source replacements, so it starts looking at the attached indexer queries.
- Xcode's indexer evaluates each query by looking at the question's kind and the TU-specific state (A set of `CXCursor`s). It sends back the result to libclang using a set of query-agnostic routines that consume different data types.
- After evaluating the queries, Xcode's indexer loads the TU in which the refactoring continuation should run. It knows which TU it has to load by looking at the result of the query that included the `RunContinuationInTUThatHasThisFile` action flag.
- Xcode's indexer then invokes the refactoring continuation in the freshly loaded TU. Clang converts the continuation's state from previously serialized TU-independent state to a state that's specific to the new TU, and continues running the refactoring operation with this state.
- Xcode's indexer then receives the source replacements produced by the refactoring operation and passes them back to Xcode so that it can apply the replacements to the source.

I would like to propose an API model that's based on our current "refactoring continuations" API for the cross-TU operations in the new refactoring engine. Clang will provide a C++ API that can be used in a way that's similar to the code presented above. It will also manage all of the interactions with libclang and Clangd, so individual refactoring operations won't have to worry about the fine details of a particular indexer. Furthermore, the new refactoring engine will limit the set of available actions based on the set of queries that are supported by an indexer, so the indexer that chooses to support refactoring continuations won't have to support all of the queries.

## Testing & Tooling

The `clang-refactor` tool will be used to test the refactoring engine. I would like to propose the following action-agnostic command line interface:

```
clang-refactor <action> [options] <initiation file> [additional build options]
```

I think that the tool should use comment-based annotations in the test source files to control things like initiation. For example, given a selection annotation `// selection: +1:1 -> +2:12` on line 12, `clang-refactor` will map it to a selection range of 13:1 -> 14:12 and will initiate a certain refactoring action using that selection.

In addition to `clang-refactor`, I would also like to provide an additional stress-test tool that will work with entire projects (using compilation databases), and will be capable of:

- Initiating/performing refactoring actions at each token in all of the source files in a project.
- Verifying the consistency between the indexer and the refactoring engine to ensure that they have the same model of the source.


This tool will also come with a script that generates compilation databases for Clang's tests.

I also have a great little reviewing tool that helps others to visualize the refactoring changes for a particular action, and I'd be more than happy to share it if you're interested. The tool generates a set of HTML pages that contain the sources of the refactoring tests. The pages get annotated to show the changes that were made during refactoring. We found that looking at this visualization instead of the raw `CHECK` lines makes reviewers' life much easier.

# Action Plan

Initially I would like to adopt the following high-level plan for the creation of the new refactoring engine:

- Upstream any non-refactoring specific changes from our code.
- Move the core of `clang-rename` over to clang (`clang-rename` should still work as before).
- Start working on the `clang-refactor` tool so that it can use the previously moved code. Move the tests from `clang-rename` over to Clang and convert them to `clang-refactor` tests.
- Submit our changes to the core of the `clang-rename` engine.
- Create and test the new global rename component.
- Create an AST source selection component. Initiation via source selection will be based on this component.
- Create a refactoring initiation API that will be easy to use.
- Start submitting our local refactoring actions.
- Start working on the libclang API.
- Create and submit everything else that remains, e.g. Cross-TU Operations & Indexer Queries.

Cheers,
Alex


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] Adding refactoring support to Clang

Xin Wang via cfe-dev
In reply to this post by Xin Wang via cfe-dev
Hi Alex,

That's a huge amount of work. Here are some questions from clangd side:

- Does the code for the refactoring engine use ASTUnit? Can it be used without that?
  The reason I ask is because we currently have a plan to replace ASTUnit with manual preamble/ASTContext management in clangd.

- Does refactoring engine modify files directly of just provide a list of "diffs" (in some form) to be applied by the editors?
  If they modify the files directly, is there a way to 

- Does refactoring engine have a separate component for testing correctness of the refactoring? Does it have an API to provide conflicts?

- What data does refactoring engine requires from Indexing API specifically?

- Why is 'Generate missing definitions' considered a cross-TU action? You actually touch only one file (i.e. the .cpp file where you put the definitions) and the AST for it should be enough for generating definitions.



On Fri, Jun 16, 2017 at 1:57 AM, Alex Lorenz via cfe-dev <[hidden email]> wrote:

Hi,

This is a proposal for a new Clang-based refactoring engine (as indicated by Duncan last week [1]). The new refactoring engine should be based on both the existing code from llvm.org and the code that I've just committed to github.com/apple/swift-clang [2] (Please note that it has some outdated designs and APIs that I'd like to replace). This email is split into three different sections:
- The first section lists the refactoring actions supported by Xcode 9.
- The second section describes the actual proposal.
- The third section outlines an action plan that I would like to follow when submitting patches for the new refactoring engine.

I'm looking forward to any feedback that you might have!

# Background

At first I would like to provide a little bit of background on the C++/Objective-C refactorings in Xcode. Xcode 9 contains a Clang-based refactoring engine that supports several refactoring operations. The primary operation that it supports is “Rename”. The refactoring engine gathers the renamed symbol occurrences using two distinct mechanisms:

- Local rename. It gathers occurrences for declarations that are declared in a function's body. It is based on existing code from `clang-rename` that was moved over to Clang's `Tooling` library. It was then modified slightly to fit our goals. I also added support for some Objective-C declarations.

- Global rename. It gathers occurrences for all other declarations. We decided to put the high-level logic for global rename into Xcode's indexer because Xcode had to support cross-language rename (between Swift and Objective-C). The indexer uses the core engine in Clang’s lib/Index, so Clang is still very much involved in this process. The indexer also delegates low-level “indexed-file” renames to Clang. This lets us gather occurrences for symbolic references that don't have the full location information (like Objective-C selector names), textual matches for occurrences of a matching string in comments, and textual matches for a matching filename in the `#include` directives.

Xcode 9 also support some local refactoring actions:
- Extract function/method.
- Extract duplicate expression.
- Add missing switch cases (GIF: https://reviews.llvm.org/file/data/q2qz53jd7ug4cpx4zlfx/PHID-FILE-aq75y26bgkfvm5e775jt/switchCases). It is also available as a fixit.
- [C++] Fill-in method stubs from abstract base classes.
- Convert if to switch.
- [ObjC] Fill in protocol stubs. It is also available as a fixit.
- [ObjC] Wrap in NSLocalizedString

Another action that Xcode supports is called “Generate Missing Function Definitions” (GIF: https://reviews.llvm.org/file/data/u4kzt62wjdin5dpl65jl/PHID-FILE-w7jl2vz42b5aybd6oizj/generateFunctionDefinitions). This action actually depends on the indexer to determine the set of methods that have to be implemented and where should the generated code go. It’s also the one and only cross-TU refactoring action that is currently supported (without taking "rename" into account).

# Refactoring Engine Proposal

I would like to create a new refactoring engine that will go into Clang's `Tooling` library. The engine's API should be designed with different use cases in mind, and I would like to ensure that the engine accommodates the following three use cases:

- Libclang. It should provide a C-based refactoring API that we can use in Xcode.
- Clangd.
- A standalone refactoring tool, `clang-refactor`. This tool will be used for testing. It will also allow users to perform refactoring actions without using other tools.

The engine should try to follow the following high level goals:
- It should be easy for others to create, test, and integrate new local refactoring operations.
- Refactoring operations should be allowed to fail. The engine should have good error reporting mechanisms.
- Initiation via source selection should be quick (if we assume that we have access to a parsed AST) as we'd like to let a user know which actions are available as soon as possible.
- The engine should allow at least a basic level of configurability and customization (users should be able select options for a refactoring action if an action chooses to provide some). In the future we might also want to provide more powerful configuration mechanisms that will allow IDEs to generate custom UIs for different actions.

## Initiation

I think that there are two equally important modes of initiation that should be supported by Clang's refactoring engine:

- Initiation via source selection. This mode is important for IDEs.
- Initiation via AST queries. This mode is important for standalone tools / scripts that want to refactor an entire project.

Unfortunately our code doesn't provide a good enough abstraction that generalizes well for different actions. I'm still hashing out a better initiation scheme, and I will send a separate RFC with our proposal for initiation in the near future. Let me know if you have any ideas/suggestions!

## Refactoring

I would like to propose keeping the two modes that were previously described for the "Rename" operation. This will probably mean that a standalone tool like `clang-refactor` won't support the global rename mode of rename. On the other hand, if Clangd will gain some form of indexing infrastructure (which seems likely given the recent interest in indexing [3]), it will be able to use the engine's global renaming infrastructure as well. I'm hoping the community will provide some input on what should happen to `clang-rename`, as I'm not sure whether it should stay or be deprecated in favour of new tools.

Other refactoring operations will primarily work with source modifications. I think that the refactoring engine should provide a set of shared utility components that can be reused by different operations. For example, things like return type deduction for extracted declarations should be accessible to all operations. This kind of design should also make testing easier.

Certain refactoring actions (e.g. "Fill-in switch cases") work well together with diagnostic fix-its, as they can be presented in an editor and a user can initiate a refactoring operation using a UI that's already familiar to them. I would like to ensure that this kind of mechanism is accessible to refactoring operations that need it. Our current code contains two actions that are implemented in the `Edit` library. They're usable from `Sema`, and we take advantage of that to generate the fix-its during semantic analysis.

## Cross-TU Operations & Indexer Queries

The majority of refactoring actions that we've implemented perform source transformations in a single translation unit (TU). However, there are some actions that need to work across TU boundaries. Xcode 9 includes one such cross-TU action - “Generate Missing Function Definitions”. A naïve cross-TU solution that I've looked at initially was based on an API that had a series of callbacks between Xcode's indexer and libclang. However, I quickly discovered that this model didn't work that well, as it interfered with the code that loaded and handled different TUs in the indexer.

Note: while "Rename" is a global operation, it doesn't actually interact with other TUs because source replacements can be derived from the data stored in the project's index. Thus, our implementation doesn't treat it as a cross-TU operation.

The current implementation of the “Generate Missing Function Definitions" action is based on the "refactoring continuations" API. A refactoring continuation is just a suspended refactoring operation that can be executed in a TU that's different to the one in which the operation was created. Each continuation includes TU-specific state that's automatically managed by Clang, as well as a set of questions about that state. Each individual question is called an "indexer query".

Indexer queries are the core structures that enable cross-TU actions. An external component (like Xcode's indexer) has to evaluate them before running the continuation in order for the operation to work correctly. Each query can be described using four different attributes:

- Question: The kind of question that should be answered. Our implementation exposes the following two kinds at the libclang level: `Decl_FileThatShouldImplement` and `Decl_IsDefined`.
- Subject: The TU-specific state from the parent continuation that acts as the question's subject.
- Result: The value that has to be set by the external indexer component before running the refactoring continuation.
- Action: A flag that represents an additional action that has to be performed by the indexer before running the continuation. We use the `RunContinuationInTUThatHasThisFile` action in Xcode's indexer to determine which TU should be loaded and used when running the refactoring continuation.

Please note that while I mention "Xcode's indexer" quite a lot in this section, our refactoring continuation API is indexer-agnostic, and is designed to work with any indexer. The indexer just has to implement support for the indexer queries that are required for a particular refactoring operation, and the refactoring engine handles everything else. The code sample below shows how our code constructs the refactoring continuation for the "Generate Missing Function Definitions" action:

```
return continueInExternalASTUnit(
/*Indexer query that gets annotated with the RunContinuationInTUThatHasThisFile action */ fileThatShouldContainImplementationOf(Container),
/*Pointer to the continuation function*/ runInImplementationAST,
/*A state value*/ Container,
/*An additional indexer query over some state*/ filter(llvm::makeArrayRef(SelectedMethods), [](const DeclEntity &D) { return !D.isDefined(); }));
```

In practical terms, the following series of events occur in Xcode when it performs the “Generate Missing Function Definitions” operation:

- After the user chooses the "Generate Missing Function Definitions” action, Xcode notifies its indexer that it has to perform that action. The indexer then uses libclang to get the refactoring results after running the action's operation in the initiation TU.
- Xcode's indexer sees that the results provide a refactoring continuation instead of source replacements, so it starts looking at the attached indexer queries.
- Xcode's indexer evaluates each query by looking at the question's kind and the TU-specific state (A set of `CXCursor`s). It sends back the result to libclang using a set of query-agnostic routines that consume different data types.
- After evaluating the queries, Xcode's indexer loads the TU in which the refactoring continuation should run. It knows which TU it has to load by looking at the result of the query that included the `RunContinuationInTUThatHasThisFile` action flag.
- Xcode's indexer then invokes the refactoring continuation in the freshly loaded TU. Clang converts the continuation's state from previously serialized TU-independent state to a state that's specific to the new TU, and continues running the refactoring operation with this state.
- Xcode's indexer then receives the source replacements produced by the refactoring operation and passes them back to Xcode so that it can apply the replacements to the source.

I would like to propose an API model that's based on our current "refactoring continuations" API for the cross-TU operations in the new refactoring engine. Clang will provide a C++ API that can be used in a way that's similar to the code presented above. It will also manage all of the interactions with libclang and Clangd, so individual refactoring operations won't have to worry about the fine details of a particular indexer. Furthermore, the new refactoring engine will limit the set of available actions based on the set of queries that are supported by an indexer, so the indexer that chooses to support refactoring continuations won't have to support all of the queries.

## Testing & Tooling

The `clang-refactor` tool will be used to test the refactoring engine. I would like to propose the following action-agnostic command line interface:

```
clang-refactor <action> [options] <initiation file> [additional build options]
```

I think that the tool should use comment-based annotations in the test source files to control things like initiation. For example, given a selection annotation `// selection: +1:1 -> +2:12` on line 12, `clang-refactor` will map it to a selection range of 13:1 -> 14:12 and will initiate a certain refactoring action using that selection.

In addition to `clang-refactor`, I would also like to provide an additional stress-test tool that will work with entire projects (using compilation databases), and will be capable of:

- Initiating/performing refactoring actions at each token in all of the source files in a project.
- Verifying the consistency between the indexer and the refactoring engine to ensure that they have the same model of the source.

This tool will also come with a script that generates compilation databases for Clang's tests.

I also have a great little reviewing tool that helps others to visualize the refactoring changes for a particular action, and I'd be more than happy to share it if you're interested. The tool generates a set of HTML pages that contain the sources of the refactoring tests. The pages get annotated to show the changes that were made during refactoring. We found that looking at this visualization instead of the raw `CHECK` lines makes reviewers' life much easier.

# Action Plan

Initially I would like to adopt the following high-level plan for the creation of the new refactoring engine:

- Upstream any non-refactoring specific changes from our code.
- Move the core of `clang-rename` over to clang (`clang-rename` should still work as before).
- Start working on the `clang-refactor` tool so that it can use the previously moved code. Move the tests from `clang-rename` over to Clang and convert them to `clang-refactor` tests.
- Submit our changes to the core of the `clang-rename` engine.
- Create and test the new global rename component.
- Create an AST source selection component. Initiation via source selection will be based on this component.
- Create a refactoring initiation API that will be easy to use.
- Start submitting our local refactoring actions.
- Start working on the libclang API.
- Create and submit everything else that remains, e.g. Cross-TU Operations & Indexer Queries.

Cheers,
Alex



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev




--
Regards,
Ilya Biryukov

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] Adding refactoring support to Clang

Xin Wang via cfe-dev
In reply to this post by Xin Wang via cfe-dev


On Jun 16, 2017, at 4:38 AM, Eric Liu <[hidden email]> wrote:

(+Manuel +Alex K.)

Hi Alex, thanks for the proposal! This sounds awesome! I'm looking forward to the new engine!

On Fri, Jun 16, 2017 at 1:57 AM Alex Lorenz via cfe-dev <[hidden email]> wrote:

Hi,

This is a proposal for a new Clang-based refactoring engine (as indicated by Duncan last week [1]). The new refactoring engine should be based on both the existing code from llvm.org and the code that I've just committed to github.com/apple/swift-clang [2] (Please note that it has some outdated designs and APIs that I'd like to replace). This email is split into three different sections:
- The first section lists the refactoring actions supported by Xcode 9.
- The second section describes the actual proposal.
- The third section outlines an action plan that I would like to follow when submitting patches for the new refactoring engine.

I'm looking forward to any feedback that you might have!

It seems that the code you committed to the swift-clang repo is using some home-grown refactoring infrastructure like RefactoringReplacement [e1]. Do you plan to switch to use the existing tooling infrastructure in clang (e.g. clang::tooling::Replacement) in the future? 

Yes, I would like to switch to something like `tooling::Replacement`, as we shouldn’t have two different structures that do the same thing in the code at llvm.org. I didn’t use it in Xcode 9 because it stores the source range using the decomposed form (filename + offsets), so I found it easier to just use another structure that stored `SourceRange`s directly. I also had to attach additional information to some replacements (e.g. the range of the newly introduced name in the replacements for “extract”).

Btw, our implementation of rename actually doesn’t use replacements, but just source ranges, as Xcode doesn't actually know the new name when its gathering the source ranges for the renamed symbols. I think that we shouldn’t use `tooling::Replacement` directly for the implementation of rename. Instead we should gather occurrences (source range + kind) first. Then we can have an additional layer that maps those occurrences to  the set of `tooling::Replacement`s.


Just to let you know that we also have some plans to improve clang's tooling infrastructure (both clang tooling and refactoring-specific). For example, the new AtomicChange [e2] for better representation of refactoring changes and an ongoing redesign of ClangTool interface [e3] to ease multi-TU/codebase-wide refactoring. I hope our work can converge in the future :) 

That’s awesome! I think that it should be possible for us to converge. I’ll try to think about this when working on the patches.


# Background

At first I would like to provide a little bit of background on the C++/Objective-C refactorings in Xcode. Xcode 9 contains a Clang-based refactoring engine that supports several refactoring operations. The primary operation that it supports is “Rename”. The refactoring engine gathers the renamed symbol occurrences using two distinct mechanisms:

- Local rename. It gathers occurrences for declarations that are declared in a function's body. It is based on existing code from `clang-rename` that was moved over to Clang's `Tooling` library. It was then modified slightly to fit our goals. I also added support for some Objective-C declarations.

- Global rename. It gathers occurrences for all other declarations. We decided to put the high-level logic for global rename into Xcode's indexer because Xcode had to support cross-language rename (between Swift and Objective-C). The indexer uses the core engine in Clang’s lib/Index, so Clang is still very much involved in this process. The indexer also delegates low-level “indexed-file” renames to Clang. This lets us gather occurrences for symbolic references that don't have the full location information (like Objective-C selector names), textual matches for occurrences of a matching string in comments, and textual matches for a matching filename in the `#include` directives.

Xcode 9 also support some local refactoring actions:
- Extract function/method.
- Extract duplicate expression.
- Add missing switch cases (GIF: https://reviews.llvm.org/file/data/q2qz53jd7ug4cpx4zlfx/PHID-FILE-aq75y26bgkfvm5e775jt/switchCases). It is also available as a fixit.
- [C++] Fill-in method stubs from abstract base classes.
- Convert if to switch.
- [ObjC] Fill in protocol stubs. It is also available as a fixit.
- [ObjC] Wrap in NSLocalizedString

Another action that Xcode supports is called “Generate Missing Function Definitions” (GIF: https://reviews.llvm.org/file/data/u4kzt62wjdin5dpl65jl/PHID-FILE-w7jl2vz42b5aybd6oizj/generateFunctionDefinitions). This action actually depends on the indexer to determine the set of methods that have to be implemented and where should the generated code go. It’s also the one and only cross-TU refactoring action that is currently supported (without taking "rename" into account).

# Refactoring Engine Proposal

I would like to create a new refactoring engine that will go into Clang's `Tooling` library. The engine's API should be designed with different use cases in mind, and I would like to ensure that the engine accommodates the following three use cases:

- Libclang. It should provide a C-based refactoring API that we can use in Xcode.
- Clangd.
- A standalone refactoring tool, `clang-refactor`. This tool will be used for testing. It will also allow users to perform refactoring actions without using other tools.

The engine should try to follow the following high level goals:
- It should be easy for others to create, test, and integrate new local refactoring operations.
- Refactoring operations should be allowed to fail. The engine should have good error reporting mechanisms.
- Initiation via source selection should be quick (if we assume that we have access to a parsed AST) as we'd like to let a user know which actions are available as soon as possible.
- The engine should allow at least a basic level of configurability and customization (users should be able select options for a refactoring action if an action chooses to provide some). In the future we might also want to provide more powerful configuration mechanisms that will allow IDEs to generate custom UIs for different actions.

## Initiation

I think that there are two equally important modes of initiation that should be supported by Clang's refactoring engine:

- Initiation via source selection. This mode is important for IDEs.
- Initiation via AST queries. This mode is important for standalone tools / scripts that want to refactor an entire project.

Unfortunately our code doesn't provide a good enough abstraction that generalizes well for different actions. I'm still hashing out a better initiation scheme, and I will send a separate RFC with our proposal for initiation in the near future. Let me know if you have any ideas/suggestions!

## Refactoring

I would like to propose keeping the two modes that were previously described for the "Rename" operation. This will probably mean that a standalone tool like `clang-refactor` won't support the global rename mode of rename. On the other hand, if Clangd will gain some form of indexing infrastructure (which seems likely given the recent interest in indexing [3]), it will be able to use the engine's global renaming infrastructure as well. I'm hoping the community will provide some input on what should happen to `clang-rename`, as I'm not sure whether it should stay or be deprecated in favour of new tools.
I think clang-rename should eventually be a module of clang-refactor.
 

Other refactoring operations will primarily work with source modifications. I think that the refactoring engine should provide a set of shared utility components that can be reused by different operations. For example, things like return type deduction for extracted declarations should be accessible to all operations. This kind of design should also make testing easier.

Certain refactoring actions (e.g. "Fill-in switch cases") work well together with diagnostic fix-its, as they can be presented in an editor and a user can initiate a refactoring operation using a UI that's already familiar to them. I would like to ensure that this kind of mechanism is accessible to refactoring operations that need it. Our current code contains two actions that are implemented in the `Edit` library. They're usable from `Sema`, and we take advantage of that to generate the fix-its during semantic analysis.

## Cross-TU Operations & Indexer Queries

The majority of refactoring actions that we've implemented perform source transformations in a single translation unit (TU). However, there are some actions that need to work across TU boundaries. Xcode 9 includes one such cross-TU action - “Generate Missing Function Definitions”. A naïve cross-TU solution that I've looked at initially was based on an API that had a series of callbacks between Xcode's indexer and libclang. However, I quickly discovered that this model didn't work that well, as it interfered with the code that loaded and handled different TUs in the indexer.

Note: while "Rename" is a global operation, it doesn't actually interact with other TUs because source replacements can be derived from the data stored in the project's index. Thus, our implementation doesn't treat it as a cross-TU operation.

The current implementation of the “Generate Missing Function Definitions" action is based on the "refactoring continuations" API. A refactoring continuation is just a suspended refactoring operation that can be executed in a TU that's different to the one in which the operation was created. Each continuation includes TU-specific state that's automatically managed by Clang, as well as a set of questions about that state. Each individual question is called an "indexer query".

Indexer queries are the core structures that enable cross-TU actions. An external component (like Xcode's indexer) has to evaluate them before running the continuation in order for the operation to work correctly. Each query can be described using four different attributes:

- Question: The kind of question that should be answered. Our implementation exposes the following two kinds at the libclang level: `Decl_FileThatShouldImplement` and `Decl_IsDefined`.
- Subject: The TU-specific state from the parent continuation that acts as the question's subject.
- Result: The value that has to be set by the external indexer component before running the refactoring continuation.
- Action: A flag that represents an additional action that has to be performed by the indexer before running the continuation. We use the `RunContinuationInTUThatHasThisFile` action in Xcode's indexer to determine which TU should be loaded and used when running the refactoring continuation.

Please note that while I mention "Xcode's indexer" quite a lot in this section, our refactoring continuation API is indexer-agnostic, and is designed to work with any indexer. The indexer just has to implement support for the indexer queries that are required for a particular refactoring operation, and the refactoring engine handles everything else. The code sample below shows how our code constructs the refactoring continuation for the "Generate Missing Function Definitions" action:

```
return continueInExternalASTUnit(
/*Indexer query that gets annotated with the RunContinuationInTUThatHasThisFile action */ fileThatShouldContainImplementationOf(Container),
/*Pointer to the continuation function*/ runInImplementationAST,
/*A state value*/ Container,
/*An additional indexer query over some state*/ filter(llvm::makeArrayRef(SelectedMethods), [](const DeclEntity &D) { return !D.isDefined(); }));
```

In practical terms, the following series of events occur in Xcode when it performs the “Generate Missing Function Definitions” operation:

- After the user chooses the "Generate Missing Function Definitions” action, Xcode notifies its indexer that it has to perform that action. The indexer then uses libclang to get the refactoring results after running the action's operation in the initiation TU.
- Xcode's indexer sees that the results provide a refactoring continuation instead of source replacements, so it starts looking at the attached indexer queries.
- Xcode's indexer evaluates each query by looking at the question's kind and the TU-specific state (A set of `CXCursor`s). It sends back the result to libclang using a set of query-agnostic routines that consume different data types.
- After evaluating the queries, Xcode's indexer loads the TU in which the refactoring continuation should run. It knows which TU it has to load by looking at the result of the query that included the `RunContinuationInTUThatHasThisFile` action flag.
- Xcode's indexer then invokes the refactoring continuation in the freshly loaded TU. Clang converts the continuation's state from previously serialized TU-independent state to a state that's specific to the new TU, and continues running the refactoring operation with this state.
- Xcode's indexer then receives the source replacements produced by the refactoring operation and passes them back to Xcode so that it can apply the replacements to the source.

I would like to propose an API model that's based on our current "refactoring continuations" API for the cross-TU operations in the new refactoring engine. Clang will provide a C++ API that can be used in a way that's similar to the code presented above. It will also manage all of the interactions with libclang and Clangd, so individual refactoring operations won't have to worry about the fine details of a particular indexer. Furthermore, the new refactoring engine will limit the set of available actions based on the set of queries that are supported by an indexer, so the indexer that chooses to support refactoring continuations won't have to support all of the queries.

## Testing & Tooling

The `clang-refactor` tool will be used to test the refactoring engine. I would like to propose the following action-agnostic command line interface:

```
clang-refactor <action> [options] <initiation file> [additional build options]
```

I think that the tool should use comment-based annotations in the test source files to control things like initiation. For example, given a selection annotation `// selection: +1:1 -> +2:12` on line 12, `clang-refactor` will map it to a selection range of 13:1 -> 14:12 and will initiate a certain refactoring action using that selection.

In addition to `clang-refactor`, I would also like to provide an additional stress-test tool that will work with entire projects (using compilation databases), and will be capable of:

- Initiating/performing refactoring actions at each token in all of the source files in a project.
- Verifying the consistency between the indexer and the refactoring engine to ensure that they have the same model of the source.


This tool will also come with a script that generates compilation databases for Clang's tests.

I also have a great little reviewing tool that helps others to visualize the refactoring changes for a particular action, and I'd be more than happy to share it if you're interested. The tool generates a set of HTML pages that contain the sources of the refactoring tests. The pages get annotated to show the changes that were made during refactoring. We found that looking at this visualization instead of the raw `CHECK` lines makes reviewers' life much easier.

# Action Plan

Initially I would like to adopt the following high-level plan for the creation of the new refactoring engine:

- Upstream any non-refactoring specific changes from our code.
- Move the core of `clang-rename` over to clang (`clang-rename` should still work as before).
- Start working on the `clang-refactor` tool so that it can use the previously moved code. Move the tests from `clang-rename` over to Clang and convert them to `clang-refactor` tests.
- Submit our changes to the core of the `clang-rename` engine.
- Create and test the new global rename component.
- Create an AST source selection component. Initiation via source selection will be based on this component.
- Create a refactoring initiation API that will be easy to use.
- Start submitting our local refactoring actions.
- Start working on the libclang API.
- Create and submit everything else that remains, e.g. Cross-TU Operations & Indexer Queries.

Cheers,
Alex


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] Adding refactoring support to Clang

Xin Wang via cfe-dev
In reply to this post by Xin Wang via cfe-dev


On Jun 16, 2017, at 4:40 AM, Ilya Biryukov <[hidden email]> wrote:

Hi Alex,

That's a huge amount of work. Here are some questions from clangd side:

- Does the code for the refactoring engine use ASTUnit? Can it be used without that?
  The reason I ask is because we currently have a plan to replace ASTUnit with manual preamble/ASTContext management in clangd.

Not really. The core engine itself doesn’t really care about the ASTUnit. As long as it gets the ASTContext from somewhere it should work :) 


- Does refactoring engine modify files directly of just provide a list of "diffs" (in some form) to be applied by the editors?
  If they modify the files directly, is there a way to 

No, just the diffs. Xcode actually performs all the modifications.


- Does refactoring engine have a separate component for testing correctness of the refactoring? Does it have an API to provide conflicts?

Do you mean the correctness of the transformed source (i.e. it should be semantically identical to the original source) or something else? No, we don’t have a component that verifies semantic correctness. Furthermore, our implementation of “extract”s doesn’t provide any semantic correctness guarantees, as there are cases when the semantics of the program will change (we could probably diagnose them, but we don’t).


- What data does refactoring engine requires from Indexing API specifically?

What do you mean by the Indexing API? There’s no direct interaction with Clang’s indexing API and the refactoring engine. For global rename, Xcode’s indexer passes in the source location and symbol kinds for the previously indexed occurrences, but that relies on new refactoring APIs.


- Why is 'Generate missing definitions' considered a cross-TU action? You actually touch only one file (i.e. the .cpp file where you put the definitions) and the AST for it should be enough for generating definitions.

The action is typically initiated in the header file, so we treat it as a separate TU. Also, C++ classes can be scattered across many CPP files, so even if we are initiating in a CPP file that contains the declaration of the class, it might not be the right CPP file (we put the out-of-line functions to the CPP file that already has the majority of the out-of-line methods from that class).




On Fri, Jun 16, 2017 at 1:57 AM, Alex Lorenz via cfe-dev <[hidden email]> wrote:

Hi,

This is a proposal for a new Clang-based refactoring engine (as indicated by Duncan last week [1]). The new refactoring engine should be based on both the existing code from llvm.org and the code that I've just committed to github.com/apple/swift-clang [2] (Please note that it has some outdated designs and APIs that I'd like to replace). This email is split into three different sections:
- The first section lists the refactoring actions supported by Xcode 9.
- The second section describes the actual proposal.
- The third section outlines an action plan that I would like to follow when submitting patches for the new refactoring engine.

I'm looking forward to any feedback that you might have!

# Background

At first I would like to provide a little bit of background on the C++/Objective-C refactorings in Xcode. Xcode 9 contains a Clang-based refactoring engine that supports several refactoring operations. The primary operation that it supports is “Rename”. The refactoring engine gathers the renamed symbol occurrences using two distinct mechanisms:

- Local rename. It gathers occurrences for declarations that are declared in a function's body. It is based on existing code from `clang-rename` that was moved over to Clang's `Tooling` library. It was then modified slightly to fit our goals. I also added support for some Objective-C declarations.

- Global rename. It gathers occurrences for all other declarations. We decided to put the high-level logic for global rename into Xcode's indexer because Xcode had to support cross-language rename (between Swift and Objective-C). The indexer uses the core engine in Clang’s lib/Index, so Clang is still very much involved in this process. The indexer also delegates low-level “indexed-file” renames to Clang. This lets us gather occurrences for symbolic references that don't have the full location information (like Objective-C selector names), textual matches for occurrences of a matching string in comments, and textual matches for a matching filename in the `#include` directives.

Xcode 9 also support some local refactoring actions:
- Extract function/method.
- Extract duplicate expression.
- Add missing switch cases (GIF: https://reviews.llvm.org/file/data/q2qz53jd7ug4cpx4zlfx/PHID-FILE-aq75y26bgkfvm5e775jt/switchCases). It is also available as a fixit.
- [C++] Fill-in method stubs from abstract base classes.
- Convert if to switch.
- [ObjC] Fill in protocol stubs. It is also available as a fixit.
- [ObjC] Wrap in NSLocalizedString

Another action that Xcode supports is called “Generate Missing Function Definitions” (GIF: https://reviews.llvm.org/file/data/u4kzt62wjdin5dpl65jl/PHID-FILE-w7jl2vz42b5aybd6oizj/generateFunctionDefinitions). This action actually depends on the indexer to determine the set of methods that have to be implemented and where should the generated code go. It’s also the one and only cross-TU refactoring action that is currently supported (without taking "rename" into account).

# Refactoring Engine Proposal

I would like to create a new refactoring engine that will go into Clang's `Tooling` library. The engine's API should be designed with different use cases in mind, and I would like to ensure that the engine accommodates the following three use cases:

- Libclang. It should provide a C-based refactoring API that we can use in Xcode.
- Clangd.
- A standalone refactoring tool, `clang-refactor`. This tool will be used for testing. It will also allow users to perform refactoring actions without using other tools.

The engine should try to follow the following high level goals:
- It should be easy for others to create, test, and integrate new local refactoring operations.
- Refactoring operations should be allowed to fail. The engine should have good error reporting mechanisms.
- Initiation via source selection should be quick (if we assume that we have access to a parsed AST) as we'd like to let a user know which actions are available as soon as possible.
- The engine should allow at least a basic level of configurability and customization (users should be able select options for a refactoring action if an action chooses to provide some). In the future we might also want to provide more powerful configuration mechanisms that will allow IDEs to generate custom UIs for different actions.

## Initiation

I think that there are two equally important modes of initiation that should be supported by Clang's refactoring engine:

- Initiation via source selection. This mode is important for IDEs.
- Initiation via AST queries. This mode is important for standalone tools / scripts that want to refactor an entire project.

Unfortunately our code doesn't provide a good enough abstraction that generalizes well for different actions. I'm still hashing out a better initiation scheme, and I will send a separate RFC with our proposal for initiation in the near future. Let me know if you have any ideas/suggestions!

## Refactoring

I would like to propose keeping the two modes that were previously described for the "Rename" operation. This will probably mean that a standalone tool like `clang-refactor` won't support the global rename mode of rename. On the other hand, if Clangd will gain some form of indexing infrastructure (which seems likely given the recent interest in indexing [3]), it will be able to use the engine's global renaming infrastructure as well. I'm hoping the community will provide some input on what should happen to `clang-rename`, as I'm not sure whether it should stay or be deprecated in favour of new tools.

Other refactoring operations will primarily work with source modifications. I think that the refactoring engine should provide a set of shared utility components that can be reused by different operations. For example, things like return type deduction for extracted declarations should be accessible to all operations. This kind of design should also make testing easier.

Certain refactoring actions (e.g. "Fill-in switch cases") work well together with diagnostic fix-its, as they can be presented in an editor and a user can initiate a refactoring operation using a UI that's already familiar to them. I would like to ensure that this kind of mechanism is accessible to refactoring operations that need it. Our current code contains two actions that are implemented in the `Edit` library. They're usable from `Sema`, and we take advantage of that to generate the fix-its during semantic analysis.

## Cross-TU Operations & Indexer Queries

The majority of refactoring actions that we've implemented perform source transformations in a single translation unit (TU). However, there are some actions that need to work across TU boundaries. Xcode 9 includes one such cross-TU action - “Generate Missing Function Definitions”. A naïve cross-TU solution that I've looked at initially was based on an API that had a series of callbacks between Xcode's indexer and libclang. However, I quickly discovered that this model didn't work that well, as it interfered with the code that loaded and handled different TUs in the indexer.

Note: while "Rename" is a global operation, it doesn't actually interact with other TUs because source replacements can be derived from the data stored in the project's index. Thus, our implementation doesn't treat it as a cross-TU operation.

The current implementation of the “Generate Missing Function Definitions" action is based on the "refactoring continuations" API. A refactoring continuation is just a suspended refactoring operation that can be executed in a TU that's different to the one in which the operation was created. Each continuation includes TU-specific state that's automatically managed by Clang, as well as a set of questions about that state. Each individual question is called an "indexer query".

Indexer queries are the core structures that enable cross-TU actions. An external component (like Xcode's indexer) has to evaluate them before running the continuation in order for the operation to work correctly. Each query can be described using four different attributes:

- Question: The kind of question that should be answered. Our implementation exposes the following two kinds at the libclang level: `Decl_FileThatShouldImplement` and `Decl_IsDefined`.
- Subject: The TU-specific state from the parent continuation that acts as the question's subject.
- Result: The value that has to be set by the external indexer component before running the refactoring continuation.
- Action: A flag that represents an additional action that has to be performed by the indexer before running the continuation. We use the `RunContinuationInTUThatHasThisFile` action in Xcode's indexer to determine which TU should be loaded and used when running the refactoring continuation.

Please note that while I mention "Xcode's indexer" quite a lot in this section, our refactoring continuation API is indexer-agnostic, and is designed to work with any indexer. The indexer just has to implement support for the indexer queries that are required for a particular refactoring operation, and the refactoring engine handles everything else. The code sample below shows how our code constructs the refactoring continuation for the "Generate Missing Function Definitions" action:

```
return continueInExternalASTUnit(
/*Indexer query that gets annotated with the RunContinuationInTUThatHasThisFile action */ fileThatShouldContainImplementationOf(Container),
/*Pointer to the continuation function*/ runInImplementationAST,
/*A state value*/ Container,
/*An additional indexer query over some state*/ filter(llvm::makeArrayRef(SelectedMethods), [](const DeclEntity &D) { return !D.isDefined(); }));
```

In practical terms, the following series of events occur in Xcode when it performs the “Generate Missing Function Definitions” operation:

- After the user chooses the "Generate Missing Function Definitions” action, Xcode notifies its indexer that it has to perform that action. The indexer then uses libclang to get the refactoring results after running the action's operation in the initiation TU.
- Xcode's indexer sees that the results provide a refactoring continuation instead of source replacements, so it starts looking at the attached indexer queries.
- Xcode's indexer evaluates each query by looking at the question's kind and the TU-specific state (A set of `CXCursor`s). It sends back the result to libclang using a set of query-agnostic routines that consume different data types.
- After evaluating the queries, Xcode's indexer loads the TU in which the refactoring continuation should run. It knows which TU it has to load by looking at the result of the query that included the `RunContinuationInTUThatHasThisFile` action flag.
- Xcode's indexer then invokes the refactoring continuation in the freshly loaded TU. Clang converts the continuation's state from previously serialized TU-independent state to a state that's specific to the new TU, and continues running the refactoring operation with this state.
- Xcode's indexer then receives the source replacements produced by the refactoring operation and passes them back to Xcode so that it can apply the replacements to the source.

I would like to propose an API model that's based on our current "refactoring continuations" API for the cross-TU operations in the new refactoring engine. Clang will provide a C++ API that can be used in a way that's similar to the code presented above. It will also manage all of the interactions with libclang and Clangd, so individual refactoring operations won't have to worry about the fine details of a particular indexer. Furthermore, the new refactoring engine will limit the set of available actions based on the set of queries that are supported by an indexer, so the indexer that chooses to support refactoring continuations won't have to support all of the queries.

## Testing & Tooling

The `clang-refactor` tool will be used to test the refactoring engine. I would like to propose the following action-agnostic command line interface:

```
clang-refactor <action> [options] <initiation file> [additional build options]
```

I think that the tool should use comment-based annotations in the test source files to control things like initiation. For example, given a selection annotation `// selection: +1:1 -> +2:12` on line 12, `clang-refactor` will map it to a selection range of 13:1 -> 14:12 and will initiate a certain refactoring action using that selection.

In addition to `clang-refactor`, I would also like to provide an additional stress-test tool that will work with entire projects (using compilation databases), and will be capable of:

- Initiating/performing refactoring actions at each token in all of the source files in a project.
- Verifying the consistency between the indexer and the refactoring engine to ensure that they have the same model of the source.

This tool will also come with a script that generates compilation databases for Clang's tests.

I also have a great little reviewing tool that helps others to visualize the refactoring changes for a particular action, and I'd be more than happy to share it if you're interested. The tool generates a set of HTML pages that contain the sources of the refactoring tests. The pages get annotated to show the changes that were made during refactoring. We found that looking at this visualization instead of the raw `CHECK` lines makes reviewers' life much easier.

# Action Plan

Initially I would like to adopt the following high-level plan for the creation of the new refactoring engine:

- Upstream any non-refactoring specific changes from our code.
- Move the core of `clang-rename` over to clang (`clang-rename` should still work as before).
- Start working on the `clang-refactor` tool so that it can use the previously moved code. Move the tests from `clang-rename` over to Clang and convert them to `clang-refactor` tests.
- Submit our changes to the core of the `clang-rename` engine.
- Create and test the new global rename component.
- Create an AST source selection component. Initiation via source selection will be based on this component.
- Create a refactoring initiation API that will be easy to use.
- Start submitting our local refactoring actions.
- Start working on the libclang API.
- Create and submit everything else that remains, e.g. Cross-TU Operations & Indexer Queries.

Cheers,
Alex



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev




--
Regards,
Ilya Biryukov


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] Adding refactoring support to Clang

Xin Wang via cfe-dev
- Does the code for the refactoring engine use ASTUnit? Can it be used without that?
Not really. The core engine itself doesn’t really care about the ASTUnit. As long as it gets the ASTContext from somewhere it should work :) 
Good to know, then it shouldn't be a problem to reuse your code in clangd even if we move away from ASTUnit. 
- Does refactoring engine have a separate component for testing correctness of the refactoring? Does it have an API to provide conflicts?
Do you mean the correctness of the transformed source (i.e. it should be semantically identical to the original source) or something else? No, we don’t have a component that verifies semantic correctness. Furthermore, our implementation of “extract”s doesn’t provide any semantic correctness guarantees, as there are cases when the semantics of the program will change (we could probably diagnose them, but we don’t).
I.e. a common check for rename would be to resolve all occurrences again after rename is completed and check that they actually resolve to the renamed entity. (I.e. there might've been a parameter with the same name and some references may now resolves to a parameter, rather than some class you were renaming. In case there are errors, we may want to notify users about those (that's what I call conflicts).

- What data does refactoring engine requires from Indexing API specifically?
What do you mean by the Indexing API? There’s no direct interaction with Clang’s indexing API and the refactoring engine. For global rename, Xcode’s indexer passes in the source location and symbol kinds for the previously indexed occurrences, but that relies on new refactoring APIs.
I was asking about the data you pass from Xcode's indexer to refactoring engine. Got it, thanks.
- Why is 'Generate missing definitions' considered a cross-TU action? You actually touch only one file (i.e. the .cpp file where you put the definitions) and the AST for it should be enough for generating definitions.
The action is typically initiated in the header file, so we treat it as a separate TU. Also, C++ classes can be scattered across many CPP files, so even if we are initiating in a CPP file that contains the declaration of the class, it might not be the right CPP file (we put the out-of-line functions to the CPP file that already has the majority of the out-of-line methods from that class).
Got it.
Have you thought about actions that would require referring to classes/functions that aren't available in the TU you're changing (i.e. require includes to added in order to become available)?
Say, we want to implement an action/refactoring to create a missing declaration inside class A (see code below):
// Foo.h
struct A {
  int foo();
};

// Foo.cpp
#include "Foo.h"
#include <string>

using namespace std;
int A::bar(string a) {  // <-- action available here to create bar() in Foo.h
  return 10;
}

Can the current refactoring infrastructure be used to implement that? (i.e. add '#include <string>' to Foo.h, qualify string with 'std::' when creating a declaration?)
If not, maybe you've already thought how we could make that possible?
 

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] Adding refactoring support to Clang

Xin Wang via cfe-dev


On Jun 16, 2017, at 11:38 AM, Ilya Biryukov <[hidden email]> wrote:

- Does the code for the refactoring engine use ASTUnit? Can it be used without that?
Not really. The core engine itself doesn’t really care about the ASTUnit. As long as it gets the ASTContext from somewhere it should work :) 
Good to know, then it shouldn't be a problem to reuse your code in clangd even if we move away from ASTUnit. 
- Does refactoring engine have a separate component for testing correctness of the refactoring? Does it have an API to provide conflicts?
Do you mean the correctness of the transformed source (i.e. it should be semantically identical to the original source) or something else? No, we don’t have a component that verifies semantic correctness. Furthermore, our implementation of “extract”s doesn’t provide any semantic correctness guarantees, as there are cases when the semantics of the program will change (we could probably diagnose them, but we don’t).
I.e. a common check for rename would be to resolve all occurrences again after rename is completed and check that they actually resolve to the renamed entity. (I.e. there might've been a parameter with the same name and some references may now resolves to a parameter, rather than some class you were renaming. In case there are errors, we may want to notify users about those (that's what I call conflicts).

Yes, this is a potential issue. We don’t have any code that performs this kind of verification though.


- What data does refactoring engine requires from Indexing API specifically?
What do you mean by the Indexing API? There’s no direct interaction with Clang’s indexing API and the refactoring engine. For global rename, Xcode’s indexer passes in the source location and symbol kinds for the previously indexed occurrences, but that relies on new refactoring APIs.
I was asking about the data you pass from Xcode's indexer to refactoring engine. Got it, thanks.
- Why is 'Generate missing definitions' considered a cross-TU action? You actually touch only one file (i.e. the .cpp file where you put the definitions) and the AST for it should be enough for generating definitions.
The action is typically initiated in the header file, so we treat it as a separate TU. Also, C++ classes can be scattered across many CPP files, so even if we are initiating in a CPP file that contains the declaration of the class, it might not be the right CPP file (we put the out-of-line functions to the CPP file that already has the majority of the out-of-line methods from that class).
Got it.
Have you thought about actions that would require referring to classes/functions that aren't available in the TU you're changing (i.e. require includes to added in order to become available)?
Say, we want to implement an action/refactoring to create a missing declaration inside class A (see code below):
// Foo.h
struct A {
  int foo();
};

// Foo.cpp
#include "Foo.h"
#include <string>

using namespace std;
int A::bar(string a) {  // <-- action available here to create bar() in Foo.h
  return 10;
}

Can the current refactoring infrastructure be used to implement that? (i.e. add '#include <string>' to Foo.h, qualify string with 'std::' when creating a declaration?)
If not, maybe you've already thought how we could make that possible?
 

Good question. It should be possible to implement that using the new engine, you’d just have to teach it about <includes>. In your particular example we already have the “foo.h” included into “foo.cpp”, so we can do everything in one TU. I can think that an approach like this will work:

- Gather declarations from imported/included headers an associate them with those headers (in that particular example `A` is from Foo.h and `std::string` is from string). 
- Somehow determine their declaration status at the point of insertion (`A` is available, `std::string` isn’t). If a declaration isn’t available, we should include a header.

I think that it should be possible to support this kind of workflow for cross-TU operation as well, we’ d just have to persist the pointers to declarations and their associated headers across TUs.

I think the `string` will be qualified with `std` if you print it out, so I think the problem that we have here is actually the reverse - if you already include <string> and use namespace std in the header, then the `std::` prefix can be dropped. I believe there’s already existing code in lib/Tooling/Core that deals with that sort of thing, but I haven’t looked that much into it.






_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] Adding refactoring support to Clang

Xin Wang via cfe-dev
Hello Alex,
 
This is great effort! I will monitor it closely for my employer in-house projects and will consider particitpation if find my help applicable. Do you plan any protection from incorrect source positions reported by clang in AST for some entities? I recently implemented c++ indexer and found at least one case where SourceLocation information doesn't match (https://bugs.llvm.org/show_bug.cgi?id=26195). I tried to find name reported in AST at SourceLoc attached to the entity. In some cases names are articificial and don't naturally match, but the case in question is real issue. There may be other cases (I don't index absolutely everyting in AST), so, probably, some protection logic is needed agains similar issue. I have fix for this particular issue with test exhibiting the offending behavior using c-index-test on trunk clang (https://reviews.llvm.org/D32439), so you may look and see how it is relevant to your development.
 
Thak you,
-- 
Serge Preis
 
 
 
 
17.06.2017, 02:00, "Alex Lorenz via cfe-dev" <[hidden email]>:
 
 
On Jun 16, 2017, at 11:38 AM, Ilya Biryukov <[hidden email]> wrote:
 
- Does the code for the refactoring engine use ASTUnit? Can it be used without that?
Not really. The core engine itself doesn’t really care about the ASTUnit. As long as it gets the ASTContext from somewhere it should work :) 
Good to know, then it shouldn't be a problem to reuse your code in clangd even if we move away from ASTUnit. 
- Does refactoring engine have a separate component for testing correctness of the refactoring? Does it have an API to provide conflicts?
Do you mean the correctness of the transformed source (i.e. it should be semantically identical to the original source) or something else? No, we don’t have a component that verifies semantic correctness. Furthermore, our implementation of “extract”s doesn’t provide any semantic correctness guarantees, as there are cases when the semantics of the program will change (we could probably diagnose them, but we don’t).
I.e. a common check for rename would be to resolve all occurrences again after rename is completed and check that they actually resolve to the renamed entity. (I.e. there might've been a parameter with the same name and some references may now resolves to a parameter, rather than some class you were renaming. In case there are errors, we may want to notify users about those (that's what I call conflicts).
 
Yes, this is a potential issue. We don’t have any code that performs this kind of verification though.
 
 
- What data does refactoring engine requires from Indexing API specifically?
What do you mean by the Indexing API? There’s no direct interaction with Clang’s indexing API and the refactoring engine. For global rename, Xcode’s indexer passes in the source location and symbol kinds for the previously indexed occurrences, but that relies on new refactoring APIs.
I was asking about the data you pass from Xcode's indexer to refactoring engine. Got it, thanks.
- Why is 'Generate missing definitions' considered a cross-TU action? You actually touch only one file (i.e. the .cpp file where you put the definitions) and the AST for it should be enough for generating definitions.
The action is typically initiated in the header file, so we treat it as a separate TU. Also, C++ classes can be scattered across many CPP files, so even if we are initiating in a CPP file that contains the declaration of the class, it might not be the right CPP file (we put the out-of-line functions to the CPP file that already has the majority of the out-of-line methods from that class).
Got it.
Have you thought about actions that would require referring to classes/functions that aren't available in the TU you're changing (i.e. require includes to added in order to become available)?
Say, we want to implement an action/refactoring to create a missing declaration inside class A (see code below):
// Foo.h
struct A {
  int foo();
};
 
// Foo.cpp
#include "Foo.h"
#include <string>
 
using namespace std;
int A::bar(string a) {  // <-- action available here to create bar() in Foo.h
  return 10;
}
 
Can the current refactoring infrastructure be used to implement that? (i.e. add '#include <string>' to Foo.h, qualify string with 'std::' when creating a declaration?)
If not, maybe you've already thought how we could make that possible?
 
Good question. It should be possible to implement that using the new engine, you’d just have to teach it about <includes>. In your particular example we already have the “foo.h” included into “foo.cpp”, so we can do everything in one TU. I can think that an approach like this will work:
 
- Gather declarations from imported/included headers an associate them with those headers (in that particular example `A` is from Foo.h and `std::string` is from string). 
- Somehow determine their declaration status at the point of insertion (`A` is available, `std::string` isn’t). If a declaration isn’t available, we should include a header.
 
I think that it should be possible to support this kind of workflow for cross-TU operation as well, we’ d just have to persist the pointers to declarations and their associated headers across TUs.
 
I think the `string` will be qualified with `std` if you print it out, so I think the problem that we have here is actually the reverse - if you already include <string> and use namespace std in the header, then the `std::` prefix can be dropped. I believe there’s already existing code in lib/Tooling/Core that deals with that sort of thing, but I haven’t looked that much into it.
 
 
 
 
,

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] Adding refactoring support to Clang

Xin Wang via cfe-dev


On Jun 19, 2017, at 5:02 AM, Сергей Прейс <[hidden email]> wrote:

Hello Alex,
 
This is great effort! I will monitor it closely for my employer in-house projects and will consider particitpation if find my help applicable. Do you plan any protection from incorrect source positions reported by clang in AST for some entities?

In the context of “Global Rename”, the engine shouldn't rename an identifier whose name is not the same as the name of the renamed symbol. This does not catch all cases though, as the incorrect source position might point to the same identifier. The number of incorrect source positions should be zero though, so we should just fix these issues.

My stress-test tool that I mentioned in the RFC can actually help find such bugs, as it can detect cases when our AST misses a location for some symbol (by trying to initiate a rename at every non-keyword identifier token). Unfortunately it has a lot of false positives, so I haven’t had time to look into it that much.

I recently implemented c++ indexer and found at least one case where SourceLocation information doesn't match (https://bugs.llvm.org/show_bug.cgi?id=26195). I tried to find name reported in AST at SourceLoc attached to the entity. In some cases names are articificial and don't naturally match, but the case in question is real issue. There may be other cases (I don't index absolutely everyting in AST), so, probably, some protection logic is needed agains similar issue. I have fix for this particular issue with test exhibiting the offending behavior using c-index-test on trunk clang (https://reviews.llvm.org/D32439), so you may look and see how it is relevant to your development.
 

Thanks! I’ll take a look this week.

Thak you,
-- 
Serge Preis
 
 
 
 
17.06.2017, 02:00, "Alex Lorenz via cfe-dev" <[hidden email]>:
 
 
On Jun 16, 2017, at 11:38 AM, Ilya Biryukov <[hidden email]> wrote:
 
- Does the code for the refactoring engine use ASTUnit? Can it be used without that?
Not really. The core engine itself doesn’t really care about the ASTUnit. As long as it gets the ASTContext from somewhere it should work :) 
Good to know, then it shouldn't be a problem to reuse your code in clangd even if we move away from ASTUnit. 
- Does refactoring engine have a separate component for testing correctness of the refactoring? Does it have an API to provide conflicts?
Do you mean the correctness of the transformed source (i.e. it should be semantically identical to the original source) or something else? No, we don’t have a component that verifies semantic correctness. Furthermore, our implementation of “extract”s doesn’t provide any semantic correctness guarantees, as there are cases when the semantics of the program will change (we could probably diagnose them, but we don’t).
I.e. a common check for rename would be to resolve all occurrences again after rename is completed and check that they actually resolve to the renamed entity. (I.e. there might've been a parameter with the same name and some references may now resolves to a parameter, rather than some class you were renaming. In case there are errors, we may want to notify users about those (that's what I call conflicts).
 
Yes, this is a potential issue. We don’t have any code that performs this kind of verification though.
 
 
- What data does refactoring engine requires from Indexing API specifically?
What do you mean by the Indexing API? There’s no direct interaction with Clang’s indexing API and the refactoring engine. For global rename, Xcode’s indexer passes in the source location and symbol kinds for the previously indexed occurrences, but that relies on new refactoring APIs.
I was asking about the data you pass from Xcode's indexer to refactoring engine. Got it, thanks.
- Why is 'Generate missing definitions' considered a cross-TU action? You actually touch only one file (i.e. the .cpp file where you put the definitions) and the AST for it should be enough for generating definitions.
The action is typically initiated in the header file, so we treat it as a separate TU. Also, C++ classes can be scattered across many CPP files, so even if we are initiating in a CPP file that contains the declaration of the class, it might not be the right CPP file (we put the out-of-line functions to the CPP file that already has the majority of the out-of-line methods from that class).
Got it.
Have you thought about actions that would require referring to classes/functions that aren't available in the TU you're changing (i.e. require includes to added in order to become available)?
Say, we want to implement an action/refactoring to create a missing declaration inside class A (see code below):
// Foo.h
struct A {
  int foo();
};
 
// Foo.cpp
#include "Foo.h"
#include <string>
 
using namespace std;
int A::bar(string a) {  // <-- action available here to create bar() in Foo.h
  return 10;
}
 
Can the current refactoring infrastructure be used to implement that? (i.e. add '#include <string>' to Foo.h, qualify string with 'std::' when creating a declaration?)
If not, maybe you've already thought how we could make that possible?
 
Good question. It should be possible to implement that using the new engine, you’d just have to teach it about <includes>. In your particular example we already have the “foo.h” included into “foo.cpp”, so we can do everything in one TU. I can think that an approach like this will work:
 
- Gather declarations from imported/included headers an associate them with those headers (in that particular example `A` is from Foo.h and `std::string` is from string). 
- Somehow determine their declaration status at the point of insertion (`A` is available, `std::string` isn’t). If a declaration isn’t available, we should include a header.
 
I think that it should be possible to support this kind of workflow for cross-TU operation as well, we’ d just have to persist the pointers to declarations and their associated headers across TUs.
 
I think the `string` will be qualified with `std` if you print it out, so I think the problem that we have here is actually the reverse - if you already include <string> and use namespace std in the header, then the `std::` prefix can be dropped. I believe there’s already existing code in lib/Tooling/Core that deals with that sort of thing, but I haven’t looked that much into it.
 
 
 
 
,

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] Adding refactoring support to Clang

Xin Wang via cfe-dev
In reply to this post by Xin Wang via cfe-dev
First, thanks for proposing this - this fits pretty much exactly with what we had envisioned / slowly started working towards, so big thumbs up :) 

Others have already made a lot of interesting observations / points, adding a couple of thoughts inline.

On Fri, Jun 16, 2017 at 1:57 AM Alex Lorenz <[hidden email]> wrote:

Hi,

This is a proposal for a new Clang-based refactoring engine (as indicated by Duncan last week [1]). The new refactoring engine should be based on both the existing code from llvm.org and the code that I've just committed to github.com/apple/swift-clang [2] (Please note that it has some outdated designs and APIs that I'd like to replace). This email is split into three different sections:
- The first section lists the refactoring actions supported by Xcode 9.
- The second section describes the actual proposal.
- The third section outlines an action plan that I would like to follow when submitting patches for the new refactoring engine.

I'm looking forward to any feedback that you might have!

# Background

At first I would like to provide a little bit of background on the C++/Objective-C refactorings in Xcode. Xcode 9 contains a Clang-based refactoring engine that supports several refactoring operations. The primary operation that it supports is “Rename”. The refactoring engine gathers the renamed symbol occurrences using two distinct mechanisms:

- Local rename. It gathers occurrences for declarations that are declared in a function's body. It is based on existing code from `clang-rename` that was moved over to Clang's `Tooling` library. It was then modified slightly to fit our goals. I also added support for some Objective-C declarations.

- Global rename. It gathers occurrences for all other declarations. We decided to put the high-level logic for global rename into Xcode's indexer because Xcode had to support cross-language rename (between Swift and Objective-C). The indexer uses the core engine in Clang’s lib/Index, so Clang is still very much involved in this process. The indexer also delegates low-level “indexed-file” renames to Clang. This lets us gather occurrences for symbolic references that don't have the full location information (like Objective-C selector names), textual matches for occurrences of a matching string in comments, and textual matches for a matching filename in the `#include` directives.

Xcode 9 also support some local refactoring actions:
- Extract function/method.
- Extract duplicate expression.
- Add missing switch cases (GIF: https://reviews.llvm.org/file/data/q2qz53jd7ug4cpx4zlfx/PHID-FILE-aq75y26bgkfvm5e775jt/switchCases). It is also available as a fixit.
- [C++] Fill-in method stubs from abstract base classes.
- Convert if to switch.
- [ObjC] Fill in protocol stubs. It is also available as a fixit.
- [ObjC] Wrap in NSLocalizedString

Another action that Xcode supports is called “Generate Missing Function Definitions” (GIF: https://reviews.llvm.org/file/data/u4kzt62wjdin5dpl65jl/PHID-FILE-w7jl2vz42b5aybd6oizj/generateFunctionDefinitions). This action actually depends on the indexer to determine the set of methods that have to be implemented and where should the generated code go. It’s also the one and only cross-TU refactoring action that is currently supported (without taking "rename" into account).

# Refactoring Engine Proposal

I would like to create a new refactoring engine that will go into Clang's `Tooling` library. The engine's API should be designed with different use cases in mind, and I would like to ensure that the engine accommodates the following three use cases:

- Libclang. It should provide a C-based refactoring API that we can use in Xcode.
- Clangd.
- A standalone refactoring tool, `clang-refactor`. This tool will be used for testing. It will also allow users to perform refactoring actions without using other tools.

The engine should try to follow the following high level goals:
- It should be easy for others to create, test, and integrate new local refactoring operations.
- Refactoring operations should be allowed to fail. The engine should have good error reporting mechanisms.
- Initiation via source selection should be quick (if we assume that we have access to a parsed AST) as we'd like to let a user know which actions are available as soon as possible.
- The engine should allow at least a basic level of configurability and customization (users should be able select options for a refactoring action if an action chooses to provide some). In the future we might also want to provide more powerful configuration mechanisms that will allow IDEs to generate custom UIs for different actions.

## Initiation

I think that there are two equally important modes of initiation that should be supported by Clang's refactoring engine:

- Initiation via source selection. This mode is important for IDEs.
- Initiation via AST queries. This mode is important for standalone tools / scripts that want to refactor an entire project.

Unfortunately our code doesn't provide a good enough abstraction that generalizes well for different actions. I'm still hashing out a better initiation scheme, and I will send a separate RFC with our proposal for initiation in the near future. Let me know if you have any ideas/suggestions!

## Refactoring

I would like to propose keeping the two modes that were previously described for the "Rename" operation. This will probably mean that a standalone tool like `clang-refactor` won't support the global rename mode of rename. On the other hand, if Clangd will gain some form of indexing infrastructure (which seems likely given the recent interest in indexing [3]), it will be able to use the engine's global renaming infrastructure as well. I'm hoping the community will provide some input on what should happen to `clang-rename`, as I'm not sure whether it should stay or be deprecated in favour of new tools.

Other refactoring operations will primarily work with source modifications. I think that the refactoring engine should provide a set of shared utility components that can be reused by different operations. For example, things like return type deduction for extracted declarations should be accessible to all operations. This kind of design should also make testing easier.

Certain refactoring actions (e.g. "Fill-in switch cases") work well together with diagnostic fix-its, as they can be presented in an editor and a user can initiate a refactoring operation using a UI that's already familiar to them. I would like to ensure that this kind of mechanism is accessible to refactoring operations that need it. Our current code contains two actions that are implemented in the `Edit` library. They're usable from `Sema`, and we take advantage of that to generate the fix-its during semantic analysis.

## Cross-TU Operations & Indexer Queries

The majority of refactoring actions that we've implemented perform source transformations in a single translation unit (TU). However, there are some actions that need to work across TU boundaries. Xcode 9 includes one such cross-TU action - “Generate Missing Function Definitions”. A naïve cross-TU solution that I've looked at initially was based on an API that had a series of callbacks between Xcode's indexer and libclang. However, I quickly discovered that this model didn't work that well, as it interfered with the code that loaded and handled different TUs in the indexer.

Note: while "Rename" is a global operation, it doesn't actually interact with other TUs because source replacements can be derived from the data stored in the project's index. Thus, our implementation doesn't treat it as a cross-TU operation.

The current implementation of the “Generate Missing Function Definitions" action is based on the "refactoring continuations" API. A refactoring continuation is just a suspended refactoring operation that can be executed in a TU that's different to the one in which the operation was created. Each continuation includes TU-specific state that's automatically managed by Clang, as well as a set of questions about that state. Each individual question is called an "indexer query".

Indexer queries are the core structures that enable cross-TU actions. An external component (like Xcode's indexer) has to evaluate them before running the continuation in order for the operation to work correctly. Each query can be described using four different attributes:

- Question: The kind of question that should be answered. Our implementation exposes the following two kinds at the libclang level: `Decl_FileThatShouldImplement` and `Decl_IsDefined`.
- Subject: The TU-specific state from the parent continuation that acts as the question's subject.
- Result: The value that has to be set by the external indexer component before running the refactoring continuation.
- Action: A flag that represents an additional action that has to be performed by the indexer before running the continuation. We use the `RunContinuationInTUThatHasThisFile` action in Xcode's indexer to determine which TU should be loaded and used when running the refactoring continuation.

Please note that while I mention "Xcode's indexer" quite a lot in this section, our refactoring continuation API is indexer-agnostic, and is designed to work with any indexer. The indexer just has to implement support for the indexer queries that are required for a particular refactoring operation, and the refactoring engine handles everything else. The code sample below shows how our code constructs the refactoring continuation for the "Generate Missing Function Definitions" action:

```
return continueInExternalASTUnit(
/*Indexer query that gets annotated with the RunContinuationInTUThatHasThisFile action */ fileThatShouldContainImplementationOf(Container),
/*Pointer to the continuation function*/ runInImplementationAST,
/*A state value*/ Container,
/*An additional indexer query over some state*/ filter(llvm::makeArrayRef(SelectedMethods), [](const DeclEntity &D) { return !D.isDefined(); }));
```

I think the harder design space will be how to run something on "all files containing references to <X>". What we definitely need is an interface that allows the ops to be sharded out across a fleet of machines (we use C++ ;)  based on the expected amount of work. This has some restrictions on the design, specifically also is why we use SourceManager independent classes to handle replacements etc.

Generally, I fully agree that we can get an interface here that is indexer independent.
 

In practical terms, the following series of events occur in Xcode when it performs the “Generate Missing Function Definitions” operation:

- After the user chooses the "Generate Missing Function Definitions” action, Xcode notifies its indexer that it has to perform that action. The indexer then uses libclang to get the refactoring results after running the action's operation in the initiation TU.
- Xcode's indexer sees that the results provide a refactoring continuation instead of source replacements, so it starts looking at the attached indexer queries.
- Xcode's indexer evaluates each query by looking at the question's kind and the TU-specific state (A set of `CXCursor`s). It sends back the result to libclang using a set of query-agnostic routines that consume different data types.
- After evaluating the queries, Xcode's indexer loads the TU in which the refactoring continuation should run. It knows which TU it has to load by looking at the result of the query that included the `RunContinuationInTUThatHasThisFile` action flag.
- Xcode's indexer then invokes the refactoring continuation in the freshly loaded TU. Clang converts the continuation's state from previously serialized TU-independent state to a state that's specific to the new TU, and continues running the refactoring operation with this state.
- Xcode's indexer then receives the source replacements produced by the refactoring operation and passes them back to Xcode so that it can apply the replacements to the source.

I would like to propose an API model that's based on our current "refactoring continuations" API for the cross-TU operations in the new refactoring engine. Clang will provide a C++ API that can be used in a way that's similar to the code presented above. It will also manage all of the interactions with libclang and Clangd, so individual refactoring operations won't have to worry about the fine details of a particular indexer. Furthermore, the new refactoring engine will limit the set of available actions based on the set of queries that are supported by an indexer, so the indexer that chooses to support refactoring continuations won't have to support all of the queries.

## Testing & Tooling

The `clang-refactor` tool will be used to test the refactoring engine. I would like to propose the following action-agnostic command line interface:

```
clang-refactor <action> [options] <initiation file> [additional build options]
```

I think that the tool should use comment-based annotations in the test source files to control things like initiation. For example, given a selection annotation `// selection: +1:1 -> +2:12` on line 12, `clang-refactor` will map it to a selection range of 13:1 -> 14:12 and will initiate a certain refactoring action using that selection.

Ok, already bikeshedding: can we call it a range instead of a selection? 
 
In addition to `clang-refactor`, I would also like to provide an additional stress-test tool that will work with entire projects (using compilation databases), and will be capable of:

- Initiating/performing refactoring actions at each token in all of the source files in a project.
- Verifying the consistency between the indexer and the refactoring engine to ensure that they have the same model of the source.

This tool will also come with a script that generates compilation databases for Clang's tests.

I also have a great little reviewing tool that helps others to visualize the refactoring changes for a particular action, and I'd be more than happy to share it if you're interested. The tool generates a set of HTML pages that contain the sources of the refactoring tests. The pages get annotated to show the changes that were made during refactoring. We found that looking at this visualization instead of the raw `CHECK` lines makes reviewers' life much easier.

Nice! Additionally, I'd also suggest to use unit tests, especially for small dedicated regression tests where the AST is complex (might be more of a problem for C++).
 

# Action Plan

Initially I would like to adopt the following high-level plan for the creation of the new refactoring engine:

- Upstream any non-refactoring specific changes from our code.
- Move the core of `clang-rename` over to clang (`clang-rename` should still work as before).
- Start working on the `clang-refactor` tool so that it can use the previously moved code. Move the tests from `clang-rename` over to Clang and convert them to `clang-refactor` tests.
- Submit our changes to the core of the `clang-rename` engine.
- Create and test the new global rename component.
- Create an AST source selection component. Initiation via source selection will be based on this component.
- Create a refactoring initiation API that will be easy to use.
- Start submitting our local refactoring actions.
- Start working on the libclang API.
- Create and submit everything else that remains, e.g. Cross-TU Operations & Indexer Queries.

Sounds generally good. Thanks!
 

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] Adding refactoring support to Clang

Xin Wang via cfe-dev


On 20 Jun 2017, at 09:27, Manuel Klimek <[hidden email]> wrote:

First, thanks for proposing this - this fits pretty much exactly with what we had envisioned / slowly started working towards, so big thumbs up :) 

Others have already made a lot of interesting observations / points, adding a couple of thoughts inline.

On Fri, Jun 16, 2017 at 1:57 AM Alex Lorenz <[hidden email]> wrote:

Hi,

This is a proposal for a new Clang-based refactoring engine (as indicated by Duncan last week [1]). The new refactoring engine should be based on both the existing code from llvm.org and the code that I've just committed to github.com/apple/swift-clang [2] (Please note that it has some outdated designs and APIs that I'd like to replace). This email is split into three different sections:
- The first section lists the refactoring actions supported by Xcode 9.
- The second section describes the actual proposal.
- The third section outlines an action plan that I would like to follow when submitting patches for the new refactoring engine.

I'm looking forward to any feedback that you might have!

# Background

At first I would like to provide a little bit of background on the C++/Objective-C refactorings in Xcode. Xcode 9 contains a Clang-based refactoring engine that supports several refactoring operations. The primary operation that it supports is “Rename”. The refactoring engine gathers the renamed symbol occurrences using two distinct mechanisms:

- Local rename. It gathers occurrences for declarations that are declared in a function's body. It is based on existing code from `clang-rename` that was moved over to Clang's `Tooling` library. It was then modified slightly to fit our goals. I also added support for some Objective-C declarations.

- Global rename. It gathers occurrences for all other declarations. We decided to put the high-level logic for global rename into Xcode's indexer because Xcode had to support cross-language rename (between Swift and Objective-C). The indexer uses the core engine in Clang’s lib/Index, so Clang is still very much involved in this process. The indexer also delegates low-level “indexed-file” renames to Clang. This lets us gather occurrences for symbolic references that don't have the full location information (like Objective-C selector names), textual matches for occurrences of a matching string in comments, and textual matches for a matching filename in the `#include` directives.

Xcode 9 also support some local refactoring actions:
- Extract function/method.
- Extract duplicate expression.
- Add missing switch cases (GIF: https://reviews.llvm.org/file/data/q2qz53jd7ug4cpx4zlfx/PHID-FILE-aq75y26bgkfvm5e775jt/switchCases). It is also available as a fixit.
- [C++] Fill-in method stubs from abstract base classes.
- Convert if to switch.
- [ObjC] Fill in protocol stubs. It is also available as a fixit.
- [ObjC] Wrap in NSLocalizedString

Another action that Xcode supports is called “Generate Missing Function Definitions” (GIF: https://reviews.llvm.org/file/data/u4kzt62wjdin5dpl65jl/PHID-FILE-w7jl2vz42b5aybd6oizj/generateFunctionDefinitions). This action actually depends on the indexer to determine the set of methods that have to be implemented and where should the generated code go. It’s also the one and only cross-TU refactoring action that is currently supported (without taking "rename" into account).

# Refactoring Engine Proposal

I would like to create a new refactoring engine that will go into Clang's `Tooling` library. The engine's API should be designed with different use cases in mind, and I would like to ensure that the engine accommodates the following three use cases:

- Libclang. It should provide a C-based refactoring API that we can use in Xcode.
- Clangd.
- A standalone refactoring tool, `clang-refactor`. This tool will be used for testing. It will also allow users to perform refactoring actions without using other tools.

The engine should try to follow the following high level goals:
- It should be easy for others to create, test, and integrate new local refactoring operations.
- Refactoring operations should be allowed to fail. The engine should have good error reporting mechanisms.
- Initiation via source selection should be quick (if we assume that we have access to a parsed AST) as we'd like to let a user know which actions are available as soon as possible.
- The engine should allow at least a basic level of configurability and customization (users should be able select options for a refactoring action if an action chooses to provide some). In the future we might also want to provide more powerful configuration mechanisms that will allow IDEs to generate custom UIs for different actions.

## Initiation

I think that there are two equally important modes of initiation that should be supported by Clang's refactoring engine:

- Initiation via source selection. This mode is important for IDEs.
- Initiation via AST queries. This mode is important for standalone tools / scripts that want to refactor an entire project.

Unfortunately our code doesn't provide a good enough abstraction that generalizes well for different actions. I'm still hashing out a better initiation scheme, and I will send a separate RFC with our proposal for initiation in the near future. Let me know if you have any ideas/suggestions!

## Refactoring

I would like to propose keeping the two modes that were previously described for the "Rename" operation. This will probably mean that a standalone tool like `clang-refactor` won't support the global rename mode of rename. On the other hand, if Clangd will gain some form of indexing infrastructure (which seems likely given the recent interest in indexing [3]), it will be able to use the engine's global renaming infrastructure as well. I'm hoping the community will provide some input on what should happen to `clang-rename`, as I'm not sure whether it should stay or be deprecated in favour of new tools.

Other refactoring operations will primarily work with source modifications. I think that the refactoring engine should provide a set of shared utility components that can be reused by different operations. For example, things like return type deduction for extracted declarations should be accessible to all operations. This kind of design should also make testing easier.

Certain refactoring actions (e.g. "Fill-in switch cases") work well together with diagnostic fix-its, as they can be presented in an editor and a user can initiate a refactoring operation using a UI that's already familiar to them. I would like to ensure that this kind of mechanism is accessible to refactoring operations that need it. Our current code contains two actions that are implemented in the `Edit` library. They're usable from `Sema`, and we take advantage of that to generate the fix-its during semantic analysis.

## Cross-TU Operations & Indexer Queries

The majority of refactoring actions that we've implemented perform source transformations in a single translation unit (TU). However, there are some actions that need to work across TU boundaries. Xcode 9 includes one such cross-TU action - “Generate Missing Function Definitions”. A naïve cross-TU solution that I've looked at initially was based on an API that had a series of callbacks between Xcode's indexer and libclang. However, I quickly discovered that this model didn't work that well, as it interfered with the code that loaded and handled different TUs in the indexer.

Note: while "Rename" is a global operation, it doesn't actually interact with other TUs because source replacements can be derived from the data stored in the project's index. Thus, our implementation doesn't treat it as a cross-TU operation.

The current implementation of the “Generate Missing Function Definitions" action is based on the "refactoring continuations" API. A refactoring continuation is just a suspended refactoring operation that can be executed in a TU that's different to the one in which the operation was created. Each continuation includes TU-specific state that's automatically managed by Clang, as well as a set of questions about that state. Each individual question is called an "indexer query".

Indexer queries are the core structures that enable cross-TU actions. An external component (like Xcode's indexer) has to evaluate them before running the continuation in order for the operation to work correctly. Each query can be described using four different attributes:

- Question: The kind of question that should be answered. Our implementation exposes the following two kinds at the libclang level: `Decl_FileThatShouldImplement` and `Decl_IsDefined`.
- Subject: The TU-specific state from the parent continuation that acts as the question's subject.
- Result: The value that has to be set by the external indexer component before running the refactoring continuation.
- Action: A flag that represents an additional action that has to be performed by the indexer before running the continuation. We use the `RunContinuationInTUThatHasThisFile` action in Xcode's indexer to determine which TU should be loaded and used when running the refactoring continuation.

Please note that while I mention "Xcode's indexer" quite a lot in this section, our refactoring continuation API is indexer-agnostic, and is designed to work with any indexer. The indexer just has to implement support for the indexer queries that are required for a particular refactoring operation, and the refactoring engine handles everything else. The code sample below shows how our code constructs the refactoring continuation for the "Generate Missing Function Definitions" action:

```
return continueInExternalASTUnit(
/*Indexer query that gets annotated with the RunContinuationInTUThatHasThisFile action */ fileThatShouldContainImplementationOf(Container),
/*Pointer to the continuation function*/ runInImplementationAST,
/*A state value*/ Container,
/*An additional indexer query over some state*/ filter(llvm::makeArrayRef(SelectedMethods), [](const DeclEntity &D) { return !D.isDefined(); }));
```

I think the harder design space will be how to run something on "all files containing references to <X>". What we definitely need is an interface that allows the ops to be sharded out across a fleet of machines (we use C++ ;)  based on the expected amount of work. This has some restrictions on the design, specifically also is why we use SourceManager independent classes to handle replacements etc.

I see. I haven’t thought that much about sharding across machines. However, I do think that we should aim to design structures (for this part of the API in particular) in a manner that can be easily serialized and passed between different processes and machines.


Generally, I fully agree that we can get an interface here that is indexer independent.
 

In practical terms, the following series of events occur in Xcode when it performs the “Generate Missing Function Definitions” operation:

- After the user chooses the "Generate Missing Function Definitions” action, Xcode notifies its indexer that it has to perform that action. The indexer then uses libclang to get the refactoring results after running the action's operation in the initiation TU.
- Xcode's indexer sees that the results provide a refactoring continuation instead of source replacements, so it starts looking at the attached indexer queries.
- Xcode's indexer evaluates each query by looking at the question's kind and the TU-specific state (A set of `CXCursor`s). It sends back the result to libclang using a set of query-agnostic routines that consume different data types.
- After evaluating the queries, Xcode's indexer loads the TU in which the refactoring continuation should run. It knows which TU it has to load by looking at the result of the query that included the `RunContinuationInTUThatHasThisFile` action flag.
- Xcode's indexer then invokes the refactoring continuation in the freshly loaded TU. Clang converts the continuation's state from previously serialized TU-independent state to a state that's specific to the new TU, and continues running the refactoring operation with this state.
- Xcode's indexer then receives the source replacements produced by the refactoring operation and passes them back to Xcode so that it can apply the replacements to the source.

I would like to propose an API model that's based on our current "refactoring continuations" API for the cross-TU operations in the new refactoring engine. Clang will provide a C++ API that can be used in a way that's similar to the code presented above. It will also manage all of the interactions with libclang and Clangd, so individual refactoring operations won't have to worry about the fine details of a particular indexer. Furthermore, the new refactoring engine will limit the set of available actions based on the set of queries that are supported by an indexer, so the indexer that chooses to support refactoring continuations won't have to support all of the queries.

## Testing & Tooling

The `clang-refactor` tool will be used to test the refactoring engine. I would like to propose the following action-agnostic command line interface:

```
clang-refactor <action> [options] <initiation file> [additional build options]
```

I think that the tool should use comment-based annotations in the test source files to control things like initiation. For example, given a selection annotation `// selection: +1:1 -> +2:12` on line 12, `clang-refactor` will map it to a selection range of 13:1 -> 14:12 and will initiate a certain refactoring action using that selection.

Ok, already bikeshedding: can we call it a range instead of a selection? 

Sure. The keyword is not what's important though ;)

 
In addition to `clang-refactor`, I would also like to provide an additional stress-test tool that will work with entire projects (using compilation databases), and will be capable of:

- Initiating/performing refactoring actions at each token in all of the source files in a project.
- Verifying the consistency between the indexer and the refactoring engine to ensure that they have the same model of the source.

This tool will also come with a script that generates compilation databases for Clang's tests.

I also have a great little reviewing tool that helps others to visualize the refactoring changes for a particular action, and I'd be more than happy to share it if you're interested. The tool generates a set of HTML pages that contain the sources of the refactoring tests. The pages get annotated to show the changes that were made during refactoring. We found that looking at this visualization instead of the raw `CHECK` lines makes reviewers' life much easier.

Nice! Additionally, I'd also suggest to use unit tests, especially for small dedicated regression tests where the AST is complex (might be more of a problem for C++).

I generally agree, I think that our code relied on end-to-end regression tests too much. Some self-contained components will be tested with unit tests instead in the new engine.

 

# Action Plan

Initially I would like to adopt the following high-level plan for the creation of the new refactoring engine:

- Upstream any non-refactoring specific changes from our code.
- Move the core of `clang-rename` over to clang (`clang-rename` should still work as before).
- Start working on the `clang-refactor` tool so that it can use the previously moved code. Move the tests from `clang-rename` over to Clang and convert them to `clang-refactor` tests.
- Submit our changes to the core of the `clang-rename` engine.
- Create and test the new global rename component.
- Create an AST source selection component. Initiation via source selection will be based on this component.
- Create a refactoring initiation API that will be easy to use.
- Start submitting our local refactoring actions.
- Start working on the libclang API.
- Create and submit everything else that remains, e.g. Cross-TU Operations & Indexer Queries.

Sounds generally good. Thanks!
 


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] Adding refactoring support to Clang

Xin Wang via cfe-dev
On Tue, Jun 20, 2017 at 11:26 AM Alex Lorenz <[hidden email]> wrote:

On 20 Jun 2017, at 09:27, Manuel Klimek <[hidden email]> wrote:

First, thanks for proposing this - this fits pretty much exactly with what we had envisioned / slowly started working towards, so big thumbs up :) 

Others have already made a lot of interesting observations / points, adding a couple of thoughts inline.

On Fri, Jun 16, 2017 at 1:57 AM Alex Lorenz <[hidden email]> wrote:

Hi,

This is a proposal for a new Clang-based refactoring engine (as indicated by Duncan last week [1]). The new refactoring engine should be based on both the existing code from llvm.org and the code that I've just committed to github.com/apple/swift-clang [2] (Please note that it has some outdated designs and APIs that I'd like to replace). This email is split into three different sections:
- The first section lists the refactoring actions supported by Xcode 9.
- The second section describes the actual proposal.
- The third section outlines an action plan that I would like to follow when submitting patches for the new refactoring engine.

I'm looking forward to any feedback that you might have!

# Background

At first I would like to provide a little bit of background on the C++/Objective-C refactorings in Xcode. Xcode 9 contains a Clang-based refactoring engine that supports several refactoring operations. The primary operation that it supports is “Rename”. The refactoring engine gathers the renamed symbol occurrences using two distinct mechanisms:

- Local rename. It gathers occurrences for declarations that are declared in a function's body. It is based on existing code from `clang-rename` that was moved over to Clang's `Tooling` library. It was then modified slightly to fit our goals. I also added support for some Objective-C declarations.

- Global rename. It gathers occurrences for all other declarations. We decided to put the high-level logic for global rename into Xcode's indexer because Xcode had to support cross-language rename (between Swift and Objective-C). The indexer uses the core engine in Clang’s lib/Index, so Clang is still very much involved in this process. The indexer also delegates low-level “indexed-file” renames to Clang. This lets us gather occurrences for symbolic references that don't have the full location information (like Objective-C selector names), textual matches for occurrences of a matching string in comments, and textual matches for a matching filename in the `#include` directives.

Xcode 9 also support some local refactoring actions:
- Extract function/method.
- Extract duplicate expression.
- Add missing switch cases (GIF: https://reviews.llvm.org/file/data/q2qz53jd7ug4cpx4zlfx/PHID-FILE-aq75y26bgkfvm5e775jt/switchCases). It is also available as a fixit.
- [C++] Fill-in method stubs from abstract base classes.
- Convert if to switch.
- [ObjC] Fill in protocol stubs. It is also available as a fixit.
- [ObjC] Wrap in NSLocalizedString

Another action that Xcode supports is called “Generate Missing Function Definitions” (GIF: https://reviews.llvm.org/file/data/u4kzt62wjdin5dpl65jl/PHID-FILE-w7jl2vz42b5aybd6oizj/generateFunctionDefinitions). This action actually depends on the indexer to determine the set of methods that have to be implemented and where should the generated code go. It’s also the one and only cross-TU refactoring action that is currently supported (without taking "rename" into account).

# Refactoring Engine Proposal

I would like to create a new refactoring engine that will go into Clang's `Tooling` library. The engine's API should be designed with different use cases in mind, and I would like to ensure that the engine accommodates the following three use cases:

- Libclang. It should provide a C-based refactoring API that we can use in Xcode.
- Clangd.
- A standalone refactoring tool, `clang-refactor`. This tool will be used for testing. It will also allow users to perform refactoring actions without using other tools.

The engine should try to follow the following high level goals:
- It should be easy for others to create, test, and integrate new local refactoring operations.
- Refactoring operations should be allowed to fail. The engine should have good error reporting mechanisms.
- Initiation via source selection should be quick (if we assume that we have access to a parsed AST) as we'd like to let a user know which actions are available as soon as possible.
- The engine should allow at least a basic level of configurability and customization (users should be able select options for a refactoring action if an action chooses to provide some). In the future we might also want to provide more powerful configuration mechanisms that will allow IDEs to generate custom UIs for different actions.

## Initiation

I think that there are two equally important modes of initiation that should be supported by Clang's refactoring engine:

- Initiation via source selection. This mode is important for IDEs.
- Initiation via AST queries. This mode is important for standalone tools / scripts that want to refactor an entire project.

Unfortunately our code doesn't provide a good enough abstraction that generalizes well for different actions. I'm still hashing out a better initiation scheme, and I will send a separate RFC with our proposal for initiation in the near future. Let me know if you have any ideas/suggestions!

## Refactoring

I would like to propose keeping the two modes that were previously described for the "Rename" operation. This will probably mean that a standalone tool like `clang-refactor` won't support the global rename mode of rename. On the other hand, if Clangd will gain some form of indexing infrastructure (which seems likely given the recent interest in indexing [3]), it will be able to use the engine's global renaming infrastructure as well. I'm hoping the community will provide some input on what should happen to `clang-rename`, as I'm not sure whether it should stay or be deprecated in favour of new tools.

Other refactoring operations will primarily work with source modifications. I think that the refactoring engine should provide a set of shared utility components that can be reused by different operations. For example, things like return type deduction for extracted declarations should be accessible to all operations. This kind of design should also make testing easier.

Certain refactoring actions (e.g. "Fill-in switch cases") work well together with diagnostic fix-its, as they can be presented in an editor and a user can initiate a refactoring operation using a UI that's already familiar to them. I would like to ensure that this kind of mechanism is accessible to refactoring operations that need it. Our current code contains two actions that are implemented in the `Edit` library. They're usable from `Sema`, and we take advantage of that to generate the fix-its during semantic analysis.

## Cross-TU Operations & Indexer Queries

The majority of refactoring actions that we've implemented perform source transformations in a single translation unit (TU). However, there are some actions that need to work across TU boundaries. Xcode 9 includes one such cross-TU action - “Generate Missing Function Definitions”. A naïve cross-TU solution that I've looked at initially was based on an API that had a series of callbacks between Xcode's indexer and libclang. However, I quickly discovered that this model didn't work that well, as it interfered with the code that loaded and handled different TUs in the indexer.

Note: while "Rename" is a global operation, it doesn't actually interact with other TUs because source replacements can be derived from the data stored in the project's index. Thus, our implementation doesn't treat it as a cross-TU operation.

The current implementation of the “Generate Missing Function Definitions" action is based on the "refactoring continuations" API. A refactoring continuation is just a suspended refactoring operation that can be executed in a TU that's different to the one in which the operation was created. Each continuation includes TU-specific state that's automatically managed by Clang, as well as a set of questions about that state. Each individual question is called an "indexer query".

Indexer queries are the core structures that enable cross-TU actions. An external component (like Xcode's indexer) has to evaluate them before running the continuation in order for the operation to work correctly. Each query can be described using four different attributes:

- Question: The kind of question that should be answered. Our implementation exposes the following two kinds at the libclang level: `Decl_FileThatShouldImplement` and `Decl_IsDefined`.
- Subject: The TU-specific state from the parent continuation that acts as the question's subject.
- Result: The value that has to be set by the external indexer component before running the refactoring continuation.
- Action: A flag that represents an additional action that has to be performed by the indexer before running the continuation. We use the `RunContinuationInTUThatHasThisFile` action in Xcode's indexer to determine which TU should be loaded and used when running the refactoring continuation.

Please note that while I mention "Xcode's indexer" quite a lot in this section, our refactoring continuation API is indexer-agnostic, and is designed to work with any indexer. The indexer just has to implement support for the indexer queries that are required for a particular refactoring operation, and the refactoring engine handles everything else. The code sample below shows how our code constructs the refactoring continuation for the "Generate Missing Function Definitions" action:

```
return continueInExternalASTUnit(
/*Indexer query that gets annotated with the RunContinuationInTUThatHasThisFile action */ fileThatShouldContainImplementationOf(Container),
/*Pointer to the continuation function*/ runInImplementationAST,
/*A state value*/ Container,
/*An additional indexer query over some state*/ filter(llvm::makeArrayRef(SelectedMethods), [](const DeclEntity &D) { return !D.isDefined(); }));
```

I think the harder design space will be how to run something on "all files containing references to <X>". What we definitely need is an interface that allows the ops to be sharded out across a fleet of machines (we use C++ ;)  based on the expected amount of work. This has some restrictions on the design, specifically also is why we use SourceManager independent classes to handle replacements etc.

I see. I haven’t thought that much about sharding across machines. However, I do think that we should aim to design structures (for this part of the API in particular) in a manner that can be easily serialized and passed between different processes and machines.

Yep, exactly. This is basically a prediction that folks from our side will put some eye on these things during code review, as we have some experience what does (and does not ;) work here.

Looking forward to the patches :D
 
Generally, I fully agree that we can get an interface here that is indexer independent.
 

In practical terms, the following series of events occur in Xcode when it performs the “Generate Missing Function Definitions” operation:

- After the user chooses the "Generate Missing Function Definitions” action, Xcode notifies its indexer that it has to perform that action. The indexer then uses libclang to get the refactoring results after running the action's operation in the initiation TU.
- Xcode's indexer sees that the results provide a refactoring continuation instead of source replacements, so it starts looking at the attached indexer queries.
- Xcode's indexer evaluates each query by looking at the question's kind and the TU-specific state (A set of `CXCursor`s). It sends back the result to libclang using a set of query-agnostic routines that consume different data types.
- After evaluating the queries, Xcode's indexer loads the TU in which the refactoring continuation should run. It knows which TU it has to load by looking at the result of the query that included the `RunContinuationInTUThatHasThisFile` action flag.
- Xcode's indexer then invokes the refactoring continuation in the freshly loaded TU. Clang converts the continuation's state from previously serialized TU-independent state to a state that's specific to the new TU, and continues running the refactoring operation with this state.
- Xcode's indexer then receives the source replacements produced by the refactoring operation and passes them back to Xcode so that it can apply the replacements to the source.

I would like to propose an API model that's based on our current "refactoring continuations" API for the cross-TU operations in the new refactoring engine. Clang will provide a C++ API that can be used in a way that's similar to the code presented above. It will also manage all of the interactions with libclang and Clangd, so individual refactoring operations won't have to worry about the fine details of a particular indexer. Furthermore, the new refactoring engine will limit the set of available actions based on the set of queries that are supported by an indexer, so the indexer that chooses to support refactoring continuations won't have to support all of the queries.

## Testing & Tooling

The `clang-refactor` tool will be used to test the refactoring engine. I would like to propose the following action-agnostic command line interface:

```
clang-refactor <action> [options] <initiation file> [additional build options]
```

I think that the tool should use comment-based annotations in the test source files to control things like initiation. For example, given a selection annotation `// selection: +1:1 -> +2:12` on line 12, `clang-refactor` will map it to a selection range of 13:1 -> 14:12 and will initiate a certain refactoring action using that selection.

Ok, already bikeshedding: can we call it a range instead of a selection? 

Sure. The keyword is not what's important though ;)

 
In addition to `clang-refactor`, I would also like to provide an additional stress-test tool that will work with entire projects (using compilation databases), and will be capable of:

- Initiating/performing refactoring actions at each token in all of the source files in a project.
- Verifying the consistency between the indexer and the refactoring engine to ensure that they have the same model of the source.

This tool will also come with a script that generates compilation databases for Clang's tests.

I also have a great little reviewing tool that helps others to visualize the refactoring changes for a particular action, and I'd be more than happy to share it if you're interested. The tool generates a set of HTML pages that contain the sources of the refactoring tests. The pages get annotated to show the changes that were made during refactoring. We found that looking at this visualization instead of the raw `CHECK` lines makes reviewers' life much easier.

Nice! Additionally, I'd also suggest to use unit tests, especially for small dedicated regression tests where the AST is complex (might be more of a problem for C++).

I generally agree, I think that our code relied on end-to-end regression tests too much. Some self-contained components will be tested with unit tests instead in the new engine.

 

# Action Plan

Initially I would like to adopt the following high-level plan for the creation of the new refactoring engine:

- Upstream any non-refactoring specific changes from our code.
- Move the core of `clang-rename` over to clang (`clang-rename` should still work as before).
- Start working on the `clang-refactor` tool so that it can use the previously moved code. Move the tests from `clang-rename` over to Clang and convert them to `clang-refactor` tests.
- Submit our changes to the core of the `clang-rename` engine.
- Create and test the new global rename component.
- Create an AST source selection component. Initiation via source selection will be based on this component.
- Create a refactoring initiation API that will be easy to use.
- Start submitting our local refactoring actions.
- Start working on the libclang API.
- Create and submit everything else that remains, e.g. Cross-TU Operations & Indexer Queries.

Sounds generally good. Thanks!
 

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] Adding refactoring support to Clang

Xin Wang via cfe-dev
In reply to this post by Xin Wang via cfe-dev

Hi Alex,


Thanks a lot for doing this! I think what you're proposing sounds very good. Just a few questions.


- Have you given some thoughts on how to handle refactorings in the presence or inactive preprocessor directives? For example, renaming a symbol that is also referenced in an inactive section. I don't know if it's already something that is handled by clang-rename? That might mean having the concept of "potential matches" that the user would have to review and unselect. Otherwise, To get fully accurate semantic renaming, it would mean having the index built for all possible configurations and query all configurations during the rename, which seems excessive.


- You mentioned a few times that the refactoring support would be indexer-agnostic and that any external indexer could be used. That lead me to wonder if it was ever considered in the past to put Xcode's indexer in the Clang codebase? Since Clangd will have to implement its own (although reusing some lib/Index things), I think that it would be ideal to collaborate on the same one. I

bring this up because it seems to me that it's the only missing bigger piece in an otherwise pretty complete solution.


Cheers,

Marc-André


From: [hidden email] <[hidden email]> on behalf of Alex Lorenz <[hidden email]>
Sent: Thursday, June 15, 2017 7:57:14 PM
To: clang developer list
Cc: Duncan P. N. Exon Smith; Vedant Kumar; Argyrios Kyrtzidis; Ben Langmuir; Marc-André Laperle; [hidden email]; [hidden email]
Subject: [RFC] Adding refactoring support to Clang
 

Hi,

This is a proposal for a new Clang-based refactoring engine (as indicated by Duncan last week [1]). The new refactoring engine should be based on both the existing code from llvm.org and the code that I've just committed to github.com/apple/swift-clang [2] (Please note that it has some outdated designs and APIs that I'd like to replace). This email is split into three different sections:
- The first section lists the refactoring actions supported by Xcode 9.
- The second section describes the actual proposal.
- The third section outlines an action plan that I would like to follow when submitting patches for the new refactoring engine.

I'm looking forward to any feedback that you might have!

# Background

At first I would like to provide a little bit of background on the C++/Objective-C refactorings in Xcode. Xcode 9 contains a Clang-based refactoring engine that supports several refactoring operations. The primary operation that it supports is “Rename”. The refactoring engine gathers the renamed symbol occurrences using two distinct mechanisms:

- Local rename. It gathers occurrences for declarations that are declared in a function's body. It is based on existing code from `clang-rename` that was moved over to Clang's `Tooling` library. It was then modified slightly to fit our goals. I also added support for some Objective-C declarations.

- Global rename. It gathers occurrences for all other declarations. We decided to put the high-level logic for global rename into Xcode's indexer because Xcode had to support cross-language rename (between Swift and Objective-C). The indexer uses the core engine in Clang’s lib/Index, so Clang is still very much involved in this process. The indexer also delegates low-level “indexed-file” renames to Clang. This lets us gather occurrences for symbolic references that don't have the full location information (like Objective-C selector names), textual matches for occurrences of a matching string in comments, and textual matches for a matching filename in the `#include` directives.

Xcode 9 also support some local refactoring actions:
- Extract function/method.
- Extract duplicate expression.
- Add missing switch cases (GIF: https://reviews.llvm.org/file/data/q2qz53jd7ug4cpx4zlfx/PHID-FILE-aq75y26bgkfvm5e775jt/switchCases). It is also available as a fixit.
- [C++] Fill-in method stubs from abstract base classes.
- Convert if to switch.
- [ObjC] Fill in protocol stubs. It is also available as a fixit.
- [ObjC] Wrap in NSLocalizedString

Another action that Xcode supports is called “Generate Missing Function Definitions” (GIF: https://reviews.llvm.org/file/data/u4kzt62wjdin5dpl65jl/PHID-FILE-w7jl2vz42b5aybd6oizj/generateFunctionDefinitions). This action actually depends on the indexer to determine the set of methods that have to be implemented and where should the generated code go. It’s also the one and only cross-TU refactoring action that is currently supported (without taking "rename" into account).

# Refactoring Engine Proposal

I would like to create a new refactoring engine that will go into Clang's `Tooling` library. The engine's API should be designed with different use cases in mind, and I would like to ensure that the engine accommodates the following three use cases:

- Libclang. It should provide a C-based refactoring API that we can use in Xcode.
- Clangd.
- A standalone refactoring tool, `clang-refactor`. This tool will be used for testing. It will also allow users to perform refactoring actions without using other tools.

The engine should try to follow the following high level goals:
- It should be easy for others to create, test, and integrate new local refactoring operations.
- Refactoring operations should be allowed to fail. The engine should have good error reporting mechanisms.
- Initiation via source selection should be quick (if we assume that we have access to a parsed AST) as we'd like to let a user know which actions are available as soon as possible.
- The engine should allow at least a basic level of configurability and customization (users should be able select options for a refactoring action if an action chooses to provide some). In the future we might also want to provide more powerful configuration mechanisms that will allow IDEs to generate custom UIs for different actions.

## Initiation

I think that there are two equally important modes of initiation that should be supported by Clang's refactoring engine:

- Initiation via source selection. This mode is important for IDEs.
- Initiation via AST queries. This mode is important for standalone tools / scripts that want to refactor an entire project.

Unfortunately our code doesn't provide a good enough abstraction that generalizes well for different actions. I'm still hashing out a better initiation scheme, and I will send a separate RFC with our proposal for initiation in the near future. Let me know if you have any ideas/suggestions!

## Refactoring

I would like to propose keeping the two modes that were previously described for the "Rename" operation. This will probably mean that a standalone tool like `clang-refactor` won't support the global rename mode of rename. On the other hand, if Clangd will gain some form of indexing infrastructure (which seems likely given the recent interest in indexing [3]), it will be able to use the engine's global renaming infrastructure as well. I'm hoping the community will provide some input on what should happen to `clang-rename`, as I'm not sure whether it should stay or be deprecated in favour of new tools.

Other refactoring operations will primarily work with source modifications. I think that the refactoring engine should provide a set of shared utility components that can be reused by different operations. For example, things like return type deduction for extracted declarations should be accessible to all operations. This kind of design should also make testing easier.

Certain refactoring actions (e.g. "Fill-in switch cases") work well together with diagnostic fix-its, as they can be presented in an editor and a user can initiate a refactoring operation using a UI that's already familiar to them. I would like to ensure that this kind of mechanism is accessible to refactoring operations that need it. Our current code contains two actions that are implemented in the `Edit` library. They're usable from `Sema`, and we take advantage of that to generate the fix-its during semantic analysis.

## Cross-TU Operations & Indexer Queries

The majority of refactoring actions that we've implemented perform source transformations in a single translation unit (TU). However, there are some actions that need to work across TU boundaries. Xcode 9 includes one such cross-TU action - “Generate Missing Function Definitions”. A naïve cross-TU solution that I've looked at initially was based on an API that had a series of callbacks between Xcode's indexer and libclang. However, I quickly discovered that this model didn't work that well, as it interfered with the code that loaded and handled different TUs in the indexer.

Note: while "Rename" is a global operation, it doesn't actually interact with other TUs because source replacements can be derived from the data stored in the project's index. Thus, our implementation doesn't treat it as a cross-TU operation.

The current implementation of the “Generate Missing Function Definitions" action is based on the "refactoring continuations" API. A refactoring continuation is just a suspended refactoring operation that can be executed in a TU that's different to the one in which the operation was created. Each continuation includes TU-specific state that's automatically managed by Clang, as well as a set of questions about that state. Each individual question is called an "indexer query".

Indexer queries are the core structures that enable cross-TU actions. An external component (like Xcode's indexer) has to evaluate them before running the continuation in order for the operation to work correctly. Each query can be described using four different attributes:

- Question: The kind of question that should be answered. Our implementation exposes the following two kinds at the libclang level: `Decl_FileThatShouldImplement` and `Decl_IsDefined`.
- Subject: The TU-specific state from the parent continuation that acts as the question's subject.
- Result: The value that has to be set by the external indexer component before running the refactoring continuation.
- Action: A flag that represents an additional action that has to be performed by the indexer before running the continuation. We use the `RunContinuationInTUThatHasThisFile` action in Xcode's indexer to determine which TU should be loaded and used when running the refactoring continuation.

Please note that while I mention "Xcode's indexer" quite a lot in this section, our refactoring continuation API is indexer-agnostic, and is designed to work with any indexer. The indexer just has to implement support for the indexer queries that are required for a particular refactoring operation, and the refactoring engine handles everything else. The code sample below shows how our code constructs the refactoring continuation for the "Generate Missing Function Definitions" action:

```
return continueInExternalASTUnit(
/*Indexer query that gets annotated with the RunContinuationInTUThatHasThisFile action */ fileThatShouldContainImplementationOf(Container),
/*Pointer to the continuation function*/ runInImplementationAST,
/*A state value*/ Container,
/*An additional indexer query over some state*/ filter(llvm::makeArrayRef(SelectedMethods), [](const DeclEntity &D) { return !D.isDefined(); }));
```

In practical terms, the following series of events occur in Xcode when it performs the “Generate Missing Function Definitions” operation:

- After the user chooses the "Generate Missing Function Definitions” action, Xcode notifies its indexer that it has to perform that action. The indexer then uses libclang to get the refactoring results after running the action's operation in the initiation TU.
- Xcode's indexer sees that the results provide a refactoring continuation instead of source replacements, so it starts looking at the attached indexer queries.
- Xcode's indexer evaluates each query by looking at the question's kind and the TU-specific state (A set of `CXCursor`s). It sends back the result to libclang using a set of query-agnostic routines that consume different data types.
- After evaluating the queries, Xcode's indexer loads the TU in which the refactoring continuation should run. It knows which TU it has to load by looking at the result of the query that included the `RunContinuationInTUThatHasThisFile` action flag.
- Xcode's indexer then invokes the refactoring continuation in the freshly loaded TU. Clang converts the continuation's state from previously serialized TU-independent state to a state that's specific to the new TU, and continues running the refactoring operation with this state.
- Xcode's indexer then receives the source replacements produced by the refactoring operation and passes them back to Xcode so that it can apply the replacements to the source.

I would like to propose an API model that's based on our current "refactoring continuations" API for the cross-TU operations in the new refactoring engine. Clang will provide a C++ API that can be used in a way that's similar to the code presented above. It will also manage all of the interactions with libclang and Clangd, so individual refactoring operations won't have to worry about the fine details of a particular indexer. Furthermore, the new refactoring engine will limit the set of available actions based on the set of queries that are supported by an indexer, so the indexer that chooses to support refactoring continuations won't have to support all of the queries.

## Testing & Tooling

The `clang-refactor` tool will be used to test the refactoring engine. I would like to propose the following action-agnostic command line interface:

```
clang-refactor <action> [options] <initiation file> [additional build options]
```

I think that the tool should use comment-based annotations in the test source files to control things like initiation. For example, given a selection annotation `// selection: +1:1 -> +2:12` on line 12, `clang-refactor` will map it to a selection range of 13:1 -> 14:12 and will initiate a certain refactoring action using that selection.

In addition to `clang-refactor`, I would also like to provide an additional stress-test tool that will work with entire projects (using compilation databases), and will be capable of:

- Initiating/performing refactoring actions at each token in all of the source files in a project.
- Verifying the consistency between the indexer and the refactoring engine to ensure that they have the same model of the source.

This tool will also come with a script that generates compilation databases for Clang's tests.

I also have a great little reviewing tool that helps others to visualize the refactoring changes for a particular action, and I'd be more than happy to share it if you're interested. The tool generates a set of HTML pages that contain the sources of the refactoring tests. The pages get annotated to show the changes that were made during refactoring. We found that looking at this visualization instead of the raw `CHECK` lines makes reviewers' life much easier.

# Action Plan

Initially I would like to adopt the following high-level plan for the creation of the new refactoring engine:

- Upstream any non-refactoring specific changes from our code.
- Move the core of `clang-rename` over to clang (`clang-rename` should still work as before).
- Start working on the `clang-refactor` tool so that it can use the previously moved code. Move the tests from `clang-rename` over to Clang and convert them to `clang-refactor` tests.
- Submit our changes to the core of the `clang-rename` engine.
- Create and test the new global rename component.
- Create an AST source selection component. Initiation via source selection will be based on this component.
- Create a refactoring initiation API that will be easy to use.
- Start submitting our local refactoring actions.
- Start working on the libclang API.
- Create and submit everything else that remains, e.g. Cross-TU Operations & Indexer Queries.

Cheers,
Alex



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] Adding refactoring support to Clang

Xin Wang via cfe-dev


On Jun 21, 2017, at 8:58 PM, Marc-André Laperle <[hidden email]> wrote:

Hi Alex,

Thanks a lot for doing this! I think what you're proposing sounds very good. Just a few questions.

- Have you given some thoughts on how to handle refactorings in the presence or inactive preprocessor directives? For example, renaming a symbol that is also referenced in an inactive section. I don't know if it's already something that is handled by clang-rename? That might mean having the concept of "potential matches" that the user would have to review and unselect. Otherwise, To get fully accurate semantic renaming, it would mean having the index built for all possible configurations and query all configurations during the rename, which seems excessive.

I think we could probably use textual in the inactive code blocks and report textual occurrences to the user (that won’t be renamed by default, so the user will have to explicitly approve them). Clang currently doesn't find such occurrences at all, but Swift can. Although Swift doesn’t use a preprocessor, its `#if` directives can still produce inactive code blocks (that are parsed but not type checked). I believe for such cases Swift simply uses textual search inside the inactive code blocks. Clang’s “global rename" could do something similar. We would have to record the skipped PP ranges while indexing first (we currently don’t store them) and then feed them into the global renaming engine.


- You mentioned a few times that the refactoring support would be indexer-agnostic and that any external indexer could be used. That lead me to wonder if it was ever considered in the past to put Xcode's indexer in the Clang codebase? Since Clangd will have to implement its own (although reusing some lib/Index things), I think that it would be ideal to collaborate on the same one. I 
bring this up because it seems to me that it's the only missing bigger piece in an otherwise pretty complete solution.

I don't know the answer to this question. Argyrios or Duncan are probably the right people for it.


Cheers,
Marc-André

From: [hidden email] <[hidden email]> on behalf of Alex Lorenz <[hidden email]>
Sent: Thursday, June 15, 2017 7:57:14 PM
To: clang developer list
Cc: Duncan P. N. Exon Smith; Vedant Kumar; Argyrios Kyrtzidis; Ben Langmuir; Marc-André Laperle; [hidden email]; [hidden email]
Subject: [RFC] Adding refactoring support to Clang
 

Hi,

This is a proposal for a new Clang-based refactoring engine (as indicated by Duncan last week [1]). The new refactoring engine should be based on both the existing code from llvm.org and the code that I've just committed to github.com/apple/swift-clang[2] (Please note that it has some outdated designs and APIs that I'd like to replace). This email is split into three different sections:
- The first section lists the refactoring actions supported by Xcode 9.
- The second section describes the actual proposal.
- The third section outlines an action plan that I would like to follow when submitting patches for the new refactoring engine.

I'm looking forward to any feedback that you might have!

# Background

At first I would like to provide a little bit of background on the C++/Objective-C refactorings in Xcode. Xcode 9 contains a Clang-based refactoring engine that supports several refactoring operations. The primary operation that it supports is “Rename”. The refactoring engine gathers the renamed symbol occurrences using two distinct mechanisms:

- Local rename. It gathers occurrences for declarations that are declared in a function's body. It is based on existing code from`clang-rename` that was moved over to Clang's `Tooling` library. It was then modified slightly to fit our goals. I also added support for some Objective-C declarations.

- Global rename. It gathers occurrences for all other declarations. We decided to put the high-level logic for global rename into Xcode's indexer because Xcode had to support cross-language rename (between Swift and Objective-C). The indexer uses the core engine in Clang’s lib/Index, so Clang is still very much involved in this process. The indexer also delegates low-level “indexed-file” renames to Clang. This lets us gather occurrences for symbolic references that don't have the full location information (like Objective-C selector names), textual matches for occurrences of a matching string in comments, and textual matches for a matching filename in the `#include` directives.

Xcode 9 also support some local refactoring actions:
- Extract function/method.
- Extract duplicate expression.
- Add missing switch cases (GIF: https://reviews.llvm.org/file/data/q2qz53jd7ug4cpx4zlfx/PHID-FILE-aq75y26bgkfvm5e775jt/switchCases). It is also available as a fixit.
- [C++] Fill-in method stubs from abstract base classes.
- Convert if to switch.
- [ObjC] Fill in protocol stubs. It is also available as a fixit.
- [ObjC] Wrap in NSLocalizedString

Another action that Xcode supports is called “Generate Missing Function Definitions” (GIF:https://reviews.llvm.org/file/data/u4kzt62wjdin5dpl65jl/PHID-FILE-w7jl2vz42b5aybd6oizj/generateFunctionDefinitions). This action actually depends on the indexer to determine the set of methods that have to be implemented and where should the generated code go. It’s also the one and only cross-TU refactoring action that is currently supported (without taking "rename" into account).

# Refactoring Engine Proposal

I would like to create a new refactoring engine that will go into Clang's `Tooling` library. The engine's API should be designed with different use cases in mind, and I would like to ensure that the engine accommodates the following three use cases:

- Libclang. It should provide a C-based refactoring API that we can use in Xcode.
- Clangd.
- A standalone refactoring tool, `clang-refactor`. This tool will be used for testing. It will also allow users to perform refactoring actions without using other tools.

The engine should try to follow the following high level goals:
- It should be easy for others to create, test, and integrate new local refactoring operations.
- Refactoring operations should be allowed to fail. The engine should have good error reporting mechanisms.
- Initiation via source selection should be quick (if we assume that we have access to a parsed AST) as we'd like to let a user know which actions are available as soon as possible.
- The engine should allow at least a basic level of configurability and customization (users should be able select options for a refactoring action if an action chooses to provide some). In the future we might also want to provide more powerful configuration mechanisms that will allow IDEs to generate custom UIs for different actions.

## Initiation

I think that there are two equally important modes of initiation that should be supported by Clang's refactoring engine:

- Initiation via source selection. This mode is important for IDEs.
- Initiation via AST queries. This mode is important for standalone tools / scripts that want to refactor an entire project.

Unfortunately our code doesn't provide a good enough abstraction that generalizes well for different actions. I'm still hashing out a better initiation scheme, and I will send a separate RFC with our proposal for initiation in the near future. Let me know if you have any ideas/suggestions!

## Refactoring

I would like to propose keeping the two modes that were previously described for the "Rename" operation. This will probably mean that a standalone tool like `clang-refactor` won't support the global rename mode of rename. On the other hand, if Clangd will gain some form of indexing infrastructure (which seems likely given the recent interest in indexing [3]), it will be able to use the engine's global renaming infrastructure as well. I'm hoping the community will provide some input on what should happen to `clang-rename`, as I'm not sure whether it should stay or be deprecated in favour of new tools.

Other refactoring operations will primarily work with source modifications. I think that the refactoring engine should provide a set of shared utility components that can be reused by different operations. For example, things like return type deduction for extracted declarations should be accessible to all operations. This kind of design should also make testing easier.

Certain refactoring actions (e.g. "Fill-in switch cases") work well together with diagnostic fix-its, as they can be presented in an editor and a user can initiate a refactoring operation using a UI that's already familiar to them. I would like to ensure that this kind of mechanism is accessible to refactoring operations that need it. Our current code contains two actions that are implemented in the `Edit` library. They're usable from `Sema`, and we take advantage of that to generate the fix-its during semantic analysis.

## Cross-TU Operations & Indexer Queries

The majority of refactoring actions that we've implemented perform source transformations in a single translation unit (TU). However, there are some actions that need to work across TU boundaries. Xcode 9 includes one such cross-TU action - “Generate Missing Function Definitions”. A naïve cross-TU solution that I've looked at initially was based on an API that had a series of callbacks between Xcode's indexer and libclang. However, I quickly discovered that this model didn't work that well, as it interfered with the code that loaded and handled different TUs in the indexer.

Note: while "Rename" is a global operation, it doesn't actually interact with other TUs because source replacements can be derived from the data stored in the project's index. Thus, our implementation doesn't treat it as a cross-TU operation.

The current implementation of the “Generate Missing Function Definitions" action is based on the "refactoring continuations" API. A refactoring continuation is just a suspended refactoring operation that can be executed in a TU that's different to the one in which the operation was created. Each continuation includes TU-specific state that's automatically managed by Clang, as well as a set of questions about that state. Each individual question is called an "indexer query".

Indexer queries are the core structures that enable cross-TU actions. An external component (like Xcode's indexer) has to evaluate them before running the continuation in order for the operation to work correctly. Each query can be described using four different attributes:

- Question: The kind of question that should be answered. Our implementation exposes the following two kinds at the libclang level: `Decl_FileThatShouldImplement` and `Decl_IsDefined`.
- Subject: The TU-specific state from the parent continuation that acts as the question's subject.
- Result: The value that has to be set by the external indexer component before running the refactoring continuation.
- Action: A flag that represents an additional action that has to be performed by the indexer before running the continuation. We use the `RunContinuationInTUThatHasThisFile` action in Xcode's indexer to determine which TU should be loaded and used when running the refactoring continuation.

Please note that while I mention "Xcode's indexer" quite a lot in this section, our refactoring continuation API is indexer-agnostic, and is designed to work with any indexer. The indexer just has to implement support for the indexer queries that are required for a particular refactoring operation, and the refactoring engine handles everything else. The code sample below shows how our code constructs the refactoring continuation for the "Generate Missing Function Definitions" action:

```
return continueInExternalASTUnit(
/*Indexer query that gets annotated with the RunContinuationInTUThatHasThisFile action */ fileThatShouldContainImplementationOf(Container),
/*Pointer to the continuation function*/ runInImplementationAST,
/*A state value*/ Container,
/*An additional indexer query over some state*/ filter(llvm::makeArrayRef(SelectedMethods), [](const DeclEntity &D) { return !D.isDefined(); }));
```

In practical terms, the following series of events occur in Xcode when it performs the “Generate Missing Function Definitions” operation:

- After the user chooses the "Generate Missing Function Definitions” action, Xcode notifies its indexer that it has to perform that action. The indexer then uses libclang to get the refactoring results after running the action's operation in the initiation TU.
- Xcode's indexer sees that the results provide a refactoring continuation instead of source replacements, so it starts looking at the attached indexer queries.
- Xcode's indexer evaluates each query by looking at the question's kind and the TU-specific state (A set of `CXCursor`s). It sends back the result to libclang using a set of query-agnostic routines that consume different data types.
- After evaluating the queries, Xcode's indexer loads the TU in which the refactoring continuation should run. It knows which TU it has to load by looking at the result of the query that included the `RunContinuationInTUThatHasThisFile` action flag.
- Xcode's indexer then invokes the refactoring continuation in the freshly loaded TU. Clang converts the continuation's state from previously serialized TU-independent state to a state that's specific to the new TU, and continues running the refactoring operation with this state.
- Xcode's indexer then receives the source replacements produced by the refactoring operation and passes them back to Xcode so that it can apply the replacements to the source.

I would like to propose an API model that's based on our current "refactoring continuations" API for the cross-TU operations in the new refactoring engine. Clang will provide a C++ API that can be used in a way that's similar to the code presented above. It will also manage all of the interactions with libclang and Clangd, so individual refactoring operations won't have to worry about the fine details of a particular indexer. Furthermore, the new refactoring engine will limit the set of available actions based on the set of queries that are supported by an indexer, so the indexer that chooses to support refactoring continuations won't have to support all of the queries.

## Testing & Tooling

The `clang-refactor` tool will be used to test the refactoring engine. I would like to propose the following action-agnostic command line interface:

```
clang-refactor <action> [options] <initiation file> [additional build options]
```

I think that the tool should use comment-based annotations in the test source files to control things like initiation. For example, given a selection annotation `// selection: +1:1 -> +2:12` on line 12, `clang-refactor` will map it to a selection range of 13:1 -> 14:12 and will initiate a certain refactoring action using that selection.

In addition to `clang-refactor`, I would also like to provide an additional stress-test tool that will work with entire projects (using compilation databases), and will be capable of:

- Initiating/performing refactoring actions at each token in all of the source files in a project.
- Verifying the consistency between the indexer and the refactoring engine to ensure that they have the same model of the source.

This tool will also come with a script that generates compilation databases for Clang's tests.

I also have a great little reviewing tool that helps others to visualize the refactoring changes for a particular action, and I'd be more than happy to share it if you're interested. The tool generates a set of HTML pages that contain the sources of the refactoring tests. The pages get annotated to show the changes that were made during refactoring. We found that looking at this visualization instead of the raw `CHECK` lines makes reviewers' life much easier.

# Action Plan

Initially I would like to adopt the following high-level plan for the creation of the new refactoring engine:

- Upstream any non-refactoring specific changes from our code.
- Move the core of `clang-rename` over to clang (`clang-rename` should still work as before).
- Start working on the `clang-refactor` tool so that it can use the previously moved code. Move the tests from `clang-rename`over to Clang and convert them to `clang-refactor` tests.
- Submit our changes to the core of the `clang-rename` engine.
- Create and test the new global rename component.
- Create an AST source selection component. Initiation via source selection will be based on this component.
- Create a refactoring initiation API that will be easy to use.
- Start submitting our local refactoring actions.
- Start working on the libclang API.
- Create and submit everything else that remains, e.g. Cross-TU Operations & Indexer Queries.

Cheers,
Alex



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] Adding refactoring support to Clang

Xin Wang via cfe-dev

"We would have to record the skipped PP ranges while indexing first (we currently don’t store them) and then feed them into the global renaming engine."


That sounds like a good approach. Thanks!


From: [hidden email] <[hidden email]> on behalf of Alex Lorenz <[hidden email]>
Sent: Thursday, June 22, 2017 10:57:32 AM
To: Marc-André Laperle
Cc: clang developer list; Duncan P. N. Exon Smith; Vedant Kumar; Argyrios Kyrtzidis; Ben Langmuir; [hidden email]; [hidden email]
Subject: Re: [RFC] Adding refactoring support to Clang
 


On Jun 21, 2017, at 8:58 PM, Marc-André Laperle <[hidden email]> wrote:

Hi Alex,

Thanks a lot for doing this! I think what you're proposing sounds very good. Just a few questions.

- Have you given some thoughts on how to handle refactorings in the presence or inactive preprocessor directives? For example, renaming a symbol that is also referenced in an inactive section. I don't know if it's already something that is handled by clang-rename? That might mean having the concept of "potential matches" that the user would have to review and unselect. Otherwise, To get fully accurate semantic renaming, it would mean having the index built for all possible configurations and query all configurations during the rename, which seems excessive.

I think we could probably use textual in the inactive code blocks and report textual occurrences to the user (that won’t be renamed by default, so the user will have to explicitly approve them). Clang currently doesn't find such occurrences at all, but Swift can. Although Swift doesn’t use a preprocessor, its `#if` directives can still produce inactive code blocks (that are parsed but not type checked). I believe for such cases Swift simply uses textual search inside the inactive code blocks. Clang’s “global rename" could do something similar. We would have to record the skipped PP ranges while indexing first (we currently don’t store them) and then feed them into the global renaming engine.


- You mentioned a few times that the refactoring support would be indexer-agnostic and that any external indexer could be used. That lead me to wonder if it was ever considered in the past to put Xcode's indexer in the Clang codebase? Since Clangd will have to implement its own (although reusing some lib/Index things), I think that it would be ideal to collaborate on the same one. I 
bring this up because it seems to me that it's the only missing bigger piece in an otherwise pretty complete solution.

I don't know the answer to this question. Argyrios or Duncan are probably the right people for it.


Cheers,
Marc-André

From: [hidden email] <[hidden email]> on behalf of Alex Lorenz <[hidden email]>
Sent: Thursday, June 15, 2017 7:57:14 PM
To: clang developer list
Cc: Duncan P. N. Exon Smith; Vedant Kumar; Argyrios Kyrtzidis; Ben Langmuir; Marc-André Laperle; [hidden email]; [hidden email]
Subject: [RFC] Adding refactoring support to Clang
 

Hi,

This is a proposal for a new Clang-based refactoring engine (as indicated by Duncan last week [1]). The new refactoring engine should be based on both the existing code from llvm.org and the code that I've just committed to github.com/apple/swift-clang[2] (Please note that it has some outdated designs and APIs that I'd like to replace). This email is split into three different sections:
- The first section lists the refactoring actions supported by Xcode 9.
- The second section describes the actual proposal.
- The third section outlines an action plan that I would like to follow when submitting patches for the new refactoring engine.

I'm looking forward to any feedback that you might have!

# Background

At first I would like to provide a little bit of background on the C++/Objective-C refactorings in Xcode. Xcode 9 contains a Clang-based refactoring engine that supports several refactoring operations. The primary operation that it supports is “Rename”. The refactoring engine gathers the renamed symbol occurrences using two distinct mechanisms:

- Local rename. It gathers occurrences for declarations that are declared in a function's body. It is based on existing code from`clang-rename` that was moved over to Clang's `Tooling` library. It was then modified slightly to fit our goals. I also added support for some Objective-C declarations.

- Global rename. It gathers occurrences for all other declarations. We decided to put the high-level logic for global rename into Xcode's indexer because Xcode had to support cross-language rename (between Swift and Objective-C). The indexer uses the core engine in Clang’s lib/Index, so Clang is still very much involved in this process. The indexer also delegates low-level “indexed-file” renames to Clang. This lets us gather occurrences for symbolic references that don't have the full location information (like Objective-C selector names), textual matches for occurrences of a matching string in comments, and textual matches for a matching filename in the `#include` directives.

Xcode 9 also support some local refactoring actions:
- Extract function/method.
- Extract duplicate expression.
- Add missing switch cases (GIF: https://reviews.llvm.org/file/data/q2qz53jd7ug4cpx4zlfx/PHID-FILE-aq75y26bgkfvm5e775jt/switchCases). It is also available as a fixit.
- [C++] Fill-in method stubs from abstract base classes.
- Convert if to switch.
- [ObjC] Fill in protocol stubs. It is also available as a fixit.
- [ObjC] Wrap in NSLocalizedString

Another action that Xcode supports is called “Generate Missing Function Definitions” (GIF:https://reviews.llvm.org/file/data/u4kzt62wjdin5dpl65jl/PHID-FILE-w7jl2vz42b5aybd6oizj/generateFunctionDefinitions). This action actually depends on the indexer to determine the set of methods that have to be implemented and where should the generated code go. It’s also the one and only cross-TU refactoring action that is currently supported (without taking "rename" into account).

# Refactoring Engine Proposal

I would like to create a new refactoring engine that will go into Clang's `Tooling` library. The engine's API should be designed with different use cases in mind, and I would like to ensure that the engine accommodates the following three use cases:

- Libclang. It should provide a C-based refactoring API that we can use in Xcode.
- Clangd.
- A standalone refactoring tool, `clang-refactor`. This tool will be used for testing. It will also allow users to perform refactoring actions without using other tools.

The engine should try to follow the following high level goals:
- It should be easy for others to create, test, and integrate new local refactoring operations.
- Refactoring operations should be allowed to fail. The engine should have good error reporting mechanisms.
- Initiation via source selection should be quick (if we assume that we have access to a parsed AST) as we'd like to let a user know which actions are available as soon as possible.
- The engine should allow at least a basic level of configurability and customization (users should be able select options for a refactoring action if an action chooses to provide some). In the future we might also want to provide more powerful configuration mechanisms that will allow IDEs to generate custom UIs for different actions.

## Initiation

I think that there are two equally important modes of initiation that should be supported by Clang's refactoring engine:

- Initiation via source selection. This mode is important for IDEs.
- Initiation via AST queries. This mode is important for standalone tools / scripts that want to refactor an entire project.

Unfortunately our code doesn't provide a good enough abstraction that generalizes well for different actions. I'm still hashing out a better initiation scheme, and I will send a separate RFC with our proposal for initiation in the near future. Let me know if you have any ideas/suggestions!

## Refactoring

I would like to propose keeping the two modes that were previously described for the "Rename" operation. This will probably mean that a standalone tool like `clang-refactor` won't support the global rename mode of rename. On the other hand, if Clangd will gain some form of indexing infrastructure (which seems likely given the recent interest in indexing [3]), it will be able to use the engine's global renaming infrastructure as well. I'm hoping the community will provide some input on what should happen to `clang-rename`, as I'm not sure whether it should stay or be deprecated in favour of new tools.

Other refactoring operations will primarily work with source modifications. I think that the refactoring engine should provide a set of shared utility components that can be reused by different operations. For example, things like return type deduction for extracted declarations should be accessible to all operations. This kind of design should also make testing easier.

Certain refactoring actions (e.g. "Fill-in switch cases") work well together with diagnostic fix-its, as they can be presented in an editor and a user can initiate a refactoring operation using a UI that's already familiar to them. I would like to ensure that this kind of mechanism is accessible to refactoring operations that need it. Our current code contains two actions that are implemented in the `Edit` library. They're usable from `Sema`, and we take advantage of that to generate the fix-its during semantic analysis.

## Cross-TU Operations & Indexer Queries

The majority of refactoring actions that we've implemented perform source transformations in a single translation unit (TU). However, there are some actions that need to work across TU boundaries. Xcode 9 includes one such cross-TU action - “Generate Missing Function Definitions”. A naïve cross-TU solution that I've looked at initially was based on an API that had a series of callbacks between Xcode's indexer and libclang. However, I quickly discovered that this model didn't work that well, as it interfered with the code that loaded and handled different TUs in the indexer.

Note: while "Rename" is a global operation, it doesn't actually interact with other TUs because source replacements can be derived from the data stored in the project's index. Thus, our implementation doesn't treat it as a cross-TU operation.

The current implementation of the “Generate Missing Function Definitions" action is based on the "refactoring continuations" API. A refactoring continuation is just a suspended refactoring operation that can be executed in a TU that's different to the one in which the operation was created. Each continuation includes TU-specific state that's automatically managed by Clang, as well as a set of questions about that state. Each individual question is called an "indexer query".

Indexer queries are the core structures that enable cross-TU actions. An external component (like Xcode's indexer) has to evaluate them before running the continuation in order for the operation to work correctly. Each query can be described using four different attributes:

- Question: The kind of question that should be answered. Our implementation exposes the following two kinds at the libclang level: `Decl_FileThatShouldImplement` and `Decl_IsDefined`.
- Subject: The TU-specific state from the parent continuation that acts as the question's subject.
- Result: The value that has to be set by the external indexer component before running the refactoring continuation.
- Action: A flag that represents an additional action that has to be performed by the indexer before running the continuation. We use the `RunContinuationInTUThatHasThisFile` action in Xcode's indexer to determine which TU should be loaded and used when running the refactoring continuation.

Please note that while I mention "Xcode's indexer" quite a lot in this section, our refactoring continuation API is indexer-agnostic, and is designed to work with any indexer. The indexer just has to implement support for the indexer queries that are required for a particular refactoring operation, and the refactoring engine handles everything else. The code sample below shows how our code constructs the refactoring continuation for the "Generate Missing Function Definitions" action:

```
return continueInExternalASTUnit(
/*Indexer query that gets annotated with the RunContinuationInTUThatHasThisFile action */ fileThatShouldContainImplementationOf(Container),
/*Pointer to the continuation function*/ runInImplementationAST,
/*A state value*/ Container,
/*An additional indexer query over some state*/ filter(llvm::makeArrayRef(SelectedMethods), [](const DeclEntity &D) { return !D.isDefined(); }));
```

In practical terms, the following series of events occur in Xcode when it performs the “Generate Missing Function Definitions” operation:

- After the user chooses the "Generate Missing Function Definitions” action, Xcode notifies its indexer that it has to perform that action. The indexer then uses libclang to get the refactoring results after running the action's operation in the initiation TU.
- Xcode's indexer sees that the results provide a refactoring continuation instead of source replacements, so it starts looking at the attached indexer queries.
- Xcode's indexer evaluates each query by looking at the question's kind and the TU-specific state (A set of `CXCursor`s). It sends back the result to libclang using a set of query-agnostic routines that consume different data types.
- After evaluating the queries, Xcode's indexer loads the TU in which the refactoring continuation should run. It knows which TU it has to load by looking at the result of the query that included the `RunContinuationInTUThatHasThisFile` action flag.
- Xcode's indexer then invokes the refactoring continuation in the freshly loaded TU. Clang converts the continuation's state from previously serialized TU-independent state to a state that's specific to the new TU, and continues running the refactoring operation with this state.
- Xcode's indexer then receives the source replacements produced by the refactoring operation and passes them back to Xcode so that it can apply the replacements to the source.

I would like to propose an API model that's based on our current "refactoring continuations" API for the cross-TU operations in the new refactoring engine. Clang will provide a C++ API that can be used in a way that's similar to the code presented above. It will also manage all of the interactions with libclang and Clangd, so individual refactoring operations won't have to worry about the fine details of a particular indexer. Furthermore, the new refactoring engine will limit the set of available actions based on the set of queries that are supported by an indexer, so the indexer that chooses to support refactoring continuations won't have to support all of the queries.

## Testing & Tooling

The `clang-refactor` tool will be used to test the refactoring engine. I would like to propose the following action-agnostic command line interface:

```
clang-refactor <action> [options] <initiation file> [additional build options]
```

I think that the tool should use comment-based annotations in the test source files to control things like initiation. For example, given a selection annotation `// selection: +1:1 -> +2:12` on line 12, `clang-refactor` will map it to a selection range of 13:1 -> 14:12 and will initiate a certain refactoring action using that selection.

In addition to `clang-refactor`, I would also like to provide an additional stress-test tool that will work with entire projects (using compilation databases), and will be capable of:

- Initiating/performing refactoring actions at each token in all of the source files in a project.
- Verifying the consistency between the indexer and the refactoring engine to ensure that they have the same model of the source.

This tool will also come with a script that generates compilation databases for Clang's tests.

I also have a great little reviewing tool that helps others to visualize the refactoring changes for a particular action, and I'd be more than happy to share it if you're interested. The tool generates a set of HTML pages that contain the sources of the refactoring tests. The pages get annotated to show the changes that were made during refactoring. We found that looking at this visualization instead of the raw `CHECK` lines makes reviewers' life much easier.

# Action Plan

Initially I would like to adopt the following high-level plan for the creation of the new refactoring engine:

- Upstream any non-refactoring specific changes from our code.
- Move the core of `clang-rename` over to clang (`clang-rename` should still work as before).
- Start working on the `clang-refactor` tool so that it can use the previously moved code. Move the tests from `clang-rename`over to Clang and convert them to `clang-refactor` tests.
- Submit our changes to the core of the `clang-rename` engine.
- Create and test the new global rename component.
- Create an AST source selection component. Initiation via source selection will be based on this component.
- Create a refactoring initiation API that will be easy to use.
- Start submitting our local refactoring actions.
- Start working on the libclang API.
- Create and submit everything else that remains, e.g. Cross-TU Operations & Indexer Queries.

Cheers,
Alex



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Loading...