Unbundling key parts of libIndex/IndexingContext

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Unbundling key parts of libIndex/IndexingContext

Kristóf Umann via cfe-dev
TL;DR: I'd really like to extract the part that goes from AST node -> Decl + Role, but I don't know how to do it incrementally. Current plan would be to copy it to something parallel, and maybe try to port libIndex later. That sounds likely to end in permanent duplication though. Help!

--

Hi tooling folks,

There are several pieces in libIndex that are important for indexing, but also useful for other things. clangd (and I would guess libclang) has been (ab)using these, e.g. implementing "go to definition" by indexing the file and looking for decl references that overlap the cursor.
But performance, flexibility, and understanding tend to suffer, as the APIs don't always suit the problem space.

I think IndexingAction/IndexingContext do the following things:
A) expose macros while parsing/recorded in PP through a common interface
B) traverse the AST (according to some policy) to find AST nodes
C) determine which tokens in the source refer to those nodes
D) determine which decls AST nodes refer to, and with which roles
E) find decls related to an AST node (e.g. overrides)
There's also F) getSymbolInfo(), which is already split out.

These seem mostly[1] disjoint to me, and can each be replaced/dropped in some scenarios. examples:
- finding AST nodes (B) is sometimes better done with ASTSelection/SelectionTree/RecursiveASTVisitor.
- hover-to-evaluate-constant needs C but not D 
- analyses like extract-variable care about D but not C or E

AST Node -> Decl is the largest and most useful chunk. I've built a standalone prototype which does pretty well on C++, and porting more of the logic from libIndex could bring it up to par.
The API is something like
  vector<pair<const Decl*, SymbolRoleSet>> findDecl(DynTypedNode,
     ReferencePolicy, // lexical references only, or semantic ones like `decltype(...)` too?
     AliasPolicy, // for typedefs etc, return the alias, the underlying decl, or both?
     TemplatePolicy, // return specialization, the relevant pattern, the primary template, or some combination?
  )
The Policy options allow us to more directly express features and fix some bugs.

My main concern is that I'm not sure how to eventually (ideally incrementally!) use this in libIndex.
The natural implementation is a set of mutually recursive functions to handle Decl/Type/Stmt etc (much like libIndex's structure), so before calling any of them, all must be complete and correct.
And SymbolRelations seem to sometimes need information from the AST traversal, and sometimes information from the node->decl walk, so it's hard to see how they can be separated.

So though it seems insane to have two copies of this logic, I find myself leaning toward a separate implementation. Is there a better way out than this? How much value is there in putting it in Tooling vs locally in clangd?

Cheers, Sam

[1] B and E seem fairly tangled, actually.

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Unbundling key parts of libIndex/IndexingContext

Kristóf Umann via cfe-dev
Hi Sam,

I think it would be great if the current implementation was modularized in a way that would allow Clang and related tools to remove recursive tree visitors that are either abused for other purposes, or just duplicate work that some other visitors are doing, either partially or fully.
 
I think starting out with a new implementation is not a bad approach. That said, I think it would be really good if the new implementation it was adopted by libIndex from the start, as this would allow:
- A clear path towards convergence of the new implementation and the old implementation 
- More coverage for testing 
The primary risk of not integrating this into libIndex directly is the high cost of transition to a unified implementation at a later stage by other adopters. Also, there's a real downside of not unifying code from the start (which we know too well unfortunately :( ): the general build and code size cost of additional tree visitor code in Clang and its tools. 

I think while it's desirable to change implementation in an incremental way from the perspective of reviews and code history, sometimes it's not entirely practical as it's actually harder to see the full picture, and to qualify the change by doing additional testing by other adopters. I think for a change like this a patch that entirely rewrites a lot of the code in libIndex is acceptable. So my suggestion would be to:
- Work on unified implementation with libIndex from the start
- Migrate libIndex in one patch
- The reviewers or you might find ways to split up the patch into smaller, incremental changes during the review
- The reviewers will be able to test one patch only 

Thanks,
Alex








On Fri, 9 Aug 2019 at 10:26, Sam McCall <[hidden email]> wrote:
TL;DR: I'd really like to extract the part that goes from AST node -> Decl + Role, but I don't know how to do it incrementally. Current plan would be to copy it to something parallel, and maybe try to port libIndex later. That sounds likely to end in permanent duplication though. Help!

--

Hi tooling folks,

There are several pieces in libIndex that are important for indexing, but also useful for other things. clangd (and I would guess libclang) has been (ab)using these, e.g. implementing "go to definition" by indexing the file and looking for decl references that overlap the cursor.
But performance, flexibility, and understanding tend to suffer, as the APIs don't always suit the problem space.

I think IndexingAction/IndexingContext do the following things:
A) expose macros while parsing/recorded in PP through a common interface
B) traverse the AST (according to some policy) to find AST nodes
C) determine which tokens in the source refer to those nodes
D) determine which decls AST nodes refer to, and with which roles
E) find decls related to an AST node (e.g. overrides)
There's also F) getSymbolInfo(), which is already split out.

These seem mostly[1] disjoint to me, and can each be replaced/dropped in some scenarios. examples:
- finding AST nodes (B) is sometimes better done with ASTSelection/SelectionTree/RecursiveASTVisitor.
- hover-to-evaluate-constant needs C but not D 
- analyses like extract-variable care about D but not C or E

AST Node -> Decl is the largest and most useful chunk. I've built a standalone prototype which does pretty well on C++, and porting more of the logic from libIndex could bring it up to par.
The API is something like
  vector<pair<const Decl*, SymbolRoleSet>> findDecl(DynTypedNode,
     ReferencePolicy, // lexical references only, or semantic ones like `decltype(...)` too?
     AliasPolicy, // for typedefs etc, return the alias, the underlying decl, or both?
     TemplatePolicy, // return specialization, the relevant pattern, the primary template, or some combination?
  )
The Policy options allow us to more directly express features and fix some bugs.

My main concern is that I'm not sure how to eventually (ideally incrementally!) use this in libIndex.
The natural implementation is a set of mutually recursive functions to handle Decl/Type/Stmt etc (much like libIndex's structure), so before calling any of them, all must be complete and correct.
And SymbolRelations seem to sometimes need information from the AST traversal, and sometimes information from the node->decl walk, so it's hard to see how they can be separated.

So though it seems insane to have two copies of this logic, I find myself leaning toward a separate implementation. Is there a better way out than this? How much value is there in putting it in Tooling vs locally in clangd?

Cheers, Sam

[1] B and E seem fairly tangled, actually.

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev