RFC: Adding index-while-building support to Clang

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

RFC: Adding index-while-building support to Clang

Eric Fiselier via cfe-dev
Hey everyone,

Xcode 9 shipped with index-while-building functionality based on enhancements to Clang that we’d like to upstream. Key among them is a new option, -index-store-path, that in addition to Clang's usual outputs, causes it to write out indexing data at the supplied path with minimal overhead.

While the current implementation is available at https://github.com/apple/swift-clang, we’d like to start by getting feedback on the high-level design, which you can read about here: https://docs.google.com/document/d/1cH2sTpgSnJZCkZtJl1aY-rzy4uGPcrI-6RrUpdATO2Q/edit?usp=sharing

Please let us know of any concerns, comments or questions you have regarding this feature.

Thanks!
Nathan

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Adding index-while-building support to Clang

Eric Fiselier via cfe-dev
Hi, thanks for the details design, I'm really looking forward for having this :)

On a high level, I think this all makes lots of sense. 
I have one question about how things get deleted: will record files for headers be deleted at some point, or will they be kept around, and you have to look at all unit files to see whether a given record file is still valid?

Cheers,
/Manuel


On Tue, Aug 29, 2017 at 2:18 AM Nathan Hawes <[hidden email]> wrote:
Hey everyone,

Xcode 9 shipped with index-while-building functionality based on enhancements to Clang that we’d like to upstream. Key among them is a new option, -index-store-path, that in addition to Clang's usual outputs, causes it to write out indexing data at the supplied path with minimal overhead.

While the current implementation is available at https://github.com/apple/swift-clang, we’d like to start by getting feedback on the high-level design, which you can read about here: https://docs.google.com/document/d/1cH2sTpgSnJZCkZtJl1aY-rzy4uGPcrI-6RrUpdATO2Q/edit?usp=sharing

Please let us know of any concerns, comments or questions you have regarding this feature.

Thanks!
Nathan

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Adding index-while-building support to Clang

Eric Fiselier via cfe-dev
Hi,

Thanks for the design.

1. You briefly mention an in-memory mapping on top the index to support efficient lookup of index records.
Is it part of the available implementation? Do you think it would make sense to reuse it in other tools?

I'm asking because that seems like something most editors and IDEs are also interested in. Proper reuse of this implementation could prove useful.

2.  In clangd, we're not controlling the build step, instead building ASTs in-memory. We would rather store the indexing information in-memory or consume it on the go while building ASTs.
Do you have suggestions on which parts of the API we should look at?
We could implement our own IndexASTConsumer, but are there more opportunities for reusing other parts of your implementation? Code for collecting indexing dependencies, definitions of high-level record structures (i.e. symbol definitions, etc.)?


On Tue, Aug 29, 2017 at 9:34 AM, Manuel Klimek <[hidden email]> wrote:
Hi, thanks for the details design, I'm really looking forward for having this :)

On a high level, I think this all makes lots of sense. 
I have one question about how things get deleted: will record files for headers be deleted at some point, or will they be kept around, and you have to look at all unit files to see whether a given record file is still valid?

Cheers,
/Manuel


On Tue, Aug 29, 2017 at 2:18 AM Nathan Hawes <[hidden email]> wrote:
Hey everyone,

Xcode 9 shipped with index-while-building functionality based on enhancements to Clang that we’d like to upstream. Key among them is a new option, -index-store-path, that in addition to Clang's usual outputs, causes it to write out indexing data at the supplied path with minimal overhead.

While the current implementation is available at https://github.com/apple/swift-clang, we’d like to start by getting feedback on the high-level design, which you can read about here: https://docs.google.com/document/d/1cH2sTpgSnJZCkZtJl1aY-rzy4uGPcrI-6RrUpdATO2Q/edit?usp=sharing

Please let us know of any concerns, comments or questions you have regarding this feature.

Thanks!
Nathan



--
Regards,
Ilya Biryukov

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Adding index-while-building support to Clang

Eric Fiselier via cfe-dev
On Tue, Aug 29, 2017 at 10:55 AM Ilya Biryukov <[hidden email]> wrote:
Hi,

Thanks for the design.

1. You briefly mention an in-memory mapping on top the index to support efficient lookup of index records.
Is it part of the available implementation? Do you think it would make sense to reuse it in other tools?

I'm asking because that seems like something most editors and IDEs are also interested in. Proper reuse of this implementation could prove useful.

2.  In clangd, we're not controlling the build step, instead building ASTs in-memory. We would rather store the indexing information in-memory or consume it on the go while building ASTs.
Do you have suggestions on which parts of the API we should look at?
We could implement our own IndexASTConsumer, but are there more opportunities for reusing other parts of your implementation? Code for collecting indexing dependencies, definitions of high-level record structures (i.e. symbol definitions, etc.)?

Btw, I'd assume in clangd we'll want both: indexing as part of the build, and updated (possibly overlayed) indexing as part of the AST reparsing.
 


On Tue, Aug 29, 2017 at 9:34 AM, Manuel Klimek <[hidden email]> wrote:
Hi, thanks for the details design, I'm really looking forward for having this :)

On a high level, I think this all makes lots of sense. 
I have one question about how things get deleted: will record files for headers be deleted at some point, or will they be kept around, and you have to look at all unit files to see whether a given record file is still valid?

Cheers,
/Manuel


On Tue, Aug 29, 2017 at 2:18 AM Nathan Hawes <[hidden email]> wrote:
Hey everyone,

Xcode 9 shipped with index-while-building functionality based on enhancements to Clang that we’d like to upstream. Key among them is a new option, -index-store-path, that in addition to Clang's usual outputs, causes it to write out indexing data at the supplied path with minimal overhead.

While the current implementation is available at https://github.com/apple/swift-clang, we’d like to start by getting feedback on the high-level design, which you can read about here: https://docs.google.com/document/d/1cH2sTpgSnJZCkZtJl1aY-rzy4uGPcrI-6RrUpdATO2Q/edit?usp=sharing

Please let us know of any concerns, comments or questions you have regarding this feature.

Thanks!
Nathan



--
Regards,
Ilya Biryukov

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Adding index-while-building support to Clang

Eric Fiselier via cfe-dev
Btw, I'd assume in clangd we'll want both: indexing as part of the build, and updated (possibly overlayed) indexing as part of the AST reparsing.
 
Right, ideally even updated overlayed indexing of changed, but unopened files.

--
Regards,
Ilya Biryukov

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Adding index-while-building support to Clang

Eric Fiselier via cfe-dev
In reply to this post by Eric Fiselier via cfe-dev


On Aug 29, 2017, at 12:34 AM, Manuel Klimek <[hidden email]> wrote:

Hi, thanks for the details design, I'm really looking forward for having this :)

On a high level, I think this all makes lots of sense. 
I have one question about how things get deleted: will record files for headers be deleted at some point, or will they be kept around, and you have to look at all unit files to see whether a given record file is still valid?

Sorry, I was a bit light on detail in this area. We have a currently unimplemented API in libIndexStore that’s intended to purge unit files whose corresponding main source files no longer exist (e.g. after a .cpp file is renamed) along with any unreferenced record files, but this was more intended as a periodic clean up operation.

In the current design we expect stale units/records to exist in the store and leave it as the responsibility of the index store client to subscribe to the unit added/removed/modified events in order to track enough information to identify and ignore them, e.g. by maintaining reference counts for each record file. When to actually remove them is also left to the client – the index store just provides APIs for deleting individual records/units.

Cheers,
Nathan


Cheers,
/Manuel


On Tue, Aug 29, 2017 at 2:18 AM Nathan Hawes <[hidden email]> wrote:
Hey everyone,

Xcode 9 shipped with index-while-building functionality based on enhancements to Clang that we’d like to upstream. Key among them is a new option, -index-store-path, that in addition to Clang's usual outputs, causes it to write out indexing data at the supplied path with minimal overhead.

While the current implementation is available at https://github.com/apple/swift-clang, we’d like to start by getting feedback on the high-level design, which you can read about here: https://docs.google.com/document/d/1cH2sTpgSnJZCkZtJl1aY-rzy4uGPcrI-6RrUpdATO2Q/edit?usp=sharing

Please let us know of any concerns, comments or questions you have regarding this feature.

Thanks!
Nathan


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Adding index-while-building support to Clang

Eric Fiselier via cfe-dev
In reply to this post by Eric Fiselier via cfe-dev
Hi Ilya,

On Aug 29, 2017, at 1:55 AM, Ilya Biryukov via cfe-dev <[hidden email]> wrote:

Hi,

Thanks for the design.

1. You briefly mention an in-memory mapping on top the index to support efficient lookup of index records.
Is it part of the available implementation? Do you think it would make sense to reuse it in other tools?

I'm asking because that seems like something most editors and IDEs are also interested in. Proper reuse of this implementation could prove useful.

In our implementation we use LMDB (https://symas.com/lightning-memory-mapped-database). It is a key-value data-store that we use for cross-referencing queries, similarly to the example that Nathan provides in the document.

Is this something that we could accept into the clang project (e.g. in clang-tools-extra) ? Note it is essentially a single header and implementation file.


2.  In clangd, we're not controlling the build step, instead building ASTs in-memory. We would rather store the indexing information in-memory or consume it on the go while building ASTs.
Do you have suggestions on which parts of the API we should look at?
We could implement our own IndexASTConsumer, but are there more opportunities for reusing other parts of your implementation? Code for collecting indexing dependencies, definitions of high-level record structures (i.e. symbol definitions, etc.)?

There are a few ways to go about this:

- Have ASTs in-memory, but indexing works on the file system. It’s not ideal but it is simple and works fairly well in practice, particularly since in our platform, files open in Xcode can be saved in disk even without having the user explicitly saving them.
- Update clang’s raw index data store using the in-memory buffers and ASTs. The simplicity is that symbol info comes from one place only, but there’s complexity in that you have raw data on disk that reflect in-memory-only sources.
- The layer on-top of clang's raw index data store is enhanced to treat the raw data on-disk as one source of symbol info, and in-memory ASTs as another. For example, if using LMDB, you could have it distinguish that info about a symbol comes from the raw data on-disk vs an in-memory AST.



On Tue, Aug 29, 2017 at 9:34 AM, Manuel Klimek <[hidden email]> wrote:
Hi, thanks for the details design, I'm really looking forward for having this :)

On a high level, I think this all makes lots of sense. 
I have one question about how things get deleted: will record files for headers be deleted at some point, or will they be kept around, and you have to look at all unit files to see whether a given record file is still valid?

Cheers,
/Manuel


On Tue, Aug 29, 2017 at 2:18 AM Nathan Hawes <[hidden email]> wrote:
Hey everyone,

Xcode 9 shipped with index-while-building functionality based on enhancements to Clang that we’d like to upstream. Key among them is a new option, -index-store-path, that in addition to Clang's usual outputs, causes it to write out indexing data at the supplied path with minimal overhead.

While the current implementation is available at https://github.com/apple/swift-clang, we’d like to start by getting feedback on the high-level design, which you can read about here: https://docs.google.com/document/d/1cH2sTpgSnJZCkZtJl1aY-rzy4uGPcrI-6RrUpdATO2Q/edit?usp=sharing

Please let us know of any concerns, comments or questions you have regarding this feature.

Thanks!
Nathan



--
Regards,
Ilya Biryukov
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Adding index-while-building support to Clang

Eric Fiselier via cfe-dev
Hi Argyrios,

In our implementation we use LMDB (https://symas.com/lightning-memory-mapped-database). It is a key-value data-store that we use for cross-referencing queries, similarly to the example that Nathan provides in the document.
Is this something that we could accept into the clang project (e.g. in clang-tools-extra) ? Note it is essentially a single header and implementation file.
AFAIK, LLVM's policy on dependencies is pretty tight. Is it hard to isolate the DB layer or it tightly coupled to the implementation?
If it's possible, we could include have DB-agnostic API in cfe or clang-tools-extra and an alternative implementation of the storage layer.
+klimek, +bkramer, maybe you could comment on adding the new third-party dependencies to LLVM? Is it possible?

2.  In clangd, we're not controlling the build step, instead building ASTs in-memory. We would rather store the indexing information in-memory or consume it on the go while building ASTs.
Do you have suggestions on which parts of the API we should look at?
We could implement our own IndexASTConsumer, but are there more opportunities for reusing other parts of your implementation? Code for collecting indexing dependencies, definitions of high-level record structures (i.e. symbol definitions, etc.)?
There are a few ways to go about this:
- Have ASTs in-memory, but indexing works on the file system. It’s not ideal but it is simple and works fairly well in practice, particularly since in our platform, files open in Xcode can be saved in disk even without having the user explicitly saving them.
- Update clang’s raw index data store using the in-memory buffers and ASTs. The simplicity is that symbol info comes from one place only, but there’s complexity in that you have raw data on disk that reflect in-memory-only sources.
- The layer on-top of clang's raw index data store is enhanced to treat the raw data on-disk as one source of symbol info, and in-memory ASTs as another. For example, if using LMDB, you could have it distinguish that info about a symbol comes from the raw data on-disk vs an in-memory AST.
Thanks. We probably want some combination of all options. We would definitely benefit from reading the on-disk indexes. if they are there. But those may be outdated, so we could our own indexing have a layer on top of that for the modified files. Than we could dispatch all requests to both layers and combine the results. Wonder if it's possible to make it work and how much effort is it.

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Adding index-while-building support to Clang

Eric Fiselier via cfe-dev

On Aug 31, 2017, at 1:26 AM, Ilya Biryukov <[hidden email]> wrote:

Hi Argyrios,

In our implementation we use LMDB (https://symas.com/lightning-memory-mapped-database). It is a key-value data-store that we use for cross-referencing queries, similarly to the example that Nathan provides in the document.
Is this something that we could accept into the clang project (e.g. in clang-tools-extra) ? Note it is essentially a single header and implementation file.
AFAIK, LLVM's policy on dependencies is pretty tight. Is it hard to isolate the DB layer or it tightly coupled to the implementation?
If it's possible, we could include have DB-agnostic API in cfe or clang-tools-extra and an alternative implementation of the storage layer.
+klimek, +bkramer, maybe you could comment on adding the new third-party dependencies to LLVM? Is it possible?

The license is BSD-like (see https://github.com/LMDB/lmdb/blob/mdb.master/libraries/liblmdb/LICENSE), which I think makes it compatible. And it would only be a new dependency added in clang-tools-extra.

I think it would be beneficial to focus on one implementation (at least at the beginning).
- Assuming that it starts with an in-memory implementation of key-value store, at some point it will be natural to want to add persistence, and at that point you end-up implementing what lmdb already provides.
- Having one implementation in-tree and another out-of-tree, is not ideal; some usage patterns may be fine for one but problematic for the other. We may evolve multiple implementations later on, if the need arises, but ideally they would be in-tree.


2.  In clangd, we're not controlling the build step, instead building ASTs in-memory. We would rather store the indexing information in-memory or consume it on the go while building ASTs.
Do you have suggestions on which parts of the API we should look at?
We could implement our own IndexASTConsumer, but are there more opportunities for reusing other parts of your implementation? Code for collecting indexing dependencies, definitions of high-level record structures (i.e. symbol definitions, etc.)?
There are a few ways to go about this:
- Have ASTs in-memory, but indexing works on the file system. It’s not ideal but it is simple and works fairly well in practice, particularly since in our platform, files open in Xcode can be saved in disk even without having the user explicitly saving them.
- Update clang’s raw index data store using the in-memory buffers and ASTs. The simplicity is that symbol info comes from one place only, but there’s complexity in that you have raw data on disk that reflect in-memory-only sources.
- The layer on-top of clang's raw index data store is enhanced to treat the raw data on-disk as one source of symbol info, and in-memory ASTs as another. For example, if using LMDB, you could have it distinguish that info about a symbol comes from the raw data on-disk vs an in-memory AST.
Thanks. We probably want some combination of all options. We would definitely benefit from reading the on-disk indexes. if they are there. But those may be outdated, so we could our own indexing have a layer on top of that for the modified files. Than we could dispatch all requests to both layers and combine the results. Wonder if it's possible to make it work and how much effort is it.

FYI, for updating out-of-date files (without needing to build), we have the ‘background indexing’ mechanism, which invokes “clang -fsyntax-only -index-store-path …” for the main files that are out-of-date (or include header files that are out-of-date), and brings the index-store up-to-date.
This does have the complexity of maintaining a “mini-build-system-like” mechanism, and the associated scheduling logic that comes with it.

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Adding index-while-building support to Clang

Eric Fiselier via cfe-dev

Hi,


This was asked before, but what would be the process in getting liblmdb in clang-tools-extra? I've started prototyping with it and it is quite useful and small. I had a small library (ClangdIndexDataStorage + BTree) filling the same role before and I *think* I'll be able to fully replace it with liblmdb.


One concern I had with the library at first is that because it uses memory mapping, I was not clear to me how we could control its memory usage. But I had in mind a single DB that included *everything*, i.e. all symbols, occurrences. After reading the index-while-building proposal, I like the idea of producing record and units and have a mapping referring to to those.


There is a part of the proposal that I want to make sure I understood: "Background indexing still occurs with this setup, but instead of being based on a call to libclang, is achieved by invoking Clang with both the -index-store-path option and -fsyntax-only". I assuming this background indexing by invoking 'clang -index-store-path -fsyntax-only' is mainly for a scenario were a unit has not been built yet?


What are the next steps in upstreaming this "index-while-building" support? I think it makes perfect sense for Clangd to use this support and use a similar indexing strategy. I think there's a nice opportunity for collaboration.


Marc-André Laperle


From: Argyrios Kyrtzidis <[hidden email]>
Sent: Thursday, August 31, 2017 8:56:01 PM
To: Ilya Biryukov
Cc: Manuel Klimek; Benjamin Kramer; Krasimir Georgiev; Marc-André Laperle; Nathan Hawes; via cfe-dev
Subject: Re: [cfe-dev] RFC: Adding index-while-building support to Clang
 

On Aug 31, 2017, at 1:26 AM, Ilya Biryukov <[hidden email]> wrote:

Hi Argyrios,

In our implementation we use LMDB (https://symas.com/lightning-memory-mapped-database). It is a key-value data-store that we use for cross-referencing queries, similarly to the example that Nathan provides in the document.
Is this something that we could accept into the clang project (e.g. in clang-tools-extra) ? Note it is essentially a single header and implementation file.
AFAIK, LLVM's policy on dependencies is pretty tight. Is it hard to isolate the DB layer or it tightly coupled to the implementation?
If it's possible, we could include have DB-agnostic API in cfe or clang-tools-extra and an alternative implementation of the storage layer.
+klimek, +bkramer, maybe you could comment on adding the new third-party dependencies to LLVM? Is it possible?

The license is BSD-like (see https://github.com/LMDB/lmdb/blob/mdb.master/libraries/liblmdb/LICENSE), which I think makes it compatible. And it would only be a new dependency added in clang-tools-extra.

I think it would be beneficial to focus on one implementation (at least at the beginning).
- Assuming that it starts with an in-memory implementation of key-value store, at some point it will be natural to want to add persistence, and at that point you end-up implementing what lmdb already provides.
- Having one implementation in-tree and another out-of-tree, is not ideal; some usage patterns may be fine for one but problematic for the other. We may evolve multiple implementations later on, if the need arises, but ideally they would be in-tree.


2.  In clangd, we're not controlling the build step, instead building ASTs in-memory. We would rather store the indexing information in-memory or consume it on the go while building ASTs.
Do you have suggestions on which parts of the API we should look at?
We could implement our own IndexASTConsumer, but are there more opportunities for reusing other parts of your implementation? Code for collecting indexing dependencies, definitions of high-level record structures (i.e. symbol definitions, etc.)?
There are a few ways to go about this:
- Have ASTs in-memory, but indexing works on the file system. It’s not ideal but it is simple and works fairly well in practice, particularly since in our platform, files open in Xcode can be saved in disk even without having the user explicitly saving them.
- Update clang’s raw index data store using the in-memory buffers and ASTs. The simplicity is that symbol info comes from one place only, but there’s complexity in that you have raw data on disk that reflect in-memory-only sources.
- The layer on-top of clang's raw index data store is enhanced to treat the raw data on-disk as one source of symbol info, and in-memory ASTs as another. For example, if using LMDB, you could have it distinguish that info about a symbol comes from the raw data on-disk vs an in-memory AST.
Thanks. We probably want some combination of all options. We would definitely benefit from reading the on-disk indexes. if they are there. But those may be outdated, so we could our own indexing have a layer on top of that for the modified files. Than we could dispatch all requests to both layers and combine the results. Wonder if it's possible to make it work and how much effort is it.

FYI, for updating out-of-date files (without needing to build), we have the ‘background indexing’ mechanism, which invokes “clang -fsyntax-only -index-store-path …” for the main files that are out-of-date (or include header files that are out-of-date), and brings the index-store up-to-date.
This does have the complexity of maintaining a “mini-build-system-like” mechanism, and the associated scheduling logic that comes with it.

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Adding index-while-building support to Clang

Eric Fiselier via cfe-dev

On Sep 18, 2017, at 1:54 PM, Marc-André Laperle <[hidden email]> wrote:

Hi,

This was asked before, but what would be the process in getting liblmdb in clang-tools-extra? I've started prototyping with it and it is quite useful and small. I had a small library (ClangdIndexDataStorage + BTree) filling the same role before and I *think* I'll be able to fully replace it with liblmdb.

One concern I had with the library at first is that because it uses memory mapping, I was not clear to me how we could control its memory usage. But I had in mind a single DB that included *everything*, i.e. all symbols, occurrences. After reading the index-while-building proposal, I like the idea of producing record and units and have a mapping referring to to those.

There is a part of the proposal that I want to make sure I understood: "Background indexing still occurs with this setup, but instead of being based on a call to libclang, is achieved by invoking Clang with both the -index-store-path option and -fsyntax-only". I assuming this background indexing by invoking 'clang -index-store-path -fsyntax-only' is mainly for a scenario were a unit has not been built yet?

Sorry for the delay in getting back to you here.
Right, it’s for getting index data for source that hasn’t ever been built and for updating the index data of source files that have been modified since the last build.


What are the next steps in upstreaming this "index-while-building" support? I think it makes perfect sense for Clangd to use this support and use a similar indexing strategy. I think there's a nice opportunity for collaboration.

I’ve just put a patch up for review with the -index-store-path option for writing out the record and unit files, and libIndexStore (for reading it):


We’re also now working towards open sourcing a basic indexer service built on top of libIndexStore (using LMDB to persist the mappings for efficient lookup of index records) along with simple plugins for the vim, emacs and sublime editors. I’ll hopefully have something for you all to look at in the next few weeks.

Cheers,
Nathan


Marc-André Laperle

From: Argyrios Kyrtzidis <[hidden email]>
Sent: Thursday, August 31, 2017 8:56:01 PM
To: Ilya Biryukov
Cc: Manuel Klimek; Benjamin Kramer; Krasimir Georgiev; Marc-André Laperle; Nathan Hawes; via cfe-dev
Subject: Re: [cfe-dev] RFC: Adding index-while-building support to Clang
 

On Aug 31, 2017, at 1:26 AM, Ilya Biryukov <[hidden email]> wrote:

Hi Argyrios,

In our implementation we use LMDB (https://symas.com/lightning-memory-mapped-database). It is a key-value data-store that we use for cross-referencing queries, similarly to the example that Nathan provides in the document.
Is this something that we could accept into the clang project (e.g. in clang-tools-extra) ? Note it is essentially a single header and implementation file.
AFAIK, LLVM's policy on dependencies is pretty tight. Is it hard to isolate the DB layer or it tightly coupled to the implementation?
If it's possible, we could include have DB-agnostic API in cfe or clang-tools-extra and an alternative implementation of the storage layer.
+klimek, +bkramer, maybe you could comment on adding the new third-party dependencies to LLVM? Is it possible?

The license is BSD-like (see https://github.com/LMDB/lmdb/blob/mdb.master/libraries/liblmdb/LICENSE), which I think makes it compatible. And it would only be a new dependency added in clang-tools-extra.

I think it would be beneficial to focus on one implementation (at least at the beginning).
- Assuming that it starts with an in-memory implementation of key-value store, at some point it will be natural to want to add persistence, and at that point you end-up implementing what lmdb already provides.
- Having one implementation in-tree and another out-of-tree, is not ideal; some usage patterns may be fine for one but problematic for the other. We may evolve multiple implementations later on, if the need arises, but ideally they would be in-tree.


2.  In clangd, we're not controlling the build step, instead building ASTs in-memory. We would rather store the indexing information in-memory or consume it on the go while building ASTs.
Do you have suggestions on which parts of the API we should look at?
We could implement our own IndexASTConsumer, but are there more opportunities for reusing other parts of your implementation? Code for collecting indexing dependencies, definitions of high-level record structures (i.e. symbol definitions, etc.)?
There are a few ways to go about this:
- Have ASTs in-memory, but indexing works on the file system. It’s not ideal but it is simple and works fairly well in practice, particularly since in our platform, files open in Xcode can be saved in disk even without having the user explicitly saving them.
- Update clang’s raw index data store using the in-memory buffers and ASTs. The simplicity is that symbol info comes from one place only, but there’s complexity in that you have raw data on disk that reflect in-memory-only sources.
- The layer on-top of clang's raw index data store is enhanced to treat the raw data on-disk as one source of symbol info, and in-memory ASTs as another. For example, if using LMDB, you could have it distinguish that info about a symbol comes from the raw data on-disk vs an in-memory AST.
Thanks. We probably want some combination of all options. We would definitely benefit from reading the on-disk indexes. if they are there. But those may be outdated, so we could our own indexing have a layer on top of that for the modified files. Than we could dispatch all requests to both layers and combine the results. Wonder if it's possible to make it work and how much effort is it.

FYI, for updating out-of-date files (without needing to build), we have the ‘background indexing’ mechanism, which invokes “clang -fsyntax-only -index-store-path …” for the main files that are out-of-date (or include header files that are out-of-date), and brings the index-store up-to-date.
This does have the complexity of maintaining a “mini-build-system-like” mechanism, and the associated scheduling logic that comes with it.


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev