LibTooling performance windows question

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

LibTooling performance windows question

Vassil Vassilev via cfe-dev
Hello,

First, I want to congratulate the developers for a beautiful piece of software. Not only it is fast but it is also written beautifully and the code has incredibly good readability (even good comments :D ). 

The second point of this email: I am trying to do an automatic include fixer with libtooling. Everything seems to work fine right now but I am running it on a project where I need to parse around 2000 files (that have deeply nested includes) and it takes a long time so I am trying to make it faster.
My current code uses only one preprocessor action (is kind of simple). It uses only the InclusionDirective callback and it will never need more (like ast and such).  When running this on a single thread the fixer takes 20 minutes. To improve that time I tried giving compilation files in batches on multiple threads:

    size_t batch_size = allInterestingSources.size() / c_num_threads;
    std::vector<std::vector<std::string>> batches;
    for (size_t i = 0; i < allInterestingSources.size(); i += batch_size) {
        auto last = allInterestingSources.size() < i + batch_size ? allInterestingSources.size() : i + batch_size;
        batches.emplace_back(allInterestingSources.begin() + i, allInterestingSources.begin() + last);
    }
    auto start = std::chrono::high_resolution_clock::now();
#pragma omp parallel for num_threads(c_num_threads)
    for (int i = 0; i < batches.size(); i++)
    {
        RefactoringTool  tool(db, batches[i]);
        tool.run(newFrontendActionFactory<IncludeFinderAction>().get());
    }
So basically I am doing a refactor tool for each batch and running it. The time for this looked promising (around 5 minutes) but:
When setting c_num_threads to 1 everything works like it should but it takes 20 minutes. When I set it to something like 10->16 threads the tool gives errors about opening files included from other files. (I think the tool opens some included files exclusively)

Important info: I am using Windows and some of the files I am parsing are read-only.

My debugging try: From what I looked in Path.inc file, it seems that all files are opened with FILE_SHARE... attribute but I don't know if I am missing some other implementation or something more deeply related to windows maybe.

My questions:
- Do you guys know why I am having this multithreading file-open issue?
- Are there any tips and tricks for making this even faster? (maybe skipping some compiler steps, as I only need preprocessor ones.. maybe it already does that.. I am a beginner in libtooling and need some advice)

Note: I am using a freshly compiled version: git clone https://github.com/llvm/llvm-project.git

Thanks,
Radu

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: LibTooling performance windows question

Vassil Vassilev via cfe-dev
Hello,
It seems the problem with multi-threading was that the tool class was changing the working directory. After using the appropriate function: tool.setRestoreWorkingDir(false); (that actually has a comment about this) it now works. (replying to my mail so anybody that is looking for the same problem can see the fix).
About parsing optimizations so it can be made faster I will probably investigate DependencyScanner and this clang example: https://github.com/llvm-mirror/clang/blob/master/tools/clang-scan-deps/ClangScanDeps.cpp .  If anybody has a simpler ideea, feel free to let me know.
Attaching the video that made me go on the correct path: https://www.youtube.com/watch?v=Ptr6e4CVTd4 . (for somebody that may be reading this email)

Cheers,
Radu


În sâm., 23 mai 2020 la 08:39, Radu Angelescu <[hidden email]> a scris:
Hello,

First, I want to congratulate the developers for a beautiful piece of software. Not only it is fast but it is also written beautifully and the code has incredibly good readability (even good comments :D ). 

The second point of this email: I am trying to do an automatic include fixer with libtooling. Everything seems to work fine right now but I am running it on a project where I need to parse around 2000 files (that have deeply nested includes) and it takes a long time so I am trying to make it faster.
My current code uses only one preprocessor action (is kind of simple). It uses only the InclusionDirective callback and it will never need more (like ast and such).  When running this on a single thread the fixer takes 20 minutes. To improve that time I tried giving compilation files in batches on multiple threads:

    size_t batch_size = allInterestingSources.size() / c_num_threads;
    std::vector<std::vector<std::string>> batches;
    for (size_t i = 0; i < allInterestingSources.size(); i += batch_size) {
        auto last = allInterestingSources.size() < i + batch_size ? allInterestingSources.size() : i + batch_size;
        batches.emplace_back(allInterestingSources.begin() + i, allInterestingSources.begin() + last);
    }
    auto start = std::chrono::high_resolution_clock::now();
#pragma omp parallel for num_threads(c_num_threads)
    for (int i = 0; i < batches.size(); i++)
    {
        RefactoringTool  tool(db, batches[i]);
        tool.run(newFrontendActionFactory<IncludeFinderAction>().get());
    }
So basically I am doing a refactor tool for each batch and running it. The time for this looked promising (around 5 minutes) but:
When setting c_num_threads to 1 everything works like it should but it takes 20 minutes. When I set it to something like 10->16 threads the tool gives errors about opening files included from other files. (I think the tool opens some included files exclusively)

Important info: I am using Windows and some of the files I am parsing are read-only.

My debugging try: From what I looked in Path.inc file, it seems that all files are opened with FILE_SHARE... attribute but I don't know if I am missing some other implementation or something more deeply related to windows maybe.

My questions:
- Do you guys know why I am having this multithreading file-open issue?
- Are there any tips and tricks for making this even faster? (maybe skipping some compiler steps, as I only need preprocessor ones.. maybe it already does that.. I am a beginner in libtooling and need some advice)

Note: I am using a freshly compiled version: git clone https://github.com/llvm/llvm-project.git

Thanks,
Radu

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: LibTooling performance windows question

Vassil Vassilev via cfe-dev
Have you give it a thought for https://include-what-you-use.org/?
It’s pretty good and fast and has binding for all sorts of build systems.

On 26 May 2020, at 20:35, Radu Angelescu via cfe-dev <[hidden email]> wrote:


Hello,
It seems the problem with multi-threading was that the tool class was changing the working directory. After using the appropriate function: tool.setRestoreWorkingDir(false); (that actually has a comment about this) it now works. (replying to my mail so anybody that is looking for the same problem can see the fix).
About parsing optimizations so it can be made faster I will probably investigate DependencyScanner and this clang example: https://github.com/llvm-mirror/clang/blob/master/tools/clang-scan-deps/ClangScanDeps.cpp .  If anybody has a simpler ideea, feel free to let me know.
Attaching the video that made me go on the correct path: https://www.youtube.com/watch?v=Ptr6e4CVTd4 . (for somebody that may be reading this email)

Cheers,
Radu


În sâm., 23 mai 2020 la 08:39, Radu Angelescu <[hidden email]> a scris:
Hello,

First, I want to congratulate the developers for a beautiful piece of software. Not only it is fast but it is also written beautifully and the code has incredibly good readability (even good comments :D ). 

The second point of this email: I am trying to do an automatic include fixer with libtooling. Everything seems to work fine right now but I am running it on a project where I need to parse around 2000 files (that have deeply nested includes) and it takes a long time so I am trying to make it faster.
My current code uses only one preprocessor action (is kind of simple). It uses only the InclusionDirective callback and it will never need more (like ast and such).  When running this on a single thread the fixer takes 20 minutes. To improve that time I tried giving compilation files in batches on multiple threads:

    size_t batch_size = allInterestingSources.size() / c_num_threads;
    std::vector<std::vector<std::string>> batches;
    for (size_t i = 0; i < allInterestingSources.size(); i += batch_size) {
        auto last = allInterestingSources.size() < i + batch_size ? allInterestingSources.size() : i + batch_size;
        batches.emplace_back(allInterestingSources.begin() + i, allInterestingSources.begin() + last);
    }
    auto start = std::chrono::high_resolution_clock::now();
#pragma omp parallel for num_threads(c_num_threads)
    for (int i = 0; i < batches.size(); i++)
    {
        RefactoringTool  tool(db, batches[i]);
        tool.run(newFrontendActionFactory<IncludeFinderAction>().get());
    }
So basically I am doing a refactor tool for each batch and running it. The time for this looked promising (around 5 minutes) but:
When setting c_num_threads to 1 everything works like it should but it takes 20 minutes. When I set it to something like 10->16 threads the tool gives errors about opening files included from other files. (I think the tool opens some included files exclusively)

Important info: I am using Windows and some of the files I am parsing are read-only.

My debugging try: From what I looked in Path.inc file, it seems that all files are opened with FILE_SHARE... attribute but I don't know if I am missing some other implementation or something more deeply related to windows maybe.

My questions:
- Do you guys know why I am having this multithreading file-open issue?
- Are there any tips and tricks for making this even faster? (maybe skipping some compiler steps, as I only need preprocessor ones.. maybe it already does that.. I am a beginner in libtooling and need some advice)

Note: I am using a freshly compiled version: git clone https://github.com/llvm/llvm-project.git

Thanks,
Radu
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev