PCH/preamble files that exceeds 512M

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

PCH/preamble files that exceeds 512M

suyash singh via cfe-dev
Hi all,

I worked on crashes and asserts on clangd and found that Clang doesn't support PCH files that exceed 512M. It happens because PCH uses *bit* offsets from the beginning of the file for some indexes and other data structures. The simplest example is SOURCE_LOCATION_OFFSETS in source manager block. But there are other similar cases. I identified some cases and made them 64bit to double check my hypotheses. It indeed helped in my case but I see other places where uint32_t is used for storing bit offsets in the file. I see two possible approaches to fix this issue:

1. Use uint64_t in all places where Clang needs bit offset from the beginning of the files. It is relatively straight forwarded approach but it increases file size even if uint32_t is enough. In my case I saw about 4% increase for 700Mb preamble file. Such increase sounds modest to me and I implemented it in https://reviews.llvm.org/D76295

2. Store offsets from the beginning of corresponding data structure i.e. it will give 512M size limit on individual blocks instead of whole file. It won't increase file size much bit will add some complication of loading/storing logic and may still require 64 bit offsets if it is not possible to find good anchors for relative offsets.

The question is which approach do you think is the best?

Dmitry

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: PCH/preamble files that exceeds 512M

suyash singh via cfe-dev
Subscribing because I'm interested and adding Richard for visibility.

On Tue, Mar 17, 2020 at 9:46 AM Dmitry Polukhin via cfe-dev <[hidden email]> wrote:
Hi all,

I worked on crashes and asserts on clangd and found that Clang doesn't support PCH files that exceed 512M. It happens because PCH uses *bit* offsets from the beginning of the file for some indexes and other data structures. The simplest example is SOURCE_LOCATION_OFFSETS in source manager block. But there are other similar cases. I identified some cases and made them 64bit to double check my hypotheses. It indeed helped in my case but I see other places where uint32_t is used for storing bit offsets in the file. I see two possible approaches to fix this issue:

1. Use uint64_t in all places where Clang needs bit offset from the beginning of the files. It is relatively straight forwarded approach but it increases file size even if uint32_t is enough. In my case I saw about 4% increase for 700Mb preamble file. Such increase sounds modest to me and I implemented it in https://reviews.llvm.org/D76295

2. Store offsets from the beginning of corresponding data structure i.e. it will give 512M size limit on individual blocks instead of whole file. It won't increase file size much bit will add some complication of loading/storing logic and may still require 64 bit offsets if it is not possible to find good anchors for relative offsets.

The question is which approach do you think is the best?

Dmitry
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: PCH/preamble files that exceeds 512M

suyash singh via cfe-dev
+clangd-dev, this is definitely interesting! We hadn't seen such large preambles before.

I have to confess from the clangd side we've mostly been reusing the modules/preamble format without deeply probing into it, so I don't have an informed opinion (both your options seem reasonable to me, with 2 sounding nicer but more work).

On Tue, Mar 17, 2020 at 6:18 PM David Blaikie via cfe-dev <[hidden email]> wrote:
Subscribing because I'm interested and adding Richard for visibility.

On Tue, Mar 17, 2020 at 9:46 AM Dmitry Polukhin via cfe-dev <[hidden email]> wrote:
Hi all,

I worked on crashes and asserts on clangd and found that Clang doesn't support PCH files that exceed 512M. It happens because PCH uses *bit* offsets from the beginning of the file for some indexes and other data structures. The simplest example is SOURCE_LOCATION_OFFSETS in source manager block. But there are other similar cases. I identified some cases and made them 64bit to double check my hypotheses. It indeed helped in my case but I see other places where uint32_t is used for storing bit offsets in the file. I see two possible approaches to fix this issue:

1. Use uint64_t in all places where Clang needs bit offset from the beginning of the files. It is relatively straight forwarded approach but it increases file size even if uint32_t is enough. In my case I saw about 4% increase for 700Mb preamble file. Such increase sounds modest to me and I implemented it in https://reviews.llvm.org/D76295

2. Store offsets from the beginning of corresponding data structure i.e. it will give 512M size limit on individual blocks instead of whole file. It won't increase file size much bit will add some complication of loading/storing logic and may still require 64 bit offsets if it is not possible to find good anchors for relative offsets.

The question is which approach do you think is the best?

Dmitry
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [clangd-dev] PCH/preamble files that exceeds 512M

suyash singh via cfe-dev
+Michael, Bruno, and Volodymyr

I agree with Sam that option 2 seems cleaner.  I'm not sure whether 4% (option 1) would be a blocker for us, but probably it would be fine.

On Mar 17, 2020, at 10:30, Sam McCall via clangd-dev <[hidden email]> wrote:

+clangd-dev, this is definitely interesting! We hadn't seen such large preambles before.

I have to confess from the clangd side we've mostly been reusing the modules/preamble format without deeply probing into it, so I don't have an informed opinion (both your options seem reasonable to me, with 2 sounding nicer but more work).

On Tue, Mar 17, 2020 at 6:18 PM David Blaikie via cfe-dev <[hidden email]> wrote:
Subscribing because I'm interested and adding Richard for visibility.

On Tue, Mar 17, 2020 at 9:46 AM Dmitry Polukhin via cfe-dev <[hidden email]> wrote:
Hi all,

I worked on crashes and asserts on clangd and found that Clang doesn't support PCH files that exceed 512M. It happens because PCH uses *bit* offsets from the beginning of the file for some indexes and other data structures. The simplest example is SOURCE_LOCATION_OFFSETS in source manager block. But there are other similar cases. I identified some cases and made them 64bit to double check my hypotheses. It indeed helped in my case but I see other places where uint32_t is used for storing bit offsets in the file. I see two possible approaches to fix this issue:

1. Use uint64_t in all places where Clang needs bit offset from the beginning of the files. It is relatively straight forwarded approach but it increases file size even if uint32_t is enough. In my case I saw about 4% increase for 700Mb preamble file. Such increase sounds modest to me and I implemented it in https://reviews.llvm.org/D76295

2. Store offsets from the beginning of corresponding data structure i.e. it will give 512M size limit on individual blocks instead of whole file. It won't increase file size much bit will add some complication of loading/storing logic and may still require 64 bit offsets if it is not possible to find good anchors for relative offsets.

The question is which approach do you think is the best?

Dmitry
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
clangd-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/clangd-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [clangd-dev] PCH/preamble files that exceeds 512M

suyash singh via cfe-dev


On Tue, Mar 17, 2020 at 12:57 PM Duncan P. N. Exon Smith <[hidden email]> wrote:
>
> +Michael, Bruno, and Volodymyr
>
> I agree with Sam that option 2 seems cleaner.

+1

> I'm not sure whether 4% (option 1) would be a blocker for us, but probably it would be fine.

Ultimately I believe it would be fine.

--
Bruno Cardoso Lopes
http://www.brunocardoso.cc

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [clangd-dev] PCH/preamble files that exceeds 512M

suyash singh via cfe-dev
OK, it seems that there is kind of consensus that second option is better and additional complexity worths it. My main concern about second option is that it is kind of temporary solution that just delays the point then 32bit won't be enough for bit offsets.
I'll explore second option in more details and post results here.

On Wed, Mar 18, 2020 at 1:13 AM Bruno Cardoso Lopes <[hidden email]> wrote:


On Tue, Mar 17, 2020 at 12:57 PM Duncan P. N. Exon Smith <[hidden email]> wrote:
>
> +Michael, Bruno, and Volodymyr
>
> I agree with Sam that option 2 seems cleaner.

+1

> I'm not sure whether 4% (option 1) would be a blocker for us, but probably it would be fine.

Ultimately I believe it would be fine.

--
Bruno Cardoso Lopes
http://www.brunocardoso.cc

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [clangd-dev] PCH/preamble files that exceeds 512M

suyash singh via cfe-dev
https://reviews.llvm.org/D76594 implements relative offsets for 32-bit bit offsets. I tried to find all potentially problematic places and found one more that I didn't identify initially PPD_ENTITIES_OFFSETS.

On Wed, Mar 18, 2020 at 5:42 PM Dmitry Polukhin <[hidden email]> wrote:
OK, it seems that there is kind of consensus that second option is better and additional complexity worths it. My main concern about second option is that it is kind of temporary solution that just delays the point then 32bit won't be enough for bit offsets.
I'll explore second option in more details and post results here.

On Wed, Mar 18, 2020 at 1:13 AM Bruno Cardoso Lopes <[hidden email]> wrote:


On Tue, Mar 17, 2020 at 12:57 PM Duncan P. N. Exon Smith <[hidden email]> wrote:
>
> +Michael, Bruno, and Volodymyr
>
> I agree with Sam that option 2 seems cleaner.

+1

> I'm not sure whether 4% (option 1) would be a blocker for us, but probably it would be fine.

Ultimately I believe it would be fine.

--
Bruno Cardoso Lopes
http://www.brunocardoso.cc

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev