Limits on AST depth and stack usage?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Limits on AST depth and stack usage?

Hubert Tong via cfe-dev
This program creates an AST of depth 10000:

constexpr 
template <class T, T...V> struct seq {
  constexpr bool zero() { return (true && ... && (V == 0)); };
};
constexpr unsigned N = 10000;
auto x = __make_integer_seq<seq, int, N>{};
static_assert(!x.zero(), "");

On my machine, clang-10 parses this successfully (slowly!) but crashes if I increase to 100000.
I suppose -ftemplate-depth doesn't apply as there's no actual recursion here (just fold-expressions), but I feel like some implementation limit *should* apply: clang relies heavily on being able to traverse the AST recursively, and will inevitably crash if the tree is too tall.

Should there be a limit here? It seems tempting to say one of:
 -ftemplate-depth should constrain either the number of arguments a pack can represent
 -ftemplate-depth should constrain the size of pack that can be unfolded
 -some limit should constrain the overall AST height

---

The immediate problem we're dealing with is that https://reviews.llvm.org/D85621 increased the size of DynTypedNode from 32->56 and we're seeing clang-tidy crashes due to stack overflow in the recursive traversal MatchASTVisitor::memoizedMatchesAncestorOfRecursively.

However this is on a unit test very similar to the above (with N=2000) so before trying to squeeze stack usage I'd like to understand what we should be aiming to support.

(Tangentially related: Richard, if you think there are good ways to squeeze TemplateArgumentLoc, that'd be interesting to know. Currently it's by-value and holds TemplateArgument and TemplateArgumentLocInfo which are 24 bytes each)

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Limits on AST depth and stack usage?

Hubert Tong via cfe-dev
(doh, obviously that first "constexpr" shouldn't be there)

On Mon, Aug 31, 2020 at 5:22 PM Sam McCall <[hidden email]> wrote:
This program creates an AST of depth 10000:

constexpr 
template <class T, T...V> struct seq {
  constexpr bool zero() { return (true && ... && (V == 0)); };
};
constexpr unsigned N = 10000;
auto x = __make_integer_seq<seq, int, N>{};
static_assert(!x.zero(), "");

On my machine, clang-10 parses this successfully (slowly!) but crashes if I increase to 100000.
I suppose -ftemplate-depth doesn't apply as there's no actual recursion here (just fold-expressions), but I feel like some implementation limit *should* apply: clang relies heavily on being able to traverse the AST recursively, and will inevitably crash if the tree is too tall.

Should there be a limit here? It seems tempting to say one of:
 -ftemplate-depth should constrain either the number of arguments a pack can represent
 -ftemplate-depth should constrain the size of pack that can be unfolded
 -some limit should constrain the overall AST height

---

The immediate problem we're dealing with is that https://reviews.llvm.org/D85621 increased the size of DynTypedNode from 32->56 and we're seeing clang-tidy crashes due to stack overflow in the recursive traversal MatchASTVisitor::memoizedMatchesAncestorOfRecursively.

However this is on a unit test very similar to the above (with N=2000) so before trying to squeeze stack usage I'd like to understand what we should be aiming to support.

(Tangentially related: Richard, if you think there are good ways to squeeze TemplateArgumentLoc, that'd be interesting to know. Currently it's by-value and holds TemplateArgument and TemplateArgumentLocInfo which are 24 bytes each)

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Limits on AST depth and stack usage?

Hubert Tong via cfe-dev
RecursiveASTVisitor::TraverseStmt has a `DataRecursionQueue *Queue` in its signature to prevent stack overflow by default (as Richard once helpfully explained to me).  ASTMatchers uses the DataRecursionQueue, so that searching for matches in Stmt children should never overflow the stack, I believe (though it does confuse users who expect a more orderly traversal…but that’s another matter).

But `memoizedMatchesAncestorOfRecursively` does *not* use data recursion, as you’ve noted.  So in other words it is using data recursion when traversing descendants of deeply nested Exprs, but not when traversing the ancestors of those at the bottom.  

So, I think it could just be a limited problem with a simple solution: there should be a non-recursive, loop-based version of memoizedMatchesAncestorOfRecursively specific to traversing the ancestors of Stmts/Exprs, mirroring what RecursiveASTVisitor::TraverseStmt does.

Whether there are more general issues with e.g. evaluating extremely deep ASTs, is beyond my own depth, though...

Dave


On Aug 31, 2020, at 11:33 AM, Sam McCall via cfe-dev <[hidden email]> wrote:

(doh, obviously that first "constexpr" shouldn't be there)

On Mon, Aug 31, 2020 at 5:22 PM Sam McCall <[hidden email]> wrote:
This program creates an AST of depth 10000:

constexpr 
template <class T, T...V> struct seq {
  constexpr bool zero() { return (true && ... && (V == 0)); };
};
constexpr unsigned N = 10000;
auto x = __make_integer_seq<seq, int, N>{};
static_assert(!x.zero(), "");

On my machine, clang-10 parses this successfully (slowly!) but crashes if I increase to 100000.
I suppose -ftemplate-depth doesn't apply as there's no actual recursion here (just fold-expressions), but I feel like some implementation limit *should* apply: clang relies heavily on being able to traverse the AST recursively, and will inevitably crash if the tree is too tall.

Should there be a limit here? It seems tempting to say one of:
 -ftemplate-depth should constrain either the number of arguments a pack can represent
 -ftemplate-depth should constrain the size of pack that can be unfolded
 -some limit should constrain the overall AST height

---

The immediate problem we're dealing with is that https://reviews.llvm.org/D85621 increased the size of DynTypedNode from 32->56 and we're seeing clang-tidy crashes due to stack overflow in the recursive traversal MatchASTVisitor::memoizedMatchesAncestorOfRecursively.

However this is on a unit test very similar to the above (with N=2000) so before trying to squeeze stack usage I'd like to understand what we should be aiming to support.

(Tangentially related: Richard, if you think there are good ways to squeeze TemplateArgumentLoc, that'd be interesting to know. Currently it's by-value and holds TemplateArgument and TemplateArgumentLocInfo which are 24 bytes each)
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Limits on AST depth and stack usage?

Hubert Tong via cfe-dev
Since you asked about more general issues:

In general terms, it would be nice if we could accept arbitrarily-deep ASTs. We make some effort to do so, with three major efforts (in addition to expecting 8MB of stack space in each thread that might run Clang):

1) Data recursion. RecursiveASTVisitor does this by default, and the constant evaluator does it when evaluating integer-valued expressions. But this is limited: you only get it where you apply it, and RAV's automatic data recursion turns itself off when it thinks the difference might be observable in subclasses.
2) Minimizing stack space during recursion. For example, during "regular" function template instantiation at the end of the translation unit, we try to keep the size of the recursive frame as small as possible.
3) The "runWithSufficientStackSpace" mechanism that tries to check that "enough" stack space is available and pops out a diagnostic (and as a slow-but-correct fallback, allocates a new stack) if we're getting close to running out.

Applying these techniques (largely in the above order) should be encouraged, and if we can make memoizedMatchesAncestorOfRecursively do so, that seems like a good directed approach for the local problem. But it's unlikely that we'll ever cover enough ground here to consistently prevent ever running out of stack in all cases -- not unless we fully move away from recursion as an implementation technique, which seems implausible, or add "runWithSufficientStackSpace" calls to every call graph SCC, which seems expensive. So this is currently done pragmatically: we fix stack usage issues as we see them become a problem. And in the remaining cases, we crash :-( Not the end of the world for a compiler, but not great for more or less any other user of Clang as a library.

For standards-conformance, our backstop here are so-called "implementation limits" -- we're permitted to bail out if things get too complex, and we largely get to define what that means. C has some minimal requirements here -- for example, that we can support at least one program with 63 levels of nested parentheses; C++ has only suggestions and no hard requirements. But in either case, if our limits are violated, there are no constraints on our behavior; we're allowed to crash or whatever. So that puts us in quality-of-implementation land.

Now, we do have some defined implementation limits (and there has been some work done towards improving and formalizing them: https://reviews.llvm.org/D72053). In the specific case of expanding a fold-expression, the relevant implementation limit is LangOptions::BracketDepth, which limits the total number of parentheses, braces, or square brackets that can be open at once, but that limit is currently only applied when parsing, not during template instantiation. So in this specific case, as a bare minimum, I think we're missing a check for that language option during fold-expression expansion (each level of a fold-expression introduces a set of parentheses, so even a pedantic interpretation of the rules has us covered reusing the same limit here).

As for limiting the size of packs in general: I think it might make sense to impose a limit -- not to avoid excessive recursion; we should have other mechanisms to deal with that -- but to avoid huge memory usage during compilation due to forming overly-large packs. Eg, passing a very large integer to the __make_integer_seq builtin can easily cause Clang to fall into a hole from which it will never escape, assuming you don't run out of address space and just crash.

Setting an implementation limit on the overall depth of the AST could be interesting, if there's a way we can practically do it. This seems like a hard problem -- there are just too many different places and ways in which AST nodes are created, many of them aren't equipped to cope with the operation failing in some kind of gracefully-recoverable way, and just keeping track of the AST depth without paying a code complexity and memory usage cost seems like a challenge -- and we'd need to be sure the cost of solving it is worth the benefit.

On Mon, 31 Aug 2020 at 16:35, David Rector <[hidden email]> wrote:
RecursiveASTVisitor::TraverseStmt has a `DataRecursionQueue *Queue` in its signature to prevent stack overflow by default (as Richard once helpfully explained to me).  ASTMatchers uses the DataRecursionQueue, so that searching for matches in Stmt children should never overflow the stack, I believe (though it does confuse users who expect a more orderly traversal…but that’s another matter).

But `memoizedMatchesAncestorOfRecursively` does *not* use data recursion, as you’ve noted.  So in other words it is using data recursion when traversing descendants of deeply nested Exprs, but not when traversing the ancestors of those at the bottom.  

So, I think it could just be a limited problem with a simple solution: there should be a non-recursive, loop-based version of memoizedMatchesAncestorOfRecursively specific to traversing the ancestors of Stmts/Exprs, mirroring what RecursiveASTVisitor::TraverseStmt does.

Whether there are more general issues with e.g. evaluating extremely deep ASTs, is beyond my own depth, though...

Dave


On Aug 31, 2020, at 11:33 AM, Sam McCall via cfe-dev <[hidden email]> wrote:

(doh, obviously that first "constexpr" shouldn't be there)

On Mon, Aug 31, 2020 at 5:22 PM Sam McCall <[hidden email]> wrote:
This program creates an AST of depth 10000:

constexpr 
template <class T, T...V> struct seq {
  constexpr bool zero() { return (true && ... && (V == 0)); };
};
constexpr unsigned N = 10000;
auto x = __make_integer_seq<seq, int, N>{};
static_assert(!x.zero(), "");

On my machine, clang-10 parses this successfully (slowly!) but crashes if I increase to 100000.
I suppose -ftemplate-depth doesn't apply as there's no actual recursion here (just fold-expressions), but I feel like some implementation limit *should* apply: clang relies heavily on being able to traverse the AST recursively, and will inevitably crash if the tree is too tall.

Should there be a limit here? It seems tempting to say one of:
 -ftemplate-depth should constrain either the number of arguments a pack can represent
 -ftemplate-depth should constrain the size of pack that can be unfolded
 -some limit should constrain the overall AST height

---

The immediate problem we're dealing with is that https://reviews.llvm.org/D85621 increased the size of DynTypedNode from 32->56 and we're seeing clang-tidy crashes due to stack overflow in the recursive traversal MatchASTVisitor::memoizedMatchesAncestorOfRecursively.

However this is on a unit test very similar to the above (with N=2000) so before trying to squeeze stack usage I'd like to understand what we should be aiming to support.

(Tangentially related: Richard, if you think there are good ways to squeeze TemplateArgumentLoc, that'd be interesting to know. Currently it's by-value and holds TemplateArgument and TemplateArgumentLocInfo which are 24 bytes each)
I think it should be possible to squeeze TemplateArgument down to 16 bytes, but it'll be a little fiddly. All representations are already 16 bytes (including the kind field) except for the Declaration representation, and that one is only two pointers plus a kind. We have too many kinds to fit into the bottom bits of a pointer (we have 9 kinds currently), but we can handle that: all the other representations have a full 4 bytes for the "kind" field, so we can use (for example) bottom bit == 0 means it's the Declaration kind, and bottom bit == 1 means the remaining 31 bits are the actual kind. (Or we can get down to 8 kinds by representing "null" as -- for example -- a null type template argument, and then pack the kind in the first 3 bits.)

The lower-hanging fruit is TemplateArgumentLocInfo, for which we could heap-allocate the information for the template name case and reduce the class to 8 bytes. Given that template template arguments are the rarest kind, that's very likely to be a net memory usage reduction.

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Limits on AST depth and stack usage?

Hubert Tong via cfe-dev
Thanks very much both, the data recursion was the biggest thing I was missing, but all the extra details are very helpful.
(Richard: I might try paraphrase the "3 techniques" section of your email into some documentation if that's OK)

It sounds like the low-hanging fruit here is:
A) make BracketDepth limit foldexprs (to 256 by default)
B) have ASTMatchFinder use explicit iteration rather than recursion when walking back up the tree (similar to data recursion)
C) heap-allocate TemplateArgumentLocInfo::Template (introducing a discriminator enum in the low bits)

Haojian and I will take these on, I think we understand well enough what needs to be done.

Any one of these would fix our clang-tidy problems. Side benefits: A should make tools safer by limiting ways to create huge ASTs, and C will mean DynTypedNote is 40 bytes rather than 56 now and 32 before, reduce overall memory usage, and let us add a few member-of-union asserts while there.

> As for limiting the size of packs in general: I think it might make sense to impose a limit -- not to avoid excessive recursion; we should have other mechanisms to deal with that -- but to avoid huge memory usage during compilation due to forming overly-large packs. 

This makes sense, I might not pick this up though. I don't have much idea sense of the programs likely to be affected (large packs that are neither recursively expanded nor unfolded...)

On Tue, Sep 1, 2020 at 5:46 AM Richard Smith <[hidden email]> wrote:
Since you asked about more general issues:

In general terms, it would be nice if we could accept arbitrarily-deep ASTs. We make some effort to do so, with three major efforts (in addition to expecting 8MB of stack space in each thread that might run Clang):

1) Data recursion. RecursiveASTVisitor does this by default, and the constant evaluator does it when evaluating integer-valued expressions. But this is limited: you only get it where you apply it, and RAV's automatic data recursion turns itself off when it thinks the difference might be observable in subclasses.
2) Minimizing stack space during recursion. For example, during "regular" function template instantiation at the end of the translation unit, we try to keep the size of the recursive frame as small as possible.
3) The "runWithSufficientStackSpace" mechanism that tries to check that "enough" stack space is available and pops out a diagnostic (and as a slow-but-correct fallback, allocates a new stack) if we're getting close to running out.

Applying these techniques (largely in the above order) should be encouraged, and if we can make memoizedMatchesAncestorOfRecursively do so, that seems like a good directed approach for the local problem. But it's unlikely that we'll ever cover enough ground here to consistently prevent ever running out of stack in all cases -- not unless we fully move away from recursion as an implementation technique, which seems implausible, or add "runWithSufficientStackSpace" calls to every call graph SCC, which seems expensive. So this is currently done pragmatically: we fix stack usage issues as we see them become a problem. And in the remaining cases, we crash :-( Not the end of the world for a compiler, but not great for more or less any other user of Clang as a library.

For standards-conformance, our backstop here are so-called "implementation limits" -- we're permitted to bail out if things get too complex, and we largely get to define what that means. C has some minimal requirements here -- for example, that we can support at least one program with 63 levels of nested parentheses; C++ has only suggestions and no hard requirements. But in either case, if our limits are violated, there are no constraints on our behavior; we're allowed to crash or whatever. So that puts us in quality-of-implementation land.

Now, we do have some defined implementation limits (and there has been some work done towards improving and formalizing them: https://reviews.llvm.org/D72053). In the specific case of expanding a fold-expression, the relevant implementation limit is LangOptions::BracketDepth, which limits the total number of parentheses, braces, or square brackets that can be open at once, but that limit is currently only applied when parsing, not during template instantiation. So in this specific case, as a bare minimum, I think we're missing a check for that language option during fold-expression expansion (each level of a fold-expression introduces a set of parentheses, so even a pedantic interpretation of the rules has us covered reusing the same limit here).

As for limiting the size of packs in general: I think it might make sense to impose a limit -- not to avoid excessive recursion; we should have other mechanisms to deal with that -- but to avoid huge memory usage during compilation due to forming overly-large packs. Eg, passing a very large integer to the __make_integer_seq builtin can easily cause Clang to fall into a hole from which it will never escape, assuming you don't run out of address space and just crash.

Setting an implementation limit on the overall depth of the AST could be interesting, if there's a way we can practically do it. This seems like a hard problem -- there are just too many different places and ways in which AST nodes are created, many of them aren't equipped to cope with the operation failing in some kind of gracefully-recoverable way, and just keeping track of the AST depth without paying a code complexity and memory usage cost seems like a challenge -- and we'd need to be sure the cost of solving it is worth the benefit.

On Mon, 31 Aug 2020 at 16:35, David Rector <[hidden email]> wrote:
RecursiveASTVisitor::TraverseStmt has a `DataRecursionQueue *Queue` in its signature to prevent stack overflow by default (as Richard once helpfully explained to me).  ASTMatchers uses the DataRecursionQueue, so that searching for matches in Stmt children should never overflow the stack, I believe (though it does confuse users who expect a more orderly traversal…but that’s another matter).

But `memoizedMatchesAncestorOfRecursively` does *not* use data recursion, as you’ve noted.  So in other words it is using data recursion when traversing descendants of deeply nested Exprs, but not when traversing the ancestors of those at the bottom.  

So, I think it could just be a limited problem with a simple solution: there should be a non-recursive, loop-based version of memoizedMatchesAncestorOfRecursively specific to traversing the ancestors of Stmts/Exprs, mirroring what RecursiveASTVisitor::TraverseStmt does.

Whether there are more general issues with e.g. evaluating extremely deep ASTs, is beyond my own depth, though...

Dave


On Aug 31, 2020, at 11:33 AM, Sam McCall via cfe-dev <[hidden email]> wrote:

(doh, obviously that first "constexpr" shouldn't be there)

On Mon, Aug 31, 2020 at 5:22 PM Sam McCall <[hidden email]> wrote:
This program creates an AST of depth 10000:

constexpr 
template <class T, T...V> struct seq {
  constexpr bool zero() { return (true && ... && (V == 0)); };
};
constexpr unsigned N = 10000;
auto x = __make_integer_seq<seq, int, N>{};
static_assert(!x.zero(), "");

On my machine, clang-10 parses this successfully (slowly!) but crashes if I increase to 100000.
I suppose -ftemplate-depth doesn't apply as there's no actual recursion here (just fold-expressions), but I feel like some implementation limit *should* apply: clang relies heavily on being able to traverse the AST recursively, and will inevitably crash if the tree is too tall.

Should there be a limit here? It seems tempting to say one of:
 -ftemplate-depth should constrain either the number of arguments a pack can represent
 -ftemplate-depth should constrain the size of pack that can be unfolded
 -some limit should constrain the overall AST height

---

The immediate problem we're dealing with is that https://reviews.llvm.org/D85621 increased the size of DynTypedNode from 32->56 and we're seeing clang-tidy crashes due to stack overflow in the recursive traversal MatchASTVisitor::memoizedMatchesAncestorOfRecursively.

However this is on a unit test very similar to the above (with N=2000) so before trying to squeeze stack usage I'd like to understand what we should be aiming to support.

(Tangentially related: Richard, if you think there are good ways to squeeze TemplateArgumentLoc, that'd be interesting to know. Currently it's by-value and holds TemplateArgument and TemplateArgumentLocInfo which are 24 bytes each)
I think it should be possible to squeeze TemplateArgument down to 16 bytes, but it'll be a little fiddly. All representations are already 16 bytes (including the kind field) except for the Declaration representation, and that one is only two pointers plus a kind. We have too many kinds to fit into the bottom bits of a pointer (we have 9 kinds currently), but we can handle that: all the other representations have a full 4 bytes for the "kind" field, so we can use (for example) bottom bit == 0 means it's the Declaration kind, and bottom bit == 1 means the remaining 31 bits are the actual kind. (Or we can get down to 8 kinds by representing "null" as -- for example -- a null type template argument, and then pack the kind in the first 3 bits.)

The lower-hanging fruit is TemplateArgumentLocInfo, for which we could heap-allocate the information for the template name case and reduce the class to 8 bytes. Given that template template arguments are the rarest kind, that's very likely to be a net memory usage reduction.

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev