Tentatively parsing a parameter-declaration-clause

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Tentatively parsing a parameter-declaration-clause

Nathan Ridge via cfe-dev
[CC: key people having touched that code]

Dear cfe-dev,

I'm working on fixing https://llvm.org/PR42851, a segfault that manifests when parsing a lambda with a default argument that's a lambda:

    int main() {
        auto apply = [](auto, void(*)() = []{}) { };
        apply(1); //                      ^^^^ BOOM
    }

Long story short, what's happening is that when we parse the parameter-declaration-clause of the lambda, we don't record that we're in a dependent context, i.e. that we're basically parsing a template for the closure type's operator(). We don't know that we're in a template until we actually hit the first parameter declared with 'auto', but by then we've already done some things wrong (e.g. we've injected declarations into a parse scope that is not marked as being a template).

So what I've come to believe is that we should "tentatively parse" the parameter-declaration-clause of the lambda and try to determine whether it's going to be a generic lambda before we actually go ahead and parse it "for real", with a parse scope that's correctly marked as being dependent (or not). However, we must do that tentative parsing without calling into Sema, because calling into Sema has side effects that we can't undo easily. This means I can't reuse most of the existing machinery for parsing a parameter-declaration-clause.

I found TentativeParsingAction and I'm aware of a few tentative parsing functions in ParseTentative.cpp (like TryParseParameterDeclarationClause, which doesn't do quite what I need). However, before I invest any more time down that path, I'd like to know whether others think I'm going down the right path.

Any insights welcome,
Louis


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Tentatively parsing a parameter-declaration-clause

Nathan Ridge via cfe-dev
We already have code to consume the token sequence of a default argument without parsing it (we do that when parsing a class definition). So we could delay parsing default arguments until after we parse the parameters and determine whether we are in a generic lambda.

But... I don't think that's enough. There could be a lambda-expression in the type of the parameter, and we might need to parse it and constant-evaluate a call to it in order to determine a parameter's type -- or just to parse at all in the case of naming a member template.

I'm... not certain it's even possible to handle this correctly in general.

Consider:

[](A<B<[]{ return Y<struct S>; }()>::C < D > (E),

At this point, we don't know whether we're still in a template argument list or not -- it depends on whether C is a template. And that depends on the evaluated value of the inner lambda.

We must parse and evaluate the inner lambda in order to keep parsing, but its meaning depends on whether the outer lambda is generic: we can't instantiate Y if the local struct S is a dependent type, which it is if the outer lambda is a generic lambda.

Now, we do know we're either at the start of a template argument or a function parameter, and a type containing a placeholder type is not valid in a template argument, so we could maybe keep going and assume that if we see a placeholder type, we must be in a function parameter so we must be parsing a generic lambda (but be careful about trailing return types...). It's not clear to me if there's some problem that prevents even that strategy from working, but I wouldn't be surprised (note that you can't do any name lookups, because you don't know if prior declaration-shaped things actually introduced any names in general).

I think our first port of call should be the core reflector, to see if maybe we can tweak the rules to make this easier.

One possible approach: lambda expressions shall not appear in a parameter declaration of a generic lambda unless there is a prior auto-typed parameter or the outer lambda has an explicit template parameter list. (That way, we can enter the dependent context when we finish parsing the first auto-typed parameter, and use the normal delay-parsing mechanism to handle default arguments.)

On Fri, 2 Aug 2019, 10:59 Louis Dionne via cfe-dev, <[hidden email]> wrote:
[CC: key people having touched that code]

Dear cfe-dev,

I'm working on fixing https://llvm.org/PR42851, a segfault that manifests when parsing a lambda with a default argument that's a lambda:

    int main() {
        auto apply = [](auto, void(*)() = []{}) { };
        apply(1); //                      ^^^^ BOOM
    }

Long story short, what's happening is that when we parse the parameter-declaration-clause of the lambda, we don't record that we're in a dependent context, i.e. that we're basically parsing a template for the closure type's operator(). We don't know that we're in a template until we actually hit the first parameter declared with 'auto', but by then we've already done some things wrong (e.g. we've injected declarations into a parse scope that is not marked as being a template).

So what I've come to believe is that we should "tentatively parse" the parameter-declaration-clause of the lambda and try to determine whether it's going to be a generic lambda before we actually go ahead and parse it "for real", with a parse scope that's correctly marked as being dependent (or not). However, we must do that tentative parsing without calling into Sema, because calling into Sema has side effects that we can't undo easily. This means I can't reuse most of the existing machinery for parsing a parameter-declaration-clause.

I found TentativeParsingAction and I'm aware of a few tentative parsing functions in ParseTentative.cpp (like TryParseParameterDeclarationClause, which doesn't do quite what I need). However, before I invest any more time down that path, I'd like to know whether others think I'm going down the right path.

Any insights welcome,
Louis

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Tentatively parsing a parameter-declaration-clause

Nathan Ridge via cfe-dev


On Aug 2, 2019, at 15:05, Richard Smith <[hidden email]> wrote:

We already have code to consume the token sequence of a default argument without parsing it (we do that when parsing a class definition). So we could delay parsing default arguments until after we parse the parameters and determine whether we are in a generic lambda.

But... I don't think that's enough. There could be a lambda-expression in the type of the parameter, and we might need to parse it and constant-evaluate a call to it in order to determine a parameter's type -- or just to parse at all in the case of naming a member template.

I'm... not certain it's even possible to handle this correctly in general.

Consider:

[](A<B<[]{ return Y<struct S>; }()>::C < D > (E),

At this point, we don't know whether we're still in a template argument list or not -- it depends on whether C is a template. And that depends on the evaluated value of the inner lambda.

We must parse and evaluate the inner lambda in order to keep parsing, but its meaning depends on whether the outer lambda is generic: we can't instantiate Y if the local struct S is a dependent type, which it is if the outer lambda is a generic lambda.

Now, we do know we're either at the start of a template argument or a function parameter, and a type containing a placeholder type is not valid in a template argument, so we could maybe keep going and assume that if we see a placeholder type, we must be in a function parameter so we must be parsing a generic lambda (but be careful about trailing return types...). It's not clear to me if there's some problem that prevents even that strategy from working, but I wouldn't be surprised (note that you can't do any name lookups, because you don't know if prior declaration-shaped things actually introduced any names in general).

I think our first port of call should be the core reflector, to see if maybe we can tweak the rules to make this easier.

One possible approach: lambda expressions shall not appear in a parameter declaration of a generic lambda unless there is a prior auto-typed parameter or the outer lambda has an explicit template parameter list. (That way, we can enter the dependent context when we finish parsing the first auto-typed parameter, and use the normal delay-parsing mechanism to handle default arguments.)

In that case, would you introduce a new parse scope marked as a TemplateParamScope when you encounter the first 'auto', or would you "change" the current parse scope to be a TemplateParamScope? Because we'll already have injected some declarations into the current parse scope when we encounter the first 'auto', so naively introducing a new (nested) TemplateParamScope doesn't work out of the box. I did try that first approach and got a bunch of problems. I wasn't sure whether it meant the existing code had to be tweaked to correct broken assumptions, or whether it meant that was fundamentally the wrong approach (but you might know).

Louis


On Fri, 2 Aug 2019, 10:59 Louis Dionne via cfe-dev, <[hidden email]> wrote:
[CC: key people having touched that code]

Dear cfe-dev,

I'm working on fixing https://llvm.org/PR42851, a segfault that manifests when parsing a lambda with a default argument that's a lambda:

    int main() {
        auto apply = [](auto, void(*)() = []{}) { };
        apply(1); //                      ^^^^ BOOM
    }

Long story short, what's happening is that when we parse the parameter-declaration-clause of the lambda, we don't record that we're in a dependent context, i.e. that we're basically parsing a template for the closure type's operator(). We don't know that we're in a template until we actually hit the first parameter declared with 'auto', but by then we've already done some things wrong (e.g. we've injected declarations into a parse scope that is not marked as being a template).

So what I've come to believe is that we should "tentatively parse" the parameter-declaration-clause of the lambda and try to determine whether it's going to be a generic lambda before we actually go ahead and parse it "for real", with a parse scope that's correctly marked as being dependent (or not). However, we must do that tentative parsing without calling into Sema, because calling into Sema has side effects that we can't undo easily. This means I can't reuse most of the existing machinery for parsing a parameter-declaration-clause.

I found TentativeParsingAction and I'm aware of a few tentative parsing functions in ParseTentative.cpp (like TryParseParameterDeclarationClause, which doesn't do quite what I need). However, before I invest any more time down that path, I'd like to know whether others think I'm going down the right path.

Any insights welcome,
Louis

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Tentatively parsing a parameter-declaration-clause

Nathan Ridge via cfe-dev
On Fri, 2 Aug 2019 at 12:17, Louis Dionne via cfe-dev <[hidden email]> wrote:


On Aug 2, 2019, at 15:05, Richard Smith <[hidden email]> wrote:

We already have code to consume the token sequence of a default argument without parsing it (we do that when parsing a class definition). So we could delay parsing default arguments until after we parse the parameters and determine whether we are in a generic lambda.

But... I don't think that's enough. There could be a lambda-expression in the type of the parameter, and we might need to parse it and constant-evaluate a call to it in order to determine a parameter's type -- or just to parse at all in the case of naming a member template.

I'm... not certain it's even possible to handle this correctly in general.

Consider:

[](A<B<[]{ return Y<struct S>; }()>::C < D > (E),

At this point, we don't know whether we're still in a template argument list or not -- it depends on whether C is a template. And that depends on the evaluated value of the inner lambda.

We must parse and evaluate the inner lambda in order to keep parsing, but its meaning depends on whether the outer lambda is generic: we can't instantiate Y if the local struct S is a dependent type, which it is if the outer lambda is a generic lambda.

Now, we do know we're either at the start of a template argument or a function parameter, and a type containing a placeholder type is not valid in a template argument, so we could maybe keep going and assume that if we see a placeholder type, we must be in a function parameter so we must be parsing a generic lambda (but be careful about trailing return types...). It's not clear to me if there's some problem that prevents even that strategy from working, but I wouldn't be surprised (note that you can't do any name lookups, because you don't know if prior declaration-shaped things actually introduced any names in general).

I think our first port of call should be the core reflector, to see if maybe we can tweak the rules to make this easier.

One possible approach: lambda expressions shall not appear in a parameter declaration of a generic lambda unless there is a prior auto-typed parameter or the outer lambda has an explicit template parameter list. (That way, we can enter the dependent context when we finish parsing the first auto-typed parameter, and use the normal delay-parsing mechanism to handle default arguments.)

In that case, would you introduce a new parse scope marked as a TemplateParamScope when you encounter the first 'auto', or would you "change" the current parse scope to be a TemplateParamScope? Because we'll already have injected some declarations into the current parse scope when we encounter the first 'auto', so naively introducing a new (nested) TemplateParamScope doesn't work out of the box. I did try that first approach and got a bunch of problems. I wasn't sure whether it meant the existing code had to be tweaked to correct broken assumptions, or whether it meant that was fundamentally the wrong approach (but you might know).

Maybe the simplest thing to do would be to always create an enclosing (placeholder) scope and convert that to a TemplateParamScope when we find that it needs to be one, and then update the flags on the inner function parameter scope to match. But it seems messy regardless of what we do.
 
Louis


On Fri, 2 Aug 2019, 10:59 Louis Dionne via cfe-dev, <[hidden email]> wrote:
[CC: key people having touched that code]

Dear cfe-dev,

I'm working on fixing https://llvm.org/PR42851, a segfault that manifests when parsing a lambda with a default argument that's a lambda:

    int main() {
        auto apply = [](auto, void(*)() = []{}) { };
        apply(1); //                      ^^^^ BOOM
    }

Long story short, what's happening is that when we parse the parameter-declaration-clause of the lambda, we don't record that we're in a dependent context, i.e. that we're basically parsing a template for the closure type's operator(). We don't know that we're in a template until we actually hit the first parameter declared with 'auto', but by then we've already done some things wrong (e.g. we've injected declarations into a parse scope that is not marked as being a template).

So what I've come to believe is that we should "tentatively parse" the parameter-declaration-clause of the lambda and try to determine whether it's going to be a generic lambda before we actually go ahead and parse it "for real", with a parse scope that's correctly marked as being dependent (or not). However, we must do that tentative parsing without calling into Sema, because calling into Sema has side effects that we can't undo easily. This means I can't reuse most of the existing machinery for parsing a parameter-declaration-clause.

I found TentativeParsingAction and I'm aware of a few tentative parsing functions in ParseTentative.cpp (like TryParseParameterDeclarationClause, which doesn't do quite what I need). However, before I invest any more time down that path, I'd like to know whether others think I'm going down the right path.

Any insights welcome,
Louis

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev