clang, unknown identifiers, and ahead of time compilation

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

clang, unknown identifiers, and ahead of time compilation

Axel Naumann
Hi,

let's try again with a hopefully clearer description of what we want to
do :-) Apologies for the lengthy email, it's hopefully justified by the
complexity of the problem.

Our future interpreter will need to support variables that are defined
from some context at runtime. They are unknown at compile time. Like this:

int f() {
   int ret = 0;
   ret = h->D(); // h is unknown at compile time
   return ret
}

Our plan is to "escape" them into a runtime invocation of sema (and
actually, runtime codegen), sort of like this:

void f() {
   int ret = 0;
   float arg =42.
   {
     DelayedCompiler::Context __ctx; __ctx.add("arg", arg);
     ret = DelayedCompiler::Compile("h->D(arg);", __ctx);
   }
   return ret;
}

where DelayedCompiler::Compile() is the invocation of the compiler at
runtime, which will compile the argument given some context. So we are
actually talking about ahead-of-time compiled f() with a runtime
compiled "ret = h->D();".

We found that Sema::ActOnIdExpression() is a function we like. It knows
when a lookup fails ("h"), and it allows us to "escape" the expression
by converting it into DelayedCompiler::Compile("h"), returning an
ExpressionResult of our choice.

The problem is that we need to escape the whole statement, and as such
ActOnIdExpression() is both insufficient and at the wrong level.

I see two options (and I am positive that I overlooked some :-):

1. extract the statement from the parser, walk up the AST to the
statement node, and replace it with our context building and
DelayedCompiler::Compile() invocation using the statement as the parser
saw it. This will probably involve some re-invocation of the lexer etc,
to analyze the statement and extract the necessary context.

2. poison the node, and use (add?) a mechanism to build a poisoned AST
statement from poisoned expression nodes. I.e. if a child node is
poisoned then the parent node is poisoned, too - but at least the AST
builds. Then in a second pass we can replace the poisoned statement by
our invocation to DelayedCompiler::Compile() - but this time we have the
parsed nodes already (yes, no valid AST, but lexed source that we can
use to build the DelayedCompiler::Compile() invocation).

If it helps, we know that all the unknown ids are pointers, and we can
limit the use cases we support (e.g. the candidate list for overload
resolution must have exactly one entry etc).

Any comments, recommendations, ideas?

Cheers, Axel.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: clang, unknown identifiers, and ahead of time compilation

Sebastian Redl

On Sep 1, 2010, at 12:54 PM, Axel Naumann wrote:

> Our future interpreter will need to support variables that are defined
> from some context at runtime. They are unknown at compile time. Like this:
>
> int f() {
>  int ret = 0;
>  ret = h->D(); // h is unknown at compile time
>  return ret
> }

It all comes down to C++ making it impossible to unambiguously parse past identifiers of unknown kind. That is, unless you know whether the identifier in question names a variable, type or template, you simply cannot continue parsing.
For example, the simple expression a(b) is a function call if a is a function or object, but an explicit cast if it's a type. Sema::isTypeName() is called by the parser to make the distinction.
Another example, a<b>(c)+d could be a rather weird but valid expression if all identifiers are variables (test if a is less than b, and test whether the result is greater than c plus d), or a call to a template function followed by an addition if a is a template function and b is a type or constant and c and d are variables, or an explicit type cast followed by an addition if a is a class template and b is a type or constant and c and d are variable, or another weird expression if a, b and d are variables and c is a type (check if a is less than b, and then compare the result to d after the unary + operator has been applied and the result was cast to c).

Of course, you could just defined any unknown identifier to be a variable name. In addition, you could tell the AST that its type is dependent, to suppress all type checks. Finally, add your runtime-evaluation bit and it would probably work. Just remember that you could be suppressing diagnostics in a way that results in VERY unexpected behavior. (In the above example, what if the user meant to refer to the type C, not the runtime variable c?)

Sebastian
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: clang, unknown identifiers, and ahead of time compilation

Axel Naumann
Hi Sebastian,

thanks for your quick answer!

Sebastian Redl wrote on 09/02/2010 01:00 AM:

> On Sep 1, 2010, at 12:54 PM, Axel Naumann wrote:
>> Our future interpreter will need to support variables that are
>> defined from some context at runtime. They are unknown at compile
>> time. Like this:
>>
>> int f() {
>>   int ret = 0;
>>   ret = h->D(); // h is unknown at compile time
>>   return ret
>> }
>
> It all comes down to C++ making it impossible to unambiguously parse
> past identifiers of unknown kind.

I know, otherwise it would be trivial to implement what we need :-)

> That is, unless you know whether
> the identifier in question names a variable, type or template, you
> simply cannot continue parsing. For example, the simple expression
> a(b) is a function call if a is a function or object, but an explicit
> cast if it's a type. Sema::isTypeName() is called by the parser to
> make the distinction.

As I said, all unknown IDs we can have a definition for at runtime are
pointers. So at (ahead-of-time) compile time we assume they are
pointers; if they are not, the expression will be evaluated as invalid
at runtime anyway.

> Another example, a<b>(c)+d could be a rather weird but valid expression
[...]

Yes, I once asked Bjarne Stroustrup whether he cares about how difficult
it is to parse C++. Guess his answer :-)

> Of course, you could just defined any unknown identifier to be a
> variable name. In addition, you could tell the AST that its type is
> dependent, to suppress all type checks.

Fantastic! Brilliant idea, Sebastian. We'll give that a try. I am sure
we will have follow-up questions ;-)

> Just remember that
> you could be suppressing diagnostics in a way that results in VERY
> unexpected behavior. (In the above example, what if the user meant to
> refer to the type C, not the runtime variable c?)

Yes, we know it's ugly. It's all (well, some of :-) the pain of python
combined with the powers of C++. But at least people will get an error -
only it'll be at run time.

Cheers, Axel.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev