Constexpr evaluation speed

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Constexpr evaluation speed

Manas via cfe-dev
I was reading http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1240r1.pdf, which describes proposed reflection features currently under advanced consideration.  

On page 5, the authors give their rationale for defining `reflexpr(x)` to return an object of an opaque builtin type called `meta::info`, which is useful only when passed to other builtin functions able to access its various properties for different reflected entities.  This design is favored over the alternative of defining it to return an AST-class-like object specific to each reflected entity (ReflectedDeclaration, ReflectedStatement etc.).  

In other words given this design the user must write e.g. `meta::parameters_of(reflexpr(somefunc))` instead of e.g. `reflexpr(somefunc)->parameters()`.

One rationale the authors give for this choice is that they found that accessing subobjects of a constexpr class object is significantly slower than accessing values which are not subobjects, all else being equal.  The authors present an example on pp 5-6.  

I tried to reproduce this example and their results in CompilerExplorer, and was mostly just shocked at the apparent orders-of-magnitude-differences between GCC (by far the fastest), Clang, and MSVC (by far the slowest) at both constant-evaluation tasks.

Example A (NB `f()` deals only with complete objects):

Example B (same as A, except `f()` now defined to dig through subobjects to get the data it needs):

Results of 5 trials:
                    __Compilation_times__
A
GCC  (1793B):   337,   408,   325,   435,   242 ms
Clang (218B):  9361,  5173,  4066,  5698,  4263 ms
MSVC  (306B): 21850, 24616, 24957, 24925, 32323 ms

B
GCC  (2295B):   471,   319,   403,   309,   323 ms
Clang (218B): 17073, 15540, 17281, 13385, 18540 ms
MSVC   (n/a): always >60k ms

Takeaways:
1. Clang performs constant evaluation 10-50 times slower than GCC.
2. While Clang performs B ~3 times slower than it performs A, it is not clear that GCC is likewise affected by having to dig through subobjects (if it is, the effect is slight).

Questions:
1. What am I missing?  Are there flags which might improve clang’s performance?  Is GCC somehow gaining an unfair advantage?  (Potential clue: note that the executable is the same small size, 218 bytes, for each of clang’s results, but it is larger and differs for GCC’s results…meaningful?)  
2. Given the "constexpr all the things" zeitgeist, and the constant evaluation speeds GCC has apparently realized, should the design of ExprConstant.cpp/APValue/etc. be reconsidered?
3. If ExprConstant.cpp etc were overhauled for optimal speed, will it still ultimately be true that programs which dig through subobjects of compile time objects are necessarily slower than equivalent programs which deal only with complete compile-time objects?

Not a pressing matter, but maybe worthy of some thought.  Thanks,

Dave

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Constexpr evaluation speed

Manas via cfe-dev
On Tue, 2 Mar 2021 at 16:46, David Rector via cfe-dev <[hidden email]> wrote:
I was reading http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1240r1.pdf, which describes proposed reflection features currently under advanced consideration.  

On page 5, the authors give their rationale for defining `reflexpr(x)` to return an object of an opaque builtin type called `meta::info`, which is useful only when passed to other builtin functions able to access its various properties for different reflected entities.  This design is favored over the alternative of defining it to return an AST-class-like object specific to each reflected entity (ReflectedDeclaration, ReflectedStatement etc.).  

In other words given this design the user must write e.g. `meta::parameters_of(reflexpr(somefunc))` instead of e.g. `reflexpr(somefunc)->parameters()`.

One rationale the authors give for this choice is that they found that accessing subobjects of a constexpr class object is significantly slower than accessing values which are not subobjects, all else being equal.  The authors present an example on pp 5-6.  

I tried to reproduce this example and their results in CompilerExplorer, and was mostly just shocked at the apparent orders-of-magnitude-differences between GCC (by far the fastest), Clang, and MSVC (by far the slowest) at both constant-evaluation tasks.

Example A (NB `f()` deals only with complete objects):

Example B (same as A, except `f()` now defined to dig through subobjects to get the data it needs):

Results of 5 trials:
                    __Compilation_times__
A
GCC  (1793B):   337,   408,   325,   435,   242 ms
Clang (218B):  9361,  5173,  4066,  5698,  4263 ms
MSVC  (306B): 21850, 24616, 24957, 24925, 32323 ms

B
GCC  (2295B):   471,   319,   403,   309,   323 ms
Clang (218B): 17073, 15540, 17281, 13385, 18540 ms
MSVC   (n/a): always >60k ms

Takeaways:
1. Clang performs constant evaluation 10-50 times slower than GCC.
2. While Clang performs B ~3 times slower than it performs A, it is not clear that GCC is likewise affected by having to dig through subobjects (if it is, the effect is slight).

Questions:
1. What am I missing?  Are there flags which might improve clang’s performance?  Is GCC somehow gaining an unfair advantage?  (Potential clue: note that the executable is the same small size, 218 bytes, for each of clang’s results, but it is larger and differs for GCC’s results…meaningful?)  

Yes, GCC is "cheating" (you're not testing what you think you are). GCC memoizes constant evaluations (at least when it's correct to do so). Here's a slightly modified version of your A that doesn't permit memoization: https://godbolt.org/z/3q3TYr

5 trials with that and GCC (1793B):  12340, 11106, 10204, 9983, 10771ms

(Times for Clang and MSVC seem similar to your measurements.) I don't know if compiler explorer uses the same machines for all compiles, or if all compilers are built in fully-optimized mode, but if so, that suggests that Clang is about 2x faster than GCC for this particular testcase.

2. Given the "constexpr all the things" zeitgeist, and the constant evaluation speeds GCC has apparently realized, should the design of ExprConstant.cpp/APValue/etc. be reconsidered?

Ignoring the part about GCC, yes. We have a -fexperimental-new-constant-interpreter flag that enables a new interpreter, which was built to be substantially faster. Unfortunately it's not complete yet (the version in trunk doesn't support any looping constructs yet) but the early indications are very promising.

3. If ExprConstant.cpp etc were overhauled for optimal speed, will it still ultimately be true that programs which dig through subobjects of compile time objects are necessarily slower than equivalent programs which deal only with complete compile-time objects?

I don't think that would necessarily be the case. I've also mentioned that in committee, but "one compiler can do X" doesn't necessarily translate into "everyone will do X", and folks representing other compilers have indicated they expect the more "bare-bones" approach will remain faster for their implementations.

Not a pressing matter, but maybe worthy of some thought.  Thanks,

Dave
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Constexpr evaluation speed

Manas via cfe-dev


On Mar 2, 2021, at 10:03 PM, Richard Smith <[hidden email]> wrote:

On Tue, 2 Mar 2021 at 16:46, David Rector via cfe-dev <[hidden email]> wrote:
I was reading http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1240r1.pdf, which describes proposed reflection features currently under advanced consideration.  

On page 5, the authors give their rationale for defining `reflexpr(x)` to return an object of an opaque builtin type called `meta::info`, which is useful only when passed to other builtin functions able to access its various properties for different reflected entities.  This design is favored over the alternative of defining it to return an AST-class-like object specific to each reflected entity (ReflectedDeclaration, ReflectedStatement etc.).  

In other words given this design the user must write e.g. `meta::parameters_of(reflexpr(somefunc))` instead of e.g. `reflexpr(somefunc)->parameters()`.

One rationale the authors give for this choice is that they found that accessing subobjects of a constexpr class object is significantly slower than accessing values which are not subobjects, all else being equal.  The authors present an example on pp 5-6.  

I tried to reproduce this example and their results in CompilerExplorer, and was mostly just shocked at the apparent orders-of-magnitude-differences between GCC (by far the fastest), Clang, and MSVC (by far the slowest) at both constant-evaluation tasks.

Example A (NB `f()` deals only with complete objects):

Example B (same as A, except `f()` now defined to dig through subobjects to get the data it needs):

Results of 5 trials:
                    __Compilation_times__
A
GCC  (1793B):   337,   408,   325,   435,   242 ms
Clang (218B):  9361,  5173,  4066,  5698,  4263 ms
MSVC  (306B): 21850, 24616, 24957, 24925, 32323 ms

B
GCC  (2295B):   471,   319,   403,   309,   323 ms
Clang (218B): 17073, 15540, 17281, 13385, 18540 ms
MSVC   (n/a): always >60k ms

Takeaways:
1. Clang performs constant evaluation 10-50 times slower than GCC.
2. While Clang performs B ~3 times slower than it performs A, it is not clear that GCC is likewise affected by having to dig through subobjects (if it is, the effect is slight).

Questions:
1. What am I missing?  Are there flags which might improve clang’s performance?  Is GCC somehow gaining an unfair advantage?  (Potential clue: note that the executable is the same small size, 218 bytes, for each of clang’s results, but it is larger and differs for GCC’s results…meaningful?)  

Yes, GCC is "cheating" (you're not testing what you think you are). GCC memoizes constant evaluations (at least when it's correct to do so). Here's a slightly modified version of your A that doesn't permit memoization: https://godbolt.org/z/3q3TYr

5 trials with that and GCC (1793B):  12340, 11106, 10204, 9983, 10771ms

(Times for Clang and MSVC seem similar to your measurements.) I don't know if compiler explorer uses the same machines for all compiles, or if all compilers are built in fully-optimized mode, but if so, that suggests that Clang is about 2x faster than GCC for this particular testcase.

2. Given the "constexpr all the things" zeitgeist, and the constant evaluation speeds GCC has apparently realized, should the design of ExprConstant.cpp/APValue/etc. be reconsidered?

Ignoring the part about GCC, yes. We have a -fexperimental-new-constant-interpreter flag that enables a new interpreter, which was built to be substantially faster. Unfortunately it's not complete yet (the version in trunk doesn't support any looping constructs yet) but the early indications are very promising.

3. If ExprConstant.cpp etc were overhauled for optimal speed, will it still ultimately be true that programs which dig through subobjects of compile time objects are necessarily slower than equivalent programs which deal only with complete compile-time objects?

I don't think that would necessarily be the case. I've also mentioned that in committee, but "one compiler can do X" doesn't necessarily translate into "everyone will do X", and folks representing other compilers have indicated they expect the more "bare-bones" approach will remain faster for their implementations.


Thanks Richard, very thorough answer; that adjustment does indeed turn the tables on GCC.  Clang takes the gold, and that’s before even the fancy new interpreter is deployed.  Looking forward to it.

Assuming like machines are running the various compilers on CompilerExplorer, I think we can all hazard a pretty good guess at this point which particular compiler really benefits from keeping things as bare bones as possible ;)

Dave


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Constexpr evaluation speed

Manas via cfe-dev
One more vein of thought re constexpr evaluation speed, the new constexpr interpreter, and in particular how the C++ standard could be tweaked to allow us to use existing LLVM optimizations to really turbocharge code with heavy constexpr usage.

Caveat: my knowledge is mostly about the AST; I know very little about the ABI/calling conventions/LLVM etc, which are implicated here.

Motivation: it seems to me that constexpr evaluations will get very complex in the future, and we should plan for it.  In particular, I have been fighting a bit of a battle on the SG7 list to ensure the reflection + injection facilities will be sufficiently general to allow most design patterns to be rendered obsolete via metafunctions, and I think we came to a rough agreement that this is indeed a worthy and viable goal.  

But this will involve some very complex metaprogramming.  Indeed, constexpr programming may well become more complex than non-constexpr programming once reflection + injection get involved, and in fact *that would be the ideal*.  Let the user automate the tasks which make C++ programming complex, but via customizable libraries from which they can pick and choose and modify, rather than endless fixed appendages to the language.

But as users constexpr all the things, so must compilers consider how best to constexpr all their optimizations.  Why rewrite optimizations specific to constexpr functions which have already been written into LLVM?

The RFC for the new constexpr intepreter has some good discussion, and seems to present an opportunity.  In particular I’m looking at the comments here: 
https://lists.llvm.org/pipermail/cfe-dev/2019-July/062807.html, and in particular this response to Richard from Nandor:

> >  * The current APValue representation is extremely bloated. Each instance is 72 bytes, and every subobject of an object is stored as a distinct APValue, so for instance a single char[128] variable will often occupy 9288 bytes of storage and incurs 128 distinct memory allocations.
Arrays and structures will be stored in a compact, contiguous form in memory, so we can save a lot of space here.

Suppose we go a short step further and store that data in compliance with the ABI just like it were run-time data (with the proper alignment, offsets etc of subobjects).  

Then, suppose this were permissible code:

```
// foo.h
struct A { constexpr A() {} … };
constexpr int foo(A a);

// foo.cpp
constexpr int foo(A a) { 
  // Complicated functions, lots of loops etc. which
  // can be well-optimized by LLVM…
}

// main.cpp (must be compiled AFTER foo.cpp, or error)
#include "foo.h"
template<int N> class Dummy {};
int main() {
  constexpr A a{};
  Dummy<foo(a)> d;
}
```

Upon being called to evaluate `foo(a)`, the interpreter would determine that this function has already been fully compiled into binary, *and* was marked constexpr (so we know there is no funny business in the definition), and therefore it can simply call that function directly, in whatever manner LLVM would call such a function, i.e. passing the raw data in accordance with the ABI.

This would seemingly allow us to benefit from optimizations written only for lowered versions of that code, for particularly-complicated constexpr code that warrant it — the user need only a) put the definitions in separate cpps/libraries, and b) ensure they are built before any translation units which depend on them.  For trivial constexpr evaluation, nothing need change.

Thoughts?  Is this remotely viable/sensible?

Dave

On Mar 2, 2021, at 11:28 PM, David Rector <[hidden email]> wrote:



On Mar 2, 2021, at 10:03 PM, Richard Smith <[hidden email]> wrote:

On Tue, 2 Mar 2021 at 16:46, David Rector via cfe-dev <[hidden email]> wrote:
I was reading http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1240r1.pdf, which describes proposed reflection features currently under advanced consideration.  

On page 5, the authors give their rationale for defining `reflexpr(x)` to return an object of an opaque builtin type called `meta::info`, which is useful only when passed to other builtin functions able to access its various properties for different reflected entities.  This design is favored over the alternative of defining it to return an AST-class-like object specific to each reflected entity (ReflectedDeclaration, ReflectedStatement etc.).  

In other words given this design the user must write e.g. `meta::parameters_of(reflexpr(somefunc))` instead of e.g. `reflexpr(somefunc)->parameters()`.

One rationale the authors give for this choice is that they found that accessing subobjects of a constexpr class object is significantly slower than accessing values which are not subobjects, all else being equal.  The authors present an example on pp 5-6.  

I tried to reproduce this example and their results in CompilerExplorer, and was mostly just shocked at the apparent orders-of-magnitude-differences between GCC (by far the fastest), Clang, and MSVC (by far the slowest) at both constant-evaluation tasks.

Example A (NB `f()` deals only with complete objects):

Example B (same as A, except `f()` now defined to dig through subobjects to get the data it needs):

Results of 5 trials:
                    __Compilation_times__
A
GCC  (1793B):   337,   408,   325,   435,   242 ms
Clang (218B):  9361,  5173,  4066,  5698,  4263 ms
MSVC  (306B): 21850, 24616, 24957, 24925, 32323 ms

B
GCC  (2295B):   471,   319,   403,   309,   323 ms
Clang (218B): 17073, 15540, 17281, 13385, 18540 ms
MSVC   (n/a): always >60k ms

Takeaways:
1. Clang performs constant evaluation 10-50 times slower than GCC.
2. While Clang performs B ~3 times slower than it performs A, it is not clear that GCC is likewise affected by having to dig through subobjects (if it is, the effect is slight).

Questions:
1. What am I missing?  Are there flags which might improve clang’s performance?  Is GCC somehow gaining an unfair advantage?  (Potential clue: note that the executable is the same small size, 218 bytes, for each of clang’s results, but it is larger and differs for GCC’s results…meaningful?)  

Yes, GCC is "cheating" (you're not testing what you think you are). GCC memoizes constant evaluations (at least when it's correct to do so). Here's a slightly modified version of your A that doesn't permit memoization: https://godbolt.org/z/3q3TYr

5 trials with that and GCC (1793B):  12340, 11106, 10204, 9983, 10771ms

(Times for Clang and MSVC seem similar to your measurements.) I don't know if compiler explorer uses the same machines for all compiles, or if all compilers are built in fully-optimized mode, but if so, that suggests that Clang is about 2x faster than GCC for this particular testcase.

2. Given the "constexpr all the things" zeitgeist, and the constant evaluation speeds GCC has apparently realized, should the design of ExprConstant.cpp/APValue/etc. be reconsidered?

Ignoring the part about GCC, yes. We have a -fexperimental-new-constant-interpreter flag that enables a new interpreter, which was built to be substantially faster. Unfortunately it's not complete yet (the version in trunk doesn't support any looping constructs yet) but the early indications are very promising.

3. If ExprConstant.cpp etc were overhauled for optimal speed, will it still ultimately be true that programs which dig through subobjects of compile time objects are necessarily slower than equivalent programs which deal only with complete compile-time objects?

I don't think that would necessarily be the case. I've also mentioned that in committee, but "one compiler can do X" doesn't necessarily translate into "everyone will do X", and folks representing other compilers have indicated they expect the more "bare-bones" approach will remain faster for their implementations.


Thanks Richard, very thorough answer; that adjustment does indeed turn the tables on GCC.  Clang takes the gold, and that’s before even the fancy new interpreter is deployed.  Looking forward to it.

Assuming like machines are running the various compilers on CompilerExplorer, I think we can all hazard a pretty good guess at this point which particular compiler really benefits from keeping things as bare bones as possible ;)

Dave


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Constexpr evaluation speed

Manas via cfe-dev
I don't think you can easily substitute constexpr evaluation with simply calling a compiled counterpart. Unfortunately, they differ in semantics.
During constexpr evaluation AFAIK we should diagnose undefined behavior. Simply substituting the constexpr evaluation with calling the binary version would not diagnose that - assuming that we don't want to pessimize the runtime version of the function with eg. sanitizers. The issue is the same for optimizations. The only way you could safely do this if you could prove that for the given input this function will not exhibit undefined behavior. For a given value, interpretation is a way of checking this, but for **all** possible values, it seems really tough.

How does constexpr evaluation differ from javascript evaluation in browsers?
They are doing just-in-time compilation. Maybe that is the future of constexpr evaluation. Hal Finkel experimented with similar stuff in C++ if I'm correct.
He might have some thoughts about this.
Could someone CC him?

David Rector via cfe-dev <[hidden email]> ezt írta (időpont: 2021. márc. 4., Cs, 17:15):
One more vein of thought re constexpr evaluation speed, the new constexpr interpreter, and in particular how the C++ standard could be tweaked to allow us to use existing LLVM optimizations to really turbocharge code with heavy constexpr usage.

Caveat: my knowledge is mostly about the AST; I know very little about the ABI/calling conventions/LLVM etc, which are implicated here.

Motivation: it seems to me that constexpr evaluations will get very complex in the future, and we should plan for it.  In particular, I have been fighting a bit of a battle on the SG7 list to ensure the reflection + injection facilities will be sufficiently general to allow most design patterns to be rendered obsolete via metafunctions, and I think we came to a rough agreement that this is indeed a worthy and viable goal.  

But this will involve some very complex metaprogramming.  Indeed, constexpr programming may well become more complex than non-constexpr programming once reflection + injection get involved, and in fact *that would be the ideal*.  Let the user automate the tasks which make C++ programming complex, but via customizable libraries from which they can pick and choose and modify, rather than endless fixed appendages to the language.

But as users constexpr all the things, so must compilers consider how best to constexpr all their optimizations.  Why rewrite optimizations specific to constexpr functions which have already been written into LLVM?

The RFC for the new constexpr intepreter has some good discussion, and seems to present an opportunity.  In particular I’m looking at the comments here: 
https://lists.llvm.org/pipermail/cfe-dev/2019-July/062807.html, and in particular this response to Richard from Nandor:

> >  * The current APValue representation is extremely bloated. Each instance is 72 bytes, and every subobject of an object is stored as a distinct APValue, so for instance a single char[128] variable will often occupy 9288 bytes of storage and incurs 128 distinct memory allocations.
Arrays and structures will be stored in a compact, contiguous form in memory, so we can save a lot of space here.

Suppose we go a short step further and store that data in compliance with the ABI just like it were run-time data (with the proper alignment, offsets etc of subobjects).  

Then, suppose this were permissible code:

```
// foo.h
struct A { constexpr A() {} … };
constexpr int foo(A a);

// foo.cpp
constexpr int foo(A a) { 
  // Complicated functions, lots of loops etc. which
  // can be well-optimized by LLVM…
}

// main.cpp (must be compiled AFTER foo.cpp, or error)
#include "foo.h"
template<int N> class Dummy {};
int main() {
  constexpr A a{};
  Dummy<foo(a)> d;
}
```

Upon being called to evaluate `foo(a)`, the interpreter would determine that this function has already been fully compiled into binary, *and* was marked constexpr (so we know there is no funny business in the definition), and therefore it can simply call that function directly, in whatever manner LLVM would call such a function, i.e. passing the raw data in accordance with the ABI.

This would seemingly allow us to benefit from optimizations written only for lowered versions of that code, for particularly-complicated constexpr code that warrant it — the user need only a) put the definitions in separate cpps/libraries, and b) ensure they are built before any translation units which depend on them.  For trivial constexpr evaluation, nothing need change.

Thoughts?  Is this remotely viable/sensible?

Dave

On Mar 2, 2021, at 11:28 PM, David Rector <[hidden email]> wrote:



On Mar 2, 2021, at 10:03 PM, Richard Smith <[hidden email]> wrote:

On Tue, 2 Mar 2021 at 16:46, David Rector via cfe-dev <[hidden email]> wrote:
I was reading http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1240r1.pdf, which describes proposed reflection features currently under advanced consideration.  

On page 5, the authors give their rationale for defining `reflexpr(x)` to return an object of an opaque builtin type called `meta::info`, which is useful only when passed to other builtin functions able to access its various properties for different reflected entities.  This design is favored over the alternative of defining it to return an AST-class-like object specific to each reflected entity (ReflectedDeclaration, ReflectedStatement etc.).  

In other words given this design the user must write e.g. `meta::parameters_of(reflexpr(somefunc))` instead of e.g. `reflexpr(somefunc)->parameters()`.

One rationale the authors give for this choice is that they found that accessing subobjects of a constexpr class object is significantly slower than accessing values which are not subobjects, all else being equal.  The authors present an example on pp 5-6.  

I tried to reproduce this example and their results in CompilerExplorer, and was mostly just shocked at the apparent orders-of-magnitude-differences between GCC (by far the fastest), Clang, and MSVC (by far the slowest) at both constant-evaluation tasks.

Example A (NB `f()` deals only with complete objects):

Example B (same as A, except `f()` now defined to dig through subobjects to get the data it needs):

Results of 5 trials:
                    __Compilation_times__
A
GCC  (1793B):   337,   408,   325,   435,   242 ms
Clang (218B):  9361,  5173,  4066,  5698,  4263 ms
MSVC  (306B): 21850, 24616, 24957, 24925, 32323 ms

B
GCC  (2295B):   471,   319,   403,   309,   323 ms
Clang (218B): 17073, 15540, 17281, 13385, 18540 ms
MSVC   (n/a): always >60k ms

Takeaways:
1. Clang performs constant evaluation 10-50 times slower than GCC.
2. While Clang performs B ~3 times slower than it performs A, it is not clear that GCC is likewise affected by having to dig through subobjects (if it is, the effect is slight).

Questions:
1. What am I missing?  Are there flags which might improve clang’s performance?  Is GCC somehow gaining an unfair advantage?  (Potential clue: note that the executable is the same small size, 218 bytes, for each of clang’s results, but it is larger and differs for GCC’s results…meaningful?)  

Yes, GCC is "cheating" (you're not testing what you think you are). GCC memoizes constant evaluations (at least when it's correct to do so). Here's a slightly modified version of your A that doesn't permit memoization: https://godbolt.org/z/3q3TYr

5 trials with that and GCC (1793B):  12340, 11106, 10204, 9983, 10771ms

(Times for Clang and MSVC seem similar to your measurements.) I don't know if compiler explorer uses the same machines for all compiles, or if all compilers are built in fully-optimized mode, but if so, that suggests that Clang is about 2x faster than GCC for this particular testcase.

2. Given the "constexpr all the things" zeitgeist, and the constant evaluation speeds GCC has apparently realized, should the design of ExprConstant.cpp/APValue/etc. be reconsidered?

Ignoring the part about GCC, yes. We have a -fexperimental-new-constant-interpreter flag that enables a new interpreter, which was built to be substantially faster. Unfortunately it's not complete yet (the version in trunk doesn't support any looping constructs yet) but the early indications are very promising.

3. If ExprConstant.cpp etc were overhauled for optimal speed, will it still ultimately be true that programs which dig through subobjects of compile time objects are necessarily slower than equivalent programs which deal only with complete compile-time objects?

I don't think that would necessarily be the case. I've also mentioned that in committee, but "one compiler can do X" doesn't necessarily translate into "everyone will do X", and folks representing other compilers have indicated they expect the more "bare-bones" approach will remain faster for their implementations.


Thanks Richard, very thorough answer; that adjustment does indeed turn the tables on GCC.  Clang takes the gold, and that’s before even the fancy new interpreter is deployed.  Looking forward to it.

Assuming like machines are running the various compilers on CompilerExplorer, I think we can all hazard a pretty good guess at this point which particular compiler really benefits from keeping things as bare bones as possible ;)

Dave

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Constexpr evaluation speed

Manas via cfe-dev


On Mar 4, 2021, at 2:26 PM, Balázs Benics <[hidden email]> wrote:

I don't think you can easily substitute constexpr evaluation with simply calling a compiled counterpart. Unfortunately, they differ in semantics.
During constexpr evaluation AFAIK we should diagnose undefined behavior. Simply substituting the constexpr evaluation with calling the binary version would not diagnose that - assuming that we don't want to pessimize the runtime version of the function with eg. sanitizers. The issue is the same for optimizations. The only way you could safely do this if you could prove that for the given input this function will not exhibit undefined behavior. For a given value, interpretation is a way of checking this, but for **all** possible values, it seems really tough.

How does constexpr evaluation differ from javascript evaluation in browsers?
They are doing just-in-time compilation. Maybe that is the future of constexpr evaluation. Hal Finkel experimented with similar stuff in C++ if I'm correct.
He might have some thoughts about this.
Could someone CC him?

Thanks for this input, you’re right — e.g. `constexpr void f(A *a)` might or might not be able to handle `f(nullptr)`; it would require proof, and there things probably get too complex.  

In the constexpr interpreter RFC Nandor and JF presents this interpreter as the basis on which a JIT could be later added:
(CC’d Hal and JF; would cc Nandor but can’t find an email.)

In particular this is the conclusion, from JF:
> A JIT is therefore something to keep in mind, but definitely not something to prioritize. It’ll only start making sense *if* people start writing extremely complex constexpr code. If that happens, the interpreter is the right starting point.
I suppose the only additional point added here is that I suspect we will indeed eventually have to handle extremely complex constexpr code, and so I would be interested to hear if Hal agrees with this conclusion.


David Rector via cfe-dev <[hidden email]> ezt írta (időpont: 2021. márc. 4., Cs, 17:15):
One more vein of thought re constexpr evaluation speed, the new constexpr interpreter, and in particular how the C++ standard could be tweaked to allow us to use existing LLVM optimizations to really turbocharge code with heavy constexpr usage.

Caveat: my knowledge is mostly about the AST; I know very little about the ABI/calling conventions/LLVM etc, which are implicated here.

Motivation: it seems to me that constexpr evaluations will get very complex in the future, and we should plan for it.  In particular, I have been fighting a bit of a battle on the SG7 list to ensure the reflection + injection facilities will be sufficiently general to allow most design patterns to be rendered obsolete via metafunctions, and I think we came to a rough agreement that this is indeed a worthy and viable goal.  

But this will involve some very complex metaprogramming.  Indeed, constexpr programming may well become more complex than non-constexpr programming once reflection + injection get involved, and in fact *that would be the ideal*.  Let the user automate the tasks which make C++ programming complex, but via customizable libraries from which they can pick and choose and modify, rather than endless fixed appendages to the language.

But as users constexpr all the things, so must compilers consider how best to constexpr all their optimizations.  Why rewrite optimizations specific to constexpr functions which have already been written into LLVM?

The RFC for the new constexpr intepreter has some good discussion, and seems to present an opportunity.  In particular I’m looking at the comments here: 
https://lists.llvm.org/pipermail/cfe-dev/2019-July/062807.html, and in particular this response to Richard from Nandor:

> >  * The current APValue representation is extremely bloated. Each instance is 72 bytes, and every subobject of an object is stored as a distinct APValue, so for instance a single char[128] variable will often occupy 9288 bytes of storage and incurs 128 distinct memory allocations.
Arrays and structures will be stored in a compact, contiguous form in memory, so we can save a lot of space here.

Suppose we go a short step further and store that data in compliance with the ABI just like it were run-time data (with the proper alignment, offsets etc of subobjects).  

Then, suppose this were permissible code:

```
// foo.h
struct A { constexpr A() {} … };
constexpr int foo(A a);

// foo.cpp
constexpr int foo(A a) { 
  // Complicated functions, lots of loops etc. which
  // can be well-optimized by LLVM…
}

// main.cpp (must be compiled AFTER foo.cpp, or error)
#include "foo.h"
template<int N> class Dummy {};
int main() {
  constexpr A a{};
  Dummy<foo(a)> d;
}
```

Upon being called to evaluate `foo(a)`, the interpreter would determine that this function has already been fully compiled into binary, *and* was marked constexpr (so we know there is no funny business in the definition), and therefore it can simply call that function directly, in whatever manner LLVM would call such a function, i.e. passing the raw data in accordance with the ABI.

This would seemingly allow us to benefit from optimizations written only for lowered versions of that code, for particularly-complicated constexpr code that warrant it — the user need only a) put the definitions in separate cpps/libraries, and b) ensure they are built before any translation units which depend on them.  For trivial constexpr evaluation, nothing need change.

Thoughts?  Is this remotely viable/sensible?

Dave

On Mar 2, 2021, at 11:28 PM, David Rector <[hidden email]> wrote:



On Mar 2, 2021, at 10:03 PM, Richard Smith <[hidden email]> wrote:

On Tue, 2 Mar 2021 at 16:46, David Rector via cfe-dev <[hidden email]> wrote:
I was reading http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1240r1.pdf, which describes proposed reflection features currently under advanced consideration.  

On page 5, the authors give their rationale for defining `reflexpr(x)` to return an object of an opaque builtin type called `meta::info`, which is useful only when passed to other builtin functions able to access its various properties for different reflected entities.  This design is favored over the alternative of defining it to return an AST-class-like object specific to each reflected entity (ReflectedDeclaration, ReflectedStatement etc.).  

In other words given this design the user must write e.g. `meta::parameters_of(reflexpr(somefunc))` instead of e.g. `reflexpr(somefunc)->parameters()`.

One rationale the authors give for this choice is that they found that accessing subobjects of a constexpr class object is significantly slower than accessing values which are not subobjects, all else being equal.  The authors present an example on pp 5-6.  

I tried to reproduce this example and their results in CompilerExplorer, and was mostly just shocked at the apparent orders-of-magnitude-differences between GCC (by far the fastest), Clang, and MSVC (by far the slowest) at both constant-evaluation tasks.

Example A (NB `f()` deals only with complete objects):

Example B (same as A, except `f()` now defined to dig through subobjects to get the data it needs):

Results of 5 trials:
                    __Compilation_times__
A
GCC  (1793B):   337,   408,   325,   435,   242 ms
Clang (218B):  9361,  5173,  4066,  5698,  4263 ms
MSVC  (306B): 21850, 24616, 24957, 24925, 32323 ms

B
GCC  (2295B):   471,   319,   403,   309,   323 ms
Clang (218B): 17073, 15540, 17281, 13385, 18540 ms
MSVC   (n/a): always >60k ms

Takeaways:
1. Clang performs constant evaluation 10-50 times slower than GCC.
2. While Clang performs B ~3 times slower than it performs A, it is not clear that GCC is likewise affected by having to dig through subobjects (if it is, the effect is slight).

Questions:
1. What am I missing?  Are there flags which might improve clang’s performance?  Is GCC somehow gaining an unfair advantage?  (Potential clue: note that the executable is the same small size, 218 bytes, for each of clang’s results, but it is larger and differs for GCC’s results…meaningful?)  

Yes, GCC is "cheating" (you're not testing what you think you are). GCC memoizes constant evaluations (at least when it's correct to do so). Here's a slightly modified version of your A that doesn't permit memoization: https://godbolt.org/z/3q3TYr

5 trials with that and GCC (1793B):  12340, 11106, 10204, 9983, 10771ms

(Times for Clang and MSVC seem similar to your measurements.) I don't know if compiler explorer uses the same machines for all compiles, or if all compilers are built in fully-optimized mode, but if so, that suggests that Clang is about 2x faster than GCC for this particular testcase.

2. Given the "constexpr all the things" zeitgeist, and the constant evaluation speeds GCC has apparently realized, should the design of ExprConstant.cpp/APValue/etc. be reconsidered?

Ignoring the part about GCC, yes. We have a -fexperimental-new-constant-interpreter flag that enables a new interpreter, which was built to be substantially faster. Unfortunately it's not complete yet (the version in trunk doesn't support any looping constructs yet) but the early indications are very promising.

3. If ExprConstant.cpp etc were overhauled for optimal speed, will it still ultimately be true that programs which dig through subobjects of compile time objects are necessarily slower than equivalent programs which deal only with complete compile-time objects?

I don't think that would necessarily be the case. I've also mentioned that in committee, but "one compiler can do X" doesn't necessarily translate into "everyone will do X", and folks representing other compilers have indicated they expect the more "bare-bones" approach will remain faster for their implementations.


Thanks Richard, very thorough answer; that adjustment does indeed turn the tables on GCC.  Clang takes the gold, and that’s before even the fancy new interpreter is deployed.  Looking forward to it.

Assuming like machines are running the various compilers on CompilerExplorer, I think we can all hazard a pretty good guess at this point which particular compiler really benefits from keeping things as bare bones as possible ;)

Dave

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Constexpr evaluation speed

Manas via cfe-dev
On Thu, Mar 4, 2021 at 3:25 PM David Rector via cfe-dev <[hidden email]> wrote:
On Mar 4, 2021, at 2:26 PM, Balázs Benics <[hidden email]> wrote:

I don't think you can easily substitute constexpr evaluation with simply calling a compiled counterpart. [...]

Thanks for this input, you’re right — e.g. `constexpr void f(A *a)` might or might not be able to handle `f(nullptr)`; it would require proof, and there things probably get too complex.  

The other thing you have to keep in mind is cross-compilers. JIT interpreters do not have to deal with cross-compilation.
Clang could certainly invest in JIT-compiled codepaths for its constexpr evaluation, and that might make the usual desktop-compilation path blazing fast... but then you'd be in this weird situation where Boost-or-whatever compiled in a reasonable amount of time for x86-64, but building an ARM binary would take days. (Until you bought an M1 Mac, I guess, and then the situation would flip-flop.)

my $.02,
Arthur

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Constexpr evaluation speed

Manas via cfe-dev


On Mar 4, 2021, at 5:34 PM, Arthur O'Dwyer <[hidden email]> wrote:

On Thu, Mar 4, 2021 at 3:25 PM David Rector via cfe-dev <[hidden email]> wrote:
On Mar 4, 2021, at 2:26 PM, Balázs Benics <[hidden email]> wrote:

I don't think you can easily substitute constexpr evaluation with simply calling a compiled counterpart. [...]

Thanks for this input, you’re right — e.g. `constexpr void f(A *a)` might or might not be able to handle `f(nullptr)`; it would require proof, and there things probably get too complex.  

The other thing you have to keep in mind is cross-compilers. JIT interpreters do not have to deal with cross-compilation.
Clang could certainly invest in JIT-compiled codepaths for its constexpr evaluation, and that might make the usual desktop-compilation path blazing fast... but then you'd be in this weird situation where Boost-or-whatever compiled in a reasonable amount of time for x86-64, but building an ARM binary would take days. (Until you bought an M1 Mac, I guess, and then the situation would flip-flop.)

my $.02,
Arthur

Yes, cross-compiling would be another complexity of such an approach; I suppose you would probably need to separately build your "constexpr binaries" for the current machine in addition to whatever building you do for the target platform

Even with these complexities, I can’t quite let the possibility go.  A user might run a single expensive constexpr metafunction might be run millions or billions of times on different inputs.  If we could analyze it just a single time, and guarantee it safely handles its inputs, or some subset of possible inputs that could be encoded along with it, it is tantalizing to consider how fast builds could get if we could call the binary version of it, bypassing all those expensive checks and the interpreter infrastructure during those calls.  

Whether these static analysis challenges are surmountable is of course well beyond my depth.

And there are other problems with the approach; e.g. anything absolutely requiring constant evaluation, e.g. reflected meta::infos, or injection statements, could not directly be compiled to binary, or at least would require a lot of additional thought.  

And, perhaps a JIT approach would not be far off in performance anyway.  Perhaps the interpreter alone will ultimately prove sufficient.  

Concluding thought:

C++ allows lightning fast run time performance…at the expense of expertise, labor, and maintenance.

Reflection + injection can solve these issues, allowing the user to delegate that expertise, labor, and maintenance to metafunctions/metaclasses…but at the expense of increases in build time due the challenges of constexpr evaluation.

This last issue may require additional focus in the medium/long term.

Thanks all,

Dave

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Constexpr evaluation speed

Manas via cfe-dev
  • Yes, cross-compiling would be another complexity of such an approach; I suppose you would probably need to separately build your "constexpr binaries" for the current machine in addition to whatever building you do for the target platform

 

One can easily imagine (say) Apple building an X86-hosted Clang that includes only ARM targets for building code to run on mobile devices; it would not have a native target, so building “for the current machine” would be infeasible.  Building natively for the host can only be an optimization, it can’t be the general solution.

--paulr

 

From: cfe-dev <[hidden email]> On Behalf Of David Rector via cfe-dev
Sent: Friday, March 5, 2021 11:53 AM
To: Arthur O'Dwyer <[hidden email]>
Cc: clang developer list <[hidden email]>
Subject: Re: [cfe-dev] Constexpr evaluation speed

 

 



On Mar 4, 2021, at 5:34 PM, Arthur O'Dwyer <[hidden email]> wrote:

 

On Thu, Mar 4, 2021 at 3:25 PM David Rector via cfe-dev <[hidden email]> wrote:

On Mar 4, 2021, at 2:26 PM, Balázs Benics <[hidden email]> wrote:

 

I don't think you can easily substitute constexpr evaluation with simply calling a compiled counterpart. [...]

 

Thanks for this input, you’re right — e.g. `constexpr void f(A *a)` might or might not be able to handle `f(nullptr)`; it would require proof, and there things probably get too complex.  

 

The other thing you have to keep in mind is cross-compilers. JIT interpreters do not have to deal with cross-compilation.

Clang could certainly invest in JIT-compiled codepaths for its constexpr evaluation, and that might make the usual desktop-compilation path blazing fast... but then you'd be in this weird situation where Boost-or-whatever compiled in a reasonable amount of time for x86-64, but building an ARM binary would take days. (Until you bought an M1 Mac, I guess, and then the situation would flip-flop.)

 

my $.02,

Arthur

 

Yes, cross-compiling would be another complexity of such an approach; I suppose you would probably need to separately build your "constexpr binaries" for the current machine in addition to whatever building you do for the target platform

 

Even with these complexities, I can’t quite let the possibility go.  A user might run a single expensive constexpr metafunction might be run millions or billions of times on different inputs.  If we could analyze it just a single time, and guarantee it safely handles its inputs, or some subset of possible inputs that could be encoded along with it, it is tantalizing to consider how fast builds could get if we could call the binary version of it, bypassing all those expensive checks and the interpreter infrastructure during those calls.  

 

Whether these static analysis challenges are surmountable is of course well beyond my depth.

 

And there are other problems with the approach; e.g. anything absolutely requiring constant evaluation, e.g. reflected meta::infos, or injection statements, could not directly be compiled to binary, or at least would require a lot of additional thought.  

 

And, perhaps a JIT approach would not be far off in performance anyway.  Perhaps the interpreter alone will ultimately prove sufficient.  

 

Concluding thought:

 

C++ allows lightning fast run time performance…at the expense of expertise, labor, and maintenance.

 

Reflection + injection can solve these issues, allowing the user to delegate that expertise, labor, and maintenance to metafunctions/metaclasses…but at the expense of increases in build time due the challenges of constexpr evaluation.

 

This last issue may require additional focus in the medium/long term.

 

Thanks all,

 

Dave


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Constexpr evaluation speed

Manas via cfe-dev


On Mar 5, 2021, at 12:00 PM, [hidden email] wrote:

  • Yes, cross-compiling would be another complexity of such an approach; I suppose you would probably need to separately build your "constexpr binaries" for the current machine in addition to whatever building you do for the target platform
 
One can easily imagine (say) Apple building an X86-hosted Clang that includes only ARM targets for building code to run on mobile devices; it would not have a native target, so building “for the current machine” would be infeasible.  Building natively for the host can only be an optimization, it can’t be the general solution.

I suppose what I mean is that it would be up to the user to build and link the desired constexpr/"meta" libraries on their own via a prior build step, with a different compiler if necessary (as it would be in the case you describe); then pass those the locations of those libraries to the current compiler via some independent flag ("—meta-library-directory" or something).  But these matters are already well beyond my breadth/depth, so I am sure I am missing still further complexities, and this certainly would be one important consideration among many.

--paulr
 
From: cfe-dev <[hidden email]> On Behalf Of David Rector via cfe-dev
Sent: Friday, March 5, 2021 11:53 AM
To: Arthur O'Dwyer <[hidden email]>
Cc: clang developer list <[hidden email]>
Subject: Re: [cfe-dev] Constexpr evaluation speed
 
 


On Mar 4, 2021, at 5:34 PM, Arthur O'Dwyer <[hidden email]> wrote:
 
On Thu, Mar 4, 2021 at 3:25 PM David Rector via cfe-dev <[hidden email]> wrote:
On Mar 4, 2021, at 2:26 PM, Balázs Benics <[hidden email]> wrote:
 
I don't think you can easily substitute constexpr evaluation with simply calling a compiled counterpart. [...]
 
Thanks for this input, you’re right — e.g. `constexpr void f(A *a)` might or might not be able to handle `f(nullptr)`; it would require proof, and there things probably get too complex.  
 
The other thing you have to keep in mind is cross-compilers. JIT interpreters do not have to deal with cross-compilation.
Clang could certainly invest in JIT-compiled codepaths for its constexpr evaluation, and that might make theusual desktop-compilation path blazing fast... but then you'd be in this weird situation where Boost-or-whatever compiled in a reasonable amount of time for x86-64, but building an ARM binary would take days. (Until you bought an M1 Mac, I guess, and then the situation would flip-flop.)
 
my $.02,
Arthur
 
Yes, cross-compiling would be another complexity of such an approach; I suppose you would probably need to separately build your "constexpr binaries" for the current machine in addition to whatever building you do for the target platform
 
Even with these complexities, I can’t quite let the possibility go.  A user might run a single expensive constexpr metafunction might be run millions or billions of times on different inputs.  If we could analyze it just a single time, and guarantee it safely handles its inputs, or some subset of possible inputs that could be encoded along with it, it is tantalizing to consider how fast builds could get if we could call the binary version of it, bypassing all those expensive checks and the interpreter infrastructure during those calls.  
 
Whether these static analysis challenges are surmountable is of course well beyond my depth.
 
And there are other problems with the approach; e.g. anything absolutely requiring constant evaluation, e.g. reflected meta::infos, or injection statements, could not directly be compiled to binary, or at least would require a lot of additional thought.  
 
And, perhaps a JIT approach would not be far off in performance anyway.  Perhaps the interpreter alone will ultimately prove sufficient.  
 
Concluding thought:
 
C++ allows lightning fast run time performance…at the expense of expertise, labor, and maintenance.
 
Reflection + injection can solve these issues, allowing the user to delegate that expertise, labor, and maintenance to metafunctions/metaclasses…but at the expense of increases in build time due the challenges of constexpr evaluation.
 
This last issue may require additional focus in the medium/long term.
 
Thanks all,
 
Dave


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Constexpr evaluation speed

Manas via cfe-dev


On Mar 5, 2021, at 1:13 PM, David Rector <[hidden email]> wrote:



On Mar 5, 2021, at 12:00 PM, [hidden email] wrote:

  • Yes, cross-compiling would be another complexity of such an approach; I suppose you would probably need to separately build your "constexpr binaries" for the current machine in addition to whatever building you do for the target platform
 
One can easily imagine (say) Apple building an X86-hosted Clang that includes only ARM targets for building code to run on mobile devices; it would not have a native target, so building “for the current machine” would be infeasible.  Building natively for the host can only be an optimization, it can’t be the general solution.

I suppose what I mean is that it would be up to the user to build and link the desired constexpr/"meta" libraries on their own via a prior build step, with a different compiler if necessary (as it would be in the case you describe); then pass those the locations of those libraries to the current compiler via some independent flag ("—meta-library-directory" or something).  

One last insight - it occurs to me that just requiring the user to *always* build a separate meta library, i.e. one that is only ever passed as a —meta-library-directory and never as a --library-directory, *even when building for the host*, might solve the issue Balazs raised previously: sanitizers etc. could be added only to functions in the meta library, as necessary to match needed constexpr semantics; the semantics of/pessimizations in the "constexpr binaries" need not ever intrude on the ordinary non-constexpr binaries.

To be sure though - there is enough complexity here, and not yet sufficient need, that all this is just food for thought.

But these matters are already well beyond my breadth/depth, so I am sure I am missing still further complexities, and this certainly would be one important consideration among many.

--paulr
 
From: cfe-dev <[hidden email]> On Behalf Of David Rector via cfe-dev
Sent: Friday, March 5, 2021 11:53 AM
To: Arthur O'Dwyer <[hidden email]>
Cc: clang developer list <[hidden email]>
Subject: Re: [cfe-dev] Constexpr evaluation speed
 
 


On Mar 4, 2021, at 5:34 PM, Arthur O'Dwyer <[hidden email]> wrote:
 
On Thu, Mar 4, 2021 at 3:25 PM David Rector via cfe-dev <[hidden email]> wrote:
On Mar 4, 2021, at 2:26 PM, Balázs Benics <[hidden email]> wrote:
 
I don't think you can easily substitute constexpr evaluation with simply calling a compiled counterpart. [...]
 
Thanks for this input, you’re right — e.g. `constexpr void f(A *a)` might or might not be able to handle `f(nullptr)`; it would require proof, and there things probably get too complex.  
 
The other thing you have to keep in mind is cross-compilers. JIT interpreters do not have to deal with cross-compilation.
Clang could certainly invest in JIT-compiled codepaths for its constexpr evaluation, and that might make theusual desktop-compilation path blazing fast... but then you'd be in this weird situation where Boost-or-whatever compiled in a reasonable amount of time for x86-64, but building an ARM binary would take days. (Until you bought an M1 Mac, I guess, and then the situation would flip-flop.)
 
my $.02,
Arthur
 
Yes, cross-compiling would be another complexity of such an approach; I suppose you would probably need to separately build your "constexpr binaries" for the current machine in addition to whatever building you do for the target platform
 
Even with these complexities, I can’t quite let the possibility go.  A user might run a single expensive constexpr metafunction might be run millions or billions of times on different inputs.  If we could analyze it just a single time, and guarantee it safely handles its inputs, or some subset of possible inputs that could be encoded along with it, it is tantalizing to consider how fast builds could get if we could call the binary version of it, bypassing all those expensive checks and the interpreter infrastructure during those calls.  
 
Whether these static analysis challenges are surmountable is of course well beyond my depth.
 
And there are other problems with the approach; e.g. anything absolutely requiring constant evaluation, e.g. reflected meta::infos, or injection statements, could not directly be compiled to binary, or at least would require a lot of additional thought.  
 
And, perhaps a JIT approach would not be far off in performance anyway.  Perhaps the interpreter alone will ultimately prove sufficient.  
 
Concluding thought:
 
C++ allows lightning fast run time performance…at the expense of expertise, labor, and maintenance.
 
Reflection + injection can solve these issues, allowing the user to delegate that expertise, labor, and maintenance to metafunctions/metaclasses…but at the expense of increases in build time due the challenges of constexpr evaluation.
 
This last issue may require additional focus in the medium/long term.
 
Thanks all,
 
Dave



_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev