__PRETTY_FUNCTION__ output loses enum type info for non-enumerated values

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

__PRETTY_FUNCTION__ output loses enum type info for non-enumerated values

Yvan Roux via cfe-dev


Pretty function output, for auto... args
(variadic just so that outputs display together):
  scoped   E: 3 -> (E)3 or E(3) or E{3}
Will it be acceptable to change the output for non-enumerated values as so?
Either C-style cast (e)3  or  function-cast / type-constructor e(3) - seem ok
(gcc uses C-style casts and Clang uses this same style in source labels).

For the duplicate-valued labels case there is no loss of type info,
but there is a loss of information about the label used for the auto arg.
Presumably, the code works from the supplied underlying-type value
and picks the first label with the given value. I'd guess that Clang knows
the label used in a direct instantiation and could return it in that case.
(c.f. named member-pointer args have to retain the id for reporting).






_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: __PRETTY_FUNCTION__ output loses enum type info for non-enumerated values

Yvan Roux via cfe-dev
On Tue, 4 Sep 2018 at 09:09, will wray via cfe-dev <[hidden email]> wrote:


Pretty function output, for auto... args
(variadic just so that outputs display together):
  scoped   E: 3 -> (E)3 or E(3) or E{3}
Will it be acceptable to change the output for non-enumerated values as so?

In the case where the template parameter involves a deduced type (`auto` or `decltype(auto)` or eventually a deduced template specialization type), it would make a lot of sense to include the actual type in the printed template argument if it differs from the type of the formatted argument string. But I think we should leave out the type in all other cases, as it would make the template argument string more verbose without (necessarily) providing any more information.

Either C-style cast (e)3  or  function-cast / type-constructor e(3) - seem ok
(gcc uses C-style casts and Clang uses this same style in source labels).

For the duplicate-valued labels case there is no loss of type info,
but there is a loss of information about the label used for the auto arg.
Presumably, the code works from the supplied underlying-type value
and picks the first label with the given value. I'd guess that Clang knows
the label used in a direct instantiation and could return it in that case.
(c.f. named member-pointer args have to retain the id for reporting).

In the case of duplicated enumerators, there simply is no distinction between 'b' and 'c' -- they are both merely names for the value 1 of type 'e'. auto_name<b>() and auto_name<c>() are the same function and therefore must return the same string.
 

No, per the above, we cannot and must not do that.


This is printIntegral, at the top of lib/AST/TemplateBase.cpp. The "else" case at the bottom would need to print out additional text representing the cast when needed.

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: __PRETTY_FUNCTION__ output loses enum type info for non-enumerated values

Yvan Roux via cfe-dev
Thanks Richard.

Ok; ignore Question 2 - you are absolutely right (what I was thinking).
Printing different labels for the same type defeats the point of this request.

OTOH, Question 1 is a request for Clang to print different output for different types.
The linked bug report is an example of where it is important to distinguish types.

Here's a summary of suggested output vs current output for Clang, GCC, MSVC.

Plain, unscoped enum  
   enum e { a, b, c=b };
   auto_name<a, e(1), c, e(3)>

  Suggested: a, b, b, (e)3
      Clang: a, b, b, 3
        GCC: (e)0, (e)1, (e)1, (e)3
       MSCV: 0, 1, 1, 3

Scoped / opaque enum: 
   enum class E { a, b, c=b };
   auto_name<E::a, E{1}, E::c, E{3}>

  Suggested:  E::a, E::b, E::b, (E)3  
      Clang: 
E::a, E::b, E::b, 3  
        GCC: 
(E)0, (E)1, (E)1, (E)3  
       MSCV: 0, 1, 1, 3

None of these are ideal -
- Clang loses type info for non-enumerated values.
- GCC does not distinguish enumerated / non-enumerated.
- MSVC loses type all info, does not distinguish.

Interestingly, it turns out that GCC's behaviour is a bug!
(The *intent* of gcc code is to print the suggested output.
 I intend to submit a bug report and a patch.)

You are concerned that this suggested change would
make the template argument string more verbose without (necessarily) providing any more information.
 
Yet the information is necessary in order to distinguish different types.
This change is needed for correctness. Verbosity is a secondary concern.
Clang's current behaviour here could be treated as a bug
(it has been described as "seriously suboptimal" by a gcc dev).
 
For Clang, the increased verbosity is only for non-enumerated values.
Enum-templates would be expected to mostly use enumerated values.
Also, verbosity becomes more similar for enumerated / non-enumerated
the number of characters in the identifier label vs the number of digits.

I'm now preferring C-style cast notation over C++ function-style
as it simplifies output and parsing code.

Please reflect on this and consider - is this suggested change acceptable?


On Tue, Sep 4, 2018 at 4:30 PM Richard Smith <[hidden email]> wrote:
On Tue, 4 Sep 2018 at 09:09, will wray via cfe-dev <[hidden email]> wrote:


Pretty function output, for auto... args
(variadic just so that outputs display together):
  scoped   E: 3 -> (E)3 or E(3) or E{3}
Will it be acceptable to change the output for non-enumerated values as so?

In the case where the template parameter involves a deduced type (`auto` or `decltype(auto)` or eventually a deduced template specialization type), it would make a lot of sense to include the actual type in the printed template argument if it differs from the type of the formatted argument string. But I think we should leave out the type in all other cases, as it would make the template argument string more verbose without (necessarily) providing any more information.

Either C-style cast (e)3  or  function-cast / type-constructor e(3) - seem ok
(gcc uses C-style casts and Clang uses this same style in source labels).

For the duplicate-valued labels case there is no loss of type info,
but there is a loss of information about the label used for the auto arg.
Presumably, the code works from the supplied underlying-type value
and picks the first label with the given value. I'd guess that Clang knows
the label used in a direct instantiation and could return it in that case.
(c.f. named member-pointer args have to retain the id for reporting).

In the case of duplicated enumerators, there simply is no distinction between 'b' and 'c' -- they are both merely names for the value 1 of type 'e'. auto_name<b>() and auto_name<c>() are the same function and therefore must return the same string.
 

No, per the above, we cannot and must not do that.


This is printIntegral, at the top of lib/AST/TemplateBase.cpp. The "else" case at the bottom would need to print out additional text representing the cast when needed.

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: __PRETTY_FUNCTION__ output loses enum type info for non-enumerated values

Yvan Roux via cfe-dev
On Mon, 10 Sep 2018 at 09:47, will wray via cfe-dev <[hidden email]> wrote:
Thanks Richard.

Ok; ignore Question 2 - you are absolutely right (what I was thinking).
Printing different labels for the same type defeats the point of this request.

OTOH, Question 1 is a request for Clang to print different output for different types.

I think you might be under the impression that I said that we shouldn't address this, which was very much not my intent. What I tried to say was that we *should* emit a cast in the case of a deduced template parameter type, when the type of the argument expression we'd otherwise print is different from that of the parameter. That should be sufficient to ensure that Clang prints different outputs for different types. Note that special-casing enums is the wrong way to fix that problem, as the problem is not specific to enums. Consider, for example:

template<auto> struct X {};
X<(short)0> xa;
X<(long)0> xb;

The types of xa and xb both pretty-print as X<0> currently.

There are (at least) two natural ways to fix this: by always printing a cast when the type of the argument isn't the same as that of the parameter (which is the unnecessary verbosity that I was concerned with, and is what we'd get if we generalized the proposed approach for enumerations to address the whole problem), or by doing something smarter that depends on whether the template parameter was deduced (which is what I'm suggesting is the right way to address this).

The linked bug report is an example of where it is important to distinguish types.
Here's a summary of suggested output vs current output for Clang, GCC, MSVC.

Plain, unscoped enum  
   enum e { a, b, c=b };
   auto_name<a, e(1), c, e(3)>

  Suggested: a, b, b, (e)3
      Clang: a, b, b, 3
        GCC: (e)0, (e)1, (e)1, (e)3
       MSCV: 0, 1, 1, 3

Scoped / opaque enum: 
   enum class E { a, b, c=b };
   auto_name<E::a, E{1}, E::c, E{3}>

  Suggested:  E::a, E::b, E::b, (E)3  
      Clang: 
E::a, E::b, E::b, 3  
        GCC: 
(E)0, (E)1, (E)1, (E)3  
       MSCV: 0, 1, 1, 3

None of these are ideal -
- Clang loses type info for non-enumerated values.
- GCC does not distinguish enumerated / non-enumerated.
- MSVC loses type all info, does not distinguish.

Interestingly, it turns out that GCC's behaviour is a bug!
(The *intent* of gcc code is to print the suggested output.
 I intend to submit a bug report and a patch.)

You are concerned that this suggested change would
make the template argument string more verbose without (necessarily) providing any more information.
 
Yet the information is necessary in order to distinguish different types.
This change is needed for correctness.

For what it's worth, I don't see a correctness concern here. Anything assuming our pretty-printed type names are unique is wrong. The bug you referenced is a bug caused by a tool getting this wrong and using the pretty type name string inappropriately, and the contributors on that bug report seem happy to acknowledge that.

That said, as a quality of implementation matter, we should try to ensure our pretty-printed type names are unique anyway wherever possible.

Verbosity is a secondary concern.
Clang's current behaviour here could be treated as a bug
(it has been described as "seriously suboptimal" by a gcc dev).
 
For Clang, the increased verbosity is only for non-enumerated values.
Enum-templates would be expected to mostly use enumerated values.
Also, verbosity becomes more similar for enumerated / non-enumerated
the number of characters in the identifier label vs the number of digits.

I'm now preferring C-style cast notation over C++ function-style
as it simplifies output and parsing code.

Please reflect on this and consider - is this suggested change acceptable?

As noted above, the problem you described, while valid, does not justify the solution you're suggesting. However, for enums specifically, I think there's another argument to consider (one that I think you've not made yet): passing a plain integer as a template argument to an enumeration template parameter won't compile, so our pretty-printed type is not valid C++. It's reasonable to consider how valuable it is to fix that, compared with the value of having a shorter pretty-printed type name, but as you point out, we're already paying the cost of including the enum type name in the template argument in the case where an scoped enumeration is named, and an integer cast to enum type would be of comparable length. So I think the change you suggested is a good idea for that reason.

On Tue, Sep 4, 2018 at 4:30 PM Richard Smith <[hidden email]> wrote:
On Tue, 4 Sep 2018 at 09:09, will wray via cfe-dev <[hidden email]> wrote:


Pretty function output, for auto... args
(variadic just so that outputs display together):
  scoped   E: 3 -> (E)3 or E(3) or E{3}
Will it be acceptable to change the output for non-enumerated values as so?

In the case where the template parameter involves a deduced type (`auto` or `decltype(auto)` or eventually a deduced template specialization type), it would make a lot of sense to include the actual type in the printed template argument if it differs from the type of the formatted argument string. But I think we should leave out the type in all other cases, as it would make the template argument string more verbose without (necessarily) providing any more information.

Either C-style cast (e)3  or  function-cast / type-constructor e(3) - seem ok
(gcc uses C-style casts and Clang uses this same style in source labels).

For the duplicate-valued labels case there is no loss of type info,
but there is a loss of information about the label used for the auto arg.
Presumably, the code works from the supplied underlying-type value
and picks the first label with the given value. I'd guess that Clang knows
the label used in a direct instantiation and could return it in that case.
(c.f. named member-pointer args have to retain the id for reporting).

In the case of duplicated enumerators, there simply is no distinction between 'b' and 'c' -- they are both merely names for the value 1 of type 'e'. auto_name<b>() and auto_name<c>() are the same function and therefore must return the same string.
 

No, per the above, we cannot and must not do that.


This is printIntegral, at the top of lib/AST/TemplateBase.cpp. The "else" case at the bottom would need to print out additional text representing the cast when needed.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: __PRETTY_FUNCTION__ output loses enum type info for non-enumerated values

Yvan Roux via cfe-dev
Thanks again Richard for a second more detailed response setting out the broader view
(I had failed to fully interpret / misinterpreted your first response as mild pushback
 apologies for that; I should've asked for the clarification)
(also, I shouldn't have argued 'correctness' here; that was disingenuous).

So, in the broader view, the question is:
 > How best to pretty print general non-type template args.

Now, as non-types may be subsumed by LiteralType literals in C++20
we should look ahead to pretty printing of such general literal types
as the broader context for discussion or decisions on changes.

  Non-type template args
  Until C++17:
    - an integral type;
    - an enumeration type;
    - std::nullptr_t; (since C++11)
    - a pointer type (to object or to function);
    - a pointer to member type (to member object or to member function);
    - lvalue reference type (to object or to function);
  For C++20:
    - A type that satisfies LiteralType, has strong structural equality ...

Pretty-print design goals:
  General: minimise special-casing
  Round-trip: the printed value can be plugged back into code
  Low-verbosity: avoid unnecessary info or repetition
  more?

The general literal type can be one of the C++17 built-ins
as above (all scalar types, apart from lvalue reference I think)
or an aggregate or class that satisfies the LiteralType requirements.

So, in general, we will now have to deal with non-scalar types
implying multiple initializers and init-lists.

Question: is there a single 'canonical' init-list for a LiteralType?

This is a place where the proposal to use parens for aggregates will help,
as special-casing initialization syntax, for brace or paren, is awkward:
P0960 Allow initializing aggregates from a parenthesized list of values
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0960r1.html
If this proposal is accepted then that would suggest to prefer paren-init.

Literals of the existing types remain awkward non-types -
'function-style' paren-init doesn't work directly for multi-token types
(like 'long long') or for compound types (pointer/ref).
An added decltype can solve that. Sticking with C-style cast is simpler.

Here's a suggestion (assuming P0960 for non-scalars):

  Scalar:
    Known type T; print as 'value'
    Deduced type; print as '(T)value'
  Non-scalar:
    Known type T; print as '(init...)'
    Deduced type; print as 'T(init...)'

'Known type' means non-deduced or T implicit in value
(for values of scalar type like enumerator or pointer-to-member).
(lvalue reference remains a special case).

In practice, I don't know if compilers can easily distinguish
whether a value's type is known or deduced, at the leaf-level.
If not then this change will be more involved than just updating
low-level printing functions.

Now let's focus back on integral and enumeration types.

Integral / enum types
---------------------
Clang, MSVC and GCC all output plain digits for Integral.
Clang, MSVC and GCC* all output enum id for enumerators.
Clang, MSVC output plain digits for non-enumerated values,
while GCC outputs a C-style cast for non-enumerated values.

*GCC enum: I submitted a bug report with a patch and tests:
(comments, subscribers or reviews will be much appreciated).
With this fix GCC now prints enum id for enumerators and
continues to print non-enumerated values by C-style cast.
I plan to further investigate the Integral output for GCC.


How best to print the actual integral values?
  MSVC currently prints as '0xhhh' where h are hex digits,
  GCC and Clang print as 'ddd' where d are decimal digits.
Decimal is less verbose, until the millions, but can be slower.
Hex simplifies round-trip as it is easy to generate and parse.
Hex is already best for reporting addresses (pointer/lval-ref).

It'd be good to get some consensus for Integral/enum output.
Ideally we can see a coordinated change in Clang, GCC & MSVC.

If the suggestion here looks acceptable then I can attempt
to update Clang printing of Integral and non-enumerated enums.




On Tue, Sep 11, 2018 at 7:56 PM Richard Smith <[hidden email]> wrote:
On Mon, 10 Sep 2018 at 09:47, will wray via cfe-dev <[hidden email]> wrote:
Thanks Richard.

Ok; ignore Question 2 - you are absolutely right (what I was thinking).
Printing different labels for the same type defeats the point of this request.

OTOH, Question 1 is a request for Clang to print different output for different types.

I think you might be under the impression that I said that we shouldn't address this, which was very much not my intent. What I tried to say was that we *should* emit a cast in the case of a deduced template parameter type, when the type of the argument expression we'd otherwise print is different from that of the parameter. That should be sufficient to ensure that Clang prints different outputs for different types. Note that special-casing enums is the wrong way to fix that problem, as the problem is not specific to enums. Consider, for example:

template<auto> struct X {};
X<(short)0> xa;
X<(long)0> xb;

The types of xa and xb both pretty-print as X<0> currently.

There are (at least) two natural ways to fix this: by always printing a cast when the type of the argument isn't the same as that of the parameter (which is the unnecessary verbosity that I was concerned with, and is what we'd get if we generalized the proposed approach for enumerations to address the whole problem), or by doing something smarter that depends on whether the template parameter was deduced (which is what I'm suggesting is the right way to address this).

The linked bug report is an example of where it is important to distinguish types.
Here's a summary of suggested output vs current output for Clang, GCC, MSVC.

Plain, unscoped enum  
   enum e { a, b, c=b };
   auto_name<a, e(1), c, e(3)>

  Suggested: a, b, b, (e)3
      Clang: a, b, b, 3
        GCC: (e)0, (e)1, (e)1, (e)3
       MSCV: 0, 1, 1, 3

Scoped / opaque enum: 
   enum class E { a, b, c=b };
   auto_name<E::a, E{1}, E::c, E{3}>

  Suggested:  E::a, E::b, E::b, (E)3  
      Clang: 
E::a, E::b, E::b, 3  
        GCC: 
(E)0, (E)1, (E)1, (E)3  
       MSCV: 0, 1, 1, 3

None of these are ideal -
- Clang loses type info for non-enumerated values.
- GCC does not distinguish enumerated / non-enumerated.
- MSVC loses type all info, does not distinguish.

Interestingly, it turns out that GCC's behaviour is a bug!
(The *intent* of gcc code is to print the suggested output.
 I intend to submit a bug report and a patch.)

You are concerned that this suggested change would
make the template argument string more verbose without (necessarily) providing any more information.
 
Yet the information is necessary in order to distinguish different types.
This change is needed for correctness.

For what it's worth, I don't see a correctness concern here. Anything assuming our pretty-printed type names are unique is wrong. The bug you referenced is a bug caused by a tool getting this wrong and using the pretty type name string inappropriately, and the contributors on that bug report seem happy to acknowledge that.

That said, as a quality of implementation matter, we should try to ensure our pretty-printed type names are unique anyway wherever possible.

Verbosity is a secondary concern.
Clang's current behaviour here could be treated as a bug
(it has been described as "seriously suboptimal" by a gcc dev).
 
For Clang, the increased verbosity is only for non-enumerated values.
Enum-templates would be expected to mostly use enumerated values.
Also, verbosity becomes more similar for enumerated / non-enumerated
the number of characters in the identifier label vs the number of digits.

I'm now preferring C-style cast notation over C++ function-style
as it simplifies output and parsing code.

Please reflect on this and consider - is this suggested change acceptable?

As noted above, the problem you described, while valid, does not justify the solution you're suggesting. However, for enums specifically, I think there's another argument to consider (one that I think you've not made yet): passing a plain integer as a template argument to an enumeration template parameter won't compile, so our pretty-printed type is not valid C++. It's reasonable to consider how valuable it is to fix that, compared with the value of having a shorter pretty-printed type name, but as you point out, we're already paying the cost of including the enum type name in the template argument in the case where an scoped enumeration is named, and an integer cast to enum type would be of comparable length. So I think the change you suggested is a good idea for that reason.

On Tue, Sep 4, 2018 at 4:30 PM Richard Smith <[hidden email]> wrote:
On Tue, 4 Sep 2018 at 09:09, will wray via cfe-dev <[hidden email]> wrote:


Pretty function output, for auto... args
(variadic just so that outputs display together):
  scoped   E: 3 -> (E)3 or E(3) or E{3}
Will it be acceptable to change the output for non-enumerated values as so?

In the case where the template parameter involves a deduced type (`auto` or `decltype(auto)` or eventually a deduced template specialization type), it would make a lot of sense to include the actual type in the printed template argument if it differs from the type of the formatted argument string. But I think we should leave out the type in all other cases, as it would make the template argument string more verbose without (necessarily) providing any more information.

Either C-style cast (e)3  or  function-cast / type-constructor e(3) - seem ok
(gcc uses C-style casts and Clang uses this same style in source labels).

For the duplicate-valued labels case there is no loss of type info,
but there is a loss of information about the label used for the auto arg.
Presumably, the code works from the supplied underlying-type value
and picks the first label with the given value. I'd guess that Clang knows
the label used in a direct instantiation and could return it in that case.
(c.f. named member-pointer args have to retain the id for reporting).

In the case of duplicated enumerators, there simply is no distinction between 'b' and 'c' -- they are both merely names for the value 1 of type 'e'. auto_name<b>() and auto_name<c>() are the same function and therefore must return the same string.
 

No, per the above, we cannot and must not do that.


This is printIntegral, at the top of lib/AST/TemplateBase.cpp. The "else" case at the bottom would need to print out additional text representing the cast when needed.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: __PRETTY_FUNCTION__ output loses enum type info for non-enumerated values

Yvan Roux via cfe-dev
Answering my own question:

    Is there a single 'canonical' init-list for a LiteralType?
 => No.

If LiteralType was further required to be structured-bindable,
and to have a corresponding constructor for the bound types,
then the compiler could select that as the canonical constructor:

Given a value v of type T
   auto [a,b,c...]{v}  structured binding
   T(a,b,c...)         canonical construction, reconstructs v

This provides a unique default choice for pretty-print output.

The 'reconstructing' constructor is likely to optimise nicely.

A requirement to be structured-bindable seems natural
(despite structured-binding not being constexpr yet).

Aggregates satisfy the reconstructing constructor requirement.
It feels more onerous to require general LiteralType classes to
provide such a constructor.



On Mon, Sep 24, 2018 at 3:11 PM will wray <[hidden email]> wrote:
Thanks again Richard for a second more detailed response setting out the broader view
(I had failed to fully interpret / misinterpreted your first response as mild pushback
 apologies for that; I should've asked for the clarification)
(also, I shouldn't have argued 'correctness' here; that was disingenuous).

So, in the broader view, the question is:
 > How best to pretty print general non-type template args.

Now, as non-types may be subsumed by LiteralType literals in C++20
we should look ahead to pretty printing of such general literal types
as the broader context for discussion or decisions on changes.

  Non-type template args
  Until C++17:
    - an integral type;
    - an enumeration type;
    - std::nullptr_t; (since C++11)
    - a pointer type (to object or to function);
    - a pointer to member type (to member object or to member function);
    - lvalue reference type (to object or to function);
  For C++20:
    - A type that satisfies LiteralType, has strong structural equality ...

Pretty-print design goals:
  General: minimise special-casing
  Round-trip: the printed value can be plugged back into code
  Low-verbosity: avoid unnecessary info or repetition
  more?

The general literal type can be one of the C++17 built-ins
as above (all scalar types, apart from lvalue reference I think)
or an aggregate or class that satisfies the LiteralType requirements.

So, in general, we will now have to deal with non-scalar types
implying multiple initializers and init-lists.

Question: is there a single 'canonical' init-list for a LiteralType?

This is a place where the proposal to use parens for aggregates will help,
as special-casing initialization syntax, for brace or paren, is awkward:
P0960 Allow initializing aggregates from a parenthesized list of values
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0960r1.html
If this proposal is accepted then that would suggest to prefer paren-init.

Literals of the existing types remain awkward non-types -
'function-style' paren-init doesn't work directly for multi-token types
(like 'long long') or for compound types (pointer/ref).
An added decltype can solve that. Sticking with C-style cast is simpler.

Here's a suggestion (assuming P0960 for non-scalars):

  Scalar:
    Known type T; print as 'value'
    Deduced type; print as '(T)value'
  Non-scalar:
    Known type T; print as '(init...)'
    Deduced type; print as 'T(init...)'

'Known type' means non-deduced or T implicit in value
(for values of scalar type like enumerator or pointer-to-member).
(lvalue reference remains a special case).

In practice, I don't know if compilers can easily distinguish
whether a value's type is known or deduced, at the leaf-level.
If not then this change will be more involved than just updating
low-level printing functions.

Now let's focus back on integral and enumeration types.

Integral / enum types
---------------------
Clang, MSVC and GCC all output plain digits for Integral.
Clang, MSVC and GCC* all output enum id for enumerators.
Clang, MSVC output plain digits for non-enumerated values,
while GCC outputs a C-style cast for non-enumerated values.

*GCC enum: I submitted a bug report with a patch and tests:
(comments, subscribers or reviews will be much appreciated).
With this fix GCC now prints enum id for enumerators and
continues to print non-enumerated values by C-style cast.
I plan to further investigate the Integral output for GCC.


How best to print the actual integral values?
  MSVC currently prints as '0xhhh' where h are hex digits,
  GCC and Clang print as 'ddd' where d are decimal digits.
Decimal is less verbose, until the millions, but can be slower.
Hex simplifies round-trip as it is easy to generate and parse.
Hex is already best for reporting addresses (pointer/lval-ref).

It'd be good to get some consensus for Integral/enum output.
Ideally we can see a coordinated change in Clang, GCC & MSVC.

If the suggestion here looks acceptable then I can attempt
to update Clang printing of Integral and non-enumerated enums.




On Tue, Sep 11, 2018 at 7:56 PM Richard Smith <[hidden email]> wrote:
On Mon, 10 Sep 2018 at 09:47, will wray via cfe-dev <[hidden email]> wrote:
Thanks Richard.

Ok; ignore Question 2 - you are absolutely right (what I was thinking).
Printing different labels for the same type defeats the point of this request.

OTOH, Question 1 is a request for Clang to print different output for different types.

I think you might be under the impression that I said that we shouldn't address this, which was very much not my intent. What I tried to say was that we *should* emit a cast in the case of a deduced template parameter type, when the type of the argument expression we'd otherwise print is different from that of the parameter. That should be sufficient to ensure that Clang prints different outputs for different types. Note that special-casing enums is the wrong way to fix that problem, as the problem is not specific to enums. Consider, for example:

template<auto> struct X {};
X<(short)0> xa;
X<(long)0> xb;

The types of xa and xb both pretty-print as X<0> currently.

There are (at least) two natural ways to fix this: by always printing a cast when the type of the argument isn't the same as that of the parameter (which is the unnecessary verbosity that I was concerned with, and is what we'd get if we generalized the proposed approach for enumerations to address the whole problem), or by doing something smarter that depends on whether the template parameter was deduced (which is what I'm suggesting is the right way to address this).

The linked bug report is an example of where it is important to distinguish types.
Here's a summary of suggested output vs current output for Clang, GCC, MSVC.

Plain, unscoped enum  
   enum e { a, b, c=b };
   auto_name<a, e(1), c, e(3)>

  Suggested: a, b, b, (e)3
      Clang: a, b, b, 3
        GCC: (e)0, (e)1, (e)1, (e)3
       MSCV: 0, 1, 1, 3

Scoped / opaque enum: 
   enum class E { a, b, c=b };
   auto_name<E::a, E{1}, E::c, E{3}>

  Suggested:  E::a, E::b, E::b, (E)3  
      Clang: 
E::a, E::b, E::b, 3  
        GCC: 
(E)0, (E)1, (E)1, (E)3  
       MSCV: 0, 1, 1, 3

None of these are ideal -
- Clang loses type info for non-enumerated values.
- GCC does not distinguish enumerated / non-enumerated.
- MSVC loses type all info, does not distinguish.

Interestingly, it turns out that GCC's behaviour is a bug!
(The *intent* of gcc code is to print the suggested output.
 I intend to submit a bug report and a patch.)

You are concerned that this suggested change would
make the template argument string more verbose without (necessarily) providing any more information.
 
Yet the information is necessary in order to distinguish different types.
This change is needed for correctness.

For what it's worth, I don't see a correctness concern here. Anything assuming our pretty-printed type names are unique is wrong. The bug you referenced is a bug caused by a tool getting this wrong and using the pretty type name string inappropriately, and the contributors on that bug report seem happy to acknowledge that.

That said, as a quality of implementation matter, we should try to ensure our pretty-printed type names are unique anyway wherever possible.

Verbosity is a secondary concern.
Clang's current behaviour here could be treated as a bug
(it has been described as "seriously suboptimal" by a gcc dev).
 
For Clang, the increased verbosity is only for non-enumerated values.
Enum-templates would be expected to mostly use enumerated values.
Also, verbosity becomes more similar for enumerated / non-enumerated
the number of characters in the identifier label vs the number of digits.

I'm now preferring C-style cast notation over C++ function-style
as it simplifies output and parsing code.

Please reflect on this and consider - is this suggested change acceptable?

As noted above, the problem you described, while valid, does not justify the solution you're suggesting. However, for enums specifically, I think there's another argument to consider (one that I think you've not made yet): passing a plain integer as a template argument to an enumeration template parameter won't compile, so our pretty-printed type is not valid C++. It's reasonable to consider how valuable it is to fix that, compared with the value of having a shorter pretty-printed type name, but as you point out, we're already paying the cost of including the enum type name in the template argument in the case where an scoped enumeration is named, and an integer cast to enum type would be of comparable length. So I think the change you suggested is a good idea for that reason.

On Tue, Sep 4, 2018 at 4:30 PM Richard Smith <[hidden email]> wrote:
On Tue, 4 Sep 2018 at 09:09, will wray via cfe-dev <[hidden email]> wrote:


Pretty function output, for auto... args
(variadic just so that outputs display together):
  scoped   E: 3 -> (E)3 or E(3) or E{3}
Will it be acceptable to change the output for non-enumerated values as so?

In the case where the template parameter involves a deduced type (`auto` or `decltype(auto)` or eventually a deduced template specialization type), it would make a lot of sense to include the actual type in the printed template argument if it differs from the type of the formatted argument string. But I think we should leave out the type in all other cases, as it would make the template argument string more verbose without (necessarily) providing any more information.

Either C-style cast (e)3  or  function-cast / type-constructor e(3) - seem ok
(gcc uses C-style casts and Clang uses this same style in source labels).

For the duplicate-valued labels case there is no loss of type info,
but there is a loss of information about the label used for the auto arg.
Presumably, the code works from the supplied underlying-type value
and picks the first label with the given value. I'd guess that Clang knows
the label used in a direct instantiation and could return it in that case.
(c.f. named member-pointer args have to retain the id for reporting).

In the case of duplicated enumerators, there simply is no distinction between 'b' and 'c' -- they are both merely names for the value 1 of type 'e'. auto_name<b>() and auto_name<c>() are the same function and therefore must return the same string.
 

No, per the above, we cannot and must not do that.


This is printIntegral, at the top of lib/AST/TemplateBase.cpp. The "else" case at the bottom would need to print out additional text representing the cast when needed.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev