Optimizing returning a struct instance larger than the quadword

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Optimizing returning a struct instance larger than the quadword

Richard Smith via cfe-dev
Hi list,

I am doing a research on my own trying to understand the best way to
report an error from a function.
My environment:
* exceptions are disabled with -fno-exceptions,
* x86_64 Clang 5.0.1,
* C++17, and
* macOS 10.12.

Suppose, there is a function which can fail: `FailingFn’. As a simple
and quite common solution, I could give it this signature: `bool
FaillingFn(Args…, Error &err)’. This way I have the bool variable
returned via a register, and err object allocated on the stack. So, I
can avoid accessing memory if return bool is true (meaning success).

However, since we have got C++17 with copy elision and structure
binding I want to simplify the signature to `std::tuple<bool, Error>
FaillingFn(Args…)’, or even `RetValue FaillingFn(Args…)’. Then I can
handle errors this way

if (auto [ok, err] = FailingFn(args…); !ok)
  // Handle error or, perhaps, just return it.
  return err;

Looks more expressive. With a few changes, we can make the `err’
object a variant and carry a result of the successful evaluation.
Though for the sake of simplicity, let’s assume it carries error
information, e.g., a `std::string’ which is obviously larger than a
64bit register.
I am hoping gcc recognizes the case of copy elision, construct the
error object in the caller, and pass it as a reference. Also, having
the returned tuple broken into two independent variables would permit
the compiler to use registers for them, at least for the first one
which fits a register (I do not care about the second one until it
carries a payload though.)

Here below a sample. Assume we have a structure and some functions
that may fail:

template <typename T>
struct Pair {
    bool ok;
    T    value;
};

Pair<std::int64_t> FuncInt(bool const toFail) {
    return {!toFail, 42};
}

Pair<std::string> FuncString(bool const toFail) {
    return {!toFail, "DEADBEEF"};
}

auto UseInt(bool const flag) {
    if (auto const [ok, value] = FuncInt(flag); ok) {
        return value;
    }
    return -1L;
}

auto UseString(bool const flag) {
    if (auto const [ok, value] = FuncString(flag); ok) {
        return value;
    }
    return std::string{"DEADFA11"};
}

The optimization works with `FuncInt’: the ok and value have got to
`eax` and `edx’. The check code compiles into:
  call FuncInt(bool)
  and al, 1

But, in case of `FuncString’, it is not happening. Here is what I get:
  call FuncString[abi:cxx11](bool)
  cmp byte ptr [rsp], 0

which obviously compares against memory. My intention is to avoid this
redundant read from memory and use a register instead like in
`FuncInt’.

Is there a way to tell clang that instances of Pair should (or can at
least) be broken into two separate variables and returned with the
most efficient way?

I believe this optimization won't work perfectly with current ABI,
though the compiler should not limit itself to the spec if a call is
happening to a non-exposed function or `-flto’ is used.

Here is a sample with assembly: https://godbolt.org/g/EKUqaF

In other words, I want `Pair<std::string> FuncString(bool const)’ to
behave like `bool FuncString(bool const, std::string &value)’
utilizing expressiveness of C++17 including "Structured binding."

As I understand, this problem is very similar to "Scalar replacement
of aggregates," but playing around with optimizer's options didn't
give me any positive outcome or insight.

I don't have any specific requirements regarding the target OS and
architecture beside it is x86. If it is possible to get it done in a
generic way: I’m happy to know; if it works only in the very specific
environment: still glad to know. Perhaps, I can try to implement this
optimization with your help if it looks interesting.

--
Best regards,
Denis Sukhonin
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing returning a struct instance larger than the quadword

Richard Smith via cfe-dev


> On Jan 23, 2018, at 6:10 AM, Denis Sukhonin via cfe-dev <[hidden email]> wrote:
>
> Hi list,
>
> I am doing a research on my own trying to understand the best way to
> report an error from a function.
> My environment:
> * exceptions are disabled with -fno-exceptions,
> * x86_64 Clang 5.0.1,
> * C++17, and
> * macOS 10.12.
>
> Suppose, there is a function which can fail: `FailingFn’. As a simple
> and quite common solution, I could give it this signature: `bool
> FaillingFn(Args…, Error &err)’. This way I have the bool variable
> returned via a register, and err object allocated on the stack. So, I
> can avoid accessing memory if return bool is true (meaning success).
>
> However, since we have got C++17 with copy elision and structure
> binding I want to simplify the signature to `std::tuple<bool, Error>
> FaillingFn(Args…)’, or even `RetValue FaillingFn(Args…)’. Then I can
> handle errors this way
>
> if (auto [ok, err] = FailingFn(args…); !ok)
>   // Handle error or, perhaps, just return it.
>   return err;
>
> Looks more expressive. With a few changes, we can make the `err’
> object a variant and carry a result of the successful evaluation.
> Though for the sake of simplicity, let’s assume it carries error
> information, e.g., a `std::string’ which is obviously larger than a
> 64bit register.
> I am hoping gcc recognizes the case of copy elision, construct the
> error object in the caller, and pass it as a reference.

Returning the components of a tuple separately is a good idea in the abstract;
I know for a fact that Swift does it with its native tuple types.  In C++, it would
have to be done by special-casing std::pair / std::tuple in the ABI, which would
technically be an ABI break for those types, but in principle it could be done,
perhaps as an opt-in operation.

Unfortunately, C++'s rules around object lifetime would limit this optimization
so much as to make it completely ineffective.  The return of 'err' in your example
is not a legal opportunity for copy elision under the standard, and so a temporary
must be introduced.  On the flip side, the standard requires temporary materialization
to be delayed so as to minimize copies, which means that it would not be legal
to break apart an existing tuple temporary in order to return its components separately.

Furthermore, it is likely that this optimization would allow the user to observe
an inconsistent ordering between pointers to similar components of different tuples,
which I believe would also violate the standard.

Also, this:

> I am hoping gcc recognizes the case of copy elision, construct the
> error object in the caller, and pass it as a reference.

This is not how objects are returned in C++; instead, the caller passes the callee
a pointer to uninitialized memory, and the callee constructs the return value into
that memory.

Anyway, if you wanted to pursue this as a non-conforming optimization, I think that
would be an interesting project, but I think you will find that Clang is not currently
well engineered for this kind of high-level value-propagation optimization.

John.

> Also, having
> the returned tuple broken into two independent variables would permit
> the compiler to use registers for them, at least for the first one
> which fits a register (I do not care about the second one until it
> carries a payload though.)
>
> Here below a sample. Assume we have a structure and some functions
> that may fail:
>
> template <typename T>
> struct Pair {
>     bool ok;
>     T    value;
> };
>
> Pair<std::int64_t> FuncInt(bool const toFail) {
>     return {!toFail, 42};
> }
>
> Pair<std::string> FuncString(bool const toFail) {
>     return {!toFail, "DEADBEEF"};
> }
>
> auto UseInt(bool const flag) {
>     if (auto const [ok, value] = FuncInt(flag); ok) {
>         return value;
>     }
>     return -1L;
> }
>
> auto UseString(bool const flag) {
>     if (auto const [ok, value] = FuncString(flag); ok) {
>         return value;
>     }
>     return std::string{"DEADFA11"};
> }
>
> The optimization works with `FuncInt’: the ok and value have got to
> `eax` and `edx’. The check code compiles into:
>   call FuncInt(bool)
>   and al, 1
>
> But, in case of `FuncString’, it is not happening. Here is what I get:
>   call FuncString[abi:cxx11](bool)
>   cmp byte ptr [rsp], 0
>
> which obviously compares against memory. My intention is to avoid this
> redundant read from memory and use a register instead like in
> `FuncInt’.
>
> Is there a way to tell clang that instances of Pair should (or can at
> least) be broken into two separate variables and returned with the
> most efficient way?
>
> I believe this optimization won't work perfectly with current ABI,
> though the compiler should not limit itself to the spec if a call is
> happening to a non-exposed function or `-flto’ is used.
>
> Here is a sample with assembly: https://godbolt.org/g/EKUqaF
>
> In other words, I want `Pair<std::string> FuncString(bool const)’ to
> behave like `bool FuncString(bool const, std::string &value)’
> utilizing expressiveness of C++17 including "Structured binding."
>
> As I understand, this problem is very similar to "Scalar replacement
> of aggregates," but playing around with optimizer's options didn't
> give me any positive outcome or insight.
>
> I don't have any specific requirements regarding the target OS and
> architecture beside it is x86. If it is possible to get it done in a
> generic way: I’m happy to know; if it works only in the very specific
> environment: still glad to know. Perhaps, I can try to implement this
> optimization with your help if it looks interesting.
>
> --
> Best regards,
> Denis Sukhonin
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev