I haven't looked at what happens with clang when compiling for other targets that don't have native support for half-precision arithmetic, but I would imagine that similar problems exist.
As a comparison point, our out of tree target uses __fp16 (and will move to _Float16 once I find time to work out what semantic changes that causes) but doesn't have native support for it.
In order to avoid spuriously converting to/from f32 across function calls it's special cased in the calling convention. Essentially pass in the low half of a f32 register without changing the bit pattern. This has no drawbacks known to me.