Is that getting wchar_t to be 32bit on win32 a good idea for compatible with Unix world by implement posix layer on win32 API?

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

Is that getting wchar_t to be 32bit on win32 a good idea for compatible with Unix world by implement posix layer on win32 API?

Yonggang Luo
Two solution:
1、Change the width of wchar_t to 16 bit, I guess that would broken a
lot of things that exist on Win32 world.
2、Or we should preserve wchar_t to be 16 bit on win32, and add the
char16_t and char32_t
variant API for all API that have both narrow and wide version?


I support for the second one, even if the second option is not
applicable. the first option would cause a lot problems, the first
thing is all Windows API use wchar_t and dependent on the wchar_t to
be 2 byte width.  Second is, there is open source libraries that
dependent the de fac·to that wchar_t to be 16 bit, such as Qt,
Git(maybe).
Almost exist open source libraries that already ported to Win32 are
dependent the the fact wchar_t to be 16 bit,  cygwin is also discussed
if getting wchar_t to be 32bit on win32

https://www.cygwin.com/ml/cygwin/2011-02/msg00037.html


> think there is no one would use
>>>>> wchar_t for cross text processing, cause, on some system, wchar_t is
>>>>> just 8bit  width!
>>>>
>>>> anybody would use wchar_t who cares about standard conformant
>>>> implementations.
>>>>
>>>> non-standard broken platforms may get an unmaintained #ifdef
>>>> as usual..
>>>
>>> I think we (and midipix) have a different perspective from Yonggang
>>> Luo on portable development. Our view is that you write to a POSIX (or
>>> nearly-POSIX) target with fully working Unicode support and fix the
>>> small number of targets (i.e. just Windows) that don't already provide
>> Small is relative, if counting the distribution count, well, Unix wins.
>>> these things. Yonggang Luo's perspective seems to be more of a
>>> traditional Windows approach with #ifdef and lots of OS-specific code,
>>> but just making the Windows branch of the #ifdefs less hideous than it
>>> was before. :)
>> If getting wchar_t to be 32 bit on win32, then truly will be a lot of
>> #ifdef, I am not so sure
>> if you have experience on Win32 API development, I hope we discussing
>> the problems in a
>>   more objective way.
>>
>
> One primary objective of code portability and posix-compatibility layer
> for win32 is to _remove_ the need for OS-specific code-paths. A wchar_t
> that is anything short (no pun intended) of a 32-bit integer will render
> it impossible to build out of the box many pieces of commonly-used
> software, including, but not limited to musl libc, the curses library,
> and anything that expects wchar_t to cover the entire unicode range.
>
> As for your suggested framework: there are currently at least three
> compilers that can produce optimized code for the target platform (gcc,
> clang, and cparser), and which work very well with most open-source
> software out there. As an aside, if you are interested in an 8-byte long
> on 64-bit windows then an open-source compiler is probably your only
> option. To compile musl with msvc, on the other hand, you'd have to make
> so many changes to the source code that you might as well write your own
> libc from scratch. To see why, please attempt to compile some ten or
> fifteen core libc headers (stdio.h, unistd.h, etc.) with msvc. If that
> goes well (spoiler: it won't), then the next step would be to compile a
> subset of the source files (src/pthread or src/stdio, for instance) and
> remove any remaining obstacles.
>
> m.
>
>
>>>
>>> Rich
>>
>>
>>
>
>

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [musl] Is that getting wchar_t to be 32bit on win32 a good idea for compatible with Unix world by implement posix layer on win32 API?

Rich Felker
On Sat, May 09, 2015 at 11:16:37AM +0800, 罗勇刚(Yonggang Luo)  wrote:

> Two solution:
> 1、Change the width of wchar_t to 16 bit, I guess that would broken a
> lot of things that exist on Win32 world.
> 2、Or we should preserve wchar_t to be 16 bit on win32, and add the
> char16_t and char32_t
> variant API for all API that have both narrow and wide version?
>
>
> I support for the second one, even if the second option is not
> applicable. the first option would cause a lot problems, the first
> thing is all Windows API use wchar_t and dependent on the wchar_t to
> be 2 byte width.  Second is, there is open source libraries that
> dependent the de fac·to that wchar_t to be 16 bit, such as Qt,
> Git(maybe).
> Almost exist open source libraries that already ported to Win32 are
> dependent the the fact wchar_t to be 16 bit,  cygwin is also discussed
> if getting wchar_t to be 32bit on win32
>
> https://www.cygwin.com/ml/cygwin/2011-02/msg00037.html

Well, which option is an easier path forward depends on your main
usage case. If you're most concerned about building existing
Windows-targetted code unmodified, obviously doing the same thing MSVC
does, even if it's a bad design, achieves that.

On the other hand, if your goal is building software that was written
for POSIX or POSIX-like systems on Windows with little or no
modification, it's more complicated. Code that currently has no
Windows support certainly will work best (full Unicode support) with
32-bit wchar_t. Code that already has Windows-specific workarounds
(assuming wchar_t is 16-bit on Windows) needs those undone to make it
work. But such code _should_ be checking WCHAR_MAX instead of assuming
Windows is 16-bit. I believe midipix is dealing with this issue simply
by not predefining _WIN32 or whatever, so that none of the Windows
workarounds will get activated.

I really suspect most Windows code interfacing with WINAPI is using
WCHAR, not wchar_t, for its UTF-16 strings. So fixing wchar_t to be
32-bit and leaving WCHAR alone is the best solution in my opinion.
Note that you're still left with the issue that L"xxx" strings will
not work with WCHAR, but this really only matters if you're trying to
use existing Windows-targetted code unmodified, and it's easily fixed
by s/L"/u"/g across the source (making them char16_t[] literals rather
than wchar_t[] literals).

I don't think adding lots of functions for char16_t and char32_t is
useful. The format you want programs to be using is UTF-8. With
midipix all of the standard C functions, just like in straight musl,
always work in UTF-8, and there are also wrappers for the WINAPI that
convert UTF-8 to UTF-16 transparently. This allows you to just work in
char[] strings and pass them to WINAPI functions like you would if you
were working in "ANSI codepage" mode, except that you actually have
full Unicode available. I strongly support this approach and hope
you'll adopt it.

Rich

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [musl] Is that getting wchar_t to be 32bit on win32 a good idea for compatible with Unix world by implement posix layer on win32 API?

Yonggang Luo
2015-05-09 11:32 GMT+08:00 Rich Felker <[hidden email]>:

> On Sat, May 09, 2015 at 11:16:37AM +0800, 罗勇刚(Yonggang Luo)  wrote:
>> Two solution:
>> 1、Change the width of wchar_t to 16 bit, I guess that would broken a
>> lot of things that exist on Win32 world.
>> 2、Or we should preserve wchar_t to be 16 bit on win32, and add the
>> char16_t and char32_t
>> variant API for all API that have both narrow and wide version?
>>
>>
>> I support for the second one, even if the second option is not
>> applicable. the first option would cause a lot problems, the first
>> thing is all Windows API use wchar_t and dependent on the wchar_t to
>> be 2 byte width.  Second is, there is open source libraries that
>> dependent the de fac·to that wchar_t to be 16 bit, such as Qt,
>> Git(maybe).
>> Almost exist open source libraries that already ported to Win32 are
>> dependent the the fact wchar_t to be 16 bit,  cygwin is also discussed
>> if getting wchar_t to be 32bit on win32
>>
>> https://www.cygwin.com/ml/cygwin/2011-02/msg00037.html
>
> Well, which option is an easier path forward depends on your main
> usage case. If you're most concerned about building existing
> Windows-targetted code unmodified, obviously doing the same thing MSVC
> does, even if it's a bad design, achieves that.
>
> On the other hand, if your goal is building software that was written
> for POSIX or POSIX-like systems on Windows with little or no
> modification, it's more complicated. Code that currently has no
> Windows support certainly will work best (full Unicode support) with
> 32-bit wchar_t. Code that already has Windows-specific workarounds
> (assuming wchar_t is 16-bit on Windows) needs those undone to make it
> work. But such code _should_ be checking WCHAR_MAX instead of assuming
> Windows is 16-bit. I believe midipix is dealing with this issue simply
> by not predefining _WIN32 or whatever, so that none of the Windows
> workarounds will get activated.
>
> I really suspect most Windows code interfacing with WINAPI is using
> WCHAR, not wchar_t, for its UTF-16 strings. So fixing wchar_t to be
This is a misunderstanding,
The real definition of WCHAR is in winnt.h, and defined as follow:

#ifndef _MAC
typedef wchar_t WCHAR;    // wc,   16-bit UNICODE character
#else
// some Macintosh compilers don't define wchar_t in a convenient
location, or define it as a char
typedef unsigned short WCHAR;    // wc,   16-bit UNICODE character
#endif



> 32-bit and leaving WCHAR alone is the best solution in my opinion.
> Note that you're still left with the issue that L"xxx" strings will
> not work with WCHAR, but this really only matters if you're trying to
> use existing Windows-targetted code unmodified, and it's easily fixed
> by s/L"/u"/g across the source (making them char16_t[] literals rather
> than wchar_t[] literals).
>
> I don't think adding lots of functions for char16_t and char32_t is
> useful. The format you want programs to be using is UTF-8. With
> midipix all of the standard C functions, just like in straight musl,
> always work in UTF-8, and there are also wrappers for the WINAPI that
> convert UTF-8 to UTF-16 transparently. This allows you to just work in
> char[] strings and pass them to WINAPI functions like you would if you
> were working in "ANSI codepage" mode, except that you actually have
> full Unicode available. I strongly support this approach and hope
> you'll adopt it.
>
> Rich



--
         此致

罗勇刚
Yours
    sincerely,
Yonggang Luo

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Is that getting wchar_t to be 32bit on win32 a good idea for compatible with Unix world by implement posix layer on win32 API?

John Sully
In reply to this post by Yonggang Luo
wchar_t is also pretty common in the win32 world.  You shouldn't assume people use the windows macros.  Regardless of what you choose someone is going to lose, so it might make more sense to think about what is more useful long term.

In my opinion you almost never want 32-bit wide characters once you learn of their limitations.  Most people assume that if they use them they can return to the one character -> one glyph idiom like ASCII.  But Unicode is vastly more complex than that and while you avoid surrogates you don't avoid things like combining characters and diacritics so the idiom does not hold.

Given that almost every character in frequent use around the world is in the BMP plane 16-bit wide chars make the most sense for most applications.


On Fri, May 8, 2015 at 8:16 PM, 罗勇刚(Yonggang Luo) <[hidden email]> wrote:
Two solution:
1、Change the width of wchar_t to 16 bit, I guess that would broken a
lot of things that exist on Win32 world.
2、Or we should preserve wchar_t to be 16 bit on win32, and add the
char16_t and char32_t
variant API for all API that have both narrow and wide version?


I support for the second one, even if the second option is not
applicable. the first option would cause a lot problems, the first
thing is all Windows API use wchar_t and dependent on the wchar_t to
be 2 byte width.  Second is, there is open source libraries that
dependent the de fac·to that wchar_t to be 16 bit, such as Qt,
Git(maybe).
Almost exist open source libraries that already ported to Win32 are
dependent the the fact wchar_t to be 16 bit,  cygwin is also discussed
if getting wchar_t to be 32bit on win32

https://www.cygwin.com/ml/cygwin/2011-02/msg00037.html


> think there is no one would use
>>>>> wchar_t for cross text processing, cause, on some system, wchar_t is
>>>>> just 8bit  width!
>>>>
>>>> anybody would use wchar_t who cares about standard conformant
>>>> implementations.
>>>>
>>>> non-standard broken platforms may get an unmaintained #ifdef
>>>> as usual..
>>>
>>> I think we (and midipix) have a different perspective from Yonggang
>>> Luo on portable development. Our view is that you write to a POSIX (or
>>> nearly-POSIX) target with fully working Unicode support and fix the
>>> small number of targets (i.e. just Windows) that don't already provide
>> Small is relative, if counting the distribution count, well, Unix wins.
>>> these things. Yonggang Luo's perspective seems to be more of a
>>> traditional Windows approach with #ifdef and lots of OS-specific code,
>>> but just making the Windows branch of the #ifdefs less hideous than it
>>> was before. :)
>> If getting wchar_t to be 32 bit on win32, then truly will be a lot of
>> #ifdef, I am not so sure
>> if you have experience on Win32 API development, I hope we discussing
>> the problems in a
>>   more objective way.
>>
>
> One primary objective of code portability and posix-compatibility layer
> for win32 is to _remove_ the need for OS-specific code-paths. A wchar_t
> that is anything short (no pun intended) of a 32-bit integer will render
> it impossible to build out of the box many pieces of commonly-used
> software, including, but not limited to musl libc, the curses library,
> and anything that expects wchar_t to cover the entire unicode range.
>
> As for your suggested framework: there are currently at least three
> compilers that can produce optimized code for the target platform (gcc,
> clang, and cparser), and which work very well with most open-source
> software out there. As an aside, if you are interested in an 8-byte long
> on 64-bit windows then an open-source compiler is probably your only
> option. To compile musl with msvc, on the other hand, you'd have to make
> so many changes to the source code that you might as well write your own
> libc from scratch. To see why, please attempt to compile some ten or
> fifteen core libc headers (stdio.h, unistd.h, etc.) with msvc. If that
> goes well (spoiler: it won't), then the next step would be to compile a
> subset of the source files (src/pthread or src/stdio, for instance) and
> remove any remaining obstacles.
>
> m.
>
>
>>>
>>> Rich
>>
>>
>>
>
>

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Is that getting wchar_t to be 32bit on win32 a good idea for compatible with Unix world by implement posix layer on win32 API?

Szabolcs Nagy
* John Sully <[hidden email]> [2015-05-09 00:55:12 -0700]:
> In my opinion you almost never want 32-bit wide characters once you learn
> of their limitations.  Most people assume that if they use them they can
> return to the one character -> one glyph idiom like ASCII.  But Unicode is

wchar_t must be at least 21 bits on a system that spports unicode
in any locale: it has to be able to represent all code points of the
supported character set.

in practice this means that the only conforming definition to iso c
(and thus posix, c++ and other standards based on c) is a 32bit wchar_t
(the signedness can be choosen freely).

so the definition is not based on what "you almost never want" or what
"most people assume".

if the goal is to provide a posix implementation then 16bit wchar_t
is not an option (assuming the system wants to be able to communicate
with the external world that uses unicode text).
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Is that getting wchar_t to be 32bit on win32 a good idea for compatible with Unix world by implement posix layer on win32 API?

Yonggang Luo
2015-05-09 18:36 GMT+08:00 Szabolcs Nagy <[hidden email]>:

> * John Sully <[hidden email]> [2015-05-09 00:55:12 -0700]:
>> In my opinion you almost never want 32-bit wide characters once you learn
>> of their limitations.  Most people assume that if they use them they can
>> return to the one character -> one glyph idiom like ASCII.  But Unicode is
>
> wchar_t must be at least 21 bits on a system that spports unicode
> in any locale: it has to be able to represent all code points of the
> supported character set.
>
> in practice this means that the only conforming definition to iso c
> (and thus posix, c++ and other standards based on c) is a 32bit wchar_t
> (the signedness can be choosen freely).
>
> so the definition is not based on what "you almost never want" or what
> "most people assume".
>
> if the goal is to provide a posix implementation then 16bit wchar_t
> is not an option (assuming the system wants to be able to communicate
> with the external world that uses unicode text).
wchar_t is not the only way to communicate with the external way, and
it's also not suite for communicate to the external world,
from the C11 standard, it's never restrict the wchar_t's width, and
for Posix, most API are implement in
utf8, and indeed, Windows need the posix layer mainly because of those
API that using utf8, not wchar_t APIs,
for the communicate reason to getting wchar_t to be 32 bit on Win32 is
not a good idea,

And for portable text processing(Including win32) apps or libs, they
would and should never dependents on the wchar_t must be 32 bit width.

And C11/C++11 already provide uchar.h to provide cross-platform
char16_t and char32_t, so there is no reason to getting wchar_t to be
32bit
on win32 for suport posix on win32.


We were intent to creating a usable posix layer on win32, not creating
a theoretical POSIX layer that would be useless, on win32, we should
considerate the de facto things
on win32.


--
         此致

罗勇刚
Yours
    sincerely,
Yonggang Luo

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [musl] Re: Is that getting wchar_t to be 32bit on win32 a good idea for compatible with Unix world by implement posix layer on win32 API?

Rich Felker
On Sat, May 09, 2015 at 07:19:14PM +0800, 罗勇刚(Yonggang Luo)  wrote:

> 2015-05-09 18:36 GMT+08:00 Szabolcs Nagy <[hidden email]>:
> > * John Sully <[hidden email]> [2015-05-09 00:55:12 -0700]:
> >> In my opinion you almost never want 32-bit wide characters once you learn
> >> of their limitations.  Most people assume that if they use them they can
> >> return to the one character -> one glyph idiom like ASCII.  But Unicode is
> >
> > wchar_t must be at least 21 bits on a system that spports unicode
> > in any locale: it has to be able to represent all code points of the
> > supported character set.
> >
> > in practice this means that the only conforming definition to iso c
> > (and thus posix, c++ and other standards based on c) is a 32bit wchar_t
> > (the signedness can be choosen freely).
> >
> > so the definition is not based on what "you almost never want" or what
> > "most people assume".
> >
> > if the goal is to provide a posix implementation then 16bit wchar_t
> > is not an option (assuming the system wants to be able to communicate
> > with the external world that uses unicode text).
> wchar_t is not the only way to communicate with the external way, and
> it's also not suite for communicate to the external world,

Of course it's not. UTF-8 is. But per both ISO C and POSIX, any
character the locale supports has a representation as wchar_t. If
wchar_t is only 16-bit, then you fundamentally can't support all of
Unicode in the locale's encoding. mbrtowc has to fail with EILSEQ for
4-byte characters, regex functions cannot process 4-byte characters,
etc. Such a system is is conforming to the requirements for C and
POSIX but does not support Unicode (in full) at the locale level.

> from the C11 standard, it's never restrict the wchar_t's width, and
> for Posix, most API are implement in
> utf8, and indeed, Windows need the posix layer mainly because of those
> API that using utf8, not wchar_t APIs,
> for the communicate reason to getting wchar_t to be 32 bit on Win32 is
> not a good idea,
>
> And for portable text processing(Including win32) apps or libs, they
> would and should never dependents on the wchar_t must be 32 bit width.

If __STDC_ISO_10646__ is defined, wchar_t must have at least 21 value
bits. Applications which are portable only to systems where this macro
is defined, or which have some fallback (like dropping multilingual
text support) for systems where it's not defined, CAN make such
assumptions.

> And C11/C++11 already provide uchar.h to provide cross-platform
> char16_t and char32_t, so there is no reason to getting wchar_t to be
> 32bit
> on win32 for suport posix on win32.

If wchar_t is 16-bit, you can't represent non-BMP characters in
char32_t because they can't be part of the locale's character set. All
char32_t buys you then is 16 wasted zero bits.

> We were intent to creating a usable posix layer on win32, not creating
> a theoretical POSIX layer that would be useless, on win32, we should
> considerate the de facto things
> on win32.

Uselessness is a big assumption you're making that's not supported by
data. If you actually provide a working POSIX layer, you'll have
pretty much any application that's currently working on Linux, BSDs,
etc. (with actual portable code, not system-specific #ifdefs) working
on Windows with few or no changes. If you do that with 32-bit wchar_t,
they'll support Unicode fully. If you do it with 16-bit wchar_t, then
the ones that are using the locale system for character handling will
have to be refitted with extra layers to support more than the BMP,
and those patches probably (hopefully) won't be accepted upstream.

The only applications that would benefit from having 16-bit wchar_t
are existing Windows applications that are not going to have much use
for a POSIX layer anyway, and they can be fixed very easily with
search-and-replace (no new code layers).

Rich

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [musl] Re: Is that getting wchar_t to be 32bit on win32 a good idea for compatible with Unix world by implement posix layer on win32 API?

Yonggang Luo
2015-05-10 4:05 GMT+08:00 Rich Felker <[hidden email]>:

> On Sat, May 09, 2015 at 07:19:14PM +0800, 罗勇刚(Yonggang Luo)  wrote:
>> 2015-05-09 18:36 GMT+08:00 Szabolcs Nagy <[hidden email]>:
>> > * John Sully <[hidden email]> [2015-05-09 00:55:12 -0700]:
>> >> In my opinion you almost never want 32-bit wide characters once you learn
>> >> of their limitations.  Most people assume that if they use them they can
>> >> return to the one character -> one glyph idiom like ASCII.  But Unicode is
>> >
>> > wchar_t must be at least 21 bits on a system that spports unicode
>> > in any locale: it has to be able to represent all code points of the
>> > supported character set.
>> >
>> > in practice this means that the only conforming definition to iso c
>> > (and thus posix, c++ and other standards based on c) is a 32bit wchar_t
>> > (the signedness can be choosen freely).
>> >
>> > so the definition is not based on what "you almost never want" or what
>> > "most people assume".
>> >
>> > if the goal is to provide a posix implementation then 16bit wchar_t
>> > is not an option (assuming the system wants to be able to communicate
>> > with the external world that uses unicode text).
>> wchar_t is not the only way to communicate with the external way, and
>> it's also not suite for communicate to the external world,
>
> Of course it's not. UTF-8 is. But per both ISO C and POSIX, any
> character the locale supports has a representation as wchar_t. If
> wchar_t is only 16-bit, then you fundamentally can't support all of
> Unicode in the locale's encoding. mbrtowc has to fail with EILSEQ for
> 4-byte characters, regex functions cannot process 4-byte characters,
> etc. Such a system is is conforming to the requirements for C and
> POSIX but does not support Unicode (in full) at the locale level.
>
>> from the C11 standard, it's never restrict the wchar_t's width, and
>> for Posix, most API are implement in
>> utf8, and indeed, Windows need the posix layer mainly because of those
>> API that using utf8, not wchar_t APIs,
>> for the communicate reason to getting wchar_t to be 32 bit on Win32 is
>> not a good idea,
>>
>> And for portable text processing(Including win32) apps or libs, they
>> would and should never dependents on the wchar_t must be 32 bit width.
>
> If __STDC_ISO_10646__ is defined, wchar_t must have at least 21 value
> bits. Applications which are portable only to systems where this macro
> is defined, or which have some fallback (like dropping multilingual
> text support) for systems where it's not defined, CAN make such
> assumptions.
>
>> And C11/C++11 already provide uchar.h to provide cross-platform
>> char16_t and char32_t, so there is no reason to getting wchar_t to be
>> 32bit
>> on win32 for suport posix on win32.
>
> If wchar_t is 16-bit, you can't represent non-BMP characters in
> char32_t because they can't be part of the locale's character set. All
> char32_t buys you then is 16 wasted zero bits.
>
>> We were intent to creating a usable posix layer on win32, not creating
>> a theoretical POSIX layer that would be useless, on win32, we should
>> considerate the de facto things
>> on win32.
>
> Uselessness is a big assumption you're making that's not supported by
> data. If you actually provide a working POSIX layer, you'll have
> pretty much any application that's currently working on Linux, BSDs,
> etc. (with actual portable code, not system-specific #ifdefs) working
> on Windows with few or no changes. If you do that with 32-bit wchar_t,
> they'll support Unicode fully. If you do it with 16-bit wchar_t, then
> the ones that are using the locale system for character handling will
> have to be refitted with extra layers to support more than the BMP,
> and those patches probably (hopefully) won't be accepted upstream.
>
> The only applications that would benefit from having 16-bit wchar_t
> are existing Windows applications that are not going to have much use
> for a POSIX layer anyway, and they can be fixed very easily with
> search-and-replace (no new code layers).
That's not so easy as you said to search-and-replace,

Windows and POSIX there is a lot of incompatible and that won't be changed, or
We just implement a virtual machine that running on Win32, that's
would compatible all the POSIX
things on win32, but that's useless

The intention to provide a POSIX layer is to reduce the burden for
those Developers have intension
to create cross-platform(include Windows), but not for those
Developers that only intent to developing apps
for Linux/POSIX.

So such a layer should preserve the usable part of POSIX and dropping
those part that just creating inconvenience.
wchar_t to be 32bit is obviously suite for Win32.

My intention is not developing a virtual machine like layer such as
cygwin, but a native Win32 layer that provide
most POSIX functions and with utf8 support, that would solve most
portable issue and works on win32 just like
a win32 app but not a Unix/Linux app.
>
> Rich



--
         此致

罗勇刚
Yours
    sincerely,
Yonggang Luo

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [musl] Re: Is that getting wchar_t to be 32bit on win32 a good idea for compatible with Unix world by implement posix layer on win32 API?

Yonggang Luo
In reply to this post by Rich Felker
For example, the open function exist both in msvcrt and posix,
int open(const char *path, int oflag, ...);

But in msvcrt, the path is ANSI encoding, and in posix, path is utf8 encoding,

So if we need to developing a cross-platform application, On Win32,
the open function should not be used.
But in fact, there is no openw(const wchar*path) API in posix or Win32,
So we need to re-implement open function on win32 with the same API,
and convert to
the wchar_t version of Window 32 API, _wopen,
That's would be a chaos for those developers want to use open function
in both posix and win32.

And if we turn the wchar_t to be 32 bit on win32,
first, posix still have no wide version of open function
second, to implement open function on win32, we need to consider the
fact wchar_t is 32bit now, and should re-use the exist _wopen in
a different way and all other exist wide version of Win32 API.


--
         此致

罗勇刚
Yours
    sincerely,
Yonggang Luo

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [musl] Re: Is that getting wchar_t to be 32bit on win32 a good idea for compatible with Unix world by implement posix layer on win32 API?

Joerg Sonnenberger
On Sun, May 10, 2015 at 08:31:54PM +0800, 罗勇刚(Yonggang Luo)  wrote:
> But in msvcrt, the path is ANSI encoding, and in posix, path is utf8 encoding,

Huh? POSIX makes no assumptions about string encoding. If it is UTF8 or
a legacy encoding is completely up to the application and/or end user
environment.

Joerg

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [musl] Re: Is that getting wchar_t to be 32bit on win32 a good idea for compatible with Unix world by implement posix layer on win32 API?

Rich Felker
In reply to this post by Yonggang Luo
On Sun, May 10, 2015 at 08:31:54PM +0800, 罗勇刚(Yonggang Luo)  wrote:

> For example, the open function exist both in msvcrt and posix,
> int open(const char *path, int oflag, ...);
>
> But in msvcrt, the path is ANSI encoding, and in posix, path is utf8 encoding,
>
> So if we need to developing a cross-platform application, On Win32,
> the open function should not be used.
> But in fact, there is no openw(const wchar*path) API in posix or Win32,
> So we need to re-implement open function on win32 with the same API,
> and convert to
> the wchar_t version of Window 32 API, _wopen,
> That's would be a chaos for those developers want to use open function
> in both posix and win32.

I assumed you were already planning for your POSIX layer an open
function which takes a UTF-8 string and converts it transparently to
whatever encoding (i.e. UTF-16) the underlying Windows operations
need. If you don't have that, then you're a step behind even Cygwin
and significantly behind midipix in terms of the ability to provide a
POSIX+Unicode environment that can run existing POSIX-targeted
applications unmodified. Anyone wanting Unicode filename support would
have to fill their codebase with Windows-specific openw() calls, which
is basically the same situation you have now on Windows.

> And if we turn the wchar_t to be 32 bit on win32,
> first, posix still have no wide version of open function
> second, to implement open function on win32, we need to consider the
> fact wchar_t is 32bit now, and should re-use the exist _wopen in
> a different way and all other exist wide version of Win32 API.

I don't follow what you're saying here.

Rich

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [musl] Re: Is that getting wchar_t to be 32bit on win32 a good idea for compatible with Unix world by implement posix layer on win32 API?

Yonggang Luo
>
> I assumed you were already planning for your POSIX layer an open
> function which takes a UTF-8 string and converts it transparently to
> whatever encoding (i.e. UTF-16) the underlying Windows operations
> need. If you don't have that, then you're a step behind even Cygwin
> and significantly behind midipix in terms of the ability to provide a
> POSIX+Unicode environment that can run existing POSIX-targeted
> applications unmodified. Anyone wanting Unicode filename support would
> have to fill their codebase with Windows-specific openw() calls, which
> is basically the same situation you have now on Windows.
>
>> And if we turn the wchar_t to be 32 bit on win32,
>> first, posix still have no wide version of open function
>> second, to implement open function on win32, we need to consider the
>> fact wchar_t is 32bit now, and should re-use the exist _wopen in
>> a different way and all other exist wide version of Win32 API.
>
> I don't follow what you're saying here.
>
My point is that getting wchar_t  to be 32bit on win32 would making
chaos for those developers.
who want to making true cross-platform apps and libs, your though is
too restricted to
unix-like system, but not thinking things in a objective manner.
> Rich



--
         此致

罗勇刚
Yours
    sincerely,
Yonggang Luo

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [musl] Re: Is that getting wchar_t to be 32bit on win32 a good idea for compatible with Unix world by implement posix layer on win32 API?

Rich Felker
On Sun, May 10, 2015 at 10:15:52PM +0800, 罗勇刚(Yonggang Luo)  wrote:

> >
> > I assumed you were already planning for your POSIX layer an open
> > function which takes a UTF-8 string and converts it transparently to
> > whatever encoding (i.e. UTF-16) the underlying Windows operations
> > need. If you don't have that, then you're a step behind even Cygwin
> > and significantly behind midipix in terms of the ability to provide a
> > POSIX+Unicode environment that can run existing POSIX-targeted
> > applications unmodified. Anyone wanting Unicode filename support would
> > have to fill their codebase with Windows-specific openw() calls, which
> > is basically the same situation you have now on Windows.
> >
> >> And if we turn the wchar_t to be 32 bit on win32,
> >> first, posix still have no wide version of open function
> >> second, to implement open function on win32, we need to consider the
> >> fact wchar_t is 32bit now, and should re-use the exist _wopen in
> >> a different way and all other exist wide version of Win32 API.
> >
> > I don't follow what you're saying here.
> >
> My point is that getting wchar_t  to be 32bit on win32 would making
> chaos for those developers.
> who want to making true cross-platform apps and libs, your though is
> too restricted to
> unix-like system, but not thinking things in a objective manner.

I understand that you've made this claim multiple times, but with no
evidence/data to back it up. From my perspective, what you're saying
is that having more uniformity between platforms, and being able to
use the same interface on them all rather than having to use
platform-specific APIs, "makes chaos for developers...making true
cross-platform apps and libs". In the absence of specific reasons to
believe that, one would tend to believe the opposite.

Rich

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Is that getting wchar_t to be 32bit on win32 a good idea for compatible with Unix world by implement posix layer on win32 API?

Karsten Blees
In reply to this post by Yonggang Luo
Am 09.05.2015 um 05:17 schrieb 罗勇刚(Yonggang Luo) :

> Is that getting wchar_t to be 32bit on win32 a good idea

wchar_t must match what the compiler generates for L"string" literals. Thus you cannot "change" wchar_t in a library or compatibility layer, as it is a compiler property.

Furthermore, using 32-bit wchar_t on Windows would break binary compatibility with existing libraries. And programs that cannot use e.g. kernel32.dll are completely useless, they cannot do anything.

> One primary objective of code portability and posix-compatibility layer for win32 is to _remove_ the need for OS-specific code-paths. A wchar_t that is anything short (no pun intended) of a 32-bit integer will render it impossible to build out of the box many pieces of commonly-used software, including, but not limited to musl libc, the curses library, and anything that expects wchar_t to cover the entire unicode range.

Any software that uses wchar_t to represent Unicode is inherently platform specific / not portable.

For example: POSIX requires that wide characters can be processed in isolation, e.g. each wide character has a specific width (see wcwidth() API and format of character set description files). This doesn't fly with Unicode's combining characters. E.g. a triple of any two Unicode characters followed by tie/breve \u0361 has a width of two. A POSIX-compliant wchar_t would need distinct wide character codes for all such combinations (i.e. requiring at least 3 * 21 = 63 bits).

Therefore, libc implementations that use wchar_t for Unicode cannot be strictly POSIX compliant (independent on whether wchar_t is UTF-32, UTF-16 or UTF-8).

The Unicode specification, chapter 5.2, recommends using char16_t / char32_t for Unicode, not wchar_t.

Just my 2c
Karsten


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [musl] Re: Is that getting wchar_t to be 32bit on win32 a good idea for compatible with Unix world by implement posix layer on win32 API?

Mike Frysinger
In reply to this post by Yonggang Luo
On 10 May 2015 20:31, 罗勇刚(Yonggang Luo)  wrote:
> For example, the open function exist both in msvcrt and posix,
> int open(const char *path, int oflag, ...);
>
> But in msvcrt, the path is ANSI encoding, and in posix, path is utf8 encoding,

POSIX has no such encoding requirement on the |path| argument:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html

on Linux, that buffer is a standard NUL-terminated C string which is passed
directly to the kernel which more or less passes it directly to the fs driver.  
how some FS drivers interpret that string depends on the FS.
-mike

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [musl] Re: Is that getting wchar_t to be 32bit on win32 a good idea for compatible with Unix world by implement posix layer on win32 API?

Yonggang Luo
2015-05-11 9:47 GMT+08:00 Mike Frysinger <[hidden email]>:

> On 10 May 2015 20:31, 罗勇刚(Yonggang Luo)  wrote:
>> For example, the open function exist both in msvcrt and posix,
>> int open(const char *path, int oflag, ...);
>>
>> But in msvcrt, the path is ANSI encoding, and in posix, path is utf8 encoding,
>
> POSIX has no such encoding requirement on the |path| argument:
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html
>
> on Linux, that buffer is a standard NUL-terminated C string which is passed
> directly to the kernel which more or less passes it directly to the fs driver.
> how some FS drivers interpret that string depends on the FS.
> -mike
In linux world, the encoding of path is dependent to FS, that's true:)
Even though at nowadays, major FS are UTF-8 as default encoding.
But in Win32 world, that's different things, the Win NTFS are using
UTF16 for the FS, but under different system locale, (GBK or CP1252)
it's would using different encoding(GBK or CP1252) to open the same file.
That's make the open function useless on win32.




--
         此致

罗勇刚
Yours
    sincerely,
Yonggang Luo

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [musl] Re: Is that getting wchar_t to be 32bit on win32 a good idea for compatible with Unix world by implement posix layer on win32 API?

Joerg Schilling
In reply to this post by Mike Frysinger
Mike Frysinger <[hidden email]> wrote:

> On 10 May 2015 20:31, ???(Yonggang Luo)  wrote:
> > For example, the open function exist both in msvcrt and posix,
> > int open(const char *path, int oflag, ...);
> >
> > But in msvcrt, the path is ANSI encoding, and in posix, path is utf8 encoding,
>
> POSIX has no such encoding requirement on the |path| argument:
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html
>
> on Linux, that buffer is a standard NUL-terminated C string which is passed
> directly to the kernel which more or less passes it directly to the fs driver.  
> how some FS drivers interpret that string depends on the FS.
> -mike

I remember that a while ago (probably around 2001), Microsoft tried to reword
POSIX to permit 16 bit characters by default to make their interface POSIX
compliant. This caused a long discussion that ended with the conclusion, that
we cannot do that.

Jörg

--
 EMail:[hidden email]                    (home) Jörg Schilling D-13353 Berlin
       [hidden email] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.org/private/ http://sourceforge.net/projects/schilytools/files/'

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [musl] Re: Is that getting wchar_t to be 32bit on win32 a good idea for compatible with Unix world by implement posix layer on win32 API?

Yonggang Luo
>
> I remember that a while ago (probably around 2001), Microsoft tried to reword
> POSIX to permit 16 bit characters by default to make their interface POSIX
> compliant. This caused a long discussion that ended with the conclusion, that
> we cannot do that.
That's really a long time ago, things are changed time to time.
Suppose we drop the support for wchar_t for POSIX, then there is still
a cross-platform subset we could use. And that's truly we want.
In real world, there is so much Compromise, I was intent to developing
a cross-platform subset C runtime API to makes some app development
ease.
Such as git. There is a large set of application and libraries that
suffering there is a cross-platfrom subset POSIX C runtime to use, so
for the cross-platform support, they have to sacrifice the code
elegance and using all kinds of tricks to work around
for those APIs.
--
         此致

罗勇刚
Yours
    sincerely,
Yonggang Luo

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [musl] Re: Is that getting wchar_t to be 32bit on win32 a good idea for compatible with Unix world by implement posix layer on win32 API?

Shware Systems

(received via austin-group-l)

My 2 cents:

Yes, it is a good idea, having wchar_t be 32-bits. For some simple transforms of a few current 8-bit oriented multi-byte code sets, a wchar_t of 64 bits is actually desirable to simplify and speed up some operations, from a conceptual standpoint. Conversions to and from char32_t type would be slower. A 32-bit wchar_t, based on uint32_t type, also has similar applications in processing UTF16 strings, whether based on char16_t type of C11, or byte or short arrays. If anything, POSIX needs to expand wchar_t support, not drop it, so systems like Windows are more subsets of POSIX in function than incompatible. Such integration is one of the tentative goals identified as desirable for Issue 8; how well it gets accomplished remains to be seen :-). In some respects it's a more massive undertaking than the efforts made for Issue 6 integrating Unix and POSIX because of various legacy compatibility issues, that I can see.



Sent from AOL Mobile Mail
Get the new AOL app: mail.mobile.aol.com




On Monday, May 11, 2015 罗勇刚(Yonggang Luo) <[hidden email]> wrote:

> > I remember that a while ago (probably around 2001), Microsoft tried to reword > POSIX to permit 16 bit characters by default to make their interface POSIX > compliant. This caused a long discussion that ended with the conclusion, that > we cannot do that. That's really a long time ago, things are changed time to time. Suppose we drop the support for wchar_t for POSIX, then there is still a cross-platform subset we could use. And that's truly we want. In real world, there is so much Compromise, I was intent to developing a cross-platform subset C runtime API to makes some app development ease. Such as git. There is a large set of application and libraries that suffering there is a cross-platfrom subset POSIX C runtime to use, so for the cross-platform support, they have to sacrifice the code elegance and using all kinds of tricks to work around for those APIs. -- 此致 礼 罗勇刚 Yours sincerely, Yonggang Luo
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [musl] Re: Is that getting wchar_t to be 32bit on win32 a good idea for compatible with Unix world by implement posix layer on win32 API?

Yonggang Luo
2015-05-13 5:57 GMT+08:00 Shware Systems <[hidden email]>:

> (received via austin-group-l)
>
> My 2 cents:
>
> Yes, it is a good idea, having wchar_t be 32-bits. For some simple
> transforms of a few current 8-bit oriented multi-byte code sets, a wchar_t
> of 64 bits is actually desirable to simplify and speed up some operations,
> from a conceptual standpoint. Conversions to and from char32_t type would be
> slower. A 32-bit wchar_t, based on uint32_t type, also has similar
> applications in processing UTF16 strings, whether based on char16_t type of
> C11, or byte or short arrays. If anything, POSIX needs to expand wchar_t
> support, not drop it, so systems like Windows are more subsets of POSIX in
> function than incompatible. Such integration is one of the tentative goals
> identified as desirable for Issue 8; how well it gets accomplished remains
> to be seen :-). In some respects it's a more massive undertaking than the
> efforts made for Issue 6 integrating Unix and POSIX because of various
> legacy compatibility issues, that I can see.
So are you suggesting wchar_t to be 64 bit? From my point of view, for
not breaking exist thing, maintain the size of wchar_t to not be
changed, and on different platform use different size,
and introduce new types such as char32_t and even char64_t to do
platform-neutral things would be a better option.
wchar_t's size is decided by the compiler and that would not changed
by our discussing.
--
         此致

罗勇刚
Yours
    sincerely,
Yonggang Luo

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev