[Python-Dev] please consider changing --enable-unicode default to ucs4
mal at egenix.com
Wed Oct 7 23:21:09 CEST 2009
Ronald Oussoren wrote:
> On 7 Oct, 2009, at 22:13, M.-A. Lemburg wrote:
>> Ronald Oussoren wrote:
>>> On 7 Oct, 2009, at 20:05, M.-A. Lemburg wrote:
>>>> If we do go for a change, we should use sizeof(wchar_t)
>>>> as basis for the new default - on all platforms that
>>>> provide a wchar_t type.
>>> I'd be -1 on that. Sizeof(wchar_t) is 4 on OSX, but all non-Unix API's
>>> that deal with Unicode text use ucs16.
>> Is that true for non-Carbon APIs as well ?
>> This is what I found on the web (in summary):
>> Apple chose to go with UTF-16 at about the same time as Microsoft did
>> and used sizeof(wchar_t) == 2 for Mac OS. When they moved to Mac OS X,
>> they switched wchar_t to sizeof(wchar_t) == 4.
> Both Carbon and the modern APIs use UTF-16.
Thanks for that data point. So UTF-16 would be the more
natural choice on Mac OS X, despite the choice of sizeof(wchar_t).
> What I don't quite get in the UTF-16 vs. UTF-32 discussion is why UTF-32
> would be useful, because if you want to do generic Unicode processing
> you have to look at sequences of composed characters (base characters +
> composing marks) anyway instead of separate code points. Not that I'm a
> unicode expert in any way...
It's one of the reasons why I'm not much of a UCS4-fan - it only
helps with surrogates and that's about it.
Combining characters, various types of control code points
(e.g. joiners, bidirectional marks, breaks, non-breaks, annotations)
context sensitive casing, bidirectional marks and other such
features found in scripts cause very similar problems - often
much harder to solve, since they are not as easily identifiable
as surrogate high and low code points.
Professional Python Services directly from the Source (#1, Oct 07 2009)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try our new mxODBC.Connect Python Database Interface for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
More information about the Python-Dev