[Python-Dev] please consider changing --enable-unicode default to ucs4

M.-A. Lemburg mal at egenix.com
Wed Oct 7 23:21:09 CEST 2009


Ronald Oussoren wrote:
> 
> On 7 Oct, 2009, at 22:13, M.-A. Lemburg wrote:
> 
>> Ronald Oussoren wrote:
>>>
>>> On 7 Oct, 2009, at 20:05, M.-A. Lemburg wrote:
>>>>
>>>>
>>>> If we do go for a change, we should use sizeof(wchar_t)
>>>> as basis for the new default - on all platforms that
>>>> provide a wchar_t type.
>>>
>>> I'd be -1 on that. Sizeof(wchar_t) is 4 on OSX, but all non-Unix API's
>>> that deal with Unicode text use ucs16.
>>
>> Is that true for non-Carbon APIs as well ?
>>
>> This is what I found on the web (in summary):
>>
>> Apple chose to go with UTF-16 at about the same time as Microsoft did
>> and used sizeof(wchar_t) == 2 for Mac OS. When they moved to Mac OS X,
>> they switched wchar_t to sizeof(wchar_t) == 4.
>>
> 
> Both Carbon and the modern APIs use UTF-16.

Thanks for that data point. So UTF-16 would be the more
natural choice on Mac OS X, despite the choice of sizeof(wchar_t).

> What I don't quite get in the UTF-16 vs. UTF-32 discussion is why UTF-32
> would be useful, because if you want to do generic Unicode processing
> you have to look at sequences of composed characters (base characters +
> composing marks) anyway instead of separate code points.  Not that I'm a
> unicode expert in any way...

Very true.

It's one of the reasons why I'm not much of a UCS4-fan - it only
helps with surrogates and that's about it.

Combining characters, various types of control code points
(e.g. joiners, bidirectional marks, breaks, non-breaks, annotations)
context sensitive casing, bidirectional marks and other such
features found in scripts cause very similar problems - often
much harder to solve, since they are not as easily identifiable
as surrogate high and low code points.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 07 2009)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/


More information about the Python-Dev mailing list