[Python-Dev] New Py_UNICODE doc
Shane Hathaway
shane at hathawaymix.org
Sat May 7 20:00:43 CEST 2005
Martin v. Löwis wrote:
> Shane Hathaway wrote:
>
>>I agree that UCS4 is needed. There is a balancing act here; UTF-16 is
>>widely used and takes less space, while UCS4 is easier to treat as an
>>array of characters. Maybe we can have both: unicode objects start with
>>an internal representation in UTF-16, but get promoted automatically to
>>UCS4 when you index or slice them. The difference will not be visible
>>to Python code. A compile-time switch will not be necessary. What do
>>you think?
>
>
> This breaks backwards compatibility with existing extension modules.
> Applications that do PyUnicode_AsUnicode get a Py_UNICODE*, and
> can use that to directly access the characters.
Py_UNICODE would always be 32 bits wide. PyUnicode_AsUnicode would
cause the unicode object to be promoted automatically. Extensions that
break as a result are technically broken already, aren't they? They're
not supposed to depend on the size of Py_UNICODE.
Shane
More information about the Python-Dev
mailing list