[Python-Dev] New Py_UNICODE doc

Shane Hathaway shane at hathawaymix.org
Sat May 7 20:00:43 CEST 2005


Martin v. Löwis wrote:
> Shane Hathaway wrote:
> 
>>I agree that UCS4 is needed.  There is a balancing act here; UTF-16 is
>>widely used and takes less space, while UCS4 is easier to treat as an
>>array of characters.  Maybe we can have both: unicode objects start with
>>an internal representation in UTF-16, but get promoted automatically to
>>UCS4 when you index or slice them.  The difference will not be visible
>>to Python code.  A compile-time switch will not be necessary.  What do
>>you think?
> 
> 
> This breaks backwards compatibility with existing extension modules.
> Applications that do PyUnicode_AsUnicode get a Py_UNICODE*, and
> can use that to directly access the characters.

Py_UNICODE would always be 32 bits wide.  PyUnicode_AsUnicode would
cause the unicode object to be promoted automatically.  Extensions that
break as a result are technically broken already, aren't they?  They're
not supposed to depend on the size of Py_UNICODE.

Shane


More information about the Python-Dev mailing list