[Cython] Py_UNICODE* string support

Nikita Nemkin nikita at nemkin.ru
Sun Mar 3 09:25:33 CET 2013


On Sun, 03 Mar 2013 13:52:49 +0600, Stefan Behnel <stefan_ml at behnel.de>  
wrote:

> Are you aware that Py_UNICODE is deprecated as of Py3.3?
>
> http://docs.python.org/3.4/c-api/unicode.html
>
> Your changes look a bit excessive for supporting something that's
> inefficient in recent Python versions and basically "dead".

Yes, I'm well aware of Py3.3 changes, but consider this:

1. _All_ system APIs on Windows, old, new and in-between, use UTF-16 in  
the form of
    zero-terminated 2-byte wchar_t* strings (on Windows Py_UNICODE is  
_always_ aliased
    to wchar_t specifically for this reason).
    Whatever happens to Python internals, the need to interoperate with  
UTF-16 based
    platforms won't go away.

2. PY_UNICODE family of APIs remains the recommended way to interoperate  
with Windows.
    (So said the autor of PEP393 himself, I could find the relevant  
discussion in python-dev.)

3. It is not _that_ inefficient. Actually, it has the same efficiency as  
the UTF8-related APIs
    (which have to be used on UTF-8 platforms like most *nix systems).

    UTF8 allows sharing of ASCII buffer and has to convert USC2/UCS4,
    Py_UNICODE shares UCS2 buffer (assuming narrow build) and has to  
convert ASCII.


One alternative to Py_UNICODE that I have rejected is using Python's  
wchar_t support.
It's practicaly useless for these reasons:
1) wchar_t APIs do not exist in Py2 and have to be implemented for  
compatibility.
2) Implementing them brings in all the pain of nonportable wchar_t type
    (on *nix systems in general), whereas it's the primary users would  
target Windows,
    where (pretty horrible) wchar_t portability workarounds would be dead  
code.
3) wchar_t APIs do not offer a zero-copy option and do not manage the  
memory for us.


The changes are some 50 lines of code, not counting the tests. I wouldn't  
call that excessive.
And they mostly mirror existing code, no trickery of any kind.

Inbuilt Py_UNICODE* support also means that the users would be shielded  
 from 3.3 changes
and Cython is free to optimize sting handling in the future.
Believe me, nobody calls Py_UNICODE APIs because they want to, they just  
have to.


Best regards,
Nikita Nemkin


More information about the cython-devel mailing list