[Cython] Py_UNICODE* string support
Nikita Nemkin
nikita at nemkin.ru
Sun Mar 3 09:25:33 CET 2013
On Sun, 03 Mar 2013 13:52:49 +0600, Stefan Behnel <stefan_ml at behnel.de>
wrote:
> Are you aware that Py_UNICODE is deprecated as of Py3.3?
>
> http://docs.python.org/3.4/c-api/unicode.html
>
> Your changes look a bit excessive for supporting something that's
> inefficient in recent Python versions and basically "dead".
Yes, I'm well aware of Py3.3 changes, but consider this:
1. _All_ system APIs on Windows, old, new and in-between, use UTF-16 in
the form of
zero-terminated 2-byte wchar_t* strings (on Windows Py_UNICODE is
_always_ aliased
to wchar_t specifically for this reason).
Whatever happens to Python internals, the need to interoperate with
UTF-16 based
platforms won't go away.
2. PY_UNICODE family of APIs remains the recommended way to interoperate
with Windows.
(So said the autor of PEP393 himself, I could find the relevant
discussion in python-dev.)
3. It is not _that_ inefficient. Actually, it has the same efficiency as
the UTF8-related APIs
(which have to be used on UTF-8 platforms like most *nix systems).
UTF8 allows sharing of ASCII buffer and has to convert USC2/UCS4,
Py_UNICODE shares UCS2 buffer (assuming narrow build) and has to
convert ASCII.
One alternative to Py_UNICODE that I have rejected is using Python's
wchar_t support.
It's practicaly useless for these reasons:
1) wchar_t APIs do not exist in Py2 and have to be implemented for
compatibility.
2) Implementing them brings in all the pain of nonportable wchar_t type
(on *nix systems in general), whereas it's the primary users would
target Windows,
where (pretty horrible) wchar_t portability workarounds would be dead
code.
3) wchar_t APIs do not offer a zero-copy option and do not manage the
memory for us.
The changes are some 50 lines of code, not counting the tests. I wouldn't
call that excessive.
And they mostly mirror existing code, no trickery of any kind.
Inbuilt Py_UNICODE* support also means that the users would be shielded
from 3.3 changes
and Cython is free to optimize sting handling in the future.
Believe me, nobody calls Py_UNICODE APIs because they want to, they just
have to.
Best regards,
Nikita Nemkin
More information about the cython-devel
mailing list