On 07.09.2018 10:22, Victor Stinner wrote:
I'm in discussion with PyPy developers, and they reported different APIs which cause them troubles: (...)
Le ven. 7 sept. 2018 à 10:33, M.-A. Lemburg firstname.lastname@example.org a écrit :
I'm -1 on removing the PyUnicode APIs. We deliberately created a useful and very complete C API for Unicode.
The fact that PyPy chose to use a different internal representation is not a good reason to remove APIs and have CPython extension take the hit as a result. It would be better for PyPy rethink the internal representation or create a shim API which translates between the two worlds.
Note that UTF-8 is not a good internal representation for Unicode if you want fast indexing and slicing. This is why we are using fixed code units to represent the Unicode strings.
The PyUnicode C API is not only an issue for PyPy, it's also an issue for CPython. When the PEP 393 has been implemented, suddly, most of the PyUnicode API has been directly deprecated: all functions using the now legacy Py_UNICODE* type...
Python 3.7 still has to support both the legacy Py_UNICODE* API and the new "compact string" API. It makes the CPython code base way more complex that it should be: any function accepting a string is supposed to call PyUnicode_Ready() and handle error properly. I would prefer to be able to remove the legacy PyUnicodeObject type, to only use compact strings everywhere.
Let me elaborate what are good and bad functions for PyUnicode.
Example of bad APIs: