Hi INADA-san, First of all, thanks for writing down a PEP! Le jeu. 18 juin 2020 à 11:42, Inada Naoki <songofacandy@gmail.com> a écrit :
To support legacy Unicode object created by ``PyUnicode_FromUnicode(NULL, length)``, many Unicode APIs has ``PyUnicode_READY()`` check.
I don't see PyUnicode_READY() removal in the specification section. When can we remove these calls and the function itself?
Support of legacy Unicode object makes Unicode implementation complex. Until we drop legacy Unicode object, it is very hard to try other Unicode implementation like UTF-8 based implementation in PyPy.
I'm not sure if it should be in the scope of the PEP or not, but there are also other C API functions which are too close to the PEP 393 concrete implementation. For example, I'm not sure that PyUnicode_MAX_CHAR_VALUE(str) would be relevant/efficient if Python str is reimplemented to use UTF-8 internally. Should we deprecate it as well? Do you think that it should be addressed in a separated PEP? In fact, a large part of the Unicode C API is based on the current implementation of the Python str type. For example, I'm not sure that PyUnicode_New(size, max_char) would still make sense if we change the code to store strings as UTF-8 internally. In an ideal world, I would prefer to have a "string builder" API, like the current _PyUnicodeWriter C API, to create a string, and only never allow to modify a string in-place. CPython "almost" immutable str "if reference count is equal to 1" has corner cases and can be misused. But again, I don't think that it should be part of this PEP :-) Sorry for being off-topic ;-)
Specification =============
Affected APIs --------------
From the Unicode implementation, ``wstr`` and ``wstr_length`` members are removed.
Macros and functions to be removed:
* PyUnicode_GET_SIZE * PyUnicode_GET_DATA_SIZE * Py_UNICODE_WSTR_LENGTH * PyUnicode_AS_UNICODE * PyUnicode_AS_DATA * PyUnicode_AsUnicode * PyUnicode_AsUnicodeAndSize
Which ones are already deprecated?
Behaviors to be removed:
* PyUnicode_FromUnicode -- ``PyUnicode_FromUnicode(NULL, size)`` where ``size > 0`` cause RuntimeError instead of creating legacy Unicode object. While this API is deprecated by PEP 393, this API will be kept when ``wstr`` is removed. This API will be removed later.
I'm not sure that it's relevant to keep PyUnicode_FromUnicode() whereas PyUnicode_FromWideChar() has a clean API (use wchar_t*, not Py_UNICODE*). I also suggest to disallow PyUnicode_FromUnicode(NULL, 0) as well. By the way, when can we finally remove the Py_UNICODE type? I would prefer to remove Py_UNICODE and PyUnicode_FromUnicode().
* PyUnicode_FromStringAndSize -- Like PyUnicode_FromUnicode, ``PyUnicode_FromStringAndSize(NULL, size)`` cause RuntimeError instead of creating legacy unicode object.
All APIs to be changed should raise DeprecationWarning for behavior to be removed. Note that ``PyUnicode_FromUnicode`` has both of compiler deprecation warning and runtime DeprecationWarning. [3]_, [4]_.
Every function scheduled for removal? Even PyUnicode_GET_SIZE()? I'm not sure that C extensions are prepared for PyUnicode_GET_SIZE() raising an exception when using -Werror.
All deprecations will be implemented in Python 3.10. Some deprecations will be backported in Python 3.9.
Actual removal will happen in Python 3.12.
Many functions are already declared with Py_DEPRECATED() for a long time. Would it make sense to remove these functions earlier? Victor -- Night gathers, and now my watch begins. It shall not end until my death.