data:image/s3,"s3://crabby-images/995d7/995d70416bcfda8f101cf55b916416a856d884b1" alt=""
PEP: 9999 Title: Remove wstr from Unicode Author: Inada Naoki <songofacandy@gmail.com> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 18-Jun-2020 Python-Version: TBD Abstract ======== PEP 393 deprecated some unicode APIs, and introduced ``wchar_t *wstr``, and ``Py_ssize_t wstr_length`` in unicode implementation for backward compatibility of these deprecated APIs. [1]_ This PEP is planning removal of ``wstr``, and ``wstr_length`` with deprecated APIs using these members. Motivation ========== Memory usage ------------ ``str`` is one of the most used types in Python. Even most simple ASCII strings have a ``wstr`` member. It consumes 8 bytes on 64bit systems. Runtime overhead ---------------- To support legacy Unicode object created by ``PyUnicode_FromUnicode(NULL, length)``, many Unicode APIs has ``PyUnicode_READY()`` check. When we drop support of legacy unicode object, We can reduce this overhead too. Simplicity ---------- Support of legacy Unicode object makes Unicode implementation complex. Until we drop legacy Unicode object, it is very hard to try other Unicode implementation like UTF-8 based implementation in PyPy. Specification ============= Affected APIs -------------- From the Unicode implementation, ``wstr`` and ``wstr_length`` members are removed. Macros and functions to be removed: * PyUnicode_GET_SIZE * PyUnicode_GET_DATA_SIZE * Py_UNICODE_WSTR_LENGTH * PyUnicode_AS_UNICODE * PyUnicode_AS_DATA * PyUnicode_AsUnicode * PyUnicode_AsUnicodeAndSize Behaviors to be removed: * PyUnicode_FromUnicode -- ``PyUnicode_FromUnicode(NULL, size)`` where ``size > 0`` cause RuntimeError instead of creating legacy Unicode object. While this API is deprecated by PEP 393, this API will be kept when ``wstr`` is removed. This API will be removed later. * PyUnicode_FromStringAndSize -- Like PyUnicode_FromUnicode, ``PyUnicode_FromStringAndSize(NULL, size)`` cause RuntimeError instead of creating legacy unicode object. * PyArg_ParseTuple, PyArg_ParseTupleAndKeywords -- 'u', 'u#', 'Z', and 'Z#' format will be removed. Deprecation ----------- All APIs to be removed should have compiler deprecation warning (e.g. `Py_DEPRECATED(3.3)`) from Python 3.9. [2]_ All APIs to be changed should raise DeprecationWarning for behavior to be removed. Note that ``PyUnicode_FromUnicode`` has both of compiler deprecation warning and runtime DeprecationWarning. [3]_, [4]_. Plan ----- All deprecations will be implemented in Python 3.10. Some deprecations will be backported in Python 3.9. Actual removal will happen in Python 3.12. Alternative Ideas ================= Advanced Schedule ----------------- Backport warnings in 3.9, and do the removal in early development phase in Python 3.11. If many third packages are broken by this change, we will revert the change and back to the regular schedule. Pros: There is a chance to remove ``wstr`` in Python 3.11. Even if we need to revert it, third party maintainers can have more time to prepare the removal and we can get feedback from the community early. Cons: Adding warnings in beta period will make some confusion. Note that we need to avoid the warning from CPython core and stdlib. Use hashtable to store wstr --------------------------- Store the ``wstr`` in a hashtable, instead of Unicode structure. Pros: We can save memory usage even from Python 3.10. We can have more longer timeline to remove the ``wstr``. Cons: This implementation will increase the complexity of Unicode implementation. References ========== A collection of URLs used as references through the PEP. .. [1] PEP 393 -- Flexible String Representation (https://www.python.org/dev/peps/pep-0393/) .. [2] GH-20878 -- Add Py_DEPRECATED to deprecated unicode APIs (https://github.com/python/cpython/pull/20878) .. [3] GH-20933 -- Raise DeprecationWarning when creating legacy Unicode (https://github.com/python/cpython/pull/20933) .. [4] GH-20927 -- Raise DeprecationWarning for getargs with 'u', 'Z' #20927 (https://github.com/python/cpython/pull/20927) Copyright ========= This document has been placed in the public domain. -- Inada Naoki <songofacandy@gmail.com>