[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

4 Feb 2021

      On Tue, Feb 2, 2021 at 8:40 PM Inada Naoki  wrote:
...
On Tue, Feb 2, 2021 at 7:37 PM M.-A. Lemburg  wrote:
...
BTW: I don't understand this comment:
"They are inefficient on platforms wchar_t* is UTF-16. It is because
built-in codecs supports only UCS-1, UCS-2, and UCS-4 input."
Windows is one such platform. Java (indirectly) is another. They both
store UTF-16LE in those arrays and Python's codecs handle this just
fine.
I'm sorry about the section is not clear.
For example, if wchar_t* is UCS4, ucs4_utf8_encoder() can encode
wchar_t* into UTF-8.
But when wchar_t* is UTF-16, ucs2_utf8_encoder() can not handle
surrogate escape.
We need to use a temporary Unicode object. That is what "inefficient" means.
I will update the section more elaborate.
I updated the "Alternative Ideas" section of the PEP.
https://www.python.org/dev/peps/pep-0624/#alternative-ideas

They replaces `Py_UNICODE*` with `PyObject*`, `Py_UCS4*`, and `wchar_t*`.
I explicitly noted that some codecs can bypass temporary Unicode objects:

"""
UTF-8, UTF-16, UTF-32 encoders support Py_UCS4 internally. So
PyUnicode_EncodeUTF8(), PyUnicode_EncodeUTF16(), and
PyUnicode_EncodeUTF32() can avoid to create a temporary Unicode
object.
"""

-- 
Inada Naoki