
On Tue, Feb 2, 2021 at 7:37 PM M.-A. Lemburg <mal@egenix.com> wrote:
That would keep extensions working after a recompile, since Py_UNICODE is already a typedef to wchar_t.
That idea is written in the PEP already. https://www.python.org/dev/peps/pep-0624/#replace-py-unicode-with-wchar-t
Right and I think this is a more workable approach than removing APIs.
BTW: I don't understand this comment: "They are inefficient on platforms wchar_t* is UTF-16. It is because built-in codecs supports only UCS-1, UCS-2, and UCS-4 input."
Windows is one such platform. Java (indirectly) is another. They both store UTF-16LE in those arrays and Python's codecs handle this just fine.
I'm sorry about the section is not clear. For example, if wchar_t* is UCS4, ucs4_utf8_encoder() can encode wchar_t* into UTF-8. But when wchar_t* is UTF-16, ucs2_utf8_encoder() can not handle surrogate escape. We need to use a temporary Unicode object. That is what "inefficient" means. I will update the section more elaborate. Regards, -- Inada Naoki <songofacandy@gmail.com>