
On Tue, Feb 2, 2021 at 11:47 PM Inada Naoki <songofacandy@gmail.com> wrote:
So if we support add UTF-16 support to ucs2_utf8_encoder(), it means we need to add code and maintain only for PyUnicode_EncodeUTF8 (encode from wchar_t* into char*).
I don't think it is a good deal. As described in the PEP, encoder APIs are used very rarely. We must not add any maintainece costs for them.
I fixed tons of bugs related in Python 2.7 and Python 3 codecs before PEP 393 (compact strings) to handle properly 16-bit wchar_t: to handle properly surrogate characters. The implementation was complex and slow. I would prefer to not move backwards to that :-( If you are curious, look into PyUnicode_FromWideChar() implementation, search for find_maxchar_surrogates(), to have an idea of the cost of handling UTF-16 surrogate pairs. For a full codec, it's way more complex, painful to write and to maintain. I'm happy that we were able to remove that thanks to the PEP 393! Victor -- Night gathers, and now my watch begins. It shall not end until my death.