
UCS-2 means units of 16 bits so it's limited to Unicode BMP: U+0000-U+FFFF. UCS-4 means units of 32 bits and so gives access to the whole (current) Unicode character set. Do you mean UTF-16 and UTF-32? UTF-16 supports the whole Unicode character set but uses the annoying surrogate pairs for characters outside the BMP.* UTF-32 is UCS-4 in practice. Victor Le jeu. 2 juil. 2020 à 15:08, Barry Scott <barry@barrys-emacs.org> a écrit :
On 30 Jun 2020, at 13:43, Emily Bowman <silverbacknet@gmail.com> wrote:
I completely agree with this, that UTF-8 has become the One True Encoding(tm), and UCS-2 and UTF-16 are hardly found anywhere outside of the Win32 API. Nearly all basic emoji can't be represented in UCS-2 wchar_t, let alone composite emoji.
I use UCS-32 in my extensions, but never persist UCS-32 for which I use UTF-8.
If you are calling WIN32 "unicode" APIs then you need UCS-16.
My plan with PyCXX is to replace Py_UNICODE with UCS-32. I think all the UCS-32 APIs will still be present.
Once I add that support to PyCXX all my users should easily port to a non-Py_UNICODE world.
Barry
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/YIKT5XGP... Code of Conduct: http://python.org/psf/codeofconduct/
-- Night gathers, and now my watch begins. It shall not end until my death.