[Python-Dev] Re: Plan to remove Py_UNICODE APis except PEP 623.

July 2, 2020

      UCS-2 means units of 16 bits so it's limited to Unicode BMP: U+0000-U+FFFF.

UCS-4 means units of 32 bits and so gives access to the whole
(current) Unicode character set.

Do you mean UTF-16 and UTF-32? UTF-16 supports the whole Unicode
character set but uses the annoying surrogate pairs for characters
outside the BMP.*

UTF-32 is UCS-4 in practice.

Victor

Le jeu. 2 juil. 2020 à 15:08, Barry Scott <barry@barrys-emacs.org> a écrit :
...
On 30 Jun 2020, at 13:43, Emily Bowman <silverbacknet@gmail.com> wrote:
I completely agree with this, that UTF-8 has become the One True Encoding(tm), and UCS-2 and UTF-16 are hardly found anywhere outside of the Win32 API. Nearly all basic emoji can't be represented in UCS-2 wchar_t, let alone composite emoji.
I use UCS-32 in my extensions, but never persist UCS-32 for which I use UTF-8.
If you are calling WIN32 "unicode" APIs then you need UCS-16.
My plan with PyCXX is to replace Py_UNICODE with UCS-32.
I think all the UCS-32 APIs will still be present.
Once I add that support to PyCXX all my users should easily port to a non-Py_UNICODE world.
Barry
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/YIKT5XGP...
Code of Conduct: http://python.org/psf/codeofconduct/
-- 
Night gathers, and now my watch begins. It shall not end until my death.

[Python-Dev] Re: Plan to remove Py_UNICODE APis except PEP 623.

Victor Stinner