[Python-Dev] Re: Plan to remove Py_UNICODE APis except PEP 623.

July 2, 2020

      On 7/2/20 10:19 AM, Victor Stinner wrote:
...
Do you mean UTF-16 and UTF-32? UTF-16 supports the whole Unicode
character set but uses the annoying surrogate pairs for characters
outside the BMP.*
Minor quibble, UTF-16 handles all of the CURRENTLY defined Unicode set,
and there is a currently a promise not to extend Unicode past that, but
at some point they may need to break that promise.

UTF-8, as previously defined (and could be again) easily handles
U+00000000 to U+7FFFFFFF.

UTF-16 can handle via the surrogate pairs U+00000000 to U+0010FFFF and
stop there, To extend past that would require some form of heroics,
which is the reason that U+0010FFFF is currently defined as the highest
possible code point, as to allow a higher value breaks UTF-16, and there
currently isn't a desire to do so. At some point in the distant future,
we may run out of 'valid' code points, and this promise will need to be
broken.

UTF-16 grew out of a need to fix what has become UCS-2, which is the
encoding used for earlier Unicode standards, before the need for code
points above U+0000FFFF (now the BMP) was seen.

-- 
Richard Damon