Is there really a default source encoding?
"Martin v. Löwis"
martin at v.loewis.de
Sat Jan 25 03:09:03 CET 2003
Brian Quinlan wrote:
> What if, in the future, there are close to 2^32 Unicode characters.
> UTF-32 might require only 4 bytes to store a character while UTF-16
> would require 6. Or is that impossible?
That's impossible. ISO and the Unicode consortium have restricted
Unicode to 17 planes (roughly 2^21 characters) (formally, all the other
UCS-4 code points are reserved, and ISO has unassigned the
previously-assigned private-use group).
Even if those reserved characters would ever be assigned, UTF-16 could
not encode them. The way surrogate pairs work, there is just no
representation for characters in plane 18 and beyond.
More information about the Python-list