python-unicode doesn't support >65535 symbols?
rainerd at eldwood.com
Thu Nov 27 19:36:00 CET 2003
Andrew Clover wrote:
> gabor <gabor at z10n.net> wrote:
>> so text (which should be \U00010330),
>> was split to 2 16bit values (text and text).
> The default encoding for native Unicode strings in Python in UTF-16,
> which cannot hold the extended planes beyond 0xFFFF in a single
That's not quite right. UTF-16 encodes unicode characters as either single
16 bit values and pairs of 16 bit values. However, one character is still
Python makes the mistake of exposing the internal representation instead of
the logical value of unicode objects. This means that, aside from space
optimization, unicode objects have no advantage over UTF-8 encoded plain
strings for storing unicode text.
Rainer Deyke - rainerd at eldwood.com - http://eldwood.com
More information about the Python-list