python-unicode doesn't support >65535 symbols?
gabor at z10n.net
Sun Nov 30 23:30:40 CET 2003
On Thu, 2003-11-27 at 18:46, Andrew Clover wrote:
> gabor <gabor at z10n.net> wrote:
> > so text (which should be \U00010330),
> > was split to 2 16bit values (text and text).
> The default encoding for native Unicode strings in Python in UTF-16, which
> cannot hold the extended planes beyond 0xFFFF in a single character. Instead,
> it uses two 'surrogate' characters. Bit of a nasty hack, but that's what
> Unicode does and there's nothing can be done about it now.
does that mean that python when compiled in utf-16 mode, uses
then it should also correctly tell me that the length is 9, not 10,
don't you think?
as i see there are 2 possibilities:
1. python, when compiled for narrow-unicode, uses surrogates to encode
non-plane0 characters in utf16. if that's true, python has a bug,
because in my example text should be what i wrote, and length should
also work correctly.
2. python, when compiled for narrow-unicode, doesn't work with
characters outside plane0. if that's true, i would expect python to at
least tell me, throw an exception for example, if i try to decode for
example an utf8 string, with non-plane-0 characters.
what do you think?
More information about the Python-list