python-unicode doesn't support >65535 symbols?
gabor at z10n.net
Thu Nov 27 12:25:07 CET 2003
today i made some tests...
i tested some unicode symbols, that are above the 16bit limit
i played around with iconv and so on,
so at the end i created an utf8 encoded text file,
with the text "Marrakesh",
where the second 'a' wes replaced with
(i simply wrote the text file "Marrakesh", used iconv to convert it to
utf32big-endian, and replaced the character in hexedit, then converted
with iconv back to utf8).
now i started python:
>>> data = open("utf8.txt").read()
>>> text = data.decode("utf8")
so far it seemed ok.
then i did:
this is wrong. the length should be 9.
so text (which should be \U00010330),
was split to 2 16bit values (text and text).
i don't understand.
if tthe representation of 'text' is correct, why is the length wrong?
btw. i understand that it's a very exotic character, but i tried for
example kwrite and gedit, and none of the was able to display the
symbol, but both successfully identified it as ONE unknown symbol.
More information about the Python-list