[Python-3000] Handling of wide Unicode characters
Alexandre Vassalotti
alexandre at peadrop.com
Sat Jun 2 00:57:41 CEST 2007
Hi,
I was doing some testing on the new _string_io module, since I was
slightly skeptical on my handling of wide Unicode characters (32-bit
of length, instead of the usual 16-bit in UTF-16). So, I ran this
little test:
>>> s = _string_io.StringIO()
>>> s.write(u'')
>>> s.tell()
2
Like I expected, wide Unicode characters count for two. However, I was
surprised that Python treats them as two characters as well:
>>> len(u'')
2
>>> u''
u'\ud87e\udccd'
Is it a bug, or only an implementation choice?
Cheers,
-- Alexandre
More information about the Python-3000
mailing list