[Python-Dev] utf8 issue
Guido van Rossum
guido@python.org
Fri, 23 Aug 2002 17:05:27 -0400
This might beling on SF, except it's already been solved in Python
2.3, and I need guidance about what to do for Python 2.2.2.
In 2.2.1, a lone surrogate encoded into utf8 gives an utf8 string that
cannot be decode back. In 2.3, this is fixed. Should this be fixed
in 2.2.2 as well?
I'm asking because it caused problems with reading .pyc files: if
there's a Unicode literal containing a lone surrogate, reading the
.pyc file causes an exception:
UnicodeError: UTF-8 decoding error: unexpected code byte
It looks like revision 2.128 fixed this for 2.3, but that patch
doesn't cleanly apply to the 2.2 maintenance branch. Can someone
help?
--Guido van Rossum (home page: http://www.python.org/~guido/)