[Python-Dev] utf8 issue

Guido van Rossum guido@python.org
Fri, 23 Aug 2002 17:05:27 -0400

This might beling on SF, except it's already been solved in Python
2.3, and I need guidance about what to do for Python 2.2.2.

In 2.2.1, a lone surrogate encoded into utf8 gives an utf8 string that
cannot be decode back.  In 2.3, this is fixed.  Should this be fixed
in 2.2.2 as well?

I'm asking because it caused problems with reading .pyc files: if
there's a Unicode literal containing a lone surrogate, reading the
.pyc file causes an exception:

UnicodeError: UTF-8 decoding error: unexpected code byte

It looks like revision 2.128 fixed this for 2.3, but that patch
doesn't cleanly apply to the 2.2 maintenance branch.  Can someone

--Guido van Rossum (home page: http://www.python.org/~guido/)