[I18n-sig] How does Python Unicode treat surrogates?
Martin v. Loewis
Sat, 23 Jun 2001 14:20:38 +0200
> About surrogate support in Python: the UTF-8 codec has full
> surrogate support for encodings and decoding
I think there are a number of bugs lying around here. For example,
>>> u" \ud800 ".encode("utf-8")
' \xa0\x80 '
give an error, since this is a lone low surrogate word?
Likewise, but somewhat more troubling, surrogates that straddle write
invocations are not processed properly.
whereas the correct answer would have been