PEP 383: Non-decodable Bytes in System Character Interfaces

Wed Apr 22 08:17:31 EDT 2009

Martin v. Löwis wrote:
[snip]
> To convert non-decodable bytes, a new error handler "python-escape" is
> introduced, which decodes non-decodable bytes using into a private-use
> character U+F01xx, which is believed to not conflict with private-use
> characters that currently exist in Python codecs.
> 
> The error handler interface is extended to allow the encode error
> handler to return byte strings immediately, in addition to returning
> Unicode strings which then get encoded again.
> 
> If the locale's encoding is UTF-8, the file system encoding is set to
> a new encoding "utf-8b". The UTF-8b codec decodes non-decodable bytes
> (which must be >= 0x80) into half surrogate codes U+DC80..U+DCFF.
> 
If the byte stream happens to include a sequence which decodes to
U+F01xx, shouldn't that raise an exception?