[Python-Dev] Bytes path related questions for Guido
Walter Dörwald
walter at livinglogic.de
Fri Aug 29 12:09:54 CEST 2014
On 28 Aug 2014, at 19:54, Glenn Linderman wrote:
> On 8/28/2014 10:41 AM, R. David Murray wrote:
>> On Thu, 28 Aug 2014 10:15:40 -0700, Glenn Linderman
>> <v+python at g.nevcal.com> wrote:
>> [...]
>> Also for
>> cases where the data stream is *supposed* to be in a given encoding,
>> but
>> contains undecodable bytes. Showing the stuff that incorrectly
>> decodes
>> as whatever it decodes to is generally what you want in that case.
> Sure, people can learn to recognize mojibake for what it is, and maybe
> even learn to recognize it for what it was intended to be, in limited
> domains. But suppressing/replacing the surrogates doesn't help with
> that... would it not be better to replace the surrogates with an
> escape sequence that shows the original, undecodable, byte value?
> Like \xNN ?
For that we could extend the "backslashreplace" codec error callback, so
that it can be used for decoding too, not just for encoding. I.e.
b"a\xffb".decode("utf-8", "backslashreplace")
would return
"a\\xffb"
Servus,
Walter
More information about the Python-Dev
mailing list