[Python-Dev] PEP 383 update: utf8b is now the error handler

Michael Urman murman at gmail.com
Thu May 7 16:31:11 CEST 2009


On Thu, May 7, 2009 at 01:16, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> I'm still at a loss what name to give it, though. I understand that
> I have to rename both error handlers, but I'm uncertain what I should
> rename them to. So proposals that rename only one of them aren't
> that helpful. It would be helpful if people would indicate support
> for Antoine's proposal.

Part of the problem is they both allow byte sequences to decode to
invalid Unicode strings, and in particular they both affect the same
byte subsequences, and that brought us to the crossroads where we
wanted to name both of them "surrogates". So I'll offer a few more
colors, and try to get out of the way of choosing between them or the
other proposed ones. :)

I haven't come up with anything I like better than errors="lenient"
for the old utf8 behavior handler; would errors="nonvalidating" be
correct? It still seems to me that a new codec, perhaps
"utf8-lenient", reads better.

For the utf8b error handler, I could see any of errors="roundtrip",
errors="roundtripreplace", errors="tosurrogate",
errors="surrogatereplace", errors="surrogateescape",
errors="binaryreplace", errors="binaryescape". This includes Antoine's
proposal (sans hyphen).

-- 
Michael Urman


More information about the Python-Dev mailing list