[Python-Dev] PEP 383 update: utf8b is now the error handler

Wed May 6 09:53:33 CEST 2009

>  > > Second, I suggest "surrogate-replace" as the name of the error handler
>  > > rather than "utf8b".
>  > 
>  > I think this is bike-shedding.
> 
> I don't personally care (I already was aware of UTF-8B), but there are
> plenty of others who do. 

I think it is a fairly bad name, because it is easy to confuse it with
the "surrogates" error handler (unless you suggest to rename that also).

> You have to fix the existing uses of
> the obsolete "python-escape", anyway.

Indeed - but only in the PEP. In the implementation, it's already utf8b
throughout. Now it is also in the PEP; thanks for pointing that out.

>  > It's a security risk. If U+DCXX would map to \xXX, then somebody could
>  > embed U+DC2E U+DC2E U+DC2F into a character string; even if this gets
>  > sanitized, nobody would expect that this will actually access ../
> 
> The odds that anybody will actually take notice of U+002E U+002E
> U+002F in a string are sufficiently small that any number of exploits
> have already been based on it.  I agree that there is some additional
> risk from this if people make the check for "../" before they prepend
> "\ucd2e\udc2e\udc2f", but I think that risk is very small compared to
> the pain of having a error handler whose raison d'etre is to not raise
> exceptions go ahead and raise them anyway.

The problem is that functions like normpath will recognize ../, and
that applications rely on them for file name sanitation. If they could
be tricked into writing outside of their target folders, this would
be a huge security risk.

OTOH, I don't care breaking applications on misconfigured systems.
People using SJIS as their locale encodings have bigger problems
than Python raising exceptions.

> See also my reply to Lino Mastrodomenico.

URL?

> But you're writing the PEP, so this battle will have to be deferred.
> Eventually Python will have to take a stand on Unicode conformance,
> but it's not urgent yet.

I think it's always applications that are conforming or not, rather
than libraries. Libraries should allow to write conforming applications.
They may refuse to write certain non-conforming applications (although
users then replace the library with one that does allow them to do
what they want). Libraries can never enforce that applications conform
to some standard.

> Sorry!  I suggest substituting the paragraph above for the paragraph
> which begins "The encode error handler interface presentlyrequires..."
> at line 129.

Ah, ok. This was Glen Linderman's text before - now it's yours :-)

> I think I forgot to do this before:  "I hereby dedicate all text
> I suggest for inclusion in the PEP to the public domain."

:-)

Martin