[Python-Dev] PEP 383 update: utf8b is now the error handler
Stephen J. Turnbull
stephen at xemacs.org
Wed May 6 07:35:30 CEST 2009
Lino Mastrodomenico writes:
> 2009/5/5 Stephen J. Turnbull <stephen at xemacs.org>:
> > Third, it is not clear to me why non-decodable ASCII should be an
> > error.
>
> The PEP originally allowed the conversion to U+DCxx of bytes below 128
> that cannot be decoded by the encoding used, but this creates
> potential security problems.
>
> See: <http://mail.python.org/pipermail/python-dev/2009-April/089102.html>
Yeah, yeah, this is the same old same old from PEP 3131. Anything
that handles the various attacks based on ASCII-alike characters
should at least rule out invalid Unicode, too!
And where is this U+DC2F supposed to be coming from, anyway? The
user's *local* environment or the user's *local* filesystem! Codecs
not using 'utf8b' can't produce it, so the only other cases are chr()
and \u literals in the *local* process, or an already broken module in
your code. I really can't imagine that any sane programmer these days
would be using 'utf8b' on bytes received from the Internet!
Of course I can't prove that there's no vector for an exploit here (in
fact, I'm sure there is one with sufficiently careless handling of
input), but I think "consenting adults" covers the Shift JIS use case.
Make it an option, but it should be explicitly part of the PEP.
More information about the Python-Dev
mailing list