[Python-Dev] PEP 383 update: utf8b is now the error handler

Stephen J. Turnbull stephen at xemacs.org
Wed May 6 07:35:30 CEST 2009


Lino Mastrodomenico writes:
 > 2009/5/5 Stephen J. Turnbull <stephen at xemacs.org>:
 > > Third, it is not clear to me why non-decodable ASCII should be an
 > > error.
 > 
 > The PEP originally allowed the conversion to U+DCxx of bytes below 128
 > that cannot be decoded by the encoding used, but this creates
 > potential security problems.
 > 
 > See: <http://mail.python.org/pipermail/python-dev/2009-April/089102.html>

Yeah, yeah, this is the same old same old from PEP 3131.  Anything
that handles the various attacks based on ASCII-alike characters
should at least rule out invalid Unicode, too!

And where is this U+DC2F supposed to be coming from, anyway?  The
user's *local* environment or the user's *local* filesystem!  Codecs
not using 'utf8b' can't produce it, so the only other cases are chr()
and \u literals in the *local* process, or an already broken module in
your code.  I really can't imagine that any sane programmer these days
would be using 'utf8b' on bytes received from the Internet!

Of course I can't prove that there's no vector for an exploit here (in
fact, I'm sure there is one with sufficiently careless handling of
input), but I think "consenting adults" covers the Shift JIS use case.
Make it an option, but it should be explicitly part of the PEP.


More information about the Python-Dev mailing list