Lino Mastrodomenico writes:
2009/5/5 Stephen J. Turnbull
: Third, it is not clear to me why non-decodable ASCII should be an error.
The PEP originally allowed the conversion to U+DCxx of bytes below 128 that cannot be decoded by the encoding used, but this creates potential security problems.
See: http://mail.python.org/pipermail/python-dev/2009-April/089102.html
Yeah, yeah, this is the same old same old from PEP 3131. Anything that handles the various attacks based on ASCII-alike characters should at least rule out invalid Unicode, too! And where is this U+DC2F supposed to be coming from, anyway? The user's *local* environment or the user's *local* filesystem! Codecs not using 'utf8b' can't produce it, so the only other cases are chr() and \u literals in the *local* process, or an already broken module in your code. I really can't imagine that any sane programmer these days would be using 'utf8b' on bytes received from the Internet! Of course I can't prove that there's no vector for an exploit here (in fact, I'm sure there is one with sufficiently careless handling of input), but I think "consenting adults" covers the Shift JIS use case. Make it an option, but it should be explicitly part of the PEP.