[Python-Dev] Security implications of pep 383

Toshio Kuratomi a.badger at gmail.com
Tue Mar 29 21:10:58 CEST 2011


On Tue, Mar 29, 2011 at 07:23:25PM +0100, Michael Foord wrote:
> Hey all,
> 
> Not sure how real the security risk is here:
> 
>     http://blog.omega-prime.co.uk/?p=107
> 
> Basically  he is saying that if you store a list of blacklisted files
> with names encoded in big-5 (or some other non-utf8 compatible
> encoding) if those names are passed at the command line, or otherwise
> read in and decoded from an assumed-utf8 source with surrogate
> escaping, the surrogate escape decoded names will not match the
> properly decoded blacklisted names.
> 
The example is correct.  The security risk is real.  However, there's a flaw
in the program and whether the question of whether there's also a flaw in
python is not so certain.

Here's the line I'd say is contentious::
  blacklist = open("blacklist.big5", encoding='big5').read().split()

The blacklist file contains a list of filenames.  However, this code treats
it as a list of strings.  This a logic error in the program, and he should
really be doing this::
  blacklist = open("blacklist.big5", 'rb').read().split()

Then, when comparing it against the values of sys.argv, either sys.argv gets
converted into bytes (using the system locale since that's what was used to
encode to unicode) or the items in blacklist get converted to unicode with
surrogateescape.

The possible flaw in python is this:  Code like the blog poster wrote passes
python3 without an error or a warning.  This gives the programmer no
feedback that they're doing something wrong until it actually bites them in
the foot in deployed code.

-Toshio
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110329/8bbdc9e1/attachment.pgp>


More information about the Python-Dev mailing list