[Python-Dev] Security implications of pep 383

Tue Mar 29 20:53:39 CEST 2011

On Tue, 29 Mar 2011 19:23:25 +0100
Michael Foord <michael at voidspace.org.uk> wrote:
> Hey all,
> 
> Not sure how real the security risk is here:
> 
>      http://blog.omega-prime.co.uk/?p=107
> 
> Basically  he is saying that if you store a list of blacklisted files 
> with names encoded in big-5 (or some other non-utf8 compatible encoding) 
> if those names are passed at the command line, or otherwise read in and 
> decoded from an assumed-utf8 source with surrogate escaping, the 
> surrogate escape decoded names will not match the properly decoded 
> blacklisted names.

This has nothing to do specifically with PEP 383. The same issues can
arise without PEP 383 if you replace utf-8 with, say, latin-1 in the
above example.

Basically, what this says is if you are decoding the same bytestring
using two different encodings, you get two different unicode strings
(which therefore compare unequal).

Another observation is that, in the script which is presented, if the
user were to extract a filename from the blacklist and call open() on
it, they wouldn't actually open one of the blacklisted files, since the
encoded representation using the filesystem encoding (e.g. utf-8 or
latin-1) would be different from the Big-5 representation.

A solution would be to open the blacklist file in binary mode and call
os.fsdecode() on the result.

Regards

Antoine.