[Python-Dev] Security implications of pep 383

Toshio Kuratomi a.badger at gmail.com
Wed Mar 30 07:54:39 CEST 2011


On Tue, Mar 29, 2011 at 10:55:47PM +0200, Victor Stinner wrote:
> Le mardi 29 mars 2011 à 22:40 +0200, Lennart Regebro a écrit :
> > The lesson here seems to be "if you have to use blacklists, and you
> > use unicode strings for those blacklists, also make sure the string
> > you compare with doesn't have surrogates".
> 
> No. '\u4f60\u597d'.encode('big5').decode('latin1') gives '§A¦n' which
> doesn't contain any surrogate character.
> 
> The lesson is: if you compare Unicode filenames on UNIX, make sure that
> your system is correctly configured (the locale encoding must be the
> filesystem encoding).
>
You're both wrong :-)

Lennart is missing that you just need to use the same encoding
+ surrogateescape (or stick with bytes) for decoding the byte strings that
you are comparing.

You're missing that on UNIX there is no filesystem encoding so the idea of
locale and filesystem encoding matching is false (and unnecessary -- the
encodings that you use within python just need to be the same.  They don't
even need to match up to the reality of what's used on the filesystem or the
user's locale.)

-Toshio
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110329/1abbe888/attachment-0001.pgp>


More information about the Python-Dev mailing list