[Python-Dev] Security implications of pep 383

Lennart Regebro regebro at gmail.com
Tue Mar 29 22:45:43 CEST 2011


On Tue, Mar 29, 2011 at 22:40, Lennart Regebro <regebro at gmail.com> wrote:
> The lesson here seems to be "if you have to use blacklists, and you
> use unicode strings for those blacklists, also make sure the string
> you compare with doesn't have surrogates".
>

For that matter, what happens with combining characters?

'\N{LATIN SMALL LETTER O}\N{COMBINING DIAERESIS}' != '\N{LATIN SMALL
LETTER O WITH DIAERESIS}'

I guess the filesystem shouldn't treat these as the same (even though
they are), but what if some webservice does? I suspect you should
normalize both strings before comparing them in any blacklist, and
what happens with surrogates when you normalize?

//Lennart


More information about the Python-Dev mailing list