[Python-Dev] Security implications of pep 383
"Martin v. Löwis"
martin at v.loewis.de
Tue Mar 29 23:17:32 CEST 2011
> '\N{LATIN SMALL LETTER O}\N{COMBINING DIAERESIS}' != '\N{LATIN SMALL
> LETTER O WITH DIAERESIS}'
>
> I guess the filesystem shouldn't treat these as the same (even though
> they are), but what if some webservice does? I suspect you should
> normalize both strings before comparing them in any blacklist, and
> what happens with surrogates when you normalize?
I think the whole blacklist example is artificial. The string in the
blacklist is actually a Chinese "hello" greeting, so it surely isn't
the string being blacklisted. For proper blacklisting, you would likely
use substring searches, case-insensitivity, transliterations, and
perhaps even regular expressions and word stemming. If you consider all
these things, proper or alternative encodings of the same text are just
another issue to consider.
Regards,
Martin
More information about the Python-Dev
mailing list