[Python-Dev] Python-3.0, unicode, and os.environ
v+python at g.nevcal.com
Mon Dec 8 03:17:04 CET 2008
On approximately 12/7/2008 10:56 AM, came the following characters from
the keyboard of Adam Olsen:
> You might receive a UTF-8 encoded file name from a malicious user,
> check if it contains something dangerous (like
> "../../../../../etc/password"), then decode it. If your decoder isn't
> compliant (ie doesn't check for overly long sequences) then a
> b'\xC0\xAF' gets translated into u'/', bypassing your previous check.
You might indeed.
But if you are interested in checking for security issues, shouldn't you
_first_ decode into some canonical form, specifying what sorts of
Unicode strictness (such as overlong sequences) to check for during the
decode process, and once the string is in canonical form, _then_ do
checks for various attacks, such as the ../ sequence you mention?
And with that order of operation, even if you don't reject overlong
sequences, you have canonized them, and can recognize the resulting
characters as good or bad.
Glenn -- http://nevcal.com/
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
More information about the Python-Dev