[Python-Dev] Python-3.0, unicode, and os.environ

Glenn Linderman v+python at g.nevcal.com
Mon Dec 8 03:17:04 CET 2008

On approximately 12/7/2008 10:56 AM, came the following characters from 
the keyboard of Adam Olsen:

> You might receive a UTF-8 encoded file name from a malicious user,
> check if it contains something dangerous (like
> "../../../../../etc/password"), then decode it.  If your decoder isn't
> compliant (ie doesn't check for overly long sequences) then a
> b'\xC0\xAF' gets translated into u'/', bypassing your previous check.

You might indeed.

But if you are interested in checking for security issues, shouldn't you 
  _first_ decode into some canonical form, specifying what sorts of 
Unicode strictness (such as overlong sequences) to check for during the 
decode process, and once the string is in canonical form, _then_ do 
checks for various attacks, such as the ../ sequence you mention?

And with that order of operation, even if you don't reject overlong 
sequences, you have canonized them, and can recognize the resulting 
characters as good or bad.

Glenn -- http://nevcal.com/
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

More information about the Python-Dev mailing list