[Web-SIG] WSGI for Python 3

Sat Jul 17 06:45:04 CEST 2010

On Fri, Jul 16, 2010 at 11:28 PM, Graham Dumpleton <
graham.dumpleton at gmail.com> wrote:

> > Nah, not nearly that hard:
> >
> > path_info =
> urllib.parse.unquote_to_bytes(environ['wsgi.raw_path_info']).decode('UTF-8')
> >
> > I don't see the problem?  If you want to distinguish %2f from /, then
> you'll do it slightly differently, like:
> >
> > path_parts = [
> >     urllib.parse.unquote_to_bytes(p).decode('UTF-8')
> >     for p in environ['wsgi.raw_path_info'].split('/')]
> >
> > This second recipe is impossible to do currently with WSGI.
> > So... before jumping to conclusions, what's the hard part with using
>
> Sorry, it is not that simple. The thing that everyone is ignoring is
> that SCRIPT_NAME and PATH_INFO are also normalized by the web server
> normally. That is, .. instances are removed. By passing the raw URL
> through to the application, you are now forcing every application to
> have to deal with that as well with the possibility of directory
> traversal attacks when people get it wrong and the URL is mapping
> somehow to file system resources. It is a huge can of worms which at
> the moment the web server deals with.
>

Well... at least to me "raw" only means "not URL decoded", so it doesn't
necessarily mean you can't clean up the request path.  I guess an attacker
could encode "." to make things harder.

Nevertheless, WSGI servers don't currently guarantee this cleaning.  I added
it to paste.httpserver, but I don't know one way or the other about any
other servers.  A quick test shows wsgiref does not clean paths.  So apps
shouldn't rely on a clean path.

-- 
Ian Bicking  |  http://blog.ianbicking.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20100716/ce973b8a/attachment-0001.html>