[Web-SIG] WSGI for Python 3
Ian Bicking
ianb at colorstudy.com
Sat Jul 17 06:45:04 CEST 2010
On Fri, Jul 16, 2010 at 11:28 PM, Graham Dumpleton <
graham.dumpleton at gmail.com> wrote:
> > Nah, not nearly that hard:
> >
> > path_info =
> urllib.parse.unquote_to_bytes(environ['wsgi.raw_path_info']).decode('UTF-8')
> >
> > I don't see the problem? If you want to distinguish %2f from /, then
> you'll do it slightly differently, like:
> >
> > path_parts = [
> > urllib.parse.unquote_to_bytes(p).decode('UTF-8')
> > for p in environ['wsgi.raw_path_info'].split('/')]
> >
> > This second recipe is impossible to do currently with WSGI.
> > So... before jumping to conclusions, what's the hard part with using
>
> Sorry, it is not that simple. The thing that everyone is ignoring is
> that SCRIPT_NAME and PATH_INFO are also normalized by the web server
> normally. That is, .. instances are removed. By passing the raw URL
> through to the application, you are now forcing every application to
> have to deal with that as well with the possibility of directory
> traversal attacks when people get it wrong and the URL is mapping
> somehow to file system resources. It is a huge can of worms which at
> the moment the web server deals with.
>
Well... at least to me "raw" only means "not URL decoded", so it doesn't
necessarily mean you can't clean up the request path. I guess an attacker
could encode "." to make things harder.
Nevertheless, WSGI servers don't currently guarantee this cleaning. I added
it to paste.httpserver, but I don't know one way or the other about any
other servers. A quick test shows wsgiref does not clean paths. So apps
shouldn't rely on a clean path.
--
Ian Bicking | http://blog.ianbicking.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20100716/ce973b8a/attachment-0001.html>
More information about the Web-SIG
mailing list