[Web-SIG] Converting REQUEST_URI to wsgi.script_name/wsgi.path_info

Graham Dumpleton graham.dumpleton at gmail.com
Mon Sep 28 09:34:35 CEST 2009


2009/9/28 Ian Bicking <ianb at colorstudy.com>:
> I tried implementing some code to convert REQUEST_URI (the raw request URL)
> and CGI-style SCRIPT_NAME/PATH_INFO into a raw script_name/path_info.
>   http://bitbucket.org/ianb/wsgi-peps/src/tip/request_uri.py (python 2)
>   http://bitbucket.org/ianb/wsgi-peps/src/tip/request_uri3.py (python 3)
> Admittedly the tests are not very complete, I just wasn't feeling creative
> about test cases.  In terms of performance this avoids being entirely brute
> force, but feels kind of complex.  I'm betting there's an entirely different
> approach which is faster.  But whatever.

Got an error:

 mod_wsgi (pid=4301): Exception occurred processing WSGI script
'/Users/grahamd/Testing/tests/wsgi20.wsgi'.
 Traceback (most recent call last):
   File "/Users/grahamd/Testing/tests/wsgi20.wsgi", line 80, in application
     environ['PATH_INFO'])
   File "/Users/grahamd/Testing/tests/wsgi20.wsgi", line 64, in
request_uri_to_path
     remove_segments = remove_segments - 1 -
qscript_name_parts[-1].lower().count('%2f')
 IndexError: list index out of range

This was an extreme corner case where Apache mod_rewrite was being
used to do stuff:

RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ /wsgi20.wsgi/$1 [QSA,PT,L]

and Apache was configured to allow encoded slashes. The input would have been:

REQUEST_URI: '/a%2fb/c/d'
SCRIPT_NAME: '/wsgi20.wsgi'
PATH_INFO: '/a/b/c/d'

That style of rewrite rule is quite often used with Apache, although
allowing encoded slashes isn't.

That SCRIPT_NAME needs to be adjusted is a known consideration with
this rewrite rule. Usually you would use wrapper around WSGI
application which does:

def _application(environ, start_response):
    # The original application.
    ...

import posixpath

def application(environ, start_response):
    # Wrapper to set SCRIPT_NAME to actual mount point.
    environ['SCRIPT_NAME'] = posixpath.dirname(environ['SCRIPT_NAME'])
    if environ['SCRIPT_NAME'] == '/':
        environ['SCRIPT_NAME'] = ''
    return _application(environ, start_response)

If that algorithm is used in WSGI adapter however, would never get the
opportunity to do that though as would already have failed before it
got called.

Graham


More information about the Web-SIG mailing list