[Web-SIG] Repeating slashes in REQUEST_URI, SCRIPT_NAME and PATH_INFO.
Graham Dumpleton
grahamd at dscpl.com.au
Mon Jan 29 01:10:03 CET 2007
Another question on SCRIPT_NAME, PATH_INFO etc.
This time I am after information on what responsibilities an adapter for a
specific web server has in respect of removal and/or preservation of repeating
slashes in a request URI.
Take for example that a WSGI application is mounted at:
/wsgi/a
and that the request URI is:
REQUEST_URI: '/////wsgi//////a///b//c/d'
What should SCRIPT_NAME and PATH_INFO be set to? Should repeating slashes
be removed from SCRIPT_NAME so that it matches the normalised mount point,
or should the repeating slashes be preserved?
Thus should the above REQUEST_URI yield:
SCRIPT_NAME: '/wsgi/a'
PATH_INFO: '///b//c/d'
or perhaps:
SCRIPT_NAME: '/////wsgi//////a'
PATH_INFO: '///b//c/d'
Similarly should repeating slashes be left as is in the PATH_INFO?
I note that path_info_pop() in paste says:
>>> def call_it(script_name, path_info):
... env = {'SCRIPT_NAME': script_name, 'PATH_INFO': path_info}
... result = path_info_pop(env)
... print 'SCRIPT_NAME=%r; PATH_INFO=%r; returns=%r' % (
... env['SCRIPT_NAME'], env['PATH_INFO'], result)
>>> call_it('/foo', '/bar')
SCRIPT_NAME='/foo/bar'; PATH_INFO=''; returns='bar'
>>> call_it('/foo/bar', '')
SCRIPT_NAME='/foo/bar'; PATH_INFO=''; returns=None
>>> call_it('/foo/bar', '/')
SCRIPT_NAME='/foo/bar/'; PATH_INFO=''; returns=''
>>> call_it('', '/1/2/3')
SCRIPT_NAME='/1'; PATH_INFO='/2/3'; returns='1'
>>> call_it('', '//1/2')
SCRIPT_NAME='//1'; PATH_INFO='/2'; returns='1'
The last comment demonstrates the need to treat repeating slashes
as a single slash, but also seems to indicate that SCRIPT_NAME can have
repeating slashes in it. Running the code yields:
BEFORE: {'PATH_INFO': '///b//c/d', 'SCRIPT_NAME': '/////wsgi//////a'}
RESULT: 'b'
AFTER: {'PATH_INFO': '//c/d', 'SCRIPT_NAME': '/////wsgi//////a///b'}
In wsgiref.shift_path_info(), although it also treats repeating slashes as one,
it strips all the repeating slashes out.
BEFORE: {'PATH_INFO': '///b//c/d', 'SCRIPT_NAME': '/////wsgi//////a'}
RESULT: 'b'
AFTER: {'PATH_INFO': '/c/d', 'SCRIPT_NAME': '/wsgi/a/b'}
What is accepted convention for dealing with repeating slashes. Should
any web server adapter leave repeating slashes in both SCRIPT_NAME and
PATH_INFO, or should it at least normalise SCRIPT_NAME so that it matches
the designated mount point.
Thanks in advance.
Graham
More information about the Web-SIG
mailing list