[Web-SIG] Repeating slashes in REQUEST_URI, SCRIPT_NAME and PATH_INFO.

Graham Dumpleton grahamd at dscpl.com.au
Mon Jan 29 01:10:03 CET 2007


Another question on SCRIPT_NAME, PATH_INFO etc.

This time I am after information on what responsibilities an adapter for a
specific web server has in respect of removal and/or preservation of repeating
slashes in a request URI.

Take for example that a WSGI application is mounted at:

  /wsgi/a

and that the request URI is:

  REQUEST_URI: '/////wsgi//////a///b//c/d'

What should SCRIPT_NAME and PATH_INFO be set to? Should repeating slashes
be removed from SCRIPT_NAME so that it matches the normalised mount point,
or should the repeating slashes be preserved?

Thus should the above REQUEST_URI yield:

  SCRIPT_NAME: '/wsgi/a'
  PATH_INFO: '///b//c/d'

or perhaps:

  SCRIPT_NAME: '/////wsgi//////a'
  PATH_INFO: '///b//c/d'

Similarly should repeating slashes be left as is in the PATH_INFO?

I note that path_info_pop() in paste says:

        >>> def call_it(script_name, path_info):
        ...     env = {'SCRIPT_NAME': script_name, 'PATH_INFO': path_info}
        ...     result = path_info_pop(env)
        ...     print 'SCRIPT_NAME=%r; PATH_INFO=%r; returns=%r' % (
        ...         env['SCRIPT_NAME'], env['PATH_INFO'], result)
        >>> call_it('/foo', '/bar')
        SCRIPT_NAME='/foo/bar'; PATH_INFO=''; returns='bar'
        >>> call_it('/foo/bar', '')
        SCRIPT_NAME='/foo/bar'; PATH_INFO=''; returns=None
        >>> call_it('/foo/bar', '/')
        SCRIPT_NAME='/foo/bar/'; PATH_INFO=''; returns=''
        >>> call_it('', '/1/2/3')
        SCRIPT_NAME='/1'; PATH_INFO='/2/3'; returns='1'
        >>> call_it('', '//1/2')
        SCRIPT_NAME='//1'; PATH_INFO='/2'; returns='1'

The last comment demonstrates the need to treat repeating slashes
as a single slash, but also seems to indicate that SCRIPT_NAME can have
repeating slashes in it. Running the code yields:

BEFORE: {'PATH_INFO': '///b//c/d', 'SCRIPT_NAME': '/////wsgi//////a'}
RESULT: 'b'
AFTER: {'PATH_INFO': '//c/d', 'SCRIPT_NAME': '/////wsgi//////a///b'}

In wsgiref.shift_path_info(), although it also treats repeating slashes as one,
it strips all the repeating slashes out.

BEFORE: {'PATH_INFO': '///b//c/d', 'SCRIPT_NAME': '/////wsgi//////a'}
RESULT: 'b'
AFTER: {'PATH_INFO': '/c/d', 'SCRIPT_NAME': '/wsgi/a/b'}

What is accepted convention for dealing with repeating slashes. Should
any web server adapter leave repeating slashes in both SCRIPT_NAME and
PATH_INFO, or should it at least normalise SCRIPT_NAME so that it matches
the designated mount point.

Thanks in advance.

Graham


More information about the Web-SIG mailing list