Bug in parsing request paths, and patch

Jim Fulton jim@d...
Sat, 19 May 2001 13:00:23 -0400


The HTTP server sort of follows RFC 1808 when parsing paths.
RFC 1808 specified a URL as:

<scheme>://<net_loc>/<path>;<params>?<query>#<fragment>

The HTTP request line contains the part after the net_loc, 
which the Medusa HTTP server parses into 4 parts, 
path, params, query, and fragment. It ignores (tosses) the params
and fragment.

RFC 2396 supercedes RFC 1808 and incorporates 'params' into the 
path. Loosely, a path is a sequence of path segments separated by 
slashes. Each path segment has a name followed by 0 or more parameters, 
where each parameter is set off by a semicolon, as in:

/namex/namey;p1=v2;p2=v2/namez;p4/....

I've never seen this syntax actually used, but I'm considering
using it in Zope to provide more explicit control over name lookup.

Medusa's current request parsing breaks this syntax and, technically,
violates RFC 2396. It lops off everything after a semicolon up to a query
string. For example, the above path becomes:

/namex/namey

I suggest that, especially given RFC 2396, the parsing and interpretation
of parameters should be left to individual handlers.

The simplest fix would be to change the parser to ignore the semicolons
in paths and return an empty string for the "params" part.

The patch is included below.

Jim


diff -c -r1.25 http_server.py
*** http_server.py	2001/05/01 12:49:04	1.25
--- http_server.py	2001/05/19 16:53:10
***************
*** 95,105 ****
# split a uri
# --------------------------------------------------

! # <path>;<params>?<query>#<fragment>
path_regex = re.compile (
! # path params query fragment
! r'([^;?#]*)(;[^?#]*)?(\?[^#]*)?(#.*)?'
! )

def split_uri (self):
if self._split_uri is None:
--- 95,105 ----
# split a uri
# --------------------------------------------------

! # <path>?<query>#<fragment>
path_regex = re.compile (
! # path query fragment
! r'([^?#]*)(\?[^#]*)?(#.*)?'
! )

def split_uri (self):
if self._split_uri is None:
***************
*** 107,113 ****
if m.end() != len(self.uri):
raise ValueError, "Broken URI"
else:
! self._split_uri = m.groups()
return self._split_uri

def get_header_with_regex (self, head_reg, group):
--- 107,114 ----
if m.end() != len(self.uri):
raise ValueError, "Broken URI"
else:
! m=m.groups()
! self._split_uri = m[0], '', m[1], m[2]
return self._split_uri

def get_header_with_regex (self, head_reg, group):


--
Jim Fulton mailto:jim@d... Python Powered! 
Technical Director (888) 344-4332 http://www.python.org 
Digital Creations http://www.digicool.com http://www.zope.org