[Python-bugs-list] [ python-Bugs-478038 ] urlparse.urlparse semicolon bug

Thu, 15 Nov 2001 19:23:52 -0800

Bugs item #478038, was opened at 2001-11-04 09:19
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=478038&group_id=5470

Category: Python Library
Group: Python 2.1.1
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: urlparse.urlparse semicolon bug

Initial Comment:
urlparse,urlparse uses obsolete parsing rules. It
expects there to
be no more than one semicolon in a URL, as in:

http://127.0.0.1:8880/semitest/foo;presentation=edit?x=y

It splits the url into parts, one of which is the part
after between
the semicolon and the question mark.  This behavior is
based
on an obsolete URL spec.

Recent specs, including the RFCs referenced in the
urlparse 
documentation allow semicolons in each path, as in:

http://127.0.0.1:8880/semitest/foo;presentation=edit/form/spam;eggs=1/splat

urlparse.urlparse parses as follows:

[jim@c ZServer]$ python2.2
Python 2.2b1 (#1, Oct 22 2001, 17:42:33) 
[GCC 2.95.3 19991030 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for
more information.
Py$ from urlparse import urlparse
Py$
urlparse("http://127.0.0.1:8880/semitest/foo%3Bbar;presentation=edit/form/spam;eggs=1/splat")
('http', '127.0.0.1:8880', '/semitest/foo%3Bbar',
'presentation=edit/form/spam;eggs=1/splat', '', '')
Py$ 

which is incorrect because much of the path is
incorrectly
included in the obsolete "params" part.

----------------------------------------------------------------------

>Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2001-11-15 19:23

Message:
Logged In: YES 
user_id=3066

Fixed in Lib/urlparse.py 1.31 and 1.30.10.1.

----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2001-11-05 14:47

Message:
Logged In: YES 
user_id=3066

I've attached a patch that makes the change described in my
previous comment and cleans up the code a little.

----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2001-11-05 13:35

Message:
Logged In: YES 
user_id=3066

Here's my proposal for a fix:

For the existing urlparse() function, return something in
the params field of the result tuple only if it appears on
the last path segment.  This makes it an empty string for
your example, but for URLs which conform to the simpler
version of the specifications the API was designed for
continue to give the expected behavior.

To support the current RFC 2396 syntax, a new function is
needed which returns a 5-tuple (the current 6-tuple less the
params field).  A second new function can be provided which
splits the path component into a sequence of pairs, which
each pair is (namepart, params).

Does this seem acceptable?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=478038&group_id=5470