[Python-bugs-list] [ python-Bugs-478038 ] urlparse.urlparse semicolon bug

noreply@sourceforge.net noreply@sourceforge.net
Mon, 05 Nov 2001 09:58:50 -0800


Bugs item #478038, was opened at 2001-11-04 09:19
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=478038&group_id=5470

Category: Python Library
Group: Python 2.1.1
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
>Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: urlparse.urlparse semicolon bug

Initial Comment:
urlparse,urlparse uses obsolete parsing rules. It
expects there to
be no more than one semicolon in a URL, as in:

 
http://127.0.0.1:8880/semitest/foo;presentation=edit?x=y

It splits the url into parts, one of which is the part
after between
the semicolon and the question mark.  This behavior is
based
on an obsolete URL spec.

Recent specs, including the RFCs referenced in the
urlparse 
documentation allow semicolons in each path, as in:

http://127.0.0.1:8880/semitest/foo;presentation=edit/form/spam;eggs=1/splat

urlparse.urlparse parses as follows:

[jim@c ZServer]$ python2.2
Python 2.2b1 (#1, Oct 22 2001, 17:42:33) 
[GCC 2.95.3 19991030 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for
more information.
Py$ from urlparse import urlparse
Py$
urlparse("http://127.0.0.1:8880/semitest/foo%3Bbar;presentation=edit/form/spam;eggs=1/splat")
('http', '127.0.0.1:8880', '/semitest/foo%3Bbar',
'presentation=edit/form/spam;eggs=1/splat', '', '')
Py$ 

which is incorrect because much of the path is
incorrectly
included in the obsolete "params" part.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=478038&group_id=5470