[ python-Bugs-1396543 ] urlparse is confused by /

Fri Jan 13 19:19:28 CET 2006

Bugs item #1396543, was opened at 2006-01-04 04:57
Message generated for change (Comment added) made by johnhansen
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1396543&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: John Hansen (johnhansen)
Assigned to: Nobody/Anonymous (nobody)
Summary: urlparse is confused by /

Initial Comment:
If the parameter field of a URL contains a '/', urlparse does not enter date 
in the parameter field, but leaves it attached to the path.

The simplified example is:
>>> urlparse.urlparse("http://f/adi;s=a;c=b/")
('http', 'f', '/adi;s=a;c=b/', '', '', '')

>>> urlparse.urlparse("http://f/adi;s=a;c=b")
('http', 'f', '/adi', 's=a;c=b', '', '')

The realworld case was:

>>> urlparse.urlparse("http://ad.doubleclick.net/adi/
N3691.VibrantMedia/B1733031.2;sz=160x600;click=http%3A/
adforce.adtech.de/adlink%7C82%7C59111%7C1%7C168%7CAdId%
3D1023327%3BBnId%3D4%3Bitime%3D335264036%3Bku%3D12900%
3Bkey%3Dcomputing%2Bbetanews%5Fgeneral%3Blink%3D")
(''http'', 'ad.doubleclick.net/adi/N3691.VibrantMedia/
B1733031.2;sz=160x600;click=http%3A/adforce.adtech.de/adlink%
7C82%7C59111%7C1%7C168%7CAdId%3D1023327%3BBnId%3D4%3Bitime
%3D335264036%3Bku%3D12900%3Bkey%3Dcomputing%2Bbetanews%
5Fgeneral%3Blink%3D', '', '', '')

What's odd is that the code specifically says to do this:
def _splitparams(url):
    if '/'  in url:
        i = url.find(';', url.rfind('/'))
        if i < 0:
            return url, ''

Is there a reason for the rfind?

----------------------------------------------------------------------

>Comment By: John Hansen (johnhansen)
Date: 2006-01-13 18:19

Message:
Logged In: YES 
user_id=1418831

Well RFC2396, section 3.4 says "/" is reserved within a query. However, the real 
world doesn't seem to follow RFC2396... so I still think it's a bug: the class 
should be useful, rather than try to enforce an RFC. A warning would be fine.

----------------------------------------------------------------------

Comment By: Peter van Kampen (pterk)
Date: 2006-01-13 00:25

Message:
Logged In: YES 
user_id=174455

Looking at the testcases it appears the answers must be in
rfc's 1808 or 2396. http://www.ietf.org/rfc/rfc1808.txt and
http://www.ietf.org/rfc/rfc2396.txt See for example section
5.3 of 1808. I don't see why _splitparams does what is does
but I didn't exactly close-read the text either. Also be
sure to look at Lib/test/test_urlparse.py.

----------------------------------------------------------------------

Comment By: John Hansen (johnhansen)
Date: 2006-01-04 16:31

Message:
Logged In: YES 
user_id=1418831

The first line should have read:

If the parameter field of a URL contains a '/', urlparse does not enter it 
into the parameter field, but leaves it attached to the path.

----------------------------------------------------------------------

Comment By: John Hansen (johnhansen)
Date: 2006-01-04 05:00

Message:
Logged In: YES 
user_id=1418831

The first line should have read:

If the parameter field of a URL contains a '/', urlparse does not enter it 
into the parameter field, but leaves it attached to the path.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1396543&group_id=5470