[ python-Bugs-1396543 ] urlparse is confused by /

Mon Feb 6 02:09:36 CET 2006

Bugs item #1396543, was opened at 2006-01-04 04:57
Message generated for change (Comment added) made by jjlee
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1396543&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: John Hansen (johnhansen)
Assigned to: Nobody/Anonymous (nobody)
Summary: urlparse is confused by /

Initial Comment:
If the parameter field of a URL contains a '/', urlparse does not enter date 
in the parameter field, but leaves it attached to the path.

The simplified example is:
>>> urlparse.urlparse("http://f/adi;s=a;c=b/")
('http', 'f', '/adi;s=a;c=b/', '', '', '')

>>> urlparse.urlparse("http://f/adi;s=a;c=b")
('http', 'f', '/adi', 's=a;c=b', '', '')

The realworld case was:

>>> urlparse.urlparse("http://ad.doubleclick.net/adi/
N3691.VibrantMedia/B1733031.2;sz=160x600;click=http%3A/
adforce.adtech.de/adlink%7C82%7C59111%7C1%7C168%7CAdId%
3D1023327%3BBnId%3D4%3Bitime%3D335264036%3Bku%3D12900%
3Bkey%3Dcomputing%2Bbetanews%5Fgeneral%3Blink%3D")
(''http'', 'ad.doubleclick.net/adi/N3691.VibrantMedia/
B1733031.2;sz=160x600;click=http%3A/adforce.adtech.de/adlink%
7C82%7C59111%7C1%7C168%7CAdId%3D1023327%3BBnId%3D4%3Bitime
%3D335264036%3Bku%3D12900%3Bkey%3Dcomputing%2Bbetanews%
5Fgeneral%3Blink%3D', '', '', '')

What's odd is that the code specifically says to do this:
def _splitparams(url):
    if '/'  in url:
        i = url.find(';', url.rfind('/'))
        if i < 0:
            return url, ''

Is there a reason for the rfind?

----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2006-02-06 01:09

Message:
Logged In: YES 
user_id=261020

The urlparse.urlparse() code should not be changed, for
backwards compatibility reasons.

As the docs for module urlparse explain, you should instead
use urlparse.urlsplit(), then another function to parse
parameters (that other function is not supplied by the
stdlib, IIRC).

Also, note that RFCs 3986 obsoletes RFC 2396 (see also RFC
3987).

----------------------------------------------------------------------

Comment By: Peter van Kampen (pterk)
Date: 2006-01-14 21:19

Message:
Logged In: YES 
user_id=174455

Actually section 3.3 of RFC2396 is relevant here and it
seems that it is indeed correctly implemented as is.

I'm not sure what the 'python policy' is on RFC vs The Real
World. My guess would be that RFC's carry some weight.
Following the 'real world' is too vague a reference. Your
world might be different than mine and tomorrow's world a
different world than today's.

You can always monkey-patch:

>>> def my_splitparams(url):
...     i = url.find(';')
...     return url[:i], url[i+1:]
...
>>> import urlparse
>>> urlparse._splitparams = my_splitparams
>>> urlparse.urlparse("http://f/adi;s=a;c=b/")
('http', 'f', '/adi', 's=a;c=b/', '', '')

----------------------------------------------------------------------

Comment By: John Hansen (johnhansen)
Date: 2006-01-13 18:19

Message:
Logged In: YES 
user_id=1418831

Well RFC2396, section 3.4 says "/" is reserved within a query. However, the real 
world doesn't seem to follow RFC2396... so I still think it's a bug: the class 
should be useful, rather than try to enforce an RFC. A warning would be fine.

----------------------------------------------------------------------

Comment By: Peter van Kampen (pterk)
Date: 2006-01-13 00:25

Message:
Logged In: YES 
user_id=174455

Looking at the testcases it appears the answers must be in
rfc's 1808 or 2396. http://www.ietf.org/rfc/rfc1808.txt and
http://www.ietf.org/rfc/rfc2396.txt See for example section
5.3 of 1808. I don't see why _splitparams does what is does
but I didn't exactly close-read the text either. Also be
sure to look at Lib/test/test_urlparse.py.

----------------------------------------------------------------------

Comment By: John Hansen (johnhansen)
Date: 2006-01-04 16:31

Message:
Logged In: YES 
user_id=1418831

The first line should have read:

If the parameter field of a URL contains a '/', urlparse does not enter it 
into the parameter field, but leaves it attached to the path.

----------------------------------------------------------------------

Comment By: John Hansen (johnhansen)
Date: 2006-01-04 05:00

Message:
Logged In: YES 
user_id=1418831

The first line should have read:

If the parameter field of a URL contains a '/', urlparse does not enter it 
into the parameter field, but leaves it attached to the path.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1396543&group_id=5470