[ python-Bugs-1396543 ] urlparse is confused by /
SourceForge.net
noreply at sourceforge.net
Mon Feb 6 02:09:36 CET 2006
Bugs item #1396543, was opened at 2006-01-04 04:57
Message generated for change (Comment added) made by jjlee
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1396543&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: John Hansen (johnhansen)
Assigned to: Nobody/Anonymous (nobody)
Summary: urlparse is confused by /
Initial Comment:
If the parameter field of a URL contains a '/', urlparse does not enter date
in the parameter field, but leaves it attached to the path.
The simplified example is:
>>> urlparse.urlparse("http://f/adi;s=a;c=b/")
('http', 'f', '/adi;s=a;c=b/', '', '', '')
>>> urlparse.urlparse("http://f/adi;s=a;c=b")
('http', 'f', '/adi', 's=a;c=b', '', '')
The realworld case was:
>>> urlparse.urlparse("http://ad.doubleclick.net/adi/
N3691.VibrantMedia/B1733031.2;sz=160x600;click=http%3A/
adforce.adtech.de/adlink%7C82%7C59111%7C1%7C168%7CAdId%
3D1023327%3BBnId%3D4%3Bitime%3D335264036%3Bku%3D12900%
3Bkey%3Dcomputing%2Bbetanews%5Fgeneral%3Blink%3D")
(''http'', 'ad.doubleclick.net/adi/N3691.VibrantMedia/
B1733031.2;sz=160x600;click=http%3A/adforce.adtech.de/adlink%
7C82%7C59111%7C1%7C168%7CAdId%3D1023327%3BBnId%3D4%3Bitime
%3D335264036%3Bku%3D12900%3Bkey%3Dcomputing%2Bbetanews%
5Fgeneral%3Blink%3D', '', '', '')
What's odd is that the code specifically says to do this:
def _splitparams(url):
if '/' in url:
i = url.find(';', url.rfind('/'))
if i < 0:
return url, ''
Is there a reason for the rfind?
----------------------------------------------------------------------
Comment By: John J Lee (jjlee)
Date: 2006-02-06 01:09
Message:
Logged In: YES
user_id=261020
The urlparse.urlparse() code should not be changed, for
backwards compatibility reasons.
As the docs for module urlparse explain, you should instead
use urlparse.urlsplit(), then another function to parse
parameters (that other function is not supplied by the
stdlib, IIRC).
Also, note that RFCs 3986 obsoletes RFC 2396 (see also RFC
3987).
----------------------------------------------------------------------
Comment By: Peter van Kampen (pterk)
Date: 2006-01-14 21:19
Message:
Logged In: YES
user_id=174455
Actually section 3.3 of RFC2396 is relevant here and it
seems that it is indeed correctly implemented as is.
I'm not sure what the 'python policy' is on RFC vs The Real
World. My guess would be that RFC's carry some weight.
Following the 'real world' is too vague a reference. Your
world might be different than mine and tomorrow's world a
different world than today's.
You can always monkey-patch:
>>> def my_splitparams(url):
... i = url.find(';')
... return url[:i], url[i+1:]
...
>>> import urlparse
>>> urlparse._splitparams = my_splitparams
>>> urlparse.urlparse("http://f/adi;s=a;c=b/")
('http', 'f', '/adi', 's=a;c=b/', '', '')
----------------------------------------------------------------------
Comment By: John Hansen (johnhansen)
Date: 2006-01-13 18:19
Message:
Logged In: YES
user_id=1418831
Well RFC2396, section 3.4 says "/" is reserved within a query. However, the real
world doesn't seem to follow RFC2396... so I still think it's a bug: the class
should be useful, rather than try to enforce an RFC. A warning would be fine.
----------------------------------------------------------------------
Comment By: Peter van Kampen (pterk)
Date: 2006-01-13 00:25
Message:
Logged In: YES
user_id=174455
Looking at the testcases it appears the answers must be in
rfc's 1808 or 2396. http://www.ietf.org/rfc/rfc1808.txt and
http://www.ietf.org/rfc/rfc2396.txt See for example section
5.3 of 1808. I don't see why _splitparams does what is does
but I didn't exactly close-read the text either. Also be
sure to look at Lib/test/test_urlparse.py.
----------------------------------------------------------------------
Comment By: John Hansen (johnhansen)
Date: 2006-01-04 16:31
Message:
Logged In: YES
user_id=1418831
The first line should have read:
If the parameter field of a URL contains a '/', urlparse does not enter it
into the parameter field, but leaves it attached to the path.
----------------------------------------------------------------------
Comment By: John Hansen (johnhansen)
Date: 2006-01-04 05:00
Message:
Logged In: YES
user_id=1418831
The first line should have read:
If the parameter field of a URL contains a '/', urlparse does not enter it
into the parameter field, but leaves it attached to the path.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1396543&group_id=5470
More information about the Python-bugs-list
mailing list