[Python-bugs-list] [ python-Bugs-450225 ] urljoin fails RFC tests
Thu, 12 Jun 2003 00:24:59 -0700
Bugs item #450225, was opened at 2001-08-11 22:10
Message generated for change (Comment added) made by bcannon
You can respond by visiting:
Category: Python Library
>Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Aaron Swartz (aaronsw)
Assigned to: Brett Cannon (bcannon)
Summary: urljoin fails RFC tests
Initial Comment:
I've put together a test suite for Python's URLparse
module, based on the tests in Appendix C of
RFC2396 (the URI RFC). They're available at:
The major problem seems to be that it treats
queries and parameters as special components
(not just normal parts of the path), making this
related to:
>Comment By: Brett Cannon (bcannon)
Date: 2003-06-12 00:24
Logged In: YES
Since there is the random possibility that this might break code
that depends on this to act like RFC 1808 instead of 2396 and
2.3 has hit beta I am going to wait for 2.4 before I deal with this.
Comment By: Brett Cannon (bcannon)
Date: 2003-05-11 17:35
Logged In: YES
mbrierst is right. From C.1 of RFC 2396 (with http://a/b/c/d;p?q as the
?y = http://a/b/c/?y
;x = http://a/b/c/;x
And notice how this contradicts RFC 1808 ( with <URL:http://a/b/c/
d;p?q#f> as the base):
?y = <URL:http://a/b/c/d;p?y>
;x = <URL:http://a/b/c/d;x>
So obviously there is a conflict here. And since RFC 2396 says "it revises and
replaces the generic definitions in RFC 1738 and RFC 1808" (of which
"generic" just means the actual syntax) this means that RFC 2396's solution
should override.
Now the issue is whether the patch for this is the right thing to do (I am
ignoring if the patch is correct; have not tested it yet). This shouldn't break
anything since the whole point of urlparse.urljoin is to have an abstracted
way to create URIs without the user having to worry about all of these rules.
So I say that it should be changed.
Fred, do you mind if I reassign this patch to myself and deal with it?
Comment By: Michael Stone (mbrierst)
Date: 2003-02-03 13:02
Logged In: YES
The two failing tests could not pass because RFC 1808 and RFC 2396 seem to conflict when a relative URI is given as just ;y or just ?y.
RFC 2396 claims to update RFC 1808, so presumably it describes the correct behavior. The patch in this message (I can't upload it on sourceforge here for some reason) brings urljoin's behavior in line with RFC 2396, and changes the appropriate test cases. I think if you apply this patch this bug can be closed. Let me know what you think
Index: python/dist/src/Lib/urlparse.py
RCS file: /cvsroot/python/python/dist/src/Lib/urlparse.py,v
retrieving revision 1.39
diff -c -r1.39 urlparse.py
*** python/dist/src/Lib/urlparse.py 7 Jan 2003 02:09:16 -0000 1.39
--- python/dist/src/Lib/urlparse.py 3 Feb 2003 20:51:08 -0000
*** 157,169 ****
if path[:1] == '/':
return urlunparse((scheme, netloc, path,
params, query, fragment))
! if not path:
! if not params:
! params = bparams
! if not query:
! query = bquery
return urlunparse((scheme, netloc, bpath,
! params, query, fragment))
segments = bpath.split('/')[:-1] + path.split('/')
# XXX The stuff below is bogus in various ways...
if segments[-1] == '.':
--- 157,165 ----
if path[:1] == '/':
return urlunparse((scheme, netloc, path,
params, query, fragment))
! if not (path or params or query):
return urlunparse((scheme, netloc, bpath,
! bparams, bquery, fragment))
segments = bpath.split('/')[:-1] + path.split('/')
# XXX The stuff below is bogus in various ways...
if segments[-1] == '.':
Index: python/dist/src/Lib/test/test_urlparse.py
RCS file: /cvsroot/python/python/dist/src/Lib/test/test_urlparse.py,v
retrieving revision 1.11
diff -c -r1.11 test_urlparse.py
*** python/dist/src/Lib/test/test_urlparse.py 6 Jan 2003 20:27:03 -0000 1.11
--- python/dist/src/Lib/test/test_urlparse.py 3 Feb 2003 20:51:12 -0000
*** 54,59 ****
--- 54,63 ----
self.assertEqual(urlparse.urlunparse(urlparse.urlparse(u)), u)
def test_RFC1808(self):
+ # updated by RFC 2396
+ # self.checkJoin(RFC1808_BASE, '?y', 'http://a/b/c/d;p?y')
+ # self.checkJoin(RFC1808_BASE, ';x', 'http://a/b/c/d;x')
# "normal" cases from RFC 1808:
self.checkJoin(RFC1808_BASE, 'g:h', 'g:h')
self.checkJoin(RFC1808_BASE, 'g', 'http://a/b/c/g')
*** 61,74 ****
self.checkJoin(RFC1808_BASE, 'g/', 'http://a/b/c/g/')
self.checkJoin(RFC1808_BASE, '/g', 'http://a/g')
self.checkJoin(RFC1808_BASE, '//g', 'http://g')
- self.checkJoin(RFC1808_BASE, '?y', 'http://a/b/c/d;p?y')
self.checkJoin(RFC1808_BASE, 'g?y', 'http://a/b/c/g?y')
self.checkJoin(RFC1808_BASE, 'g?y/./x', 'http://a/b/c/g?y/./x')
self.checkJoin(RFC1808_BASE, '#s', 'http://a/b/c/d;p?q#s')
self.checkJoin(RFC1808_BASE, 'g#s', 'http://a/b/c/g#s')
self.checkJoin(RFC1808_BASE, 'g#s/./x', 'http://a/b/c/g#s/./x')
self.checkJoin(RFC1808_BASE, 'g?y#s', 'http://a/b/c/g?y#s')
- self.checkJoin(RFC1808_BASE, ';x', 'http://a/b/c/d;x')
self.checkJoin(RFC1808_BASE, 'g;x', 'http://a/b/c/g;x')
self.checkJoin(RFC1808_BASE, 'g;x?y#s', 'http://a/b/c/g;x?y#s')
self.checkJoin(RFC1808_BASE, '.', 'http://a/b/c/')
--- 65,76 ----
*** 103,111 ****
def test_RFC2396(self):
# cases from RFC 2396
! ### urlparse.py as of v 1.32 fails on these two
! #self.checkJoin(RFC2396_BASE, '?y', 'http://a/b/c/?y')
! #self.checkJoin(RFC2396_BASE, ';x', 'http://a/b/c/;x')
self.checkJoin(RFC2396_BASE, 'g:h', 'g:h')
self.checkJoin(RFC2396_BASE, 'g', 'http://a/b/c/g')
--- 105,113 ----
def test_RFC2396(self):
# cases from RFC 2396
! # conflict with RFC 1808, tests commented out there
! self.checkJoin(RFC2396_BASE, '?y', 'http://a/b/c/?y')
! self.checkJoin(RFC2396_BASE, ';x', 'http://a/b/c/;x')
self.checkJoin(RFC2396_BASE, 'g:h', 'g:h')
self.checkJoin(RFC2396_BASE, 'g', 'http://a/b/c/g')
Comment By: Skip Montanaro (montanaro)
Date: 2002-03-22 21:34
Logged In: YES
added Aaron's RFC 2396 tests to test_urlparse.py
version 1.4 - the two failing tests are commented out
Comment By: Jon Ribbens (jribbens)
Date: 2002-03-18 06:22
Logged In: YES
I think it would be better btw if '..' components taking
you 'off the top' were stripped. RFC 2396 says this is
valid behaviour, and it's what 'real' browsers do.
http://a/b/ + ../../../d == http://a/d
Comment By: Aaron Swartz (aaronsw)
Date: 2001-11-05 10:34
Logged In: YES
Oops, meant to attach it...
Comment By: Aaron Swartz (aaronsw)
Date: 2001-11-05 10:30
Logged In: YES
Sure, here they are:
import urlparse
base = 'http://a/b/c/d;p?q'
assert urlparse.urljoin(base, 'g:h') == 'g:h'
assert urlparse.urljoin(base, 'g') == 'http://a/b/c/g'
assert urlparse.urljoin(base, './g') == 'http://a/b/c/g'
assert urlparse.urljoin(base, 'g/') == 'http://a/b/c/g/'
assert urlparse.urljoin(base, '/g') == 'http://a/g'
assert urlparse.urljoin(base, '//g') == 'http://g'
assert urlparse.urljoin(base, '?y') == 'http://a/b/c/?y'
assert urlparse.urljoin(base, 'g?y') == 'http://a/b/c/g?y'
assert urlparse.urljoin(base, '#s') == 'http://a/b/c/
assert urlparse.urljoin(base, 'g#s') == 'http://a/b/c/g#s'
assert urlparse.urljoin(base, 'g?y#s') == 'http://a/b/c/
assert urlparse.urljoin(base, ';x') == 'http://a/b/c/;x'
assert urlparse.urljoin(base, 'g;x') == 'http://a/b/c/g;x'
assert urlparse.urljoin(base, 'g;x?y#s') == 'http://a/b/c/
assert urlparse.urljoin(base, '.') == 'http://a/b/c/'
assert urlparse.urljoin(base, './') == 'http://a/b/c/'
assert urlparse.urljoin(base, '..') == 'http://a/b/'
assert urlparse.urljoin(base, '../') == 'http://a/b/'
assert urlparse.urljoin(base, '../g') == 'http://a/b/g'
assert urlparse.urljoin(base, '../..') == 'http://a/'
assert urlparse.urljoin(base, '../../') == 'http://a/'
assert urlparse.urljoin(base, '../../g') == 'http://a/g'
assert urlparse.urljoin(base, '') == base
assert urlparse.urljoin(base, '../../../g') == 'http://a/../g'
assert urlparse.urljoin(base, '../../../../g') == 'http://a/../../g'
assert urlparse.urljoin(base, '/./g') == 'http://a/./g'
assert urlparse.urljoin(base, '/../g') == 'http://a/../g'
assert urlparse.urljoin(base, 'g.') == 'http://a/b/c/
assert urlparse.urljoin(base, '.g') == 'http://a/b/c/
assert urlparse.urljoin(base, 'g..') == 'http://a/b/c/
assert urlparse.urljoin(base, '..g') == 'http://a/b/c/
assert urlparse.urljoin(base, './../g') == 'http://a/b/g'
assert urlparse.urljoin(base, './g/.') == 'http://a/b/c/
assert urlparse.urljoin(base, 'g/./h') == 'http://a/b/c/
assert urlparse.urljoin(base, 'g/../h') == 'http://a/b/c/
assert urlparse.urljoin(base, 'g;x=1/./y') ==
assert urlparse.urljoin(base, 'g;x=1/../y') == 'http://a/b/
assert urlparse.urljoin(base, 'g?y/./x') ==
assert urlparse.urljoin(base, 'g?y/../x') ==
assert urlparse.urljoin(base, 'g#s/./x') == 'http://a/b/
assert urlparse.urljoin(base, 'g#s/../x') == 'http://a/b/
Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2001-11-05 10:05
Logged In: YES
This looks like its probably related to #478038; I'll try to
tackle them together. Can you attach your tests to the bug
report on SF? Thanks!
You can respond by visiting: