[Python-bugs-list] [ python-Bugs-450225 ] urljoin fails RFC tests

noreply@sourceforge.net noreply@sourceforge.net
Mon, 18 Mar 2002 06:22:19 -0800


Bugs item #450225, was opened at 2001-08-12 06:10
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=450225&group_id=5470

Category: Python Library
Group: Python 2.1.1
Status: Open
Resolution: None
Priority: 5
Submitted By: Aaron Swartz (aaronsw)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: urljoin fails RFC tests

Initial Comment:
I've put together a test suite for Python's URLparse 
module, based on the tests in Appendix C of 
RFC2396 (the URI RFC). They're available at:

http://lists.w3.org/Archives/Public/uri/2001Aug/
0013.html

The major problem seems to be that it treats 
queries and parameters as special components 
(not just normal parts of the path), making this 
related to:

http://sourceforge.net/tracker/?group_id=5470&
atid=105470&func=detail&aid=210834

----------------------------------------------------------------------

Comment By: Jon Ribbens (jribbens)
Date: 2002-03-18 14:22

Message:
Logged In: YES 
user_id=76089

I think it would be better btw if '..' components taking 
you 'off the top' were stripped. RFC 2396 says this is 
valid behaviour, and it's what 'real' browsers do.

i.e.
  http://a/b/ + ../../../d == http://a/d


----------------------------------------------------------------------

Comment By: Aaron Swartz (aaronsw)
Date: 2001-11-05 18:34

Message:
Logged In: YES 
user_id=122141

Oops, meant to attach it...

----------------------------------------------------------------------

Comment By: Aaron Swartz (aaronsw)
Date: 2001-11-05 18:30

Message:
Logged In: YES 
user_id=122141

Sure, here they are:



import urlparse

base = 'http://a/b/c/d;p?q'

assert urlparse.urljoin(base, 'g:h') == 'g:h'
assert urlparse.urljoin(base, 'g') ==   'http://a/b/c/g'
assert urlparse.urljoin(base, './g') == 'http://a/b/c/g'
assert urlparse.urljoin(base, 'g/') ==  'http://a/b/c/g/'
assert urlparse.urljoin(base, '/g') ==  'http://a/g'
assert urlparse.urljoin(base, '//g') == 'http://g'
assert urlparse.urljoin(base, '?y') ==  'http://a/b/c/?y'
assert urlparse.urljoin(base, 'g?y') == 'http://a/b/c/g?y'
assert urlparse.urljoin(base, '#s') ==  'http://a/b/c/
d;p?q#s'
assert urlparse.urljoin(base, 'g#s') == 'http://a/b/c/g#s'
assert urlparse.urljoin(base, 'g?y#s') == 'http://a/b/c/
g?y#s'
assert urlparse.urljoin(base, ';x') == 'http://a/b/c/;x'
assert urlparse.urljoin(base, 'g;x') ==  'http://a/b/c/g;x'
assert urlparse.urljoin(base, 'g;x?y#s') == 'http://a/b/c/
g;x?y#s'
assert urlparse.urljoin(base, '.') ==  'http://a/b/c/'
assert urlparse.urljoin(base, './') ==  'http://a/b/c/'
assert urlparse.urljoin(base, '..') ==  'http://a/b/'
assert urlparse.urljoin(base, '../') ==  'http://a/b/'
assert urlparse.urljoin(base, '../g') ==  'http://a/b/g'
assert urlparse.urljoin(base, '../..') ==  'http://a/'
assert urlparse.urljoin(base, '../../') ==  'http://a/'
assert urlparse.urljoin(base, '../../g') ==  'http://a/g'

assert urlparse.urljoin(base, '') == base

assert urlparse.urljoin(base, '../../../g')    ==  'http://a/../g'
assert urlparse.urljoin(base, '../../../../g') ==  'http://a/../../g'

assert urlparse.urljoin(base, '/./g') ==  'http://a/./g'
assert urlparse.urljoin(base, '/../g')         ==  'http://a/../g'
assert urlparse.urljoin(base, 'g.')            ==  'http://a/b/c/
g.'
assert urlparse.urljoin(base, '.g')            ==  'http://a/b/c/
.g'
assert urlparse.urljoin(base, 'g..')           == 'http://a/b/c/
g..'
assert urlparse.urljoin(base, '..g')           == 'http://a/b/c/
..g'

assert urlparse.urljoin(base, './../g')        ==  'http://a/b/g'
assert urlparse.urljoin(base, './g/.')         ==  'http://a/b/c/
g/'
assert urlparse.urljoin(base, 'g/./h')         ==  'http://a/b/c/
g/h'
assert urlparse.urljoin(base, 'g/../h')        ==  'http://a/b/c/
h'
assert urlparse.urljoin(base, 'g;x=1/./y')     ==  
'http://a/b/c/g;x=1/y'
assert urlparse.urljoin(base, 'g;x=1/../y')    ==  'http://a/b/
c/y'

assert urlparse.urljoin(base, 'g?y/./x')       ==  
'http://a/b/c/g?y/./x'
assert urlparse.urljoin(base, 'g?y/../x')      == 
'http://a/b/c/g?y/../x'
assert urlparse.urljoin(base, 'g#s/./x')       ==  'http://a/b/
c/g#s/./x'
assert urlparse.urljoin(base, 'g#s/../x')      ==  'http://a/b/
c/g#s/../x'



----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2001-11-05 18:05

Message:
Logged In: YES 
user_id=3066

This looks like its probably related to #478038; I'll try to
tackle them together.  Can you attach your tests to the bug
report on SF?  Thanks!

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=450225&group_id=5470