[Python-bugs-list] [Bug #110832] urljoin() bug with odd no of '..' (PR#194)
noreply@sourceforge.net
noreply@sourceforge.net
Tue, 19 Dec 2000 08:38:23 -0800
Bug #110832, was updated on 2000-Aug-01 14:13
Here is a current snapshot of the bug.
Project: Python
Category: None
Status: Open
Resolution: None
Bug Group: None
Priority: 1
Submitted by: nobody
Assigned to : gvanrossum
Summary: urljoin() bug with odd no of '..' (PR#194)
Details: Jitterbug-Id: 194
Submitted-By: DrMalte@ddd.de
Date: Sun, 30 Jan 2000 19:40:45 -0500 (EST)
Version: 1.5.2 and 1.4
OS: Linux
While playing with linbot I noticed some failed requests to
'http://xxx.xxx.xx/../img/xxx.gif'
for a document in the root directory containing
<IMG SRC="../img/xxx.gif">.
The Reason is in urlparse.urljoin()
urljoin() fails to remove an odd number of '../' from the path.
Demonstration:
from urlparse import urljoin
print urljoin( 'http://127.0.0.1/', '../imgs/logo.gif' )
# gives 'http://127.0.0.1/../imgs/logo.gif'
# should give 'http://127.0.0.1/imgs/logo.gif'
print urljoin( 'http://127.0.0.1/', '../../imgs/logo.gif' )
# gives 'http://127.0.0.1/imgs/logo.gif'
# works
# '../../imgs/logo.gif' gives 'http://127.0.0.1/../imgs/logo.gif' and so on
The patch for 1.5.2
( I'm not sure if it works generally, but tests with linbot looked good)
*** /usr/local/lib/python1.5/urlparse.py Sat Jun 26 19:11:59 1999
--- urlparse.py Mon Jan 31 01:31:45 2000
***************
*** 170,175 ****
--- 170,180 ----
segments[-1] = ''
elif len(segments) >= 2 and segments[-1] == '..':
segments[-2:] = ['']
+
+ if segments[0] == '':
+ while segments[1] == '..': # remove all leading '..'
+ del segments[1]
+
return urlunparse((scheme, netloc, joinfields(segments, '/'),
params, query, fragment))
====================================================================
Audit trail:
Mon Feb 07 12:35:35 2000 guido changed notes
Mon Feb 07 12:35:35 2000 guido moved from incoming to request
Follow-Ups:
Date: 2000-Dec-19 08:38
By: gvanrossum
Comment:
OK, reopened.
-------------------------------------------------------
Date: 2000-Dec-19 08:30
By: doerwalter
Comment:
Section 5.2 of RFC 1808 states that in the context of the base URL
<> = <URL:http://a/b/c/d;p?q#f>
URLs that have more .. than the base has directory names, should be resolved
in the following way:
../../../g = <URL:http://a/../g>
../../../../g = <URL:http://a/../../g>
i.e. they should be preserved, which urljoin does in the first example gives
in the bug report:
print urljoin( 'http://127.0.0.1/', '../imgs/logo.gif' )
http://127.0.0.1/../imgs/logo.gif
but not in the second example:
print urljoin( 'http://127.0.0.1/', '../../imgs/logo.gif' )
http://127.0.0.1/imgs/logo.gif
where the result should have been
http://127.0.0.1/../../imgs/logo.gif
-------------------------------------------------------
Date: 2000-Aug-23 21:22
By: fdrake
Comment:
RFC 1808 gives examples of this form in section 5.2, "Abnormal Examples," and gives the current behavior as the desired treatment, stating that all parsers (urljoin() counts given the RFC's terminology) should treat the abnormal examples consistently.
-------------------------------------------------------
Date: 2000-Aug-13 01:36
By: moshez
Comment:
OK, Jeremy -- this one is yours. Either notabug it, or check in the relevant patch (101064 -- assigned to you)
-------------------------------------------------------
Date: 2000-Aug-01 14:13
By: nobody
Comment:
Patch being considered.
-------------------------------------------------------
Date: 2000-Aug-01 14:13
By: nobody
Comment:
From: Guido van Rossum <guido@CNRI.Reston.VA.US>
Subject: Re: [Python-bugs-list] urljoin() bug with odd no of '..' (PR#194)
Date: Mon, 31 Jan 2000 12:28:55 -0500
Thanks for your bug report and fix. I agree with your diagnosis.
Would you please be so kind as to resend your patch with the
legal disclaimer from
http://www.python.org/1.5/bugrelease.html
--Guido van Rossum (home page: http://www.python.org/~guido/)
-------------------------------------------------------
For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=110832&group_id=5470