[ python-Bugs-210832 ] urljoin() bug with odd no of '..' (PR#194)

SourceForge.net noreply at sourceforge.net
Wed Jun 2 11:49:47 EDT 2004


Bugs item #210832, was opened at 2000-08-01 23:13
Message generated for change (Comment added) made by doerwalter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=210832&group_id=5470

Category: Python Library
Group: None
Status: Closed
Resolution: Fixed
Priority: 6
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: urljoin() bug with odd no of '..' (PR#194)

Initial Comment:
Jitterbug-Id: 194
Submitted-By: DrMalte at ddd.de
Date: Sun, 30 Jan 2000 19:40:45 -0500 (EST)
Version: 1.5.2 and 1.4
OS: Linux


While playing with linbot I noticed some failed requests to 
'http://xxx.xxx.xx/../img/xxx.gif'
for a document in the root directory containing
<IMG SRC="../img/xxx.gif">.

The Reason is in urlparse.urljoin()
urljoin() fails to remove an odd number of '../' from the path.

Demonstration:

from urlparse import urljoin

print urljoin( 'http://127.0.0.1/', '../imgs/logo.gif' )
# gives       'http://127.0.0.1/../imgs/logo.gif'
# should give 'http://127.0.0.1/imgs/logo.gif'

print urljoin( 'http://127.0.0.1/', '../../imgs/logo.gif' )
# gives 'http://127.0.0.1/imgs/logo.gif'
# works

# '../../imgs/logo.gif' gives 'http://127.0.0.1/../imgs/logo.gif' and so on

The patch for 1.5.2
( I'm not sure if it works generally, but tests with linbot looked good)

*** /usr/local/lib/python1.5/urlparse.py        Sat Jun 26 19:11:59 1999
--- urlparse.py Mon Jan 31 01:31:45 2000
***************
*** 170,175 ****
--- 170,180 ----
                segments[-1] = ''
        elif len(segments) >= 2 and segments[-1] == '..':
                segments[-2:] = ['']
+
+       if segments[0] == '':
+               while segments[1] == '..':      # remove all leading '..'
+                       del segments[1]
+
        return urlunparse((scheme, netloc, joinfields(segments, '/'),
                           params, query, fragment))




====================================================================
Audit trail:
Mon Feb 07 12:35:35 2000	guido	changed notes
Mon Feb 07 12:35:35 2000	guido	moved from incoming to request

----------------------------------------------------------------------

>Comment By: Walter Dörwald (doerwalter)
Date: 2004-06-02 17:49

Message:
Logged In: YES 
user_id=89016

This is the same behaviour as in Python 2.3.4 and exactly 
what RFC 1808 specifies (see 
http://www.ietf.org/rfc/rfc1808.txt and scroll down to 
section 5.2. "Abnormal Examples"). Why do you think this is a 
problem?

----------------------------------------------------------------------

Comment By: Jon Nelson (jnelson)
Date: 2004-06-02 17:35

Message:
Logged In: YES 
user_id=8446

I'm not 100% sure, but as of Python 2.2.2 (#1, Feb 24 2003,
19:13:11) for RedHat, this is still a problem:


>>> import urlparse
>>> print urlparse.urljoin( 'http://127.0.0.1/',
'../imgs/logo.gif' )
http://127.0.0.1/../imgs/logo.gif
>>> 

The patch above obviously no longer applies.


----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2001-01-05 06:59

Message:
Lib/urlparse.py revision 1.27 conforms to all recommended practices from RFC 1808 which don't conflict with RFC 1630.  Test cases have been added to ensure we don't lose this attribute.

----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2000-12-19 17:41

Message:
Ok, confirmed.  Reopening the bug until I get a chance to look at the proposed patch and can update the test suite.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2000-12-19 17:38

Message:
OK, reopened.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2000-12-19 17:30

Message:
Section 5.2 of RFC 1808 states that in the context of the base URL
      <>            = <URL:http://a/b/c/d;p?q#f>
URLs that have more .. than the base has directory names, should be resolved
in the following way:
      ../../../g    = <URL:http://a/../g>
      ../../../../g = <URL:http://a/../../g>
i.e. they should be preserved, which urljoin does in the first example gives
in the bug report:
   print urljoin( 'http://127.0.0.1/', '../imgs/logo.gif' )
   http://127.0.0.1/../imgs/logo.gif
but not in the second example:
   print urljoin( 'http://127.0.0.1/', '../../imgs/logo.gif' )
   http://127.0.0.1/imgs/logo.gif
where the result should have been
   http://127.0.0.1/../../imgs/logo.gif


----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2000-08-24 06:22

Message:
RFC 1808 gives examples of this form in section 5.2, "Abnormal Examples," and gives the current behavior as the desired treatment, stating that all parsers (urljoin() counts given the RFC's terminology) should treat the abnormal examples consistently.

----------------------------------------------------------------------

Comment By: Moshe Zadka (moshez)
Date: 2000-08-13 10:36

Message:
OK, Jeremy -- this one is yours. Either notabug it, or check in the relevant patch (101064 -- assigned to you)



----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2000-08-01 23:13

Message:
Patch being considered.


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2000-08-01 23:13

Message:
From: Guido van Rossum <guido at CNRI.Reston.VA.US>
Subject: Re: [Python-bugs-list] urljoin() bug with odd no of '..' (PR#194)
Date: Mon, 31 Jan 2000 12:28:55 -0500

Thanks for your bug report and fix.  I agree with your diagnosis.

Would you please be so kind as to resend your patch with the
legal disclaimer from 
http://www.python.org/1.5/bugrelease.html

--Guido van Rossum (home page: http://www.python.org/~guido/)


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=210832&group_id=5470



More information about the Python-bugs-list mailing list