[ python-Bugs-1546628 ] urlparse.urljoin odd behaviour
SourceForge.net
noreply at sourceforge.net
Tue Aug 29 13:29:43 CEST 2006
Bugs item #1546628, was opened at 2006-08-25 23:04
Message generated for change (Comment added) made by the_j10
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1546628&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Andres Riancho (andresriancho)
Assigned to: Nobody/Anonymous (nobody)
Summary: urlparse.urljoin odd behaviour
Initial Comment:
Hi !
I think i have found a bug on the urljoin function
of the urlparse
module. I'm using Python 2.4.3 (#2, Apr 27 2006,
14:43:58), [GCC 4.0.3
(Ubuntu 4.0.3-1ubuntu5)] on linux2 . Here is a demo of
the bug :
>>> import urlparse
>>>urlparse.urljoin('http://www.f00.com/','//a')
'http://a'
>>>
urlparse.urljoin('http://www.f00.com/','https://0000/somethingIsWrong')
'https://0000/somethingIsWrong'
>>>
urlparse.urljoin('http://www.f00.com/','https://0000/somethingIsWrong')
'https://0000/somethingIsWrong'
>>>
urlparse.urljoin('http://www.f00.com/','file:///etc/passwd')
'file:///etc/passwd'
The result for the first call to urljoin should be
either
'http://www.f00.com/a' or 'http://www.f00.com//a'. The
result to the
second and third call to urljoin should be
'http://www.f00.com/', or maybe an
exception ?
Please correct me if i'm wrong and this is some kind
of feature or
the bug was already reported. This bug can result in a
security vuln,
take this code as an example:
// viewImage.py //
import htmlTools
# Some fake
module, just for the example
import urlparse
# module
with bug.
htmlTools.startHtml()
# print <html>
params = htmlTools.getParams() # get the
query string
parameters
htmlTools.printToHtml( '<img src=' + urlparse.urljoin(
'http://myWebsite/' , params['image'] ) + '>' )
htmlTools.endHtml()
# print </html>
// viewImage.py //
The code should generate an html that shows an image
from the site
http://myWebsite/, but with the urljoin bug, the image
source can be
manipulated and result in a completely different html.
Cheers,
Andres Riancho
----------------------------------------------------------------------
Comment By: Andrew Jones (the_j10)
Date: 2006-08-29 21:29
Message:
Logged In: YES
user_id=332575
The second argument in the urljoin method can be either an
absolute url or a relative url as specified by rfc1808. So
your 1st example: '//a' gives a relative position w.r.t the
base resulting in: 'http://a'. This is similar to how `cd
/boot` takes you to a path relative to the filesystem's root
'/'.
In the rest of your examples you have the scheme name
'https'in the url as the 2nd argument. urljoin follows the
rfc1808 and accepts the second argument if it has a scheme
name as the absolute url and returns it.
This behavior is not very intuitive. Perhaps the urlparse
could be extended to have a urlappend method, which has the
behavior you expected. Hmmm...
-- Andrew
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1546628&group_id=5470
More information about the Python-bugs-list
mailing list