[Python-bugs-list] [ python-Bugs-639311 ] urllib.basejoin() mishandles ''
SourceForge.net
noreply@sourceforge.net
Wed, 21 May 2003 20:14:34 -0700
Bugs item #639311, was opened at 2002-11-16 04:34
Message generated for change (Comment added) made by bcannon
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=639311&group_id=5470
Category: Python Library
Group: Python 2.2.1
Status: Open
Resolution: None
Priority: 5
Submitted By: Mike Brown (mike_j_brown)
Assigned to: Nobody/Anonymous (nobody)
Summary: urllib.basejoin() mishandles ''
Initial Comment:
It's not entirely clear whether urllib.basejoin() intends to
implement RFC 2396's "resolution of relative URI
references to absolute form" faithfully, but it seems to
behave improperly when given an empty string as the
relative URI to make absolute.
>>> from urllib import basejoin
>>> basejoin('http://host/foo/bar.xml','')
'http://host/foo/'
I believe it should return the base as-is, because the
empty string is a reference to the document that
contains that reference... and presumably the
document's URI is what you're passing in as the base.
----------------------------------------------------------------------
>Comment By: Brett Cannon (bcannon)
Date: 2003-05-21 20:14
Message:
Logged In: YES
user_id=357491
Perhaps urllib.basejoin (which is not documented) should just become a
wrapper for urlparse.urljoin ? It won't solve this bug but it would cut back
on unneeded code.
----------------------------------------------------------------------
Comment By: Mike Brown (mike_j_brown)
Date: 2002-11-26 02:41
Message:
Logged In: YES
user_id=371366
I was partly mistaken; the document's URI is not necessarily
the base. A reference with an empty path (e.g., an empty
string or just a fragment identifier) is a reference to the current
document, regardless of the base URI you are resolving
against. A base URI is only for resolving relative URIs that are
not referencing the current document. See some discussion
at http://lists.w3.org/Archives/Public/uri/2002Jan/0015.html
So neither urllib.basejoin() nor urlparse.urljoin() fully
implement the RFC 2396 "resolution to absolute form", since
there would need to be a way to indicate "current document"
other than returning the base.
Nevertheless, basejoin()'s behavior differs from urlparse.urljoin
()'s when presented with the empty string, and it's not clear
whether that is intentional.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=639311&group_id=5470