[Python-bugs-list] [ python-Bugs-639311 ] urllib.basejoin() mishandles ''

noreply@sourceforge.net noreply@sourceforge.net
Tue, 26 Nov 2002 02:41:21 -0800

Bugs item #639311, was opened at 2002-11-16 05:34
You can respond by visiting: 

Category: Python Library
Group: Python 2.2.1
Status: Open
Resolution: None
Priority: 5
Submitted By: Mike Brown (mike_j_brown)
Assigned to: Nobody/Anonymous (nobody)
Summary: urllib.basejoin() mishandles ''

Initial Comment:
It's not entirely clear whether urllib.basejoin() intends to 
implement RFC 2396's "resolution of relative URI 
references to absolute form" faithfully, but it seems to 
behave improperly when given an empty string as the 
relative URI to make absolute.

>>> from urllib import basejoin
>>> basejoin('http://host/foo/bar.xml','')

I believe it should return the base as-is, because the 
empty string is a reference to the document that 
contains that reference... and presumably the 
document's URI is what you're passing in as the base.


>Comment By: Mike Brown (mike_j_brown)
Date: 2002-11-26 03:41

Logged In: YES 

I was partly mistaken; the document's URI is not necessarily 
the base. A reference with an empty path (e.g., an empty 
string or just a fragment identifier) is a reference to the current 
document, regardless of the base URI you are resolving 
against. A base URI is only for resolving relative URIs that are 
not referencing the current document. See some discussion 
at http://lists.w3.org/Archives/Public/uri/2002Jan/0015.html

So neither urllib.basejoin() nor urlparse.urljoin() fully 
implement the RFC 2396 "resolution to absolute form", since 
there would need to be a way to indicate "current document" 
other than returning the base.

Nevertheless, basejoin()'s behavior differs from urlparse.urljoin
()'s when presented with the empty string, and it's not clear 
whether that is intentional.


You can respond by visiting: