[Python-Dev] Path object design

Fri Nov 3 19:38:21 CET 2006

Andrew Dalke wrote:
> glyph:
> 
>>Path manipulation:
>>
>> * This is confusing as heck:
>>   >>> os.path.join("hello", "/world")
>>   '/world'
>>   >>> os.path.join("hello", "slash/world")
>>   'hello/slash/world'
>>   >>> os.path.join("hello", "slash//world")
>>   'hello/slash//world'
>>   Trying to formulate a general rule for what the arguments to os.path.join
>>are supposed to be is really hard.  I can't really figure out what it would
>>be like on a non-POSIX/non-win32 platform.
> 
> 
> Made trickier by the similar yet different behaviour of urlparse.urljoin.
> 
>  >>> import urlparse
>  >>> urlparse.urljoin("hello", "/world")
>  '/world'
>  >>> urlparse.urljoin("hello", "slash/world")
>  'slash/world'
>  >>> urlparse.urljoin("hello", "slash//world")
>  'slash//world'
>  >>>
> 
> It does not make sense to me that these should be different.
> 
Although the last two smell like bugs, the point of urljoin is to make 
an absolute URL from an absolute ("current page") URL and a relative 
(link) one. As we see:

  >>> urljoin("/hello", "slash/world")
'/slash/world'

and

  >>> urljoin("http://localhost/hello", "slash/world")
'http://localhost/slash/world'

but

  >>> urljoin("http://localhost/hello/", "slash/world")
'http://localhost/hello/slash/world'
  >>> urljoin("http://localhost/hello/index.html", "slash/world")
'http://localhost/hello/slash/world'
  >>>

I think we can probably conclude that this is what's supposed to happen. 
In the case of urljoin the first argument is interpreted as referencing 
an existing resource and the second as a link such as might appear in 
that resource.

regards
  Steve
-- 
Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd          http://www.holdenweb.com
Skype: holdenweb       http://holdenweb.blogspot.com
Recent Ramblings     http://del.icio.us/steve.holden