[Python-Dev] Path object design

Phillip J. Eby pje at telecommunity.com
Sat Nov 4 03:09:47 CET 2006


At 01:56 AM 11/4/2006 +0100, Andrew Dalke wrote:
>os.join assumes the base is a directory
>name when used in a join: "inserting '/' as needed" while RFC
>1808 says
>
>            The last segment of the base URL's path (anything
>            following the rightmost slash "/", or the entire path if no
>            slash is present) is removed
>
>Is my intuition wrong in thinking those should be the same?

Yes.  :)

Path combining and URL absolutization(?) are inherently different 
operations with only superficial similarities.  One reason for this is that 
a trailing / on a URL has an actual meaning, whereas in filesystem paths a 
trailing / is an aberration and likely an actual error.

The path combining operation says, "treat the following as a subpath of the 
base path, unless it is absolute".  The URL normalization operation says, 
"treat the following as a subpath of the location the base URL is 
*contained in*".

Because of this, os.path.join assumes a path with a trailing separator is 
equivalent to a path without one, since that is the only reasonable way to 
interpret treating the joined path as a subpath of the base path.

But for a URL join, the path /foo and the path /foo/ are not only 
*different paths* referring to distinct objects, but the operation wants to 
refer to the *container* of the referenced object.  /foo might refer to a 
directory, while /foo/ refers to some default content (e.g. 
index.html).  This is actually why Apache normally redirects you from /foo 
to /foo/ before it serves up the index.html; relative URLs based on a base 
URL of /foo won't work right.

The URL approach is designed to make peer-to-peer linking in a given 
directory convenient.  Instead of referring to './foo.html' (as one would 
have to do with filenames, you can simply refer to 'foo.html'.  But the 
cost of saving those characters in every link is that joining always takes 
place on the parent, never the tail-end.  Thus directory URLs normally end 
in a trailing /, and most tools tend to automatically redirect when 
somebody leaves it off.  (Because otherwise the links would be wrong.)



More information about the Python-Dev mailing list