At 01:56 AM 11/4/2006 +0100, Andrew Dalke wrote:
os.join assumes the base is a directory name when used in a join: "inserting '/' as needed" while RFC 1808 says
The last segment of the base URL's path (anything following the rightmost slash "/", or the entire path if no slash is present) is removed
Is my intuition wrong in thinking those should be the same?
Yes. :) Path combining and URL absolutization(?) are inherently different operations with only superficial similarities. One reason for this is that a trailing / on a URL has an actual meaning, whereas in filesystem paths a trailing / is an aberration and likely an actual error. The path combining operation says, "treat the following as a subpath of the base path, unless it is absolute". The URL normalization operation says, "treat the following as a subpath of the location the base URL is *contained in*". Because of this, os.path.join assumes a path with a trailing separator is equivalent to a path without one, since that is the only reasonable way to interpret treating the joined path as a subpath of the base path. But for a URL join, the path /foo and the path /foo/ are not only *different paths* referring to distinct objects, but the operation wants to refer to the *container* of the referenced object. /foo might refer to a directory, while /foo/ refers to some default content (e.g. index.html). This is actually why Apache normally redirects you from /foo to /foo/ before it serves up the index.html; relative URLs based on a base URL of /foo won't work right. The URL approach is designed to make peer-to-peer linking in a given directory convenient. Instead of referring to './foo.html' (as one would have to do with filenames, you can simply refer to 'foo.html'. But the cost of saving those characters in every link is that joining always takes place on the parent, never the tail-end. Thus directory URLs normally end in a trailing /, and most tools tend to automatically redirect when somebody leaves it off. (Because otherwise the links would be wrong.)