[Python-Dev] Path object design
Steve Holden
steve at holdenweb.com
Sat Nov 4 05:34:12 CET 2006
Phillip J. Eby wrote:
> At 01:56 AM 11/4/2006 +0100, Andrew Dalke wrote:
>
>>os.join assumes the base is a directory
>>name when used in a join: "inserting '/' as needed" while RFC
>>1808 says
>>
>> The last segment of the base URL's path (anything
>> following the rightmost slash "/", or the entire path if no
>> slash is present) is removed
>>
>>Is my intuition wrong in thinking those should be the same?
>
>
> Yes. :)
>
> Path combining and URL absolutization(?) are inherently different
> operations with only superficial similarities. One reason for this is that
> a trailing / on a URL has an actual meaning, whereas in filesystem paths a
> trailing / is an aberration and likely an actual error.
>
> The path combining operation says, "treat the following as a subpath of the
> base path, unless it is absolute". The URL normalization operation says,
> "treat the following as a subpath of the location the base URL is
> *contained in*".
>
> Because of this, os.path.join assumes a path with a trailing separator is
> equivalent to a path without one, since that is the only reasonable way to
> interpret treating the joined path as a subpath of the base path.
>
> But for a URL join, the path /foo and the path /foo/ are not only
> *different paths* referring to distinct objects, but the operation wants to
> refer to the *container* of the referenced object. /foo might refer to a
> directory, while /foo/ refers to some default content (e.g.
> index.html). This is actually why Apache normally redirects you from /foo
> to /foo/ before it serves up the index.html; relative URLs based on a base
> URL of /foo won't work right.
>
> The URL approach is designed to make peer-to-peer linking in a given
> directory convenient. Instead of referring to './foo.html' (as one would
> have to do with filenames, you can simply refer to 'foo.html'. But the
> cost of saving those characters in every link is that joining always takes
> place on the parent, never the tail-end. Thus directory URLs normally end
> in a trailing /, and most tools tend to automatically redirect when
> somebody leaves it off. (Because otherwise the links would be wrong.)
>
Having said this, Andrew *did* demonstrate quite convincingly that the
current urljoin has some fairly egregious directory traversal glitches.
Is it really right to punt obvious gotchas like
>>>urlparse.urljoin("http://blah.com/a/b/c", "../../../../")
'http://blah.com/../../'
>>>
to the server?
regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden
More information about the Python-Dev
mailing list