[Python-Dev] Path object design

Andrew Dalke dalke at dalkescientific.com
Mon Nov 6 15:57:59 CET 2006


Andrew:
> >>> urlparse.urljoin("http://blah.com/", "..")
> 'http://blah.com/'
> >>> urlparse.urljoin("http://blah.com/", "../")
> 'http://blah.com/../'
> >>> urlparse.urljoin("http://blah.com/", "../..")
> 'http://blah.com/'

/F:
> as I said, today's urljoin doesn't guarantee that the output is
> the *shortest* possible way to represent the resulting URI.

I didn't think anyone was making that claim.  The module claims
RFC 1808 compliance.  From the docstring:

    DESCRIPTION
        See RFC 1808: "Relative Uniform Resource Locators", by R. Fielding,
        UC Irvine, June 1995.

Now quoting from RFC 1808:

   5.2.  Abnormal Examples

   Although the following abnormal examples are unlikely to occur in
   normal practice, all URL parsers should be capable of resolving them
   consistently.  Each example uses the same base as above.

   An empty reference resolves to the complete base URL:

      <>            = <URL:http://a/b/c/d;p?q#f>

   Parsers must be careful in handling the case where there are more
   relative path ".." segments than there are hierarchical levels in the
   base URL's path.

My claim is that "consistent" implies "in the spirit of the rest of the RFC"
and "to a human trying to make sense of the results" and not only
mean "does the same thing each time."  Else

>>> urljoin("http://blah.com/", "../../..")
'http://blah.com/there/were/too/many/dot-dot/path/elements/in/the/relative/url'

would be equally consistent.

>>> for rel in ".. ../ ../.. ../../ ../../.. ../../../ ../../../..".split():
...   print repr(rel), repr(urlparse.urljoin("http://blah.com/", rel))
...
'..' 'http://blah.com/'
'../' 'http://blah.com/../'
'../..' 'http://blah.com/'
'../../' 'http://blah.com/../../'
'../../..' 'http://blah.com/../'
'../../../' 'http://blah.com/../../../'
'../../../..' 'http://blah.com/../../'

I grant there is a consistency there.  It's not one most would have
predicted beforehand.

Then again, "should" is that wishy-washy "unless you've got a good
reason to do it a different way" sort of constraint.

        Andrew
        dalke at dalkescientific.com


More information about the Python-Dev mailing list