Andrew Dalke wrote:
as I said, today's urljoin doesn't guarantee that the output is the *shortest* possible way to represent the resulting URI.
I didn't think anyone was making that claim. The module claims RFC 1808 compliance. From the docstring:
DESCRIPTION See RFC 1808: "Relative Uniform Resource Locators", by R. Fielding, UC Irvine, June 1995.
Now quoting from RFC 1808:
5.2. Abnormal Examples
Although the following abnormal examples are unlikely to occur in normal practice, all URL parsers should be capable of resolving them consistently.
My claim is that "consistent" implies "in the spirit of the rest of the RFC" and "to a human trying to make sense of the results" and not only mean "does the same thing each time." Else
urljoin("http://blah.com/", "../../..") 'http://blah.com/there/were/too/many/dot-dot/path/elements/in/the/relative/ur...'
would be equally consistent.
perhaps, but such an urljoin wouldn't pass the minimize(base + relative) == minimize(urljoin(base, relative)) test that today's urljoin passes (where "minimize" is defined as "create the shortest possible URI that identifies the same target, according to the relevant RFC"). isn't the real issue in this subthread whether urljoin should be expected to pass the minimize(base + relative) == urljoin(base, relative) test? </F>