On Wed, Mar 30, 2016 at 7:06 AM, Stephen J. Turnbull <stephen@xemacs.org> wrote: [...]
The correct syntaxes per [1] and RFC 3986 are
4) Path("file:///http://www.example.com") 5) Path("file://localhost/http://www.example.com") 6) Path("file://[127.0.0.1]/http://www.example.com") 7) Path("file://[::1]/http://www.example.com")
Even if correct, these do not refer to "http:/www.example.com", but to "/http:/www.example.com". An URI with a relative path would not make a lot of sense, because its meaning would depend on the context, which is against. Then again, all file system paths are 'relative' with respect to the file system you are working in. Also, while RFC 3986 is not super clear about this, I think '//' inside a URI path component may cause problems. IIUC this leads to a zero-length path segment '' in between the two slashes. It might work though if it it just gets passed forward to the file system in the end. I don't know if that can 'officially' be normalized to a single slash though. "URIs that identify in relation to the end-user's local context should only be used when the context itself is a defining aspect of the resource, such as when an on-line help manual refers to a file on the end- user's file system (e.g., "file:///etc/hosts")." - RFC 3986
As far as I can tell the colon in "http:" is RFC 3986-legal, since it has no URI syntactic meaning in the path component.
That's right; per RFC 3986, colons are allowed in a URI path component, even if it is disallowed in *the first path segment* of a *relative reference*, which I assume is to make relative references unambiguous as *URI references* which can be URIs or relative references. That is, a URI reference "mailto:email@address.com" is a mailto-URL and not a relative reference equivalent to "./mailto:email@address.com". So basically, if you want to express the (ridiculous) path 'http:/www.example.com' as a relative reference, you'd need to do './http:/www.example.com'.
This isn't as easy as it looks (which is why people are trying to delegate it to something they think of as "simple").
There's an additional problem with trying to cram URIs and Path together, which is that in a file system, "/a/b/symlink/../c" may refer to any file system object depending on symlink's target which is unknown, while as an URI path it refers to whatever "/a/b/c" refers to, and nothing else. (This is the semantic glitch I was thinking of earlier.)
This is an interesting issue, because the behavior is not implemented consistently: k7hoven@pomelo ~ % mkdir -p foo/bar k7hoven@pomelo ~ % ln -s foo/bar baz k7hoven@pomelo ~ % cd baz/.. k7hoven@pomelo ~ % cd baz k7hoven@pomelo ~/baz % cd .. k7hoven@pomelo ~ % echo "am I in foo/ or in ~/ ?" > baz/../question.txt k7hoven@pomelo ~ % cat question.txt cat: question.txt: No such file or directory k7hoven@pomelo ~ % cat foo/question.txt am I in foo/ or in ~/ ?
This means that URIs can be canonicalized syntactically, while doing so with file system paths is risky.
And that URI normalization should not be done automatically, especially if it is not clear if it's an URI or not. Then sometimes you also want to do scheme-specific normalization. -Koos