[Python-Dev] urlparse.urlunsplit should be smarter about +
Stephen J. Turnbull
stephen at xemacs.org
Mon May 10 10:11:12 CEST 2010
Senthil Kumaran writes:
> Not all urls have the 'authority' component after the scheme. (sip
> based urls for e.g) urlparse differentiates those by maintaining a
> list of scheme names which will follow the pattern of parsing, and
> joining for the urls which have a netloc (or authority component).
> This is in general according to RFC 3986 itself.
This actually quite at variance with the RFC. The grammar in section
3 doesn't make any reference to schemes as being significant in
parsing. Whether an authority component is to be parsed or not is
entirely dependent on the presence or absence of the "//" delimiter
following the scheme and its colon delimiter. AFAICS, if the "//"
delimiter is present, an authority component (possibly empty) *must*
be present in the parse. Presumably an unparse should then include
that empty component in the generated URI (ie, a "scheme:///..." URI).
Thus, it seems that by the RFC, regardless of any registration,
urlparse.unsplit(urlparse.split('git+file:///foo/bar'))
should produce 'git+file:///foo/bar' (or perhaps raise an error in
"validation" mode). The only question is whether registration of
'git+file' as a use_netloc scheme should force
urlparse.unsplit(urlparse.split('git+file:/foo/bar'))
to return 'git+file:///foo/bar', or whether 'git+file:/foo/bar' would
be acceptable (or better).
None of what I wrote here or elsewhere takes account of backward
compatibility, it is true. I'm only talking about the letter of the
RFC.
More information about the Python-Dev
mailing list