[Python-Dev] urlparse.urlunsplit should be smarter about +

Stephen J. Turnbull stephen at xemacs.org
Mon May 10 10:11:12 CEST 2010


Senthil Kumaran writes:

 > Not all urls have the 'authority' component after the scheme. (sip
 > based urls for e.g) urlparse differentiates those by maintaining a
 > list of scheme names which will follow the pattern of parsing, and
 > joining for the urls which  have a netloc (or authority component).
 > This is in general according to RFC 3986 itself.

This actually quite at variance with the RFC.  The grammar in section
3 doesn't make any reference to schemes as being significant in
parsing.  Whether an authority component is to be parsed or not is
entirely dependent on the presence or absence of the "//" delimiter
following the scheme and its colon delimiter.  AFAICS, if the "//"
delimiter is present, an authority component (possibly empty) *must*
be present in the parse.  Presumably an unparse should then include
that empty component in the generated URI (ie, a "scheme:///..." URI).

Thus, it seems that by the RFC, regardless of any registration,

    urlparse.unsplit(urlparse.split('git+file:///foo/bar'))

should produce 'git+file:///foo/bar' (or perhaps raise an error in
"validation" mode).  The only question is whether registration of
'git+file' as a use_netloc scheme should force

    urlparse.unsplit(urlparse.split('git+file:/foo/bar'))

to return 'git+file:///foo/bar', or whether 'git+file:/foo/bar' would
be acceptable (or better). 

None of what I wrote here or elsewhere takes account of backward
compatibility, it is true.  I'm only talking about the letter of the
RFC.


More information about the Python-Dev mailing list