[Python-Dev] urlparse.urlunsplit should be smarter about +

Senthil Kumaran orsenthil at gmail.com
Mon May 10 07:38:14 CEST 2010


On Sun, May 09, 2010 at 03:19:40PM -0600, David Abrahams wrote:
> John Arbash Meinel wrote:
> > Don't you need to register the "git+file:///" url for urlparse to
> > properly split it?
> 
> Yes.  But the question is whether urlparse should really be so fragile
> that every hierarchical scheme needs to be explicitly registered.
> Surely ending with “+file” should be sufficient to have it recognized
> as a file-based scheme

Not all urls have the 'authority' component after the scheme. (sip
based urls for e.g) urlparse differentiates those by maintaining a
list of scheme names which will follow the pattern of parsing, and
joining for the urls which  have a netloc (or authority component).
This is in general according to RFC 3986 itself.

Yes,'+' is a valid char in url schemes and svn, svn+ssh will be as per
your expectations. But git and git+ssh was missing in there and I
attached a patch in issue8657 to include the same. It is rightly a bug
in the module. But for any general scheme and assuming '+file' would
follow valid authority component, is not something I am sure that
should be in urlparse's expected behavior.



-- 
Senthil

Do not seek death; death will find you.  But seek the road which makes death
a fulfillment.
		-- Dag Hammarskjold


More information about the Python-Dev mailing list