[ python-Bugs-754016 ] urlparse goes wrong with IP:port without scheme

Sun Dec 26 22:55:51 CET 2004

Bugs item #754016, was opened at 2003-06-13 12:15
Message generated for change (Comment added) made by facundobatista
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=754016&group_id=5470

Category: Python Library
>Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Thomas Krüger (kruegi)
Assigned to: Nobody/Anonymous (nobody)
Summary: urlparse goes wrong with IP:port without scheme

Initial Comment:
urlparse doesnt work if IP and port are given without 
scheme:

&gt;&gt;&gt; urlparse.urlparse('1.2.3.4:80','http')
('1.2.3.4', '', '80', '', '', '')

should be:

&gt;&gt;&gt; urlparse.urlparse('1.2.3.4:80','http')
('http', '1.2.3.4', '80', '', '', '')

----------------------------------------------------------------------

>Comment By: Facundo Batista (facundobatista)
Date: 2004-12-26 18:55

Message:
Logged In: YES 
user_id=752496

The problem is still present in Py2.3.4.

IMO, it should support dirs without the "http://" or raise
an error if it's not present (never fail silently!).

----------------------------------------------------------------------

Comment By: Shannon Jones (sjones)
Date: 2003-06-14 01:18

Message:
Logged In: YES 
user_id=589306

Ok, I researched this a bit, and the situation isn't as
simple as it first appears. The RFC that urlparse tries to
follow is at http://www.faqs.org/rfcs/rfc1808.html and
notice specifically sections 2.1 and 2.2.

It seems to me that the source code follows rfc1808
religiously, and in that sense it does the correct thing.
According to the RFC, the netloc should begin with a '//',
and since your example didn't include one then it technical
was an invalid URL. Here is another example where it seems
Python fails to do the right thing:

&gt;&gt;&gt; urlparse.urlparse('python.org')
('', '', 'python.org', '', '', '')
&gt;&gt;&gt; urlparse.urlparse('python.org', 'http')
('http', '', 'python.org', '', '', '')

Note that it is putting 'python.org' as the path and not the
netloc. So the problem isn't limited to just when you use a
scheme parameter and/or a port number. Now if we put '//' at
the beginning, we get:

&gt;&gt;&gt; urlparse.urlparse('//python.org')
('', 'python.org', '', '', '', '')
&gt;&gt;&gt; urlparse.urlparse('//python.org', 'http')
('http', 'python.org', '', '', '', '')

So here it does the correct thing.

There are two problems though. First, it is common for
browsers and other software to just take a URL without a
scheme and '://' and assume it is http for the user. While
the URL is technically not correct, it is still common
usage. Also, urlparse does take a scheme parameter. It seems
as though this parameter should be used with a scheme-less
URL to give it a default one like web browsers do.

So somebody needs to make a decision. Should urlparse follow
the RFC religiously and require '//' in front of netlocs? If
so, I think the documentation should give an example showing
this and showing how to use the 'scheme' parameter. Or
should urlparse use the more commonly used form of a URL
where '//' is omitted when the scheme is omitted? If so,
urlparse.py will need to be changed. Or maybe another
fuction should be added to cover whichever behaviour
urlparse doesn't cover.

In any case, you can temporarily solve your problem by
making sure that URL's without a scheme have '//' at the
front. So your example becomes:

&gt;&gt;&gt; urlparse.urlparse('//1.2.3.4:80', 'http')
('http', '1.2.3.4:80', '', '', '', '')

----------------------------------------------------------------------

Comment By: Shannon Jones (sjones)
Date: 2003-06-13 23:39

Message:
Logged In: YES 
user_id=589306

Sorry, previous comment got cut off...

urlparse.urlparse takes a url of the format:
    &lt;scheme&gt;://&lt;netloc&gt;/&lt;path&gt;;&lt;params&gt;?&lt;query&gt;#&lt;fragment&gt;

And returns a 6-tuple of the format:
    (scheme, netloc, path, params, query, fragment).

An example from the library refrence takes:
    urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')

And produces:
    ('http', 'www.cwi.nl:80', '/%7Eguido/Python.html', '',
'', '')

--------------------------------

Note that there isn't a field for the port number in the
6-tuple. Instead, it is included in the netloc. Urlparse
should handle your example as:

&gt;&gt;&gt; urlparse.urlparse('1.2.3.4:80','http') 
('http', '1.2.3.4:80', '', '', '', '')

Instead, it gives the incorrect output as you indicated.

----------------------------------------------------------------------

Comment By: Shannon Jones (sjones)
Date: 2003-06-13 23:26

Message:
Logged In: YES 
user_id=589306

urlparse.urlparse takes a url of the format:
    &lt;scheme&gt;://&lt;netloc&gt;/&lt;path&gt;;&lt;params&gt;?&lt;query&gt;#&lt;fragment&gt;

And returns a 6-tuple of the format:
    (scheme, netloc, path, params, query, fragment).

An example from the library refrence takes:

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=754016&group_id=5470