[issue754016] urlparse goes wrong with IP:port without scheme

Sat Jun 21 20:45:46 CEST 2008

Anthony Lenton <antoniolenton at gmail.com> added the comment:

I agree with facundobatista that the patch is bad, but for a different
reason: it now breaks with:

>>> import urlparse
>>> urlparse.urlparse ('http:')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/anthony/svn/python26/Lib/urlparse.py", line 108, in urlparse
tuple = urlsplit(url, scheme, allow_fragments)
  File "/home/anthony/svn/python26/Lib/urlparse.py", line 148, in urlsplit
if i > 0 and not url[i+1].isdigit():
IndexError: string index out of range

I'm afraid that it it's not evident that the expected behavior isn't
evident.

Take for example:

>>> import urlparse
>>> urlparse.urlparse('some.com', 'http')
ParseResult(scheme='http', netloc='', path='some.com', params='',
query='', fragment='')

Is the url referring to the some.com domain or to a windows executable file?

If you're using urlparse to parse only absolute urls then probably you
want the first component to be considered a net_loc, but not if you're
thinking of accepting also relative urls.

It would probably be better to be explicit and raise an exception if the
url is invalid, so that the user can prepend a '//' and resubmit if
needed.  Also we'd probably stop seeing bugreports about this issue :)

----------
nosy: +elachuni

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue754016>
_______________________________________