bug in urllib under Window?

Dave Berkeley dave at rotwang.freeserve.co.uk
Thu Sep 27 07:42:40 EDT 2001


Hello

I have found what appears to be a bug in urllib.

urllib.urlopen() has been failing with error 'invalid literal'. It appears
to be confusing an http: protocol with the domain. I've had this problem
with Python 2.0 and 2.1. I seem to remember using urlopen() without problems
before. I suspect that the problem is caused by the data held in the Windows
registry being inconsistent. It may well be that urllib is correct and my
service provider's dialer installer is buggy. I noticed that other people
have come across this problem before.

The traceback is as follows:

  File "C:\PROGRAM FILES\PYTHON21\lib\urllib.py", line 71, in urlopen
    return _urlopener.open(url)
  File "C:\PROGRAM FILES\PYTHON21\lib\urllib.py", line 176, in open
    return getattr(self, name)(url)
  File "C:\PROGRAM FILES\PYTHON21\lib\urllib.py", line 277, in open_http
    h = httplib.HTTP(host)
  File "C:\PROGRAM FILES\PYTHON21\lib\httplib.py", line 663, in __init__
    self._conn = self._connection_class(host, port)
  File "C:\PROGRAM FILES\PYTHON21\lib\httplib.py", line 342, in __init__
    self._set_hostport(host, port)
  File "C:\PROGRAM FILES\PYTHON21\lib\httplib.py", line 348, in
_set_hostport
    port = int(host[i+1:])
ValueError: invalid literal for int():

The 'host' variable contains the protocol, 'http:', which is clearly wrong.
The code is attempting to strip the port number from it and fails.

I tried debuging the urllib (by putting print statements in it)

urllib.get_proxies() for Win32 returns (for protocol http)

'http://http://www-cache.freeserve.net:8080'

This is clearly wrong. This data is stored in the registry under:

'''HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Internet
settings\ProxyServer'''

which contains a semicolon delimited list of mappings of protocol to proxy,
in my case:


'http=http://www-cache.freeserve.net:8080;ftp=http://www-cache.freeserve.net
:8080'

I tracked the problem down to the Windows specific code that reads data from
the registry, urllib.py line 1270:

    proxies[protocol] = '%s://%s' % (protocol, address)'

Should be

    proxies[protocol] = address

As the address seems to contain the protocol already. Maybe this is a bug
caused by incorrect data being entered in the registry. The protocol
information is clearly duplicated in this example. The code assumes that it
is not present in the url.

Has anyone had any similar experiences with urllib?

Dave Berkeley





More information about the Python-list mailing list