[Python-bugs-list] [ python-Bugs-503031 ] urllib.py: open_http() host problem
noreply@sourceforge.net
noreply@sourceforge.net
Wed, 16 Jan 2002 09:31:02 -0800
Bugs item #503031, was opened at 2002-01-13 10:09
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=503031&group_id=5470
Category: Python Library
Group: Python 2.2
Status: Open
Resolution: None
Priority: 5
Submitted By: Jason Cowley (sachmoz)
Assigned to: Mark Hammond (mhammond)
Summary: urllib.py: open_http() host problem
Initial Comment:
While trying to use the httplib.py urlopen() function,
as follows:
doc = urlopen("http://www.python.org").read()
print doc
I was receiving the following trace:
Traceback (most recent call last):
File "C:/Documents and
Settings/Administrator/Desktop/jason/grabpage.py",
line 3, in ?
doc = urlopen("http://www.python.org").read()
File "C:\Python22\lib\urllib.py", line 73, in urlopen
return _urlopener.open(url)
File "C:\Python22\lib\urllib.py", line 178, in open
return getattr(self, name)(url)
File "C:\Python22\lib\urllib.py", line 283, in
open_http
h = httplib.HTTP(host)
File "C:\Python22\lib\httplib.py", line 688, in
__init__
self._setup(self._connection_class(host, port))
File "C:\Python22\lib\httplib.py", line 343, in
__init__
self._set_hostport(host, port)
File "C:\Python22\lib\httplib.py", line 349, in
_set_hostport
port = int(host[i+1:])
ValueError: invalid literal for int():
I managed to track the problem down to the function
open_http() in urllib.py. The value of the 'host'
variable contained the string 'http:' rather
than 'www.python.org', when a call is made as follows:
httplib.HTTP(host)
Line 272 of urllib.py should be setting the
variable 'host' to the value of 'realhost' but the
statement is never executed. The function 'proxy_bypas
()' doesn't appear to do anything but return 0.
I fixed it for my own purposes by adding a statement:
host = realhost
----------------------------------------------------------------------
>Comment By: Thomas Heller (theller)
Date: 2002-01-16 09:31
Message:
Logged In: YES
user_id=11105
The (my) conclusion of all this is:
# Per-protocol settings
for p in proxyServer.split(';'):
protocol, address = p.split('=', 1)
proxies[protocol] = '%s://%s' % (protocol, address)
It should add a "http://" prefix if one isn't already there.
----------------------------------------------------------------------
Comment By: Jason Cowley (sachmoz)
Date: 2002-01-16 07:37
Message:
Logged In: YES
user_id=426262
I have found the document that theller referred to online,
the URL is:
http://www.microsoft.com/WINDOWS2000/techinfo/reskit/en/ierk
/Ch13_d.htm
or alternatively:
http://www.microsoft.com/windows2000/techinfo/reskit/en-
us/default.asp?url=/WINDOWS2000/techinfo/reskit/en-
us/ierk/Ch13_d.asp
The actual registry entries that I posted earlier appear to
be set by Windows2000/IE6. If you make the following series
of clicks from IE:
Tools | Internet Options | Connections
Then in the section: Dial-up and Virtual Private Network
settings click the Settings... button, then the Advanced...
button under Proxy server, you will see a list of proxy
servers for different protocols.
If I add a fake proxy server for Gopher, such as:
http://www-cache.sachmoz.com
with port 8080, the registry key data is altered to:
ftp=http://www-cache.freeserve.com:8080;gopher=http://www-
cache.sachmoz.com:8080;http=http://www-
cache.freeserve.com:8080
----------------------------------------------------------------------
Comment By: Thomas Heller (theller)
Date: 2002-01-16 05:03
Message:
Logged In: YES
user_id=11105
Here's a quote from Microsoft docs (Windows 2000 Server
Resource Kit). I have not found it online, but it's in my
local MSDN library April 2001:
MSDNLibrary ->
Resource Kits ->
Windows 2000 Server Resource Kit ->
Internet Explorer 5 Resource Kit ->
Part 3: Customizing ->
Chapter 13: Setting up Servers ->
Working with Proxy Servers
<quote>
Proxy locations that do not begin with a protocol (such as
http:// or ftp://) are assumed to be a CERN-type HTTP proxy.
For example, when the user types proxy, it's treated the
same as if the user typed http://proxy. For FTP gateways,
such as the TIS FTP gateway, the proxy should be listed with
the ftp:// in front of the proxy name. For example, an FTP
gateway for an FTP proxy would have this format:
ftp://ftpproxy
When you enter proxy settings, use the following syntax,
where <address> is the Web address of the proxy server and
<port> is the port number assigned to the proxy server:
http://<address>:<port>
For example, if the address of the proxy server is
proxy.example.microsoft.com and the port number is 80, the
setting in the Proxy Server box for LAN settings in the
Proxy Settings dialog box or the Proxy Settings screen of
the Customization wizard should read as follows:
http://proxy.example.microsoft.com:80
</quote>
----------------------------------------------------------------------
Comment By: Thomas Heller (theller)
Date: 2002-01-15 12:58
Message:
Logged In: YES
user_id=11105
Isn't the correct setting for an ftp
proxy "http://192.168.0.15:3128" instead
of "ftp://192.168.0.15:3128".
At least, in Python 2.1, only the former works for me.
In Python 2.2 neither does, but maybe that's a different
issue.
----------------------------------------------------------------------
Comment By: Guido van Rossum (gvanrossum)
Date: 2002-01-15 12:18
Message:
Logged In: YES
user_id=6380
Looks like this code block in getproxies_registry() is
broken then:
# Per-protocol settings
for p in proxyServer.split(';'):
protocol, address = p.split('=', 1)
proxies[protocol] = '%s://%s' %
(protocol, address)
It should only add the <protocol>:// prefix if one isn't
already there.
----------------------------------------------------------------------
Comment By: Thomas Heller (theller)
Date: 2002-01-15 12:07
Message:
Logged In: YES
user_id=11105
It seems sachmoz registry settings are valid for IE, I
checked this by changing my own settings from
ftp=192.168.0.15:3128;http=192.168.0.13:3128
to
ftp=http://192.168.0.15:3128;http=http://192.168.0.13:3128
IE works either before or after this change.
Here's the only article I found on MSDN showing an example:
http://support.microsoft.com/default.aspx?scid=kb;EN-
US;q164035
----------------------------------------------------------------------
Comment By: Guido van Rossum (gvanrossum)
Date: 2002-01-15 11:24
Message:
Logged In: YES
user_id=6380
I'm assigning this to Mark Hammond, who knows more about the
Windows registry. Could there be a bug in the function
getproxies_registry()? See the last two posts from sachmoz;
ignore the original problem description.
----------------------------------------------------------------------
Comment By: Jason Cowley (sachmoz)
Date: 2002-01-15 10:17
Message:
Logged In: YES
user_id=426262
The actual settings in the registry look slightly different:
http=http://www-cache.freeserve.com:8080;ftp=http://www-
cache.freeserve.com:8080
Notice the '=' signs.
These settings have been set automatically by Freeserve,
and so there are perhaps millions of people in the UK with
the same registry settings (and therefore the same problem).
I have mailed Freeserve to ask them to confirm if the
settings are correct.
----------------------------------------------------------------------
Comment By: Guido van Rossum (gvanrossum)
Date: 2002-01-15 09:46
Message:
Logged In: YES
user_id=6380
If that's really what getproxies_registry() prints, then
look again at the URL in the dict for key 'http'. It says
'http://http://www-cache.freeserve.com:8080'
In other words a double http:// prefix!!!
If you fix the registry the problem will go away.
I don't think this is a problem with urllib.py.
----------------------------------------------------------------------
Comment By: Jason Cowley (sachmoz)
Date: 2002-01-14 05:04
Message:
Logged In: YES
user_id=426262
I hope this is what you need:
>>> print getproxies_environment()
{}
>>> print getproxies_registry()
{'ftp': 'ftp://http://www-
cache.freeserve.com:8080', 'http': 'http://http://www-
cache.freeserve.com:8080'}
>>>
----------------------------------------------------------------------
Comment By: Guido van Rossum (gvanrossum)
Date: 2002-01-13 20:46
Message:
Logged In: YES
user_id=6380
Hm, you can only ever end up in that code block if you have
some kind of proxy settings active. On Windows, those are in
the registry, even if you think they are not.
Your fix is clearly not right -- but in order to find out
what is right, I need your proxy settings.
----------------------------------------------------------------------
Comment By: Jason Cowley (sachmoz)
Date: 2002-01-13 14:35
Message:
Logged In: YES
user_id=426262
I am not using a proxy, but I have a dial-up connection to
an ISP and I am using Windows 2000.
The Python version info is:
Python 2.2 (#28, Dec 21 2001, 12:21:22) [MSC 32 bit
(Intel)] on win32
Here is the modification I made to urllib.py:
272: if proxy_bypass(realhost):
273: host = realhost # this line was not being executed
274: host = realhost # I added this to fix urlopen()
Without this line I added, the following statement was
being executed 9-10 lines below, with 'http:' as the value
of host:
h = httplib.HTTP(host)
Which later caused the problem when _set_hostport in
httplib.py tries to convert an empty string to an int on
line 349:
port = int(host[i+1:])
I have attached my copy of "urllib.py".
----------------------------------------------------------------------
Comment By: Guido van Rossum (gvanrossum)
Date: 2002-01-13 13:33
Message:
Logged In: YES
user_id=6380
I cannot reproduce this.
What are your proxy settings?
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=503031&group_id=5470