[Python-Dev] site triggering a bug in urllib2
John J Lee
jjl at pobox.com
Fri Jan 20 21:28:23 CET 2006
On Tue, 17 Jan 2006, Thomas Mangin wrote:
[...]
> I have hit a bug with python 2.4.2 (on Mandriva 2006) using urllib2.
> The code which trigger the bug is as follow..
>
> import urllib2
> req = urllib2.Request("http://66.117.37.13/")
>
> # makes no difference ..
> req.add_header('Connection', 'close')
>
> handle = urllib2.urlopen(req)
> data = handle.read()
> print data
>
> using a timeout on the socket does not work neither.
This is a real bug, I think. I filed a report on the SF bug tracker:
http://python.org/sf/1411097
The problem seems to be the (ab)use of socket._fileobject in urllib2 (I
believe this was introduced when urllib2 switched to using
httplib.HTTPConnection). The purpose of the hack (as commented in
AbstractHTTPHandler.do_open()) is to provide .readline() and .readlines()
methods on the response object returned by urllib2.urlopen().
Workaround if you're not using .readline() or .readlines() (against 2.4.2,
but should apply against current SVN):
--- urllib2.py.orig Fri Jan 20 20:10:56 2006
+++ urllib2.py Fri Jan 20 20:12:07 2006
@@ -1006,8 +1006,7 @@
# XXX It might be better to extract the read buffering code
# out of socket._fileobject() and into a base class.
- r.recv = r.read
- fp = socket._fileobject(r)
+ fp = r.fp
resp = addinfourl(fp, r.msg, req.get_full_url())
resp.code = r.status
Not sure yet what the actual problem/cure is...
John
More information about the Python-Dev
mailing list