[Python-Dev] site triggering a bug in urllib2

John J Lee jjl at pobox.com
Fri Jan 20 21:28:23 CET 2006


On Tue, 17 Jan 2006, Thomas Mangin wrote:
[...]
> I have hit a bug with python 2.4.2 (on Mandriva 2006) using urllib2.
> The code which trigger the bug is as follow..
>
> import urllib2
> req = urllib2.Request("http://66.117.37.13/")
>
> # makes no difference ..
> req.add_header('Connection', 'close')
>
> handle = urllib2.urlopen(req)
> data = handle.read()
> print data
>
> using a timeout on the socket does not work neither.

This is a real bug, I think.  I filed a report on the SF bug tracker:

http://python.org/sf/1411097


The problem seems to be the (ab)use of socket._fileobject in urllib2 (I 
believe this was introduced when urllib2 switched to using 
httplib.HTTPConnection).  The purpose of the hack (as commented in 
AbstractHTTPHandler.do_open()) is to provide .readline() and .readlines() 
methods on the response object returned by urllib2.urlopen().

Workaround if you're not using .readline() or .readlines() (against 2.4.2, 
but should apply against current SVN):

--- urllib2.py.orig     Fri Jan 20 20:10:56 2006
+++ urllib2.py  Fri Jan 20 20:12:07 2006
@@ -1006,8 +1006,7 @@
          # XXX It might be better to extract the read buffering code
          # out of socket._fileobject() and into a base class.

-        r.recv = r.read
-        fp = socket._fileobject(r)
+        fp = r.fp

          resp = addinfourl(fp, r.msg, req.get_full_url())
          resp.code = r.status


Not sure yet what the actual problem/cure is...


John


More information about the Python-Dev mailing list