urllib2 spinning CPU on read

kdotsky kdotsky at gmail.com
Sun Nov 26 09:54:43 CET 2006

Hello All,
I've ran into this problem on several sites where urllib2 will hang
(using all the CPU) trying to read a page.  I was able to reproduce it
for one particular site.  I'm using python 2.4

import urllib2
url = 'http://www.wautomas.info'
request = urllib2.Request(url)
opener = urllib2.build_opener()
result = opener.open(request)
data = result.read()

It never returns from this read call.

I did some profiling to try and see what was going on and make sure it
wasn't my code.  There was a huge number of calls to (and amount of
time spent in) socket.py:315(readline) and to recv.  A large amount of
time was also spent in httplib.py:482(_read_chunked).  Here's the
significant part of the statistics:

         32564841 function calls (32563582 primitive calls) in 545.250
CPU seconds

   Ordered by: internal time
   List reduced from 416 to 50 due to restriction <50>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 10844775  233.920    0.000  447.440    0.000 socket.py:315(readline)
 10846078  152.430    0.000  152.430    0.000 :0(recv)
        3   97.330   32.443  544.730  181.577
 10844812   61.090    0.000   61.090    0.000 :0(join)

Also, where should I go to see if something like this has already been
reported as a bug?

Thanks for any help you can give me.

