urllib (54, 'Connection reset by peer') error

Chris cwitts at gmail.com
Fri Jun 13 16:38:11 CEST 2008


On Jun 13, 4:21 pm, chrispoliq... at gmail.com wrote:
> Hi,
>
> I have a small Python script to fetch some pages from the internet.
> There are a lot of pages and I am looping through them and then
> downloading the page using urlretrieve() in the urllib module.
>
> The problem is that after 110 pages or so the script sort of hangs and
> then I get the following traceback:
>
>
>
> Traceback (most recent call last):
>   File "volume_archiver.py", line 21, in <module>
>     urllib.urlretrieve(remotefile,localfile)
>   File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
> python2.5/urllib.py", line 89, in urlretrieve
>     return _urlopener.retrieve(url, filename, reporthook, data)
>   File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
> python2.5/urllib.py", line 222, in retrieve
>     fp = self.open(url, data)
>   File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
> python2.5/urllib.py", line 190, in open
>     return getattr(self, name)(url)
>   File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
> python2.5/urllib.py", line 328, in open_http
>     errcode, errmsg, headers = h.getreply()
>   File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
> python2.5/httplib.py", line 1195, in getreply
>     response = self._conn.getresponse()
>   File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
> python2.5/httplib.py", line 924, in getresponse
>     response.begin()
>   File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
> python2.5/httplib.py", line 385, in begin
>     version, status, reason = self._read_status()
>   File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
> python2.5/httplib.py", line 343, in _read_status
>     line = self.fp.readline()
>   File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
> python2.5/socket.py", line 331, in readline
>     data = recv(1)
> IOError: [Errno socket error] (54, 'Connection reset by peer')
>
>
>
> My script code is as follows:
> -----------------------------------------
> import os
> import urllib
>
> volume_number = 149 # The volumes number 150 to 544
>
> while volume_number < 544:
>         volume_number = volume_number + 1
>         localfile = '/Users/Chris/Desktop/Decisions/' + str(volume_number) +
> '.html'
>         remotefile = 'http://caselaw.lp.findlaw.com/scripts/getcase.pl?
> court=us&navby=vol&vol=' + str(volume_number)
>         print 'Getting volume number:', volume_number
>         urllib.urlretrieve(remotefile,localfile)
>
> print 'Download complete.'
> -----------------------------------------
>
> Once I get the error once running the script again doesn't do much
> good.  It usually gets two or three pages and then hangs again.
>
> What is causing this?

The server is causing it, you could just alter your code

import os
import urllib
import time

volume_number = 149 # The volumes number 150 to 544
localfile = '/Users/Chris/Desktop/Decisions/%s.html'
remotefile = 'http://caselaw.lp.findlaw.com/scripts/getcase.pl?
court=us&navby=vol&vol=%s'
while volume_number < 544:
    volume_number += 1
    print 'Getting volume number:', volume_number
    try:
        urllib.urlretrieve(remotefile%volume_number,localfile
%volume_number)
    except IOError:
        volume_number -= 1
        time.sleep(5)

print 'Download complete.'

That way if the attempt fails it rolls back the volume number, pauses
for a few seconds and tries again.



More information about the Python-list mailing list