urllib (54, 'Connection reset by peer') error
mail at timgolden.me.uk
Wed Jun 18 11:24:26 EDT 2008
chrispoliquin at gmail.com wrote:
> Thanks for the help. The error handling worked to a certain extent
> but after a while the server does seem to stop responding to my
> I have a list of about 7,000 links to pages I want to parse the HTML
> of (it's basically a web crawler) but after a certain number of
> urlretrieve() or urlopen() calls the server just stops responding.
> Anyone know of a way to get around this? I don't own the server so I
> can't make any modifications on that side.
I think someone's already mentioned this, but it's almost
certainly an explicit or implicit throttling on the remote server.
If you're pulling 7,000 pages from a single server you need to
at the least you need to contact the maintainers in courtesy to
confirm that this is acceptable.
If you don't you may well cause your IP block to be banned on
their network, which could affect others as well as yourself.
More information about the Python-list