[urllib2 + Tor] How to handle 404?
Steven McKay
shubalubdub at gmail.com
Fri Nov 7 13:20:13 EST 2008
On Fri, Nov 7, 2008 at 2:28 AM, Chris Rebert <clp at rebertia.com> wrote:
>
> On Fri, Nov 7, 2008 at 12:05 AM, Gilles Ganault <nospam at nospam.com> wrote:
> > Hello
> >
> > I'm using the urllib2 module and Tor as a proxy to download data
> > from the web.
> >
> > Occasionnally, urlllib2 returns 404, probably because of some issue
> > with the Tor network. This code doesn't solve the issue, as it just
> > loops through the same error indefinitely:
> >
> > =====
> *snip*
>
> Cheers,
> Chris
> --
> Follow the path of the Iguana...
> http://rebertia.com
>
> > =====
> >
> > Any idea of what I should do to handle this error properly?
> >
> > Thank you.
> > --
> > http://mail.python.org/mailman/listinfo/python-list
> >
> --
> http://mail.python.org/mailman/listinfo/python-list
It sounds like Gilles may be having an issue with persistent 404s, in
which case something like this could be more appropriate:
for id in rows:
url = 'http://www.acme.com/?code=' + id[0]
retries = 0
while retries < 10:
try:
req = urllib2.Request(url, None, headers)
response = urllib2.urlopen(req).read()
except HTTPError,e:
print 'Error code: ', e.code
retries += 1
time.sleep(2)
continue
else: #should align with the `except`
break
else:
print 'Fetch of ' + url + ' failed after ' + retries + 'tries.'
handle_success(response) #should align with `url =` line
More information about the Python-list
mailing list