[urllib2 + Tor] How to handle 404?

Fri Nov 7 13:20:13 EST 2008

On Fri, Nov 7, 2008 at 2:28 AM, Chris Rebert <clp at rebertia.com> wrote:
>
> On Fri, Nov 7, 2008 at 12:05 AM, Gilles Ganault <nospam at nospam.com> wrote:
> > Hello
> >
> >        I'm using the urllib2 module and Tor as a proxy to download data
> > from the web.
> >
> > Occasionnally, urlllib2 returns 404, probably because of some issue
> > with the Tor network. This code doesn't solve the issue, as it just
> > loops through the same error indefinitely:
> >
> > =====
> *snip*
>
> Cheers,
> Chris
> --
> Follow the path of the Iguana...
> http://rebertia.com
>
> > =====
> >
> > Any idea of what I should do to handle this error properly?
> >
> > Thank you.
> > --
> > http://mail.python.org/mailman/listinfo/python-list
> >
> --
> http://mail.python.org/mailman/listinfo/python-list

It sounds like Gilles may be having an issue with persistent 404s, in
which case something like this could be more appropriate:

for id in rows:
       url  = 'http://www.acme.com/?code=' + id[0]
       retries = 0
       while retries < 10:
               try:
                       req = urllib2.Request(url, None, headers)
                       response = urllib2.urlopen(req).read()
               except HTTPError,e:
                       print 'Error code: ', e.code
                       retries += 1
                       time.sleep(2)
                       continue
               else: #should align with the `except`
                       break
       else:
               print 'Fetch of ' + url + ' failed after ' + retries + 'tries.'
       handle_success(response) #should align with `url =` line