n00b confusion re: local variable referenced before assignment error

Dave Angel davea at ieee.org
Fri Jun 19 15:34:23 EDT 2009


Wells Oliver wrote:
> Writing a class which essentially spiders a site and saves the files
> locally. On a URLError exception, it sleeps for a second and tries again (on
> 404 it just moves on). The relevant bit of code, including the offending
> method:
>
> class Handler(threading.Thread):
>         def __init__(self, url):
>                 threading.Thread.__init__(self)
>                 self.url = url
>
>         def save(self, uri, location):
>                 try:
>                         handler = urllib2.urlopen(uri)
>                 except urllib2.HTTPError, e:
>                         if e.code == 404:
>                                 return
>                         else:
>                                 print "retrying %s (HTTPError)" % uri
>                                 time.sleep(1)
>                                 self.save(uri, location)
>                 except urllib2.URLError, e:
>                         print "retrying %s" % uri
>                         time.sleep(1)
>                         self.save(uri, location)
>
>                 if not os.path.exists(os.path.dirname(location)):
>                         os.makedirs(os.path.dirname(location))
>
>                 file = open(location, "w")
>                 file.write(handler.read())
>                 file.close()
>
> ...
>
> But what I am seeing is that after a retry (on catching a URLError
> exception), I see bunches of "UnboundLocalError: local variable 'handler'
> referenced before assignment" errors on line 38, which is the
> "file.write(handler.read())" line..
>
> What gives?
>
>   
Your save() method is recursive in the case of that error, which is a 
poor excuse for what should have been a loop.  You should have some 
retry depth check, just in case you get hundreds of such errors on the 
same site.

But ignoring that issue, the specific symptom is caused when returning 
from the recursive call.  You still fall through to the end of your 
outer method call, and try to write stuff from that handler (also wiping 
out the file "location" in the process).  Without a handler, that causes 
an exception.  So you should follow those calls to self.save() with 
return's.

Much better would be to write a loop in the first place, and break out 
of the loop when you succeed.  Then the flow is much easier to follow, 
and you can easily avoid the risk of looping forever, nor of running out 
of stack space.

Something like (untested):

        def save(self, uri, location):
            for tries in xrange(10):                #try up to 10 times
                try:
                        handler = urllib2.urlopen(uri)
                except urllib2.HTTPError, e:
                        if e.code == 404:
                                break
                        else:
                                print "retrying %s (HTTPError)" % uri
                                time.sleep(1)
                                continue
                except urllib2.URLError, e:
                        print "retrying %s" % uri
                        time.sleep(1)
                        continue

                if not os.path.exists(os.path.dirname(location)):
                        os.makedirs(os.path.dirname(location))

                file = open(location, "w")
                file.write(handler.read())
                file.close()
                break

Other refinements are possible, of course.  For example, if you have any cleanup code at the end you may need an additional flag variable to tell whether you've succeeded or not.





More information about the Python-list mailing list