[Baypiggies] urllib2.urlopen() and exception layering
Gregory P. Smith
greg at krypto.org
Sun Apr 6 08:54:32 CEST 2008
On Sat, Apr 5, 2008 at 2:48 PM, Niall O'Higgins <niallo at unworkable.org>
> I have written a number of long-running Python programs which make heavy
> use of
> urllib2.urlopen() to fetch HTTP URLs in a convenient manner. While the
> documentation states that this function "Raises URLError on errors", I
> found this to be incorrect. So far I have encountered the following
> being thrown by urllib2.urlopen() on HTTP URLs:
> * urllib2.HTTPError
> * urllib2.URLError
> * httplib.BadStatusLine
> * httplib.InvalidURL
> * ValueError
> * IOError
> Looking at the urllib2 module source, it is unclear to me whether the
> is that all httplib errors should be caught and raised as URLError or
> the programmer is expected to handle the underlying exceptions himself.
> For example, at least socket.error is caught by urllib2.urlopen() and
> raised as a URLError. The comment in the code block suggests some
urllib2 is a pretty big mess and does unfortunately violate the general rule
of exception catching + retyping + reraising being a bad idea.
That said, you can reduce the above list of exceptions a bit thanks to their
inheritance. URLError inherits from IOError. All httplib errors inherit
from httplib.HTTPException. This leaves you with (IOError, socket.error,
httplib.HTTPException, ValueError) as the ones you should catch. I changed
socket.error to inherit from IOError in Python 2.6. That narrows the list
down to a sane pair of exceptions if the ValueErrors can be fixed.
IMHO if you're ever seeing a ValueError out of urllib2 in normal sane use
please try to figure out why, make a simple test case and file it on
bugs.python.org if you believe it is a case that the library should handle.
double plus good if it includes a patch to fix it.
> h.request(req.get_method(), req.get_selector(), req.data,
> r = h.getresponse()
> except socket.error, err: # XXX what error?
> raise URLError(err)
> I think this is a problem which needs to be addressed, at the very least
> through clearer documentation, and possibly by improving urllib2 to handle
> of these exceptions and raise them as URLError.
> I'm new to the Python development community, but would be happy to submit
> patch if there is some consensus on which approach to take.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Baypiggies