[Baypiggies] urllib2.urlopen() and exception layering

Sun Apr 6 08:54:32 CEST 2008

On Sat, Apr 5, 2008 at 2:48 PM, Niall O'Higgins <niallo at unworkable.org>
wrote:

> Hi,
>
> I have written a number of long-running Python programs which make heavy
> use of
> urllib2.urlopen() to fetch HTTP URLs in a convenient manner.  While the
> documentation states that this function "Raises URLError on errors", I
> have
> found this to be incorrect.  So far I have encountered the following
> exceptions
> being thrown by urllib2.urlopen() on HTTP URLs:
>
>    * urllib2.HTTPError
>    * urllib2.URLError
>    * httplib.BadStatusLine
>    * httplib.InvalidURL
>    * ValueError
>    * IOError
>
> Looking at the urllib2 module source, it is unclear to me whether the
> intention
> is that all httplib errors should be caught and raised as URLError or
> whether
> the programmer is expected to handle the underlying exceptions himself.
>
> For example, at least socket.error is caught by urllib2.urlopen() and
> raised as a URLError.  The comment in the code block suggests some
> confusion:
>

urllib2 is a pretty big mess and does unfortunately violate the general rule
of exception catching + retyping + reraising being a bad idea.

That said, you can reduce the above list of exceptions a bit thanks to their
inheritance.  URLError inherits from IOError.  All httplib errors inherit
from httplib.HTTPException.  This leaves you with (IOError, socket.error,
httplib.HTTPException, ValueError) as the ones you should catch.  I changed
socket.error to inherit from IOError in Python 2.6.  That narrows the list
down to a sane pair of exceptions if the ValueErrors can be fixed.

IMHO if you're ever seeing a ValueError out of urllib2 in normal sane use
please try to figure out why, make a simple test case and file it on
bugs.python.org if you believe it is a case that the library should handle.
double plus good if it includes a patch to fix it.

>        try:
>            h.request(req.get_method(), req.get_selector(), req.data,
> headers)
>            r = h.getresponse()
>        except socket.error, err: # XXX what error?
>            raise URLError(err)
>
> I think this is a problem which needs to be addressed, at the very least
> through clearer documentation, and possibly by improving urllib2 to handle
> more
> of these exceptions and raise them as URLError.
>
> I'm new to the Python development community, but would be happy to submit
> a
> patch if there is some consensus on which approach to take.
>
> Thanks!
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/baypiggies/attachments/20080405/2aaac57e/attachment-0001.htm