[Baypiggies] urllib2.urlopen() and exception layering

Sat Apr 5 23:48:06 CEST 2008

Hi,

I have written a number of long-running Python programs which make heavy use of
urllib2.urlopen() to fetch HTTP URLs in a convenient manner.  While the
documentation states that this function "Raises URLError on errors", I have
found this to be incorrect.  So far I have encountered the following exceptions
being thrown by urllib2.urlopen() on HTTP URLs:

    * urllib2.HTTPError
    * urllib2.URLError
    * httplib.BadStatusLine
    * httplib.InvalidURL
    * ValueError
    * IOError

Looking at the urllib2 module source, it is unclear to me whether the intention
is that all httplib errors should be caught and raised as URLError or whether
the programmer is expected to handle the underlying exceptions himself.

For example, at least socket.error is caught by urllib2.urlopen() and raised as a URLError.  The comment in the code block suggests some confusion:

        try:
            h.request(req.get_method(), req.get_selector(), req.data, headers)
            r = h.getresponse()
        except socket.error, err: # XXX what error?
            raise URLError(err)

I think this is a problem which needs to be addressed, at the very least
through clearer documentation, and possibly by improving urllib2 to handle more
of these exceptions and raise them as URLError.

I'm new to the Python development community, but would be happy to submit a
patch if there is some consensus on which approach to take.

Thanks!

--
Niall O'Higgins
Software Enthusiast
http://niallohiggins.com