[Tutor] urllib2.urlopen()

eryksun eryksun at gmail.com
Sun Oct 14 11:26:19 CEST 2012


On Sun, Oct 14, 2012 at 2:15 AM, Ray Jones <crawlzone at gmail.com> wrote:
>
> I can iterate through e.info() with a 'for' loop, but all I get as a
> result is:
>
> connection
> content-type
> www-authenticate
> content-length

urllib2.HTTPError inherits from both urllib2.URLError and
urllib.addinfourl (see help(e)). An instance of the latter is what
urlopen() returns. It would be weird, but you could catch
urllib.addinfourl as the exception type (please don't).

HTTPError provides a file-like interface (read, readlines, etc), plus
the status code, headers, and URL. The info() method returns the
"headers" attribute, which is an instance of httplib.HTTPMessage (see
below). HTTPError also stores this in the attibute "hdrs". So you have
3 ways to access the headers: e.info(), e.headers, and e.hdrs.

(Actually, there are more ways since e.fp is the original addinfourl
instance, and e.fp.fp._sock.msg is the original HTTPResponse msg.)

Look at help(e.hdrs). HTTPMessage inherits a dict interface from
rfc822.Message (__getitem__, __setitem__, __delitem__,  __contains__,
__len__, __iter__, get, setdefault, has_key, items, keys, values). It
also has the method getheaders(name) that returns a list of values in
case the header appears multiple times in the message.

    >>> e.hdrs['connection']
    'close'
    >>> e.hdrs.getheaders('connection')
    ['close']


More information about the Tutor mailing list