Help extending BaseHandler from urllib2

Follower follower at iname.com
Tue Nov 13 12:25:19 EST 2001


Hi,

> I'm trying to extend the BaseHandler class from urllib2 to handle 302 
> redirects in a different way.  I need to grab the cookies from the page 
> and send them after being redirected.
Just today I was trying to do a similar thing, using a slightly
different approach but with no success.

Instead of subclassing the 'BaseHandler' class I subclassed the
'urllib2.HTTPRedirectHandler' class with the intention of overriding
the 'http_error_302' method.

The code looked a little like this:

class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_302(self, req, fp, code, msg, headers):
        print headers
        urllib2.HTTPRedirectHandler.http_error_302(self, req, fp,
code, msg, headers)

It got called okay (i.e. it prints out the headers) but then the call
to the parent class's 'http_error_302' method generates an 'HTTPError'
exception (HTTPError: HTTP Error 302: Found) with the following trace:

Traceback (most recent call last):
[Snipped local details...]
  File "E:\LANGUAGES\PYTHON21\lib\urllib2.py", line 135, in urlopen
    return _opener.open(url, data)
  File "E:\LANGUAGES\PYTHON21\lib\urllib2.py", line 318, in open
    '_open', req)
  File "E:\LANGUAGES\PYTHON21\lib\urllib2.py", line 297, in
_call_chain
    result = func(*args)
  File "E:\LANGUAGES\PYTHON21\lib\urllib2.py", line 824, in http_open
    return self.do_open(httplib.HTTP, req)
  File "E:\LANGUAGES\PYTHON21\lib\urllib2.py", line 818, in do_open
    return self.parent.error('http', req, fp, code, msg, hdrs)
  File "E:\LANGUAGES\PYTHON21\lib\urllib2.py", line 344, in error
    return self._call_chain(*args)
  File "E:\LANGUAGES\PYTHON21\lib\urllib2.py", line 297, in
_call_chain
    result = func(*args)
  File "E:\LANGUAGES\PYTHON21\lib\urllib2.py", line 425, in
http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 302: Found

I worked around this (sort of) by copying the 'HTTPRedirectHandler'
class code from urllib2.py and making additions to that--so it no
longer generates the exception but it's not an ideal solution.

I readily admit I may have made a mistake completely unrelated to the
task at hand but thought you might want to consider this alternative
approach.

There is one thought that's been nagging me which might apply to both
our situations:

The documentation for build_opener states:
"Instances of the following classes will be in the front of the
handlers, unless the handlers contain them, instances of them or
subclasses of them:
ProxyHandler, UnknownHandler, HTTPHandler, HTTPDefaultErrorHandler,
HTTPRedirectHandler, FTPHandler, FileHandler"

This makes me wonder if, in your case, because you're subclassing
'BaseHandler' and not 'HTTPRedirectHandler' your 302 specific code
doesn't get called because the redirect is still being handled by the
default 'HTTPRedirecthandler' because it is before it in the queue.
Just a thought.

In my case, I'm wondering if this means I can't actually call the
parent classes's 'http_error_302' method, but my knowledge of Python's
class handling is mediocre at best, and besides, it's too
late/early...

As for my application, I'm (re)-implementing a script to archive email
from a web mail account (at mail.com -- don't use them, they suck and
lost two years worth of my email!). It appears they also keep track of
the current value of the 'Host' header. I thought I needed to grab the
cookies from the initial redirect but it turns out it's the
destination page which contains the cookies but the the value of the
'Host' header needs to change on the redirect (otherwise the redirect
won't succeed) and I couldn't find an easy way to do that with urllib2
either.

In the interim I'm using a raw httplib solution.

Hope this is of some use.

Phil.



More information about the Python-list mailing list