n00b with urllib2: How to make it handle cookie automatically?
7stud
bbxx789_05ss at yahoo.com
Sun Feb 24 16:46:22 EST 2008
On Feb 24, 4:41 am, est <electronix... at gmail.com> wrote:
> On Feb 23, 2:42 am, Rob Wolfe <r... at smsnet.pl> wrote:
>
>
>
> > est <electronix... at gmail.com> writes:
> > > Hi all,
>
> > > I need urllib2 do perform series of HTTP requests with cookie from
> > > PREVIOUS request(like our browsers usually do ). Many people suggest I
> > > use some library(e.g. pycURL) instead but I guess it's good practise
> > > for a python beginner to DIY something rather than use existing tools.
>
> > > So my problem is how to expand the urllib2 class
>
> > > from cookielib import CookieJar
> > > class SmartRequest():
> > > cj=CookieJar()
> > > def __init__(self, strUrl, strContent=None):
> > > self.Request = urllib2.Request(strUrl, strContent)
> > > self.cj.add_cookie_header(self.Request)
> > > self.Response = urllib2.urlopen(Request)
> > > self.cj.extract_cookies(self.Response, self.Request)
> > > def url
> > > def read(self, intCount):
> > > return self.Response.read(intCount)
> > > def headers(self, strHeaderName):
> > > return self.Response.headers[strHeaderName]
>
> > > The code does not work because each time SmartRequest is initiated,
> > > object 'cj' is cleared. How to avoid that?
> > > The only stupid solution I figured out is use a global CookieJar
> > > object. Is there anyway that could handle all this INSIDE the class?
>
> > > I am totally new to OOP & python programming, so could anyone give me
> > > some suggestions? Thanks in advance
>
> > Google for urllib2.HTTPCookieProcessor.
>
> > HTH,
> > Rob- Hide quoted text -
>
> > - Show quoted text -
>
> Wow, thank you Rob Wolfe! Your reply is shortest yet most helpful! I
> solved this problem by the following code.
>
> class HTTPRefererProcessor(urllib2.BaseHandler):
> """Add Referer header to requests.
>
> This only makes sense if you use each RefererProcessor for a
> single
> chain of requests only (so, for example, if you use a single
> HTTPRefererProcessor to fetch a series of URLs extracted from a
> single
> page, this will break).
>
> There's a proper implementation of this in module mechanize.
>
> """
> def __init__(self):
> self.referer = None
>
> def http_request(self, request):
> if ((self.referer is not None) and
> not request.has_header("Referer")):
> request.add_unredirected_header("Referer", self.referer)
> return request
>
> def http_response(self, request, response):
> self.referer = response.geturl()
> return response
>
> https_request = http_request
> https_response = http_response
>
> def main():
> cj = CookieJar()
> opener = urllib2.build_opener(
> urllib2.HTTPCookieProcessor(cj),
> HTTPRefererProcessor(),
> )
> urllib2.install_opener(opener)
>
> urllib2.urlopen(url1)
> urllib2.urlopen(url2)
>
> if "__main__" == __name__:
> main()
>
> And it's working great!
>
> Once again, thanks everyone!
How does the class HTTPReferrerProcessor do anything useful for you?
More information about the Python-list
mailing list