urllib to cache 301 redirections?
John Nagle
nagle at animats.com
Mon Jul 16 15:34:00 EDT 2007
O.R.Senthil Kumaran wrote:
> Thank you for the reply, Mr. John and I apologize for a very late response
> from my end.
>
> * John J. Lee <jjl at pobox.com> [2007-07-06 18:53:09]:
>
>
>>"O.R.Senthil Kumaran" <orsenthil at users.sourceforge.net> writes:
>>
>>
>>>Hi,
>>>There is an Open Tracker item against urllib2 library python.org/sf/735515
>>
>>>I am not completely getting what "cache - redirection" implies and what should
>>>be done with the urllib2 module. Any pointers?
>>
>>When a 301 redirect occurs after a request for URL U, via
>>urllib2.urlopen(U), urllib2 should remember the result of that
>>redirection, viz a second URL, V. Then, when another
>>urllib2.urlopen(U) takes place, urllib2 should send an HTTP request
>>for V, not U. urllib2 does not currently do this. (Obviously the
>>cache -- that is, the dictionary or whatever that stores the mapping
>>from URLs U to V -- should not be maintained by function urlopen
>>itself. Perhaps it should live on the redirect handler.)
>>
>
>
> I spent a little time thinking about a solution and figured out that the
> following changes to HTTPRedirectHandler, might be helpful in implementing
> this.
>
> Class HTTPRedirectHandler(BaseHandler):
> # ... omitted ...
> # Initialize a dictionary to hold cache.
>
> def __init__(self):
> self.cache = {}
>
>
> # Handles 301 errors separately in a different function which maintains a
> # maintains cache.
>
> def http_error_301(self, req, fp, code, msg, headers):
>
> if req in self.cache:
> # Look for loop, if a particular url appears in both key and value
> # then there is loop and return HTTPError
> if len(set(self.cache.keys()) & set(self.cache.values())) > 0:
> raise HTTPError(req.get_full_url(), code, self.inf_msg + msg +
> headers, fp)
> return self.cache[req]
>
> self.cache[req] = self.http_error_302(req,fp,code,msg, headers)
> return self.cache[req]
>
>
> John, let me know your comments on this approach.
> I have not tested this code in real scenario yet with a 301 redirect.
> If its okay, I shall test it and submit a patch for the tracker item.
That assumes you're reusing the same object to reopen another URL.
Is this thread-safe?
That's also an inefficient way to test for an empty dictionary.
John Nagle
More information about the Python-list
mailing list