urllib to cache 301 redirections?
John J. Lee
jjl at pobox.com
Fri Jul 6 20:53:09 CEST 2007
"O.R.Senthil Kumaran" <orsenthil at users.sourceforge.net> writes:
> There is an Open Tracker item against urllib2 library python.org/sf/735515
> which states that.
> urllib / urllib2 should cache the results of 301 (permanent) redirections.
> This shouldn't break anything, since it's just an internal optimisation
> from one point of view -- but it's also what the RFC (2616, section 10.3.2, first para) says
> SHOULD happen.
> I am trying to understand, what does it mean.
> Should the original url be avaiable to the user upon request as urllib
> automatically calls the redirect_request and provides the redirected url only?
urllib2, you mean.
Regardless of this bug, Request.get_full_url() should be (and is)
whatever URL the request instance was originally constructed with.
> I am not completely getting what "cache - redirection" implies and what should
> be done with the urllib2 module. Any pointers?
When a 301 redirect occurs after a request for URL U, via
urllib2.urlopen(U), urllib2 should remember the result of that
redirection, viz a second URL, V. Then, when another
urllib2.urlopen(U) takes place, urllib2 should send an HTTP request
for V, not U. urllib2 does not currently do this. (Obviously the
cache -- that is, the dictionary or whatever that stores the mapping
from URLs U to V -- should not be maintained by function urlopen
itself. Perhaps it should live on the redirect handler.)
302 redirections are temporary and are handled correctly in this
respect already by urllib2.
More information about the Python-list