setting Referer for urllib.urlretrieve
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Sun Aug 9 10:41:05 EDT 2009
On Sun, 09 Aug 2009 06:13:38 -0700, samwyse wrote:
> Here's what I have so far:
>
> import urllib
>
> class AppURLopener(urllib.FancyURLopener):
> version = "App/1.7"
> referrer = None
> def __init__(self, *args):
> urllib.FancyURLopener.__init__(self, *args)
> if self.referrer:
> addheader('Referer', self.referrer)
>
> urllib._urlopener = AppURLopener()
>
> Unfortunately, the 'Referer' header potentially varies for each url that
> I retrieve, and the way the module is written, I can't change the calls
> to __init__ or open. The best idea I've had is to assign a new value to
> my class variable just before calling urllib.urlretrieve(), but that
> just seems ugly. Any ideas? Thanks.
[Aside: an int variable is an int. A str variable is a str. A list
variable is a list. A class variable is a class. You probably mean a
class attribute, not a variable. If other languages want to call it a
variable, or a sausage, that's their problem.]
If you're prepared for a bit of extra work, you could take over all the
URL handling instead of relying on automatic openers. This will give you
much finer control, but it will also require more effort on your part.
The basic idea is, instead of installing openers, and then ask the urllib
module to handle the connection, you handle the connection yourself:
make a Request object using urllib2.Request
make an Opener object using urllib2.build_opener
call opener.open(request) to connect to the server
deal with the connection (retry, fail or read)
Essentially, you use the Request object instead of a URL, and you would
add the appropriate referer header to the Request object.
Another approach, perhaps a more minimal change than the above, would be
something like this:
# untested
class AppURLopener(urllib.FancyURLopener):
version = "App/1.7"
def __init__(self, *args):
urllib.FancyURLopener.__init__(self, *args)
def add_referrer(self, url=None):
if url:
addheader('Referer', url)
urllib._urlopener = AppURLopener()
urllib._urlopener.add_referrer("http://example.com/")
--
Steven
More information about the Python-list
mailing list