setting Referer for urllib.urlretrieve

E odieguard at gmail.com
Sat Sep 5 01:31:29 CEST 2009


On Aug 10, 10:21 am, samwyse <samw... at gmail.com> wrote:
> On Aug 9, 9:41 am, Steven D'Aprano <st... at REMOVE-THIS-
>
> cybersource.com.au> wrote:
> > On Sun, 09 Aug 2009 06:13:38 -0700,samwysewrote:
> > > Here's what I have so far:
>
> > > import urllib
>
> > > class AppURLopener(urllib.FancyURLopener):
> > >     version = "App/1.7"
> > >     referrer = None
> > >     def __init__(self, *args):
> > >         urllib.FancyURLopener.__init__(self, *args)
> > >         if self.referrer:
> > >             addheader('Referer', self.referrer)
>
> > > urllib._urlopener = AppURLopener()
>
> > > Unfortunately, the 'Referer' header potentially varies for each url that
> > > I retrieve, and the way the module is written, I can't change the calls
> > > to __init__ or open. The best idea I've had is to assign a new value to
> > > my class variable just before calling urllib.urlretrieve(), but that
> > > just seems ugly.  Any ideas?  Thanks.
>
> > [Aside: an int variable is an int. A str variable is a str. A list
> > variable is a list. A class variable is a class. You probably mean a
> > class attribute, not a variable. If other languages want to call it a
> > variable, or a sausage, that's their problem.]
>
> > If you're prepared for a bit of extra work, you could take over all the
> > URL handling instead of relying on automatic openers. This will give you
> > much finer control, but it will also require more effort on your part.
> > The basic idea is, instead of installing openers, and then ask the urllib
> > module to handle the connection, you handle the connection yourself:
>
> > make a Request object using urllib2.Request
> > make an Opener object using urllib2.build_opener
> > call opener.open(request) to connect to the server
> > deal with the connection (retry, fail or read)
>
> > Essentially, you use the Request object instead of a URL, and you would
> > add the appropriate referer header to the Request object.
>
> > Another approach, perhaps a more minimal change than the above, would be
> > something like this:
>
> > # untested
> > class AppURLopener(urllib.FancyURLopener):
> >     version = "App/1.7"
> >     def __init__(self, *args):
> >         urllib.FancyURLopener.__init__(self, *args)
> >     def add_referrer(self, url=None):
> >         if url:
> >             addheader('Referer', url)
>
> > urllib._urlopener = AppURLopener()
> > urllib._urlopener.add_referrer("http://example.com/")
>
> Thanks for the ideas.  I'd briefly considered something similar to
> your first idea, implementing my own version of urlretrieve to accept
> a Request object, but it does seem like a good bit of work.  Maybe
> over Labor Day.  :)
>
> The second idea is pretty much what I'm going to go with for now.  The
> program that I'm writing is almost a clone of wget, but it fixes some
> personal dislikes with the way recursive retrievals are done.  (Or
> maybe I just don't understand wget's full array of options well
> enough.)  This means that my referrer changes as I bounce up and down
> the hierarchy, which makes this less convenient.  Still, it does seem
> more convenient that re-writing the module from scratch.


Just wanted to add a note. I used the sample code posted above, and I
would get this syntax error:
	NameError: global name 'addheader' is not defined
The fix for the code is to change the line that references addheader
to say this:
	self.addheader('Referer', url)

~E



More information about the Python-list mailing list