Getting final url when original url redirects

Albert Hopkins marduk at letterboxes.org
Thu Mar 12 16:50:01 EDT 2009


On Thu, 2009-03-12 at 12:57 -0700, IanR wrote:
> I'm processing RSS content from a # of given sources.  Most of the
> time the url given by the RSS feed redirects to the real URL (I'm
> guessing they do this for tracking purposes)
> 
> For example.
> 
> This is a url that I get from and RSS feed,
> http://www.pheedcontent.com/click.phdo?i=d22e9bc7641aab8a0566526f61806512
> It redirects to
> http://www.macsimumnews.com/index.php/archive/klipsch_developing_headphones_for_new_ipod_shuffle/
> 
> I want to record the final URL and not the URL I get from the RSS feed
> (However sometimes there is no redirect so I might want the original
> URL)
> 
> I've tried sniffing the header and don't see any "Location:"... I
> think sites are using different ways to redirect.  Does anyone have
> any suggestions on how I might handle this?

If you are using urllib[2]:

>>> url =
'http://www.pheedcontent.com/click.phdo?i=d22e9bc7641aab8a0566526f61806512'
>>> o = urllib2.urlopen(url)
>>> o.url
'http://www.macsimumnews.com/index.php/archive/klipsch_developing_headphones_for_new_ipod_shuffle/'





More information about the Python-list mailing list