Getting final url when original url redirects

Philip Semanchuk philip at semanchuk.com
Thu Mar 12 21:45:32 CET 2009


On Mar 12, 2009, at 3:57 PM, IanR wrote:

> I'm processing RSS content from a # of given sources.  Most of the
> time the url given by the RSS feed redirects to the real URL (I'm
> guessing they do this for tracking purposes)
>
> For example.
>
> This is a url that I get from and RSS feed,
> http://www.pheedcontent.com/click.phdo?i=d22e9bc7641aab8a0566526f61806512
> It redirects to
> http://www.macsimumnews.com/index.php/archive/klipsch_developing_headphones_for_new_ipod_shuffle/
>
> I want to record the final URL and not the URL I get from the RSS feed
> (However sometimes there is no redirect so I might want the original
> URL)
>
> I've tried sniffing the header and don't see any "Location:"... I
> think sites are using different ways to redirect.  Does anyone have
> any suggestions on how I might handle this?


Hi Ian,
Using Firefox's Live HTTP Headers extension, I see a 302 redirect with  
a Location header (see session log below). Are aware that urrlib2  
resolves redirects for you? That might be why you're not seeing what  
you expect. If you want a record of each URL you'll have to implement  
an HTTPRedirectHandler.



http://www.pheedcontent.com/click.phdo?i=d22e9bc7641aab8a0566526f61806512

GET /click.phdo?i=d22e9bc7641aab8a0566526f61806512 HTTP/1.1
Host: www.pheedcontent.com
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv: 
1.9.0.7) Gecko/2009021906 Firefox/3.0.7
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.7,sv;q=0.3
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive

HTTP/1.x 302 Found
Date: Thu, 12 Mar 2009 20:41:29 GMT
Server: Apache
X-Powered-By: PHP/5.2.3-1ubuntu6.3
Pragma: no-cache
Cache-Control: no-cache, must-revalidate
Set-Cookie: phdo=1-tst 
%7Cv3 
%3Ac3cbcae440ff783381d0d9fa96f14d05 
%3Aa8t5sELbkk9oy3pXsrohSnPslqQxQKIhVP%2F8Ots%3D; expires=Fri, 13- 
Mar-2009 20:41:29 GMT; path=/; domain=pheedo.com
Location: http://www.macsimumnews.com/index.php/archive/klipsch_developing_headphones_for_new_ipod_shuffle/
Content-Encoding: gzip
Vary: Accept-Encoding
Content-Length: 26
Connection: close
Content-Type: text/html
----------------------------------------------------------
http://www.macsimumnews.com/index.php/archive/klipsch_developing_headphones_for_new_ipod_shuffle/


etc. etc.





More information about the Python-list mailing list