Urllib vs. FireFox

Stefan Behnel stefan_ml at behnel.de
Fri Oct 24 15:04:12 EDT 2008


Gilles Ganault wrote:
> After scratching my head as to why I failed finding data from a web
> using the "re" module, I discovered that a web page as downloaded by
> urllib doesn't match what is displayed when viewing the source page in
> FireFox.
> 
> For instance, when searching Amazon for "Wargames":
> 
> URLLIB:
> <a
> href="http://www.amazon.fr/Wargames-Matthew-Broderick/dp/B00004RJ7H"><span
> class="srTitle">Wargames</span></a>
>   
>    ~ Matthew Broderick, Dabney Coleman, John Wood,  et Ally Sheedy
> <span class="bindingBlock">(<span class="binding">Cassette
> vidéo</span> - 2000)</span></td></tr>
> 
> FIREFOX:
>  <div class="productTitle"><a
> href="http://www.amazon.fr/Wargames-Matthew-Broderick/dp/B00004RJ7H/ref=sr_1_1?ie=UTF8&s=dvd&qid=1224872998&sr=8-1">
> Wargames</a> <span class="binding"> ~ Matthew Broderick, Dabney
> Coleman, John Wood,  et Ally Sheedy</span><span class="binding">
> (<span class="format">Cassette vidéo</span> - 2000)</span></div>
> 
> Why do they differ?

The browser sends a different client identifier than urllib, and the server
sends back different page content depending on what client is asking.

Stefan



More information about the Python-list mailing list