how to save a whole web page with something block

Daniel Fetchinson fetchinson at googlemail.com
Tue Aug 10 11:04:09 EDT 2010


> I want to save a web page. I use urllib to parse the web page. But I
> find the saved file, where some content  is missing. The missing part
> is block from the original web page, such as this part  <div
> style="display: block;" id="GeneInts">...</div>.I don't know how to
> parse a whole page without something block in it. Could you help me
> figure it out? Thank you!
>
>
> This is my program
>
> url = 'http://receptome.stanford.edu/hpmr/SearchDB/getGenePage.asp?
> Param=4502931&ProtId=1&ProtType=Receptor'
> f = urllib.urlretrieve(url,'test.html')

A web server may present different output depending on the client
used. When you use your browser to look at the source and then use
urllib's saved file you access the web server with different clients.
I'm not saying this is your problem, but potentially it is.

So you might want to make urllib appear as a browser by sending the
appropriate headers.

HTH,
Daniel



-- 
Psss, psss, put it down! - http://www.cafepress.com/putitdown



More information about the Python-list mailing list