download web pages that are updated by ajax
Jabba Laci
jabba.laci at gmail.com
Tue Apr 12 17:55:33 EDT 2011
> I've heard you can drive a web browser using Selenium
> (http://code.google.com/p/selenium/ ), have it visit the webpage and
> run the JavaScript on it, and then grab the final result.
Hi,
Thanks for the info. I tried selenium, you can get the source with the
get_html_source() function but it returns the original HTML, not the
DOM.
For the problem, I found a _general solution_ at
http://simile.mit.edu/wiki/Crowbar . "Crowbar is a web scraping
environment based on the use of a server-side headless mozilla-based
browser.
Its purpose is to allow running javascript scrapers against a DOM..."
A _specific solution_ for my case is to use BioPython:
-------
from Bio import Entrez
id='CP002059.1'
Entrez.email = 'whatev at mail.com'
handle=Entrez.efetch(db='nucleotide',id=id,rettype='gb')
local_file=open(id,'w')
local_file.write(handle.read())
handle.close()
local_file.close()
-------
Laszlo
More information about the Python-list
mailing list