Python- javascript

Douglas Alan darkwater42 at gmail.com
Sat Aug 15 22:12:06 EDT 2009


On Aug 15, 8:02 pm, Mike Paul <paul.mik... at gmail.com> wrote:

> I'm trying to scrap a dynamic page with lot of javascript in it.
> Inorder to get all the data from the page i need to access the
> javascript. But i've no idea how to do it.

I'm not sure exactly what you are trying to do, but scraping websites
that use a lot of JavaScript are often very problematic. The last time
I did so, I had to write some pretty funky regular expressions to pick
data out of the JavaScript. Fortunately, the data was directly in the
JavaScript, rather than me having to reproduce the Ajax calling
chain.  If you need to do that, then you almost certainly want to use
a package designed for doing such things. One such package is
HtmlUnit. It is a "GUI-less browser" with a built-in JavaScript engine
that is design for such scraping tasks.

Unfortunately, you have to program it in Java, rather than Python.
(You might be able to use Jython instead of Java, but I don't know for
sure.)

|>ouglas


P.S. For scraping tasks, you probably want to use BeautifulSoup rather
than urllib2.





More information about the Python-list mailing list