parsing a site/page that uses/calls javascript functions...

bruce bedouglas at
Sun Sep 28 19:31:02 CEST 2008


I've got a couple of test apps that I use to parse/test different html
webpages. However, I'm now looking at how to parse a given site/page that
uses javascript calls to dynamically create/display the resulting HTML.

I can see the HTML is the Browser page if I manually select the btn that
invokes the javascript function, but I have no idea how to create an app
that can effectively parse the page.

My test apps use python, along with mechanize/browser/urllib. I've seen
sites/docs that discuss selenium, spidermonkey, etc... If possible, I'm
trying to find a complete example (that walks through how to setup the
environment, to how to finally extract the DOM elements of a given
javascript page), or I'm looking to find someone I can work with, to create
a complete example that can then be posted to the 'net.

I'd really rather have a headless browser solution, as my overall goal is to
run a parsing/crawling over a number of pages that utilize javascript..

Pointers, thoughts, comments, etc will be greatly appreciated.



More information about the Python-list mailing list