parsing javascript generated files...

bruce bedouglas at earthlink.net
Sun Jan 11 01:58:23 EST 2009


Hi.

Looking to parse some web pages that have javascript (jquery) embedded/used
in the pages. I'm trying to get a better understanding of exactly how the
page is generated, and displayed in the browser.

I've seen various references to python-spidermonkey, as well as
watir/firewatir. Is there a way to accomplish fetching text from javascript
generated pages?

It appears that the ability to "call" firefox using "jssh" could allow me to
return the complete page of the displayed app. I'm not sure if this is
pythonic!!

I suspect that I would have to somehow invoke the page, using firefox/jssh,
(or spidermonkey) or some other javascript engine, and then somehow invoke
the javascript function, that would fill in the 'div' within the page...

Is this even doable???

It would be great if there was someway of calling an external browser/app
that one could pass the targeted url, and get back the resulting html that's
displayed by the browser!!

A target site is http:://web-app.usc.edu/soc/term_20091.html where the
'dept' list is completely generated by javascript functions...

When i researched this awhile ago, there didn't appear to be a really good
solution to this situation. I'm curious if someone knows of a solution to
this issue that's now available and that works!

Thanks for any thoughts/comments in this issue...

-bruce




More information about the Python-list mailing list