Ideas on how to parse a dynamically generated html pages

Tim Harig usernet at ilthio.net
Thu Oct 21 22:49:30 EDT 2010


On 2010-10-22, chad <cdalten at gmail.com> wrote:
> Let's say there is a site that uses javascript to generate menus. More
> or less what happens is when a person clicks on url, a pop up menu
> appears asking the users for some data. How would I go about
> automating this? Just curious because the web spider doesn't actually
> pick up the urls that generate the menu. I'm assuming the actual url
> link is dynamically generated?

You have two options:

1. Look at the javascript to see what interfaces it uses.  If it is
	generating menues, then it is getting the data it uses to generate
	those menus from somewhere.  Once you have found that resource,
	you can access it yourself with a request from your Python code.
	This is generally the best approach if possible.

2. You can automate a bowser thorough a COM/XPCOM/etc. interface
	which allows you to access the DOM object in real time as it is
	modified by the Javascript and even to trigger javascript events.
	There are libraries that will do this as well.	I have used
	this on heavy AJAX style interfaces with mountains of spagetti
	Javascript that were simply too large and poorly designed to
	try to understand.



More information about the Python-list mailing list