Ideas on how to parse a dynamically generated html pages
Tim Harig
usernet at ilthio.net
Thu Oct 21 22:49:30 EDT 2010
On 2010-10-22, chad <cdalten at gmail.com> wrote:
> Let's say there is a site that uses javascript to generate menus. More
> or less what happens is when a person clicks on url, a pop up menu
> appears asking the users for some data. How would I go about
> automating this? Just curious because the web spider doesn't actually
> pick up the urls that generate the menu. I'm assuming the actual url
> link is dynamically generated?
You have two options:
1. Look at the javascript to see what interfaces it uses. If it is
generating menues, then it is getting the data it uses to generate
those menus from somewhere. Once you have found that resource,
you can access it yourself with a request from your Python code.
This is generally the best approach if possible.
2. You can automate a bowser thorough a COM/XPCOM/etc. interface
which allows you to access the DOM object in real time as it is
modified by the Javascript and even to trigger javascript events.
There are libraries that will do this as well. I have used
this on heavy AJAX style interfaces with mountains of spagetti
Javascript that were simply too large and poorly designed to
try to understand.
More information about the Python-list
mailing list