Ideas on how to parse a dynamically generated html pages

chad cdalten at gmail.com
Thu Oct 21 22:38:22 EDT 2010


Let's say there is a site that uses javascript to generate menus. More
or less what happens is when a person clicks on url, a pop up menu
appears asking the users for some data. How would I go about
automating this? Just curious because the web spider doesn't actually
pick up the urls that generate the menu. I'm assuming the actual url
link is dynamically generated?

Here is the code I'm using to get the URLs...

>>> from HTMLParser import HTMLParser
>>> from urllib2 import urlopen
>>> class Spider(HTMLParser):
...   def __init__(self, url):
...       HTMLParser.__init__(self)
...       req = urlopen(url)
...       self.feed(req.read())
...   def handle_starttag(self, tag, attrs):
...       if tag == 'a' and attrs:
...             print "Found Link => %s" % attrs[0][1]





More information about the Python-list mailing list