Ask how to use HTMLParser
Water Lin
WaterLin at ymail.invalid
Thu Jan 7 22:44:48 EST 2010
I am a new guy to use Python, but I want to parse a html page now. I
tried to use HTMLParse. Here is my sample code:
----------------------
from HTMLParser import HTMLParser
from urllib2 import urlopen
class MyParser(HTMLParser):
title = ""
is_title = ""
def __init__(self, url):
HTMLParser.__init__(self)
req = urlopen(url)
self.feed(req.read())
def handle_starttag(self, tag, attrs):
if tag == 'div' and attrs[0][1] == 'articleTitle':
print "Found link => %s" % attrs[0][1]
self.is_title = 1
def handle_data(self, data):
if self.is_title:
print "here"
self.title = data
print self.title
self.is_title = 0
-----------------------
For the tag
-------
<div class="articleTitle">open article title</div>
-------
I use my code to parse it. I can locate the div tag but I don't know how
to get the text for the tag which is "open article title" in my example.
How can I get the html content? What's wrong in my handle_data function?
Thanks
Water Lin
--
Water Lin's notes and pencils: http://en.waterlin.org
Email: WaterLin at ymail.com
More information about the Python-list
mailing list