Water Lin WaterLin at ymail.invalid
Fri Jan 8 04:44:48 CET 2010

I am a new guy to use Python, but I want to parse a html page now. I
tried to use HTMLParse. Here is my sample code:
from HTMLParser import HTMLParser
from urllib2 import urlopen

class MyParser(HTMLParser):
    title = ""
    is_title = ""
    def __init__(self, url):
        req = urlopen(url)

    def handle_starttag(self, tag, attrs):
        if tag == 'div' and attrs[0][1] == 'articleTitle':
            print "Found link => %s" % attrs[0][1]
            self.is_title = 1

    def handle_data(self, data):
        if self.is_title:
            print "here"
            self.title = data
            print self.title
            self.is_title = 0

For the tag
<div class="articleTitle">open article title</div>

I use my code to parse it. I can locate the div tag but I don't know how
to get the text for the tag which is "open article title" in my example.

How can I get the html content? What's wrong in my handle_data function?


