HTMLParser question
Benjamin Niemann
b.niemann at betternet.de
Thu Aug 19 11:51:08 EDT 2004
Rajarshi Guha wrote:
> Hi,
> I have some HTML that looks essentially consists of a series of <div>'s
> and each <div> having one of two classes (tnt-question or tnt-answer).
> I'm using HTMLParser to handle the tags as:
>
> class MyHTMLParser(HTMLParser.HTMLParser):
>
> def handle_starttag(self, tag, attrs):
> if len(attrs) == 1:
> cls,whichcls = attrs[0]
> if whichcls == 'tnt-question':
> print self.get_starttag_text(), self.getpos()
> def handle_endtag(self, tag):
> pass
> def handle_data(self, data):
> print data
>
> if __name__ == '__main__':
>
> htmldata = string.join(open('tt.html','r').readlines())
> parser = MyHTMLParser()
> parser.feed( htmldata )
>
> However what I would like is that when the parser reaches some HTML like
> this:
>
> <div class="tnt-question">
> How do I add a user to a MySQL system?
> </div>
>
> I should get back the data between the open and close tags. However the
> above code prints the text contained between all tags, not just the <div>
> tags with the class='tnt-question'.
>
> Is there a way to call handle_data() when a specific tag is being handled?
> Placing a call to handle_data() in handle_starttag seems to be the way -
> but I';m not sure how to actually do it - what data should I pass to the
> call?
Set a flag, when you the parser calls handle_starttag() and the tag
matches your criteria, unset it, when the corresponding endtag is found
(you'll probably have to count the nesting depth, so for
<div class="printme">Yo <div>man</div>!</div>
the flag is unset on the second </div>). Then in handle_data() only
print it, when the flag is set.
More information about the Python-list
mailing list