Ask how to use HTMLParser

Water Lin WaterLin at ymail.invalid
Fri Jan 8 01:44:16 EST 2010


h0uk <vardan.pogosyan at gmail.com> writes:

> On 8 янв, 08:44, Water Lin <Water... at ymail.invalid> wrote:
>> I am a new guy to use Python, but I want to parse a html page now. I
>> tried to use HTMLParse. Here is my sample code:
>> ----------------------
>> from HTMLParser import HTMLParser
>> from urllib2 import urlopen
>>
>> class MyParser(HTMLParser):
>>     title = ""
>>     is_title = ""
>>     def __init__(self, url):
>>         HTMLParser.__init__(self)
>>         req = urlopen(url)
>>         self.feed(req.read())
>>
>>     def handle_starttag(self, tag, attrs):
>>         if tag == 'div' and attrs[0][1] == 'articleTitle':
>>             print "Found link => %s" % attrs[0][1]
>>             self.is_title = 1
>>
>>     def handle_data(self, data):
>>         if self.is_title:
>>             print "here"
>>             self.title = data
>>             print self.title
>>             self.is_title = 0
>> -----------------------
>>
>> For the tag
>> -------
>> <div class="articleTitle">open article title</div>
>> -------
>>
>> I use my code to parse it. I can locate the div tag but I don't know how
>> to get the text for the tag which is "open article title" in my example.
>>
>> How can I get the html content? What's wrong in my handle_data function?
>>
>> Thanks
>>
>> Water Lin
>>
>> --
>> Water Lin's notes and pencils:http://en.waterlin.org
>> Email: Water... at ymail.com
>
> I want to say your code works well

But in handle_data I can't print self.title. I don't why I can't set the
self.title in handle_data.

Thanks

Water Lin

-- 
Water Lin's notes and pencils: http://en.waterlin.org
Email: WaterLin at ymail.com



More information about the Python-list mailing list