[Tutor] problems with HTMLParser
Sean 'Shaleh' Perry
shaleh@valinux.com
Mon, 22 Jan 2001 14:29:51 -0800 (PST)
I am trying to write a program that will parse the contents list of a howto
index.html.
Initial code:
from formatter import NullFormatter
from htmllib import HTMLParser
class myHTML(HTMLParser):
def __init__(self, formatter, verbose = 0):
HTMLParser.__init__(self, formatter, verbose)
self.section_list = [] # to be filled in later
def start_ul(self, attributes):
pass
def end_ul(self, attributes):
pass
def start_li(self, attributes):
pass
def end_li(self, attributes):
pass
def start_a(self, attributes):
print attributes
I call this simply:
foo = myHTML(NullFormatter)
foo.feed(data)
foo.close()
the goal is to have the href and the text from the anchors inserted into
section_list. Of course this should only happen when the anchor occurs in the
<li> tag, so I will have a variable to ensure this case.
The problem is that when i run the code I get:
Traceback (innermost last):
File "./import_toc.py", line 28, in ?
foo.feed(data)
File "/usr/lib/python1.5/sgmllib.py", line 83, in feed
self.goahead(0)
File "/usr/lib/python1.5/sgmllib.py", line 104, in goahead
if i < j: self.handle_data(rawdata[i:j])
File "/usr/lib/python1.5/htmllib.py", line 42, in handle_data
self.formatter.add_flowing_data(data)
TypeError: unbound method must be called with class instance 1st argument