[Tutor] problems with HTMLParser

Sean 'Shaleh' Perry shaleh@valinux.com
Mon, 22 Jan 2001 14:29:51 -0800 (PST)


I am trying to write a program that will parse the contents list of a howto
index.html.

Initial code:

from formatter import NullFormatter
from htmllib import HTMLParser

class myHTML(HTMLParser):
    def __init__(self, formatter, verbose = 0):
        HTMLParser.__init__(self, formatter, verbose)
        self.section_list = [] # to be filled in later
        
    def start_ul(self, attributes):
        pass

    def end_ul(self, attributes):
        pass

    def start_li(self, attributes):
        pass

    def end_li(self, attributes):
        pass

    def start_a(self, attributes):
        print attributes

I call this simply:

foo = myHTML(NullFormatter)
foo.feed(data)
foo.close()

the goal is to have the href and the text from the anchors inserted into
section_list.  Of course this should only happen when the anchor occurs in the
<li> tag, so I will have a variable to ensure this case.

The problem is that when i run the code I get:

Traceback (innermost last):
  File "./import_toc.py", line 28, in ?
    foo.feed(data)
  File "/usr/lib/python1.5/sgmllib.py", line 83, in feed
    self.goahead(0)
  File "/usr/lib/python1.5/sgmllib.py", line 104, in goahead
    if i < j: self.handle_data(rawdata[i:j])
  File "/usr/lib/python1.5/htmllib.py", line 42, in handle_data
    self.formatter.add_flowing_data(data)
TypeError: unbound method must be called with class instance 1st argument