[Tutor] problems with HTMLParser

Danny Yoo dyoo@hkn.eecs.berkeley.edu
Wed, 24 Jan 2001 00:30:50 -0800 (PST)


On Tue, 23 Jan 2001, Sean 'Shaleh' Perry wrote:

> gah, HTMLParser should not be this hard.  Also, when I get this working 
> eventually I want the text wrapped in the anchor too, how do i get that?

Hmmm... let's take another look.  From the last version of your source:

###
class myHTML(HTMLParser):
    def __init__(self, formatter, verbose = 0):
        HTMLParser.__init__(self, formatter, verbose)
        self.section_list = [] # to be filled in later

    def start_ul(self, attributes):
        pass

    def end_ul(self, attributes):
        pass

    def start_li(self, attributes):
        pass

    def end_li(self, attributes):
        pass

    def start_a(self, attributes):
        print attributes
###

So there's a definition for start_a(), but there needs to be a definition
of end_a(); they act as a pair.  That's probably why it's complaining:

>   File "/usr/lib/python1.5/sgmllib.py", line 336, in handle_endtag
>     method()  # seems to die on end_a?
> TypeError: not enough arguments; expected 2, got 1

Add a do-nothing end_a() method to your class, and with luck, that should
fix that bug.


About the line justification; I think that one of the regular formatters
will actually do line justification for you, but I haven't looked too
closely into them yet.  Take a look at DumbWriter:

    http://python.org/doc/current/lib/writer-interface.html

and see if it's suitable for your program.