[Tutor] help with HTMLParseError

Peter Kim peateyk at gmail.com
Fri Feb 18 07:22:47 CET 2005


I'm using HTMLParser.py to parse XHTML and invalid tag is throwing an
exception.  How do I handle this?

1. Below is the faulty markup.  Notice the missing >.  Both Firefox
and IE6 correct automatically but HTMLParser is less forgiving.  My
code has to be able to treat this gracefully because I don't have
control over the XHTML source.

###/
<A NAME='anchor'</a>
/###

2. Below is the current code that raises a self.error("malformed start
tag") at line 301 in HTMLParser.py due to the invalid markup.

###/
from HTMLParser import HTMLParser

def parseHTML(htmlsource):
    class MyHTMLParser(HTMLParser):
        def handle_starttag(self, tag, attrs):
           print "<%s>" % tag,
        def handle_endtag(self, tag):
            print "</%s>" % tag,
    MyParser = MyHTMLParser()
    MyParser.feed(htmlsource)
    MyParser.close()
    return MyParser.output()

if __name__ == "__main":
    htmlsource = r"<P><A NAME='anchor'</a></P>"
    result = parseHTML(htmlsource)
/###

3. I think the ideal solution is to be able to do something like
below, but I don't know how.

###/
class MyHTMLParseError(HTMLParseError):
    if self.message == "malformed start tag":
        text.append(">")
    else:
        raise
/###

Thanks in advance for the help!


More information about the Tutor mailing list