[Tutor] help with HTMLParseError
Peter Kim
peateyk at gmail.com
Fri Feb 18 07:22:47 CET 2005
I'm using HTMLParser.py to parse XHTML and invalid tag is throwing an
exception. How do I handle this?
1. Below is the faulty markup. Notice the missing >. Both Firefox
and IE6 correct automatically but HTMLParser is less forgiving. My
code has to be able to treat this gracefully because I don't have
control over the XHTML source.
###/
<A NAME='anchor'</a>
/###
2. Below is the current code that raises a self.error("malformed start
tag") at line 301 in HTMLParser.py due to the invalid markup.
###/
from HTMLParser import HTMLParser
def parseHTML(htmlsource):
class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
print "<%s>" % tag,
def handle_endtag(self, tag):
print "</%s>" % tag,
MyParser = MyHTMLParser()
MyParser.feed(htmlsource)
MyParser.close()
return MyParser.output()
if __name__ == "__main":
htmlsource = r"<P><A NAME='anchor'</a></P>"
result = parseHTML(htmlsource)
/###
3. I think the ideal solution is to be able to do something like
below, but I don't know how.
###/
class MyHTMLParseError(HTMLParseError):
if self.message == "malformed start tag":
text.append(">")
else:
raise
/###
Thanks in advance for the help!
More information about the Tutor
mailing list