intolerant HTML parser
Jim
jim.hefferon at gmail.com
Sat Feb 6 14:09:31 EST 2010
I generate some HTML and I want to include in my unit tests a check
for syntax. So I am looking for a program that will complain at any
syntax irregularities.
I am familiar with Beautiful Soup (use it all the time) but it is
intended to cope with bad syntax. I just tried feeding
HTMLParser.HTMLParser some HTML containing '<p>a<b>b</p></b>' and it
didn't complain.
That is, this:
h=HTMLParser.HTMLParser()
try:
h.feed('<p>a<b>b</p></b>')
h.close()
print "I expect not to see this line"
except Exception, err:
print "exception:",str(err)
gives me "I expect not to see this line".
Am I using that routine incorrectly? Is there a natural Python choice
for this job?
Thanks,
Jim
More information about the Python-list
mailing list