intolerant HTML parser

Jim jim.hefferon at gmail.com
Sat Feb 6 14:09:31 EST 2010


I generate some HTML and I want to include in my unit tests a check
for syntax.  So I am looking for a program that will complain at any
syntax irregularities.

I am familiar with Beautiful Soup (use it all the time) but it is
intended to cope with bad syntax.  I just tried feeding
HTMLParser.HTMLParser some HTML containing '<p>a<b>b</p></b>' and it
didn't complain.

That is, this:
        h=HTMLParser.HTMLParser()
        try:
            h.feed('<p>a<b>b</p></b>')
            h.close()
            print "I expect not to see this line"
        except Exception, err:
            print "exception:",str(err)
gives me "I expect not to see this line".

Am I using that routine incorrectly?  Is there a natural Python choice
for this job?

Thanks,
Jim



More information about the Python-list mailing list