and parsing malformed HTML

KC nskhcarlso at
Tue Sep 2 14:15:03 CEST 2003

Thomas Güttler wrote:
> Hi,
> You could use tidy ( before you
> parse the html.

I appreciate the suggestion but unfortunately this will not work well 
for me as the parser runs as part of a cron job.  I wouldn't be able to 
review the tidy error log in a timely fashion if there was a problem.

What would be really nice is a way to tell the parser it was "inside" a 
<TR> when I encountered a <TD> after a closing </TR>.  Browsers still 
display the HTML correctly without a starting <TR>, but if the closing 
</TR> is omitted everything gets mangled.

Any other suggestions?

More information about the Python-list mailing list