[Tutor] finding mismatched or unpaired html tags

A.T.Hofkamp a.t.hofkamp at tue.nl
Tue Apr 28 15:44:35 CEST 2009


Dinesh B Vadhia wrote:
> I'm processing tens of thousands of html files and a few of them contain mismatched tags and ElementTree throws the error:
> 
> "Unexpected error opening J:/F2/663/blahblah.html: mismatched tag: line 124, column 8"
> 
> I now want to scan each file and simply identify each mismatched or
> unpaired
tags (by line number) in each file. I've read the ElementTree docs and cannot
see anything obvious how to do this. I know this is a common problem but
feeling a bit clueless here - any ideas?
> 

Don't use elementTree, use BeautifulSoup instead.

elementTree expects perfect input, typically generated by another computer.
BeautifulSoup is designed to handle your everyday HTML page, filled with 
errors of all possible kinds.


Sincerely,
Albert



More information about the Tutor mailing list