Tidy HTML, was: "<!" in SGMLParser - an error ?
Walter Dörwald
walter at livinglogic.de
Thu Nov 15 08:48:24 EST 2001
Hernan M. Foffani wrote:
> The fact that with Python is soooo easy to grab and extract data from
> remote pages that annoys a lot when such pages aren't valid HTML.
>
> It's unfair to require that htmllib &co parses invalid HTML though.
> This problem can be solved with a simple routine that calls tidy
> through a pipe before calling the parser.
Better yet, use Marc-André Lemburgs mxTidy
(http://www.lemburg.com/files/python/mxTidy.html)
Bye,
Walter Dörwald
More information about the Python-list
mailing list