HTMLParser and Quotes

Richard West rwest2 at
Thu Jan 2 22:06:57 CET 2003

Thank you!  Fortunately my app is not high volume.  This should do


On Thu, 02 Jan 2003 13:43:04 -0700, Andrew Dalke
<adalke at> wrote:

>Richard Brodie:
>> HTMLParser is a fairly straightforward parser: it mostly follows the SGML
>> syntax rules. That means that it is of little use for most of the HTML out on
>> the web. Whilst an DWIM parser might be useful, it could get out of hand,
>> and I'm fairly happy that the standard library one stops on the first error.
>> In a few years the XML ones will error anyway.
>In the meanwhile, you can use something like HTML Tidy
>and  Marc-André Lemburg Python interface to it, mxTidy
>to clean up input HTML, like this
> >>> from mx import Tidy
> >>> from HTMLParser import HTMLParser
> >>> text = """<html>
>... <body>
>... <font face=arial,helvetica>test</font>
>... </body>
>... </html>"""
> >>>
> >>> print Tidy.Tidy.tidy(text)[2]
><!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
><font face="arial,helvetica">test</font>
> >>>
> >>> x = HTMLParser()
>					Andrew
>					dalke at

More information about the Python-list mailing list