HTMLParser and Quotes

Richard West rwest2 at opti.cgi.net
Thu Jan 2 16:04:27 EST 2003


On Thu, 2 Jan 2003 12:35:57 -0000, "Richard Brodie"
<R.Brodie at rl.ac.uk> wrote:

>
>"Richard West" <rwest2 at opti.cgi.net> wrote in message
>news:qpj71v0as06msdmoj6nep17dptnvtqlml6 at 4ax.com...
>
>> The face should obviously have quotes around its value, but under the
>> circumstances I would think HTMLParser should take anything up until
>> the next space or end of the tag as its value.
>
>HTMLParser is a fairly straightforward parser: it mostly follows the SGML
>syntax rules. That means that it is of little use for most of the HTML out on
>the web. Whilst an DWIM parser might be useful, it could get out of hand,
>and I'm fairly happy that the standard library one stops on the first error.
>In a few years the XML ones will error anyway.
>


Well, standards compliance is nice, but a non-strict mode would be
extremely helpful in this not-so-perfect world.  I'm putting together
a home grown spam filter and the unfortunate truth that I'm finding is
that Outlook Express does not always generate standards compliant html
emails.  And the web at large does not look much rosier.  I wish I
could say that pointing at standards and proclaiming
it's-not-our-fault does us good, but all I can see is how this
diminishes the value of an otherwise fine part of our beloved
language.

-Richard









More information about the Python-list mailing list