[Spambayes] Mail with problem

Tim Peters tim.one@comcast.net
Thu Nov 14 20:33:47 2002


[Tim Stone]
> Point taken.  Makes me wonder, though, if we might not have a
> problem like this when this starts getting used by regular folks, like
>with the proxy...

The OP attached the email in question to his msg.  It tokenized fine on my
box at the time (Win2K), and if I don't hear about it causing problems on
other boxes either, then I'll assume it's Just Another Glitch specific to
Mac OS 9.

> I suppose the reason we're not using python's html parser is
> performance...?

Flyswatters versus dynamite mostly.  We're not doing anything with HTML
except throwing it away.  Half-assed regexps can do a fine job of this, are
very robust against ill-formed HTML too, against damaged email that intended
to call itself text/html but forgot to, etc.  If we need to do fancier
things with HTML, then a real parser becomes correspondingly more
attractive.




More information about the Spambayes mailing list