[Spambayes] Suggestion for HTML analysis

Tim Peters tim.one at comcast.net
Mon Sep 15 13:34:04 EDT 2003

[wsy at merl.com]
> Removing spurious HTML comments is one of the two things that
> CRM114 Mailfilter does that isn't totally learned-behavior (the
> other is popping open base64's)
> It helps a lot.  SpamBayes should at least consider it.

We already remove all HTML tags (including nonsense tags), comment sections,
and style sections.  base64 attachments get decoded if and only if they have
MIME type text/* (so, e.g., we don't bother decoding images or audio files).

These are all instances of that spambayes tries to score what the user sees,
rather than how it happened to be encoded.  Special parsing of embedded URLs
is a big exception to that.

More information about the Spambayes mailing list