[Spambayes] Table munging defeats SpamBayes

Mathew Hendry mdhpub at blueyonder.co.uk
Sun Dec 7 06:23:50 EST 2003

Mathew Hendry wrote:

>Here's a new one on me. I spotted just now when looking through my spam
>corpus for "low scorers". The rendered version of the spam (In Outlook 2003
>anyway) looks like this:
>All Rx Products
>Consultation at no cost
>No embarassing M.D. visits
>I want to know more <link>
>Most of that text is broken up into tiny pieces and inserted into tables,
>followed by a huge <a> block filled mostly with randomized English text but
>also, at the end, containing the web site href and "I want to know more".
>SpamBayes gobbles up all the text inside the <a> but doesn't spot the
>contents of the table because each apparent token is only 1-3 characters

I've received several more of these, and they're all scoring pretty low.
Once more spammers realize that they can slip past filters this way I'm sure
we'll see more of them.

Does anyone know of a Python text-based HTML renderer (or just an HTML table
renderer) that could help in trapping them?

-- Mat.

