[Spambayes] Wonderful tool!

Tony Meyer tameyer at ihug.co.nz
Wed Mar 2 03:26:12 CET 2005


> The limited spam that makes it thru all seems to use the same
> countermeasure:
> This spam contains a high percentage of unrelated text that
> appears to either be from a novel/book or words from a dictionary.

This is often called "word salad".

> In some cases, this text is mildly concealed thru small and/or
> light font or else placing well beneath the main spam message.

SpamBayes currently ignores basically all HTML tags, which includes those
that change the formatting in this way, so all text is treated equally
regardless of what the mailer renders it like.

> Any advice on counter-countermeasures?
> Will the algorithm learn formatting anomalies such as this and
> I should just be patient?

If you just keep training on any mistakes, SpamBayes should cope quite well.
As long as the text is more-or-less random, then it shouldn't be any
problem.  The theory is that random words are just as likely to be in your
spam training as your ham training, so the overall effect is neglible.
There should be remaining clues (in the headers if nowhere else) that can be
used to get the right classification.  (Tailored spam, where the random text
is personally aimed at you is different, and spam where the spam content is
an image isn't handled as well).

Anyway, lots of people have said lots of things about this sort of attack,
but at the moment SpamBayes ought to manage reasonably well.

=Tony.Meyer

-- 
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes.
http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this.



More information about the Spambayes mailing list