[Spambayes] Spammer countermeasures against bayesian filters

Tue Jul 29 22:14:10 EDT 2003

[Sean True]
> ...
> Using statistical summaries of message properties has been
> intriguing, but every time I try it it seems to be
>
>   1) marginal and
>   2) not robust for some reason like this

WRT #1, a statistical summary token is just one token, so pretty much can't
have a strong effect.  The intent of giving everything the same "weight" is
to avoid creating an especially good thing to attack.  Lots of statistical
summary tokens could have a strong effect in concert -- although not
necessarily a good effect.

> I'm suspecting that the purely lexical tokenization tricks may be more
> interesting. Or figuring out a way to recognize 'Nigerian Spam' which
> is the _only_ spam I still see on anything like a regular basis.

Hmm.  You need to advertise your email address more <0.9 wink>.  Are
Nigerian S[cp]ams getting misclassified for you?  That would be interesting.
I haven't trained on one of those in months, cuz they're all nailed for me.