[Spambayes] To think like a spammer...

Guido van Rossum guido@python.org
Sat, 28 Sep 2002 22:33:13 -0400


> The spambayes scheme (and others like it that I've seen) can be defeated
> easily, with something like this...
> 
> THIS  IS  A   F A N T A S T I C   O P P O R T U N I T Y ! !

It's an arms race.  I expect that the classification scheme can be
kept relatively constant (the math doesn't change) but the tokenizing
(better called feature extraction) scheme can and should be adapted
occasionally, to deal with new ways of hiding spam.  This particular
style can easily be recognized[*] *if* it becomes popular among
spammers; for anything you can come up with there's a tokenizer that
recognizes it.

But spammers will only start worrying if their return rates go down,
and that will only happen once almost everybody is using anti-spam
technology.  We've got a long way to go before that's the case.  So
let's not be stymied by worries about what the spammers can do.

[*] I wouldn't even bother trying to recover the words FANTASTIC
OPPORTUNITY.  This style is so completely unseen in ham that simply
looking for many consecutive one-letter words and inserting a token
representing such a presence would most likely be enough.

--Guido van Rossum (home page: http://www.python.org/~guido/)