[Spambayes] interesting paper: _Exploiting Machine Learning to Subvert Your Spam Filter_ (fwd)
Justin Mason
jm at jmason.org
Sun Apr 20 23:13:01 CEST 2008
Hi folks -- in case you haven't yet seen this:
'This paper shows how an adversary can exploit statistical machine
learning, as used in the SpamBayes spam filter, to render it useless--even
if the adversary's access is limited to only 1% of the training messages.
We further demonstrate a new class of focused attacks that successfully
prevent victims from receiving specific email messages. Finally, we
introduce two new types of defenses against these attacks.'
http://www.usenix.org/event/leet08/tech/full_papers/nelson/nelson_html/
Basically, measuring the effects of loading spams with huge dictionaries
in order to increase false positive frequencies, once the mail has been
trained on.
Would be interested to hear what people think -- personally:
- 1. this is very similar to
http://www.cs.dal.ca/research/techreports/2004/CS-2004-06.shtml , and I
haven't seen spammers using those attacks in the intervening 4 years.
- 2. I wonder how big the messages have to be, in order to affect training
in a relatively small number of messages. Maybe limiting the number of
tokens trained on per message, might help.
It might be worthwhile implementing the described "RONI" scheme
to avoid the less targeted form of the issue anyway.
--j.
More information about the SpamBayes
mailing list