[Spambayes] Bayesian drawbacks?

Brendon spambayes at whateley.com
Mon Sep 29 11:09:14 EDT 2003


Hi Gerrit,

I'm a new convert to Spambayes and I have to tell you that the training is not 
much of a drawback.  I started getting useful reductions in spam after 
training with just a handful of each good and bad email.  I didn't do any 
training BEFORE I started to use it, I just let it classify all the email as 
unsure until I had a few of each.  By the time I had trained on 10 each spam 
and ham it caught 23% of the spam.  The next day I was up to 58% of the spam 
caught and by the 3rd day it was catching 98%!!  Currently I've trained it on 
119 each of spam and ham, all using the web interface and I get only a few 
"unsure" emails a day.  After 3 days of use I was at 98% of the spam 
eliminated and NO FALSE POSITIVES!! My previous product used fixed filters, 
got about 60% of the spam with a trickle of false positives.

Since spam is always evolving, I just check the email that is classified as 
"unsure" every so often and train on those.  To keep the numbers even, I also 
train on a matching number of either good or bad email to balance the numbers 
of each type.  Unlike fixed filters that slowly start to be fooled as 
spammers start changing w0rds l1ke th1s, as soon as you train Spambayes on 
even one message like this, it KNOWS that words like that never (except in 
this email) appear in anything other than spam!

So, since training consists of selecting a radio button on a web page next to 
each message I want to train, the training aspect takes a few seconds per 
day.

Brendon.

On Monday 29 September 2003 03:37 am, Gerrit Holl wrote:
> Hi,
>
> I am considering to install a spamfilter on my machine. However, I don't
> know which one to choose. I have read about the Bayesian approach Spambayes
> is using. As I understand it, Spambayes needs to be trained in order to be
> useful. Isn't this a major drawback? Is it possible that a non-Bayesian
> approach would much better suit my needs, or did I misunderstand the
> Bayesian technique then?
>
> yours,
> Gerrit Holl.




More information about the Spambayes mailing list