[Spambayes] How low can you go?

Skip Montanaro skip at pobox.com
Tue Dec 9 17:42:35 EST 2003

Okay, time for a little contest.  We've recently seen several users tout the
size of their training database.  I used to be one of those "enlarged
database" types, but no more.

Not long ago I dumped it all in favor of a more minimalist approach.  In the
past few days I noticed SB seemed to be leaving a large number of spam in my
unsure box, so today I deleted that (~450 spams and 250 hams) and started
from scratch again.  I figure I either had introduced some outright mistakes
into the database or had trained on some messages which are sort of
legitimately both ham and spam.  At any rate, it seemed easier to just start
from scratch than really figure out what was wrong.

At the moment I have trained on 14 spams and 20 hams and am quite pleased
with how its performing so far.  I've received mail for a half dozen or so
different mailing lists, and it's catching spams left and right.  I
anticipate a slew of unsures overnight as I get new kinds of email (both ham
and spam), but I will be damned selective about what I add to my database.

So, how small is yours? <wink>


More information about the Spambayes mailing list