[Spambayes] rebuild database?
kenny.pitt at gmail.com
Tue Oct 11 16:07:42 CEST 2005
On 10/10/05, mgleich <mgleich at nyc.rr.com> wrote:
> I've just realized that although my database is 536kb and that is not so
> large, it is composed of 702 spam and 110 ham. I gather this is extremely
> unbalanced and may explain why I'm getting false negatives.
Actually, 7 to 1 is really not an unusually high imbalance. We've seen
reports from people who have 100 to 1 or higher imbalances.
If you are getting false positives then imbalance is the most common
cause. A few false negatives are not uncommon, though, because spam is
constantly changing. If a relatively high percentage of your spam is
coming in as false negatives, then you might have an imbalance
problem. The best way to tell for sure is to see the spam clues for
one of the false negatives, which you can generate from the SpamBayes
> Do I need to begin from scratch? If so, do I just delete the db file and
> will Spambayes just create a new one?
For a 7 to 1 imbalance, I would usually say there is no need to begin
from scratch. However, SpamBayes learns quickly so it shouldn't hurt
to start over and see what happens. Since you know the size of your
DB, you've obviously located the file. You will probably see two files
with the *.db extension, one is the training data and the other
contains information about the messages that have been processed. Just
close Outlook, delete these 2 files, then restart Outlook and
SpamBayes should recreate the databases.
More information about the SpamBayes