[Spambayes] rebuild database?
mgleich at nyc.rr.com
Wed Oct 12 06:50:23 CEST 2005
Thanks a lot for your help.
From: Kenny Pitt [mailto:kenny.pitt at gmail.com]
Sent: Tuesday, October 11, 2005 10:08 AM
Cc: Tony Meyer; spambayes at python.org
Subject: Re: [Spambayes] rebuild database?
On 10/10/05, mgleich <mgleich at nyc.rr.com> wrote:
> I've just realized that although my database is 536kb and that is not
> so large, it is composed of 702 spam and 110 ham. I gather this is
> extremely unbalanced and may explain why I'm getting false negatives.
Actually, 7 to 1 is really not an unusually high imbalance. We've seen
reports from people who have 100 to 1 or higher imbalances.
If you are getting false positives then imbalance is the most common cause.
A few false negatives are not uncommon, though, because spam is constantly
changing. If a relatively high percentage of your spam is coming in as false
negatives, then you might have an imbalance problem. The best way to tell
for sure is to see the spam clues for one of the false negatives, which you
can generate from the SpamBayes menu.
> Do I need to begin from scratch? If so, do I just delete the db file
> and will Spambayes just create a new one?
For a 7 to 1 imbalance, I would usually say there is no need to begin from
scratch. However, SpamBayes learns quickly so it shouldn't hurt to start
over and see what happens. Since you know the size of your DB, you've
obviously located the file. You will probably see two files with the *.db
extension, one is the training data and the other contains information about
the messages that have been processed. Just close Outlook, delete these 2
files, then restart Outlook and SpamBayes should recreate the databases.
__________ NOD32 1.1249 (20051011) Information __________
This message was checked by NOD32 antivirus system. http://www.eset.com
More information about the SpamBayes