[Spambayes] rebuild database?

mgleich mgleich at nyc.rr.com
Mon Oct 10 15:21:36 CEST 2005

I've just realized that although my database is 536kb and that is not so
large, it is composed of 702 spam and 110 ham.  I gather this is extremely
unbalanced and may explain why I'm getting false negatives.  

I'm using the Outlook plugin.

Do I need to begin from scratch?  If so, do I just delete the db file and
will Spambayes just create a new one?  
Or, what is the best way to fix this?  


-----Original Message-----
From: Tony Meyer [mailto:tameyer at ihug.co.nz] 
Sent: Monday, October 10, 2005 4:50 AM
To: mgleich
Cc: spambayes at python.org
Subject: Re: [Spambayes] rebuild database?

> Recently I've been getting some false negatives - essentially,
> email from friends that is suddenly placed in the questionable, or  
> suspect folder when many other emails from these friends have been  
> processed and seen as ham.
> I began by training as I went, no prior collection of email on
> which to train.

How balanced is your database?  SpamBayes works best with a roughly  
equal number of ham and spam trained.  Unfortunately, retaining this  
balance can be difficult to do, depending on the incoming mail  
stream.  Since SpamBayes learns quickly, retraining from scratch  
periodically is sometimes a good idea, or adjusting the thresholds so  
fewer messages are trained might help.

> I'm wondering - should the box that says "rebuild entire database"
> be checked by default?

That only has any effect when you are training via the "Training" tab  
in the SpamBayes Manager.  If you tick it, then the training that you  
do will completely replace any existing training.  If you don't, then  
the training will be added on to whatever previous training has been  

> Also, my database is now 2,536 kb - is this too large?

How many messages is that?  (The General tab of the SpamBayes Manager  
dialog tells you this).  It doesn't sound that large, but I don't use  
the same database system, so I'm a bit rusty on what a normal bsddb  
database size is.


Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes.
http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this.

__________ NOD32 1.1248 (20051010) Information __________

This message was checked by NOD32 antivirus system. http://www.eset.com

More information about the SpamBayes mailing list