[Spambayes] Data file "out of balance"...?

Tony Meyer tameyer at ihug.co.nz
Sun Jul 25 04:03:41 CEST 2004


> As a result, the ratio of spams to hams in my database 
> quickly goes up over 2:1 which I understand from the FAQs is 
> not the best way to have things set up.

I wouldn't worry about a 2::1 (or 1::2) imbalance.  Anything up to around
5::1 is probably fine - and if you keep getting the results that you want,
then don't worry about the imbalance at all.

> When this happens, I have thought to train on more hams in 
> the hope of getting the DB into better "balance" but I can't 
> figure out how to train on hams only. I can't move hams from 
> the spam folder because none are in there.
> 
> What is the best way for me to handle the situation I 
> describe?

Put the messages to train in an otherwise empty folder, and use the
SpamBayes Manager dialog's Training tab to "Train Now".

=Tony Meyer

---
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes. This
way, you get everyone's help, and avoid a lack of replies when I'm busy.



More information about the Spambayes mailing list