[spambayes-dev] FW: Overnight shipping on xãnax,valíum and more

Glenn Brown gbrown at alumni.caltech.edu
Mon May 3 13:33:38 EDT 2004


> "Bayes database has X, message database has Y" error message."

You're right.  Retraining fixed this and the unscored messages problem went
away.

> In any case, retraining does look like the best option,

Retraining worked well following Kenny Pitt's advice, which was to
retrain on just 5 spam and 5 ham and then train only on the hammiest spam
and spammiest ham in my archive, using "Filter messages" to rescore the
collection after each iteration.  It was horrifically slow, but required
only about 30 ham and 40 spam messages to be sure of all ham and catch 90%
of spam.

If I were to do it again, I would first lower the spam threshold from 90% to
70% or lower.  Methinks all those random words make it hard to be so sure of
spam.

Thanks to Kenny for his suggestions.  Sorry for the slow reply, but I was in
Hawaii. :)

> the imbalance can be addressed

In the hopes it will inspire me to code a fix for the imbalance problem, I
refuse to manually balance the database.  IMHO, SpamBayes will not be mature
until it doesn't require manual balancing, and the problem will not be
addressed if all potential developers manually balance.

Thanks for the help,
--Glenn





More information about the spambayes-dev mailing list