[Spambayes] Better default configuration for SpamBayes?
logies at logies.de
Sat Oct 29 16:21:51 CEST 2005
I switched from POPFile to Spambayes about 6 months ago, because I had
false positives with POPFile from time to time and POPFile is lacking the
I didn`t change the default configuration when I started with SpamBayes,
because I didn`t know the product. But I already was wondering about the
default configuration on the "untrained messages"-page, which is "defer"
for unsure messages, "ham" for presumably ham messages and "spam" for
presumably spam messages. PopFile was more restrictive, it only trained on
messages which were manually chosen for being trained.
The problem I have with this configuration is, that overtraining and wrong
classification happen too easily. I get hundreds of emails every day, and I
often found myself checking only the "unsure"-messages, then pressing the
"train"-button. I think, I did mistakes this way and the performance of
SpamBayes didn`t become better with time.
Luckily my hammie.db got corrupted recently when I was coming back from
holidays, did training of messages and the computer did a scheduled reboot
in the middle of the training...
The hammie.db had grown to over 20 MB, and classification had become a bit
Now I have a different default configuration: "defer" for unsure, "discard"
for ham/spam-messages. This way I avoid errors of classification. hammie.db
is only 650 kB now, SpamBayes is faster and seems to be as reliable as ever.
So perhaps you should change the default configuration of SpamBayes
Keep up the good work!
http://www.logies.de/ (u. a. _die_ Mailingliste für die Dentalbranche)
PGP-key (RSA/IDEA) kommt mit angeforderter Empfangsbestätigung (return receipt)
More information about the SpamBayes