[Spambayes] Better default configuration for SpamBayes?

Michael Logies logies at logies.de
Sat Oct 29 16:21:51 CEST 2005


Hello,

I switched from POPFile to Spambayes about 6 months ago, because I had 
false positives with POPFile from time to time and POPFile is lacking the 
"unsure"-classification.

I didn`t change the default configuration when I started with SpamBayes, 
because I didn`t know the product. But I already was wondering about the 
default configuration on the "untrained messages"-page, which is "defer" 
for unsure messages, "ham" for presumably ham messages and "spam" for 
presumably spam messages. PopFile was more restrictive, it only trained on 
messages which were manually chosen for being trained.

The problem I have with this configuration is, that overtraining and wrong 
classification happen too easily. I get hundreds of emails every day, and I 
often found myself checking only the "unsure"-messages, then pressing the 
"train"-button. I think, I did mistakes this way and the performance of 
SpamBayes didn`t become better with time.

Luckily my hammie.db got corrupted recently when I was coming back from 
holidays, did training of messages and the computer did a scheduled reboot 
in the middle of the training...
The hammie.db had grown to over 20 MB, and classification had become a bit 
slow.
Now I have a different default configuration: "defer" for unsure, "discard" 
for ham/spam-messages. This way I avoid errors of classification. hammie.db 
is only 650 kB now, SpamBayes is faster and seems to be as reliable as ever.

So perhaps you should change the default configuration of SpamBayes 
accordingly?

Keep up the good work!

Best Regards

Michael
--
http://www.logies.de/ (u. a. _die_ Mailingliste für die Dentalbranche)
PGP-key (RSA/IDEA) kommt mit angeforderter Empfangsbestätigung (return receipt)



More information about the SpamBayes mailing list