[Spambayes] Spambayes works so well, it's hard to keep training balanced

Claude Jones claudejones at tehogeeservices.com
Tue Oct 17 09:13:54 CEST 2006


I've been meaning to ask this for a long time. 
I run Spambayes on a Fedora Core 5 Linux machine, among others
It works extremely well. After two days training, it had pretty 
much figured out all my mail-list traffic, which is about 98% of 
my mail, and always classifies it as ham.
It catches about 90-95% of my spam.
On a typical day I get 600-800 messages.
Every day, I go through the review process from the Spambayes web 
interface window and check the unsures, which are nearly all spam 
and properly classify them. 
Over time, the result is that I've built up a huge imbalance of 
trained messages, nearly 1000 trained spam vs. 150 trained ham

So, how to regain balance? Should I just train on a group of mails 
that already have been correctly classified as ham, say an equal 
number from each of my mail-lists, to get things back in balance? 
Somehow, that seems counterintuitive to me - but I can't think of 
any other way. Spambayes just works too well...
-- 
Claude Jones
Brunswick, MD, USA


More information about the SpamBayes mailing list