[Spambayes] Spambayes repeatedly classifies essages frommailing list as SPAM despite multiple (20+)recoveries fromspam folder

Ryan Malayter rmalayter at bai.org
Fri Sep 5 12:50:35 EDT 2003


Meyer, Tony wrote:
> Spambayes works best trained with roughly equal numbers of ham & spam;
> we're still trying to come up with a good method of working with
> unbalanced training data.  At the moment there is an option (defaults

I have several dozen folders in my mailbox that contain different types
of ham. All told, this is about 7000 messages, and I have about 1500
spam messages. I used these as my training corpus with plug-in version
007.

Should I instead create a "sample" folder of ham that contains about
1500 messages and train with that?

What about adding a feature to the plug-in that would could the number
of messages in each training folder, then use a random subsample of each
folder (spam or ham) as necessary to create a balanced training corpus?



More information about the Spambayes mailing list