[Spambayes] Spambayes repeatedly classifies essages frommailing
list as SPAM despite multiple (20+)recoveries fromspam folder
Ryan Malayter
rmalayter at bai.org
Fri Sep 5 12:50:35 EDT 2003
Meyer, Tony wrote:
> Spambayes works best trained with roughly equal numbers of ham & spam;
> we're still trying to come up with a good method of working with
> unbalanced training data. At the moment there is an option (defaults
I have several dozen folders in my mailbox that contain different types
of ham. All told, this is about 7000 messages, and I have about 1500
spam messages. I used these as my training corpus with plug-in version
007.
Should I instead create a "sample" folder of ham that contains about
1500 messages and train with that?
What about adding a feature to the plug-in that would could the number
of messages in each training folder, then use a random subsample of each
folder (spam or ham) as necessary to create a balanced training corpus?
More information about the Spambayes
mailing list