[Spambayes] Spambayes repeatedly classifies essages
frommailinglist as SPAM despite multiple (20+)recoveries
fromspam folder
Meyer, Tony
T.A.Meyer at massey.ac.nz
Mon Sep 8 22:20:34 EDT 2003
> I have several dozen folders in my mailbox that contain
> different types of ham. All told, this is about 7000
> messages, and I have about 1500 spam messages. I used these
> as my training corpus with plug-in version 007.
>
> Should I instead create a "sample" folder of ham that
> contains about 1500 messages and train with that?
That's probably a good idea. A little imbalance doesn't hurt (you could
have 2000, for example), but equal numbers are best.
> What about adding a feature to the plug-in that would could
> the number of messages in each training folder, then use a
> random subsample of each folder (spam or ham) as necessary to
> create a balanced training corpus?
An interesting idea. I've opened a feature request here:
<http://sourceforge.net/tracker/index.php?func=detail&aid=802341&group_i
d=61702&atid=498106>
We'll see what Mark has to say ;)
=Tony Meyer
More information about the Spambayes
mailing list