[Spambayes] Ham:Spam ratio
tameyer at ihug.co.nz
Sat Feb 21 22:04:21 EST 2004
> Ok, using OL2k, I've rebuilt the database with 5+5. Now I
> want to be clear on what I should do with the Suspects. I
> have a suspect now with a score of 47% (limits are 80 and
> 15). If I understand, I should move this message to deleted
> items and NOT use the Delete As Spam button. Only use the
> buttons when ham ends up in Junk or when Spam ends up in Inbox.
> > You'll probably find it's better to train *less spam* than *more
> > ham*.
> Based on that, maybe I should also "Recover From Spam" and
> ham that ends up in Suspects because I'm likely to get way
> more spam in my Inbox than ham in my Junk (due to the big
> difference in my actual ham:spam ratio).
For the moment, use "Delete As Spam"/"Recover From Spam" for *any* mail that
appears in your 'Suspects' folder, and any spam that doesn't get identified,
and any ham that ends up in your 'Junk' folder (this last one should be
rare). You can probably just continue doing that, and your results should
be fine - even though you don't get a balanced amount of mail, the
percentage of each ending up in your 'Suspects' folder will probably be much
closer to even.
If you do find that the imbalance is starting to get large (say > 5::1 or
1::5), then you could move some of the (spam/ham, whichever you have most
of) messages that end up in 'Suspects' straight to the 'Deleted Items'
folder/your inbox rather than training on them. I suspect that this won't
be necessary, though, and the imbalance won't end up getting this high.
I hope this is starting to make sense! (There is a lot of ongoing
discussion about the best way to train, and how to make that easiest for
users, so this should hopefully get simpler as time goes on).
> I'm sorry I require so much hand-holding, but I appreciate your time.
No worries, glad to help.
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes. This
way, you get everyone's help, and avoid a lack of replies when I'm busy.
More information about the Spambayes