[Spambayes] Ham:Spam ratio

Tony Meyer tameyer at ihug.co.nz
Sat Feb 21 22:09:09 EST 2004


[Tony Meyer]
> You'll probably find it's better to train *less spam* than 
> *more ham*.

[Russ Foster]
> I only train spam that is either misclassified to 'unsure'. Are you 
> recommending just deleting some of those without training?

I think it would be worth trying, if it would keep your database balanced,
yes.  Do you tend to get a lot of similar spam ending up in the 'unsure'
folder at the same time?  It might be that training on one or two or those
would be enough to classify all the rest (and ones arriving in the future)
correctly.  You can test this by training on a couple and then doing a
"Filter Now" on the 'unsure' folder, although that's a rather cumbersome
process.

Do most of the spam that end up in the 'unsure' folder score about a certain
level?  If they were mostly > 80%, for example, you could also move the spam
threshold down a bit (it's still *very* unlikely that a ham could score over
80%).

> I wish I had less spam to train!

:)

=Tony Meyer

---
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes. This
way, you get everyone's help, and avoid a lack of replies when I'm busy.




More information about the Spambayes mailing list