[Spambayes] Incremental Training for ham in Outlook Plugin?
jsp at PKC.com
Tue Apr 25 15:26:47 CEST 2006
This is one for the training gurus. You can find a discussion of various
training approaches on the SpamBayes wiki
That said, I'll put my oar in. In general, the recommendation of the
gurus is along the lines of "don't worry, be happy:" as long as you're
getting satisfactory results, just use the training buttons to correct
classification errors. The bottom line is the quality of the results
you're getting; the suggestion to keep the ham:spam ratio close to 1 is
a guideline that seems to help achieve that result. I follow that
approach, and when I notice that I'm getting unsatisfactory results over
a period of time, I just discard my training database and start over.
SpamBayes learns very quickly, so I don't find it worthwhile to try to
tune the database over time.
Another thing to look at is the threshold scores for possible and
certain spam. I've dropped my certain spam threshold somewhat as I've
become more confident in my training data (it's now .70). This means
fewer possible spam messages that I then train as spam, which reduces
the ham:spam imbalance. I'm currently getting good results (>95%
correctly classified) with 53 ham and 171 spam trained on.
From: spambayes-bounces at python.org [mailto:spambayes-bounces at python.org]
On Behalf Of Gil Hurlbut
Sent: Monday, April 24, 2006 4:35 PM
To: spambayes at python.org
Subject: Re: [Spambayes] Incremental Training for ham in Outlook Plugin?
The question addresses the fact that SpamBayes is far better at
classifying ham once it is trained than it is in keeping up with
classifying new spam. I find it necessary to remove many spam messages
until I get to the point where the Manager has far more spam than ham.
Until I hear a recommendation differently, I'm going to get back to a
balance by moving known ham to my Unsure folder and click on "Recover
from Spam" to do the incremental training.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the SpamBayes