[spambayes-dev] RE: [Spambayes] question regarding training

Kenny Pitt kennypitt at hotmail.com
Wed Aug 11 17:06:27 CEST 2004


This discussion seems more appropriate for the dev list, so moving it there.

Coe, Bob wrote:
> To me, the solution to the problem seems obvious and almost absurdly
> easy to implement: When the imbalance reaches a certain level
> (determined by the Spambayes gurus), have the program start training
> on every nth message it classifies as ham. Do this until the desired
> balance is restored.

A while back, I devised a scheme that I wanted to try out where SpamBayes
would automatically train on random messages of the classification that is
too low, and the probability of training would be based on the extent of the
imbalance.  Probability would be 0 until imbalance reaches a starting
threshold such as 2:1, and would increase exponentially to 1 as the
imbalance moves toward a maximum threshold, say 5:1.

Unfortunately, automatic training is not as easy as it would seem in the
Outlook add-in.  The add-in uses a different trainer so that it can keep
track of the Outlook id's of trained messages, rescore trained messages,
etc.  This makes it difficult to call the trainer from inside the classifier
because the classifier is shared with the other SpamBayes apps such as
sb_server.  I'm sure there are some simple modifications that could be made
to the code structure so that this could be implemented, but I just haven't
found the time yet to work out the details.

-- 
Kenny Pitt



More information about the spambayes-dev mailing list