[spambayes-dev] Another incremental training idea...

Kenny Pitt kennypitt at hotmail.com
Tue Jan 13 18:10:40 EST 2004


Skip Montanaro wrote:
> For some reason, my ham/spam ratio is getting out-of-whack faster
> that it seemed to in the past.

This is just an unsubstantiated guess based on my experience with my own
e-mail mix.  I get ham scores near 0.00 a lot more than I get spam
scores near 1.00.  Maybe the non-edge training is discarding a higher
percentage of hams than it is spams.  I suppose you could correct for
that by setting different edge thresholds, but maybe you've already done
that?

I've also been kicking around some auto-training ideas hoping for time
to try them.  One idea I had was based on a "sliding non-edge" scale.
You would set a max imbalance, say 2:1, beyond which you would train on
everything on the low side.  As your imbalance falls back below the
maximum, auto-train would start skipping the "edge" messages with near
perfect classification scores.  The closer you get to a perfect 1:1
balance, the closer to the cutoff score the message would need to be
before it would get auto-trained.  Anyone see any obvious holes in this
idea?

-- 
Kenny Pitt




More information about the spambayes-dev mailing list