[spambayes-dev] Another incremental training idea...
kennypitt at hotmail.com
Tue Jan 13 18:10:40 EST 2004
Skip Montanaro wrote:
> For some reason, my ham/spam ratio is getting out-of-whack faster
> that it seemed to in the past.
This is just an unsubstantiated guess based on my experience with my own
e-mail mix. I get ham scores near 0.00 a lot more than I get spam
scores near 1.00. Maybe the non-edge training is discarding a higher
percentage of hams than it is spams. I suppose you could correct for
that by setting different edge thresholds, but maybe you've already done
I've also been kicking around some auto-training ideas hoping for time
to try them. One idea I had was based on a "sliding non-edge" scale.
You would set a max imbalance, say 2:1, beyond which you would train on
everything on the low side. As your imbalance falls back below the
maximum, auto-train would start skipping the "edge" messages with near
perfect classification scores. The closer you get to a perfect 1:1
balance, the closer to the cutoff score the message would need to be
before it would get auto-trained. Anyone see any obvious holes in this
More information about the spambayes-dev