[Spambayes] Feature idea: Autobalancing ham/spam
thruska at cubiclesoft.com
Tue Nov 27 15:01:44 CET 2007
I've been thinking about how I'm going to balance my ham (10,641
messages) and spam (60,230 messages). What I plan on doing is
discarding spam and then just train on ham until they are balanced. It
will take a while because the incoming ratio of ham to spam is fairly
While this approach will work, I'm thinking it would be nice for
Spambayes to automatically balance itself when some configurable
percentage is hit on either end of the spectrum so that I wouldn't have
to worry about it. There will ALWAYS be more spam than ham. Most users
of Spambayes think like me: Continue training on the spam in the hope
that it will completely go away. Why concern users with balance issues
that should be, IMO, handled automatically?
Another option could be to calculate the ratio of ham to spam and alter
the "strength" of the ham/spam clues according to the ratio. However,
this is probably a bad idea.
I'm running Spambayes 1.0.4.
*NEW* MyTaskFocus 1.1
Get on task. Stay on task.
More information about the SpamBayes