[Spambayes] Feature idea: Autobalancing ham/spam

Amedee Van Gasse amedee at amedee.be
Wed Nov 28 11:22:35 CET 2007

On Tue, November 27, 2007 15:01, Thomas Hruska wrote:
> I've been thinking about how I'm going to balance my ham (10,641
> messages) and spam (60,230 messages).  What I plan on doing is
> discarding spam and then just train on ham until they are balanced.  It
> will take a while because the incoming ratio of ham to spam is fairly
> ridiculous.
> While this approach will work, I'm thinking it would be nice for
> Spambayes to automatically balance itself when some configurable
> percentage is hit on either end of the spectrum so that I wouldn't have
> to worry about it.  There will ALWAYS be more spam than ham.  Most users
> of Spambayes think like me:  Continue training on the spam in the hope
> that it will completely go away.  Why concern users with balance issues
> that should be, IMO, handled automatically?

I think it is easier to acknowledge that spam won't go away, that no
solution is perfect, and that it is less work to retrain from scratch when
your ham/spam ratio becomes ridiculous.

Amedee Van Gasse
amedee at amedee.be

