[spambayes-dev] Training DB question

Skip Montanaro skip at pobox.com
Sun Oct 12 11:33:38 EDT 2003

    Dan> I deployed your product across our enterprise and it works very
    Dan> well, but new employees who haven't collected any SPAM would seem
    Dan> to benefit from having an extensive DB like mine to start off
    Dan> of.

Maybe, maybe not.  In a work environment you're probably correct, as long as
your organization isn't too big.  The bigger it is, the more varied people's
interests will be.  I might be interested in unsolicited mailings from
someone like O'Reilley, while our group's administrative assistant probably
wouldn't be.  The advantage of having something to start with is that
SpamBayes will start classifying incoming mail immediately.  The
disadvantage is that if your notion of ham and spam differ from the new
employee's, those users will to perform more training to counteract your
initial training in those areas where you disagree.  In short, you should
probably not start them off with a huge database.  I'm supposed to be
working on a "starter database".  What I have (about 20 hams and spams) is
available at


(pickle format) though I haven't actually gotten around to testing it.


