[spambayes-dev] Training DB question
Skip Montanaro
skip at pobox.com
Sun Oct 12 11:33:38 EDT 2003
Dan> I deployed your product across our enterprise and it works very
Dan> well, but new employees who haven't collected any SPAM would seem
Dan> to benefit from having an extensive DB like mine to start off
Dan> of.
Maybe, maybe not. In a work environment you're probably correct, as long as
your organization isn't too big. The bigger it is, the more varied people's
interests will be. I might be interested in unsolicited mailings from
someone like O'Reilley, while our group's administrative assistant probably
wouldn't be. The advantage of having something to start with is that
SpamBayes will start classifying incoming mail immediately. The
disadvantage is that if your notion of ham and spam differ from the new
employee's, those users will to perform more training to counteract your
initial training in those areas where you disagree. In short, you should
probably not start them off with a huge database. I'm supposed to be
working on a "starter database". What I have (about 20 hams and spams) is
available at
http://manatee.mojam.com/~skip/training.pck
(pickle format) though I haven't actually gotten around to testing it.
Skip
More information about the spambayes-dev
mailing list