[Spambayes] To label or not to label, a practical question

Michael D. Adams mdmkolbe at gmail.com
Fri Jul 8 05:25:45 CEST 2005


My ISP provides a spam filtering service (server side) that labels the
things that they think are spam by putting an extra string in the
subject like (e.g. "--Spam--" at the front).  Their filters don't
catch everything so I want to also use SpamBayes to eliminate the spam
that my ISP doesn't label.  My question is whether or not I should
train SpamBayes with the spams that get labeled by my ISP.  I could
easily see SpamBayes picking up on the "--Spam--" string in the
subject line and filtering just based on that.  On the other hand
maybe that would introduce some selection bias or a bad spam vs ham
ratio for training (e.g. maybe I'll get 50 ham, 40 spam caught by my
ISP, and 10 spam not caught by my ISP (I don't know what the ratio is
yet, I only just started using my ISP's filter)).

Does anyone have any advice on whether these might interfere or how to
avoid that interference?  Should I even be using my ISP's filter along
with SpamBayes or just SpamBayes by itself?

Michael D. Adams
mdmkolbe at gmail.com


More information about the Spambayes mailing list