[Spambayes] progress on POP+VM+ZODB deployment
Guido van Rossum
guido@python.org
Mon Oct 28 17:38:18 2002
> But for the vast majority of people, just knowing that a
> particular email has Bruce-spam-like content would be enough to want
> to filter it into a lower-priority folder, or even directly into
> Trash. At least, I see it as the job of the postmaster to provide a
> flag that could be used like that.
>
> To summarize: I think it's the job of a spam filter (or "flagger")
> to identify those messages univerally accepted as being spam --
> whether or not any one person likes that kind of mail. And although
> for any given spam there is _somebody_ on Earth who would want to
> read it, it would be up to them to set up their client-app filter
> rules to work how they want them to -- even if that includes running
> a local installation of SpamBayes to do personalized
> (high-resolution) filtering.
That would be a laudable goal, but the techniques pursued here don't
work like that. They can only do a good job if you train them on
*both* spam and non-spam. That's how the math of a Bayesian
classifier works, alas. Someone can probably prove that you can't
reduce the false positives more without knowing what *your* non-spam
looks like. It sounds like SpamAssassin might be your best bet if you
don't want to train on your non-spam (and even SpamAssassin requires
an elaborate "whitelist" setup to avoid flagging the most flagrant
spammish-looking non-spam).
--Guido van Rossum (home page: http://www.python.org/~guido/)