[Spambayes] progress on POP+VM+ZODB deployment

Guido van Rossum guido@python.org
Mon Oct 28 17:38:18 2002


> 	But for the vast majority of people, just knowing that a
> particular email has Bruce-spam-like content would be enough to want
> to filter it into a lower-priority folder, or even directly into
> Trash.  At least, I see it as the job of the postmaster to provide a
> flag that could be used like that.
> 
> 	To summarize: I think it's the job of a spam filter (or "flagger")
> to identify those messages univerally accepted as being spam --
> whether or not any one person likes that kind of mail.  And although
> for any given spam there is _somebody_ on Earth who would want to
> read it, it would be up to them to set up their client-app filter
> rules to work how they want them to -- even if that includes running
> a local installation of SpamBayes to do personalized
> (high-resolution) filtering.

That would be a laudable goal, but the techniques pursued here don't
work like that.  They can only do a good job if you train them on
*both* spam and non-spam.  That's how the math of a Bayesian
classifier works, alas.  Someone can probably prove that you can't
reduce the false positives more without knowing what *your* non-spam
looks like.  It sounds like SpamAssassin might be your best bet if you
don't want to train on your non-spam (and even SpamAssassin requires
an elaborate "whitelist" setup to avoid flagging the most flagrant
spammish-looking non-spam).

--Guido van Rossum (home page: http://www.python.org/~guido/)