Graham's spam filter

Erik Max Francis max at
Fri Aug 23 02:44:37 CEST 2002

Heiko Wundram wrote:

> That's what I propose...

What you describe below is not the same thing as I was suggesting.  I
was suggesting having a static, never-changing "seed" of a nominal
sample of typical spam messages and (perhaps faked) legitimate messages
that every user could start with to jumpstart their analysis.  Everyone
starts on the same footing, because they need somewhere to go. 
Thereafter, they build up their own private good and bad corpora
independently of the initial seed and all other users.

> Keeping a central database for typical spam
> words (a public database containing the SPAM-Corpus), and a private
> database containing the non-spam words occurances (non-spam corpus).
> The
> words probability database is kept separate on each computer...

Then this raises issue of who decides what goes into the spam corpus. 
If that's decided by third parties, then someone has the potential of
reading private mail.  And it still comes down to a matter of
individualty:  _Both_ the corpora need to vary with my own personal
taste, or otherwise it's not going to accurately reflect what _I_ as an
individual want to see.  Someone who gets a lot of (legitimate)
commercial email may get a lot of false positives and have difficulty
doing anything about it, since he can't control the spam corpus, only
his (private) non-spam corpus.

> By the way, the people I write this thing for are not my customers;
> the admin-job I do here isn't paid for. Just for fun. :) (I live in
> this
> dorm)

The client/server model also brings up another obvious issue:  Ask you
what your roommates think about the potentiality for you to be able to
read all of their email.

 Erik Max Francis / max at /
 __ San Jose, CA, US / 37 20 N 121 53 W / ICQ16063900 / &tSftDotIotE
/  \ There is nothing so subject to the inconstancy of fortune as war.
\__/ Miguel de Cervantes
    Church /
 A lambda calculus explorer in Python.

More information about the Python-list mailing list