Graham's spam filter

Heiko Wundram heikowu at ceosg.de
Thu Aug 22 23:38:08 CEST 2002


Am Don, 2002-08-22 um 22.16 schrieb Erik Max Francis:
> This doesn't sound like the right approach to me.  Instead, you should
> perhaps start with a "global" database that is a sample of fairly
> typical mail from your clients and typical spam.  These should be used
> as an initial "seed" to the system only; once a user starts actually
> actively using the system to filter his mail, it can tailor itself to
> his specific needs.  The "global" database is simply a seed, so it never
> needs to be updated; it's just to get the customer user-specific
> databases started.

That's what I propose... Keeping a central database for typical spam
words (a public database containing the SPAM-Corpus), and a private
database containing the non-spam words occurances (non-spam corpus). The
words probability database is kept separate on each computer...

Guess this would help.

By the way, the people I write this thing for are not my customers; the
admin-job I do here isn't paid for. Just for fun. :) (I live in this
dorm)

Yours,

	Heiko W.






More information about the Python-list mailing list