Graham's spam filter
heikowu at ceosg.de
Thu Aug 22 23:38:08 CEST 2002
Am Don, 2002-08-22 um 22.16 schrieb Erik Max Francis:
> This doesn't sound like the right approach to me. Instead, you should
> perhaps start with a "global" database that is a sample of fairly
> typical mail from your clients and typical spam. These should be used
> as an initial "seed" to the system only; once a user starts actually
> actively using the system to filter his mail, it can tailor itself to
> his specific needs. The "global" database is simply a seed, so it never
> needs to be updated; it's just to get the customer user-specific
> databases started.
That's what I propose... Keeping a central database for typical spam
words (a public database containing the SPAM-Corpus), and a private
database containing the non-spam words occurances (non-spam corpus). The
words probability database is kept separate on each computer...
Guess this would help.
By the way, the people I write this thing for are not my customers; the
admin-job I do here isn't paid for. Just for fun. :) (I live in this
More information about the Python-list