Graham's spam filter (was Lisp to Python translation criticism?)

Erik Max Francis max at
Sun Aug 18 01:29:00 CEST 2002

"John E. Barham" wrote:

> But I don't think that a pickled dictionary/database would be
> unmanageably
> huge, even w/ a large set of input messages, since the rate of growth
> of the
> "vocabulary" (i.e., set of tokens) would slow as more messages were
> input. The spam probability database in particular is smaller than the
> "good" and
> "bad" ones since it has a frequency threshold.

That's true, but if your spam filter acts as a standalone program (i.e.,
one that is simply invoked from your .qmail and/or .forward file), it's
going to have to read that probability database each time an email comes
in.  Updating the database is much more intensive, but can happen much
less often.

 Erik Max Francis / max at /
 __ San Jose, CA, US / 37 20 N 121 53 W / ICQ16063900 / &tSftDotIotE
/  \ There is nothing so subject to the inconstancy of fortune as war.
\__/ Miguel de Cervantes
    Church /
 A lambda calculus explorer in Python.

More information about the Python-list mailing list