Graham's spam filter (was Lisp to Python translation criticism?)

Paul Rubin phr-n2002b at
Sat Aug 17 23:03:29 CEST 2002

Erik Max Francis <max at> writes:
> One obvious and immediate issue is that for an industrial-strength
> filter, the database gets _huge_ (Graham's basic setup involved 4000
> messages each in the spam and nonspam corpora), and reading and writing
> the database (even with cPickle) each time a spam message comes through
> starts to become intensive.

Why not use dbhash?  I think there's also a Python cdb wrapper somewhere.

More information about the Python-list mailing list