Graham's spam filter (was Lisp to Python translation criticism?)

Paul Rubin phr-n2002b at NOSPAMnightsong.com
Sat Aug 17 17:03:29 EDT 2002


Erik Max Francis <max at alcyone.com> writes:
> One obvious and immediate issue is that for an industrial-strength
> filter, the database gets _huge_ (Graham's basic setup involved 4000
> messages each in the spam and nonspam corpora), and reading and writing
> the database (even with cPickle) each time a spam message comes through
> starts to become intensive.

Why not use dbhash?  I think there's also a Python cdb wrapper somewhere.



More information about the Python-list mailing list