Graham's spam filter (was Lisp to Python translation criticism?)
phr-n2002b at NOSPAMnightsong.com
Sat Aug 17 23:03:29 CEST 2002
Erik Max Francis <max at alcyone.com> writes:
> One obvious and immediate issue is that for an industrial-strength
> filter, the database gets _huge_ (Graham's basic setup involved 4000
> messages each in the spam and nonspam corpora), and reading and writing
> the database (even with cPickle) each time a spam message comes through
> starts to become intensive.
Why not use dbhash? I think there's also a Python cdb wrapper somewhere.
More information about the Python-list