Lisp to Python translation criticism?
John E. Barham
jbarham at jbarham.com
Sat Aug 17 06:23:45 CEST 2002
Erik Max Francis wrote:
> ... I already have a Python-based spam filter (which uses qmail)...
I'm using qmail myself. Its "every email is a file" maildir format shines
here since you don't have to mess around parsing mbox files.
> The real "difficulty" (if you could call it that) with this scheme is
> not so much the implementation, but rather with the collection and
> maintenance of the sample set of spam and non-spam emails. This
> requires digging through emails, and/or altering the way you process
> email as a user, and how you give that processing feedback back to the
> spam filtering system. The Python implementation itself, based on those
> samples, seems pretty straightforward.
I'm working on launching a commercial email/Web hosting service, so
obviously spam filtering would be a useful feature. Initially I'd start w/
a pool of data tokens taken from my legitimate email and spam, and whatever
my friends have been collecting. Although over time, depending on my
clients' email usage patterns, they would develop a custom ruleset which
would have a stronger weighting than the generic one.
Paul Graham also suggests having "Delete" and "Delete as Spam" commands to
help in classification.
> In either case I'll probably put something together as a sample and see
> what it thinks about spam as I get it.
Nice of the spammers to be giving us so much data to work with!
More information about the Python-list