Graham's spam filter (was Lisp to Python translation criticism?)

David LeBlanc whisper at oz.net
Wed Aug 21 04:15:56 CEST 2002


> -----Original Message-----
> From: python-list-admin at python.org
> [mailto:python-list-admin at python.org]On Behalf Of Christopher Browne
> Sent: Tuesday, August 20, 2002 17:15
> To: python-list at python.org
> Subject: Re: Graham's spam filter (was Lisp to Python translation
> criticism?)
>
>
<snip>
> I'd suggest the thought of doing message header associations as
> tokens, so that you might get, out of:
>
>   Subject: Re: Graham's spam filter (was Lisp to Python
> translation criticism?)
>
> the set of tokens:
> subject::re
> subject::graham's
<snip>
> subject::Python
>
> Then do something similar with .signature material:
>
> signature::a
> signature::ago
> signature::been
<snip>

What's the advantage of this?

<snip>

> > One thing I don't see how to do is to add a corpus containing a new
> > message (good or bad) to the database - i.e. update the
> > database. Maybe Database.addGood() and Database.addBad()?
>
> It works a whopping lot better if there's a whopping lot more than
> just two categories...

I agree that a complete mail program should have the ability to sort mail
into many categories and this phase of the operation is not where to do it.
This is a pass/fail filtration step, not a sort step.

> --
> (reverse (concatenate 'string "gro.mca@" "enworbbc"))
> http://www3.sympatico.ca/cbbrowne/unix.html
> Trivialize   a user's bug report  by  pointing out that   it was fixed
> independently long ago in a system that hasn't been released yet.
> -- from the Symbolics Guidelines for Sending Mail


Dave LeBlanc
Seattle, WA USA





More information about the Python-list mailing list