Graham's spam filter

Erik Max Francis max at alcyone.com
Thu Aug 22 13:46:17 EDT 2002


Sean 'Shaleh' Perry wrote:

> Since I read that article I created a spam folder and moved all spam
> there
> rather than delete it.  I now have 400 or so messages in that folder. 
> Should
> be a sufficient corpus and it grows daily.

Fortunately, I do have a possible solution:  I keep all mail that comes
in through my new rule-based filter, and the filter logs whether or not
each mail was considered spam and why.

max at sade:~/etc% ls -l
total 190424
-rw-r--r--   1 max      users    190592594 Aug 20 13:13 Mailbox.backup
-rw-r--r--   1 max      users     4201081 Aug 20 13:13 unspamlog

This does mean writing something to process through that data and
separate it into the two corpora, but it isn't too unreasonable.  But it
does make it a little strange in that I'd be using one (successful) spam
filter to provide input to another nacent spam filter, at this point of
questionable value.

As a benchmark, Graham's indication was that each of his corpora had
about 4000 messages in it.

-- 
 Erik Max Francis / max at alcyone.com / http://www.alcyone.com/max/
 __ San Jose, CA, US / 37 20 N 121 53 W / ICQ16063900 / &tSftDotIotE
/  \ There is nothing so subject to the inconstancy of fortune as war.
\__/ Miguel de Cervantes
    Church / http://www.alcyone.com/pyos/church/
 A lambda calculus explorer in Python.



More information about the Python-list mailing list