Graham's spam filter
Erik Max Francis
max at alcyone.com
Thu Aug 22 13:46:17 EDT 2002
Sean 'Shaleh' Perry wrote:
> Since I read that article I created a spam folder and moved all spam
> rather than delete it. I now have 400 or so messages in that folder.
> be a sufficient corpus and it grows daily.
Fortunately, I do have a possible solution: I keep all mail that comes
in through my new rule-based filter, and the filter logs whether or not
each mail was considered spam and why.
max at sade:~/etc% ls -l
-rw-r--r-- 1 max users 190592594 Aug 20 13:13 Mailbox.backup
-rw-r--r-- 1 max users 4201081 Aug 20 13:13 unspamlog
This does mean writing something to process through that data and
separate it into the two corpora, but it isn't too unreasonable. But it
does make it a little strange in that I'd be using one (successful) spam
filter to provide input to another nacent spam filter, at this point of
As a benchmark, Graham's indication was that each of his corpora had
about 4000 messages in it.
Erik Max Francis / max at alcyone.com / http://www.alcyone.com/max/
__ San Jose, CA, US / 37 20 N 121 53 W / ICQ16063900 / &tSftDotIotE
/ \ There is nothing so subject to the inconstancy of fortune as war.
\__/ Miguel de Cervantes
Church / http://www.alcyone.com/pyos/church/
A lambda calculus explorer in Python.
More information about the Python-list