Graham's spam filter (was Lisp to Python translation criticism?)
Erik Max Francis
max at alcyone.com
Wed Aug 21 05:11:11 CEST 2002
David LeBlanc wrote:
> > signature::a
> > signature::ago
> > signature::been
> What's the advantage of this?
Presumably he's trying to make a distinction between words that appear
in different places, which seems a reasonable approach (although trying
to divide things based on a the _signature_ is probably not going to be
very useful in spam). I know for my own rules-based approach, it's
significant as to whether certain key words are within the (say) Subject
line or the body, and presumably this would be helpful for a statistical
filtration system as well. It may well be that Graham's approach simply
doesn't need this level of detaill, but it certainly couldn't hurt to
think about when designing a system from scratch.
> I agree that a complete mail program should have the ability to sort
> into many categories and this phase of the operation is not where to
> do it.
> This is a pass/fail filtration step, not a sort step.
Yes, all that's being discussed here is a distinction between spam and
non-spam; any other filtering should be done by rules later on.
Erik Max Francis / max at alcyone.com / http://www.alcyone.com/max/
__ San Jose, CA, US / 37 20 N 121 53 W / ICQ16063900 / &tSftDotIotE
/ \ There is nothing so subject to the inconstancy of fortune as war.
\__/ Miguel de Cervantes
Church / http://www.alcyone.com/pyos/church/
A lambda calculus explorer in Python.
More information about the Python-list