[Mailman-Developers] Anti-spam "killer app"?

Ron Jarrell jarrell@vt.edu
Fri, 16 Aug 2002 20:30:34 -0400


At 01:10 PM 8/16/02 -0700, you wrote:

>Hey, all. 
>
>Take a look at this --
>
><http://www.paulgraham.com/spam.html>
>
>It's a new technique for identifying spam. The more I look into the details,
>the more I think we have the "anti-spam killer app", becaues it tunes itself
>to the individual (or site), adapts as the anti-spammers adapt, and the
>technique used is fairly easy to implement and damn difficult for a spammer
>to avoid....
>
>Damn, I wish I'd thought of this.
>
>(I've dropped a pointer to it at
>http://www.chuqui.com/cgi-bin/mwf/topic_show.pl?tid=389)


Yea, I read that... It really started wheels turning in my head.  Now, if I'd coded
in lisp more recently than 20 years ago it would have been a little easier, since I hadn't
taken stats any more recently either, and couldn't remember Bayesean analysis to save my
soul...

Of course, it still requires either you accept someone elses starting corpus, or you have
to have someone still tag the spam as it arrives.

But it wouldn't be *that* hard to add to mailman...  Add a button to the admindb page for
"this was spam".  Ideally add a button into the pipermail interface (yea, I know, change
pipermail; ugh feh) that the admin can use that says "This message is spam.  Treat it a so".

Both would make the message go away, and get added to the lists spam corpus correspondingly
deleting it from the lists good corpus stats if it was in the archives (why?  because if it made it that
far then it was considered valid email, and we need to get its keywords *out* of that database).

We'd also archive the entire note for future reference.

At some point, when we had enough data to trust the corpus sample size, the list admin would be
given the option of turning on the spam filter, which could just throw the same results out that the
moderation checks do; discard, hold, pass, etc.

Meanwhile, back at the ranch, the site owner could run a site-level filter if they liked, independant
of the list-level one (hence why we save the messages.  This allows the site admin to review messages
tagged as spam, and agree or disagree on their inclusion, preventing the psycho-moderator syndrome),
which could be used to seed new lists with a better starting corpus...