[Spambayes] from a new member

Four Stones Forum tim@fourstonesforum.com
Sun Oct 20 15:15:43 2002


I've recently become aware of the Spambayes project, and I'm quite interested, so I subscribed to the mailing, and I've been reading for a 
while, trying to get my head around the solution you're working on.  I think I (kinda) have the idea now, and I figured I'd post to introduce 
myself and to ask a few questions.

I'm Tim Stone, I've never worked on an open source project before, though I've used lots of open source stuff, I've been in the IT industry 
since 1975 (which makes me a geezer), I know 30-odd languages (Python isn't one of them), I've worked on all kinds of stuff under lots of 
different architectures...  so enough about me.

First of all: I HATE SPAM.  It is an insidious evil, and I'm glad to see some truely progressive thinking about how to deal with it, not only 
deal with the mail, but deal with the PROBLEM.

Second of all: I run a website (www.fourstonesExpressions.com) that has a mailing list (I say these words at risk of having this mail 
rejected by simplistic filters) that I feel to be completely legit.  It's a completely voluntary opt-in, there are no checkboxes with 'Yes' 
defaults, etc. etc.  I don't sell the list or give it away.  I only send mailings occasionally, perhaps 3 or 4 times a year.  I've only ever had one 
opt-out.  I think that speaks well of how the list is run.  I say all that to say this: I take great pains in my mailings to ensure that things like 
spam-assassin don't label my mailings.  Spam-assassin is very popular, and it does some great things.  It also documents its reasonings in 
incoming mail's headers, so you can see how it arrived at its conclusion about your mail.  This allows me to optimize my mailings by simply 
sending one to myself and seeing how SA rates it, and then fixing the problems.  It shocks me that all spammers don't do this, but I'm 
certainly glad that they don't, because that allows SA to work for me.  However, as our ability to block spam becomes better and better, I 
think they'll be forced to use this stragegy more and more.  As someone who sends mailings that *could* be thought of as spam, these are 
the things that I'm sure spammers will think about.  How do you defeat Spambayes?  Well, if I'm a spammer, I get me a copy and train it on 
a vast number of spams that are like mine, then I start tweaking....

As such, I think that Spambayes will work BEST in conjunction with other technologies.  One of the best ideas I've in the discussions thus 
far is to keep a PUBLIC list of urls that spammers actively promote.  This should probably be done at the domain level.  The keeper of this 
list could very well use a crawler and a Bayesian approach to rating the website itself, which is a double safety net.  Otherwise a spammer 
could include urls that are not related to the spam, and do (at least public relations) damage to other sites.  Using this in conjunction with 
Spambayes actually defeats several other simple (temporary) workarounds that spammers could employ, s u c h  a s  i n c l u d i n g  s p a c 
e s  b e t w e e n  l e t t e r s, which is quite human readable, but breaks the document down into a large number of single character words, 
or sending spam as a single jpg.

Well, that's enough for now.  Is anybody working on the Bayesian crawler idea?

- Tim