[Spambayes] Introducing myself

Matt Sergeant msergeant@startechgroup.co.uk
Mon Nov 11 09:49:38 2002


Robert Woodhead said the following on 10/11/02 00:32:

> * My personal bias (as I think Guido mentioned) is for a multifaceted 
> approach, using Bayesian, rules-based (attacking things that bayesian 
> isn't good at, like looking for obfuscated url structures), DNSBL, 
> and whitelisting heuristics to generate an overall ranking.  So a 
> hammy mail from a guy in your address book would bubble up to highest 
> priority, whereas something spammy from him would stay neutral. 
> There's lots of room for cooperation between the various approaches 
> and multiple agents means its less likely that a spam will get by. 
> In particular, whitelisting heuristics can almost eliminate false 
> positives.

That's the approach SpamAssassin now takes, fwiw (including the bayesian 
stuff). All done in 2.50 CVS.

> * Finally, if anyone needs more spam, I get over 300 a day (I've been 
> around a while!) and have a cleaned corpus of over 130MB of spam and 
> foreign email.  Also, given all the legit web-marketing email I get 
> because of the url registration work I've done, I've got tons of the 
> spammiest ham you could imagine.

I'm always looking for more corpuses. Stick the data on an FTP/HTTP 
server somewhere (password protect if you need to). Or contact me 
privately if that's not possible.

Matt.




More information about the Spambayes mailing list