[Spambayes] Tokenising clues

Matt Sergeant msergeant@startechgroup.co.uk
Tue, 01 Oct 2002 14:36:35 +0100


It seems everyone is slowly stumbling on "tokenising clues" here. A 
"date" header issue here, a "message-id" issue there, and a particular 
way to format body text as another possible clue.

This seems like a vast waste of your time to me. There's a couple of 
projects out there that have already spent vast amounts of time and 
programming effort into figuring out these other clues that spambayes 
misses out on. Rather than repeating that work, why not just rip all the 
rules out of SpamAssassin or some other spam checking project wholesale, 
and stuff those into your database?

Sorry, I don't want to demean any of your work, but we need to work 
together to fight spam, and I'd rather not see so much time wasted on 
individual clues when SpamAssassin already extracts about 800 of them!

Matt.