[Spambayes] Tokenising clues
Matt Sergeant
msergeant@startechgroup.co.uk
Tue, 01 Oct 2002 14:36:35 +0100
It seems everyone is slowly stumbling on "tokenising clues" here. A
"date" header issue here, a "message-id" issue there, and a particular
way to format body text as another possible clue.
This seems like a vast waste of your time to me. There's a couple of
projects out there that have already spent vast amounts of time and
programming effort into figuring out these other clues that spambayes
misses out on. Rather than repeating that work, why not just rip all the
rules out of SpamAssassin or some other spam checking project wholesale,
and stuff those into your database?
Sorry, I don't want to demean any of your work, but we need to work
together to fight spam, and I'd rather not see so much time wasted on
individual clues when SpamAssassin already extracts about 800 of them!
Matt.