[Spambayes] Images of commercial text with decoy text are mushing my index
skip at pobox.com
skip at pobox.com
Mon Jan 1 16:00:15 CET 2007
Jamie> since the decoy text is completely non-commercial in nature, it
Jamie> seems to be polluting my index and making detection less
Jamie> accurate. With OCR, will this continue to be an issue?
Sure, if the decoy text actually turns out to be relevant from a scoring
standpoint. By default the SpamBayes classifier only considers tokens
(words) which score <= 0.4 or >= 0.6. My guess is that most of the words in
the decoy text are clustered around 0.5 so aren't even considered.
Skip
More information about the SpamBayes
mailing list