[Spambayes] More on 'Spammer Attempts to CircumventBayesianFilter'
kennypitt at hotmail.com
Mon Jul 19 21:05:44 CEST 2004
Richard B Barger ABC APR wrote:
> One more thought: It would intuitively seem that a longer, flowing
> text narrative from a spammer would be slightly more likely to
> include neutral and ham words than spam words. I won't attempt to do
> math on this, and there are probably lots of theoretical and
> practical reasons why I'm wrong, but my gut tells me that, the longer
> and more coherent the narrative, the more likely it would be to score
> as ham.
You are probably quite correct on this in general, but as usual it depends
on your personal training data. The random gibberish that spammers
sometimes insert to fool hashing filters such as SpamNet has proven
completely ineffective at fooling SpamBayes. Random nonsense isn't going to
appear hammy to anyone. A narrative, on the other hand, will depend on how
similar the topic is to something that you typically discuss in your e-mail.
For me, text taken from a political news story would probably be far less
likely to appear hammy than an excerpt from a computer mag or a sci-fi
novel. For others, it would probably be exactly the opposite.
More information about the Spambayes