[Spambayes] More on 'Spammer Attempts to Circumvent Bayesian Filter'

Richard B Barger ABC APR Rich at RBarger.com
Mon Jul 19 18:11:39 CEST 2004


Thank you, Toby.

I only look at spam clues occasionally, and do not really understand them well
enough to recognize their effect on scoring a particular message.

As an example:  For me, the word "baseball" in this text only shows up with a 15
percent probability of spam, with the words "plain" and "quality" rating 68
percent and "did" at 77 percent.  I sent the entire text in an email from me to
me (I realize those me-to-me headers will skew the results) and the score comes
out at 0.000161 spam probability.

That said, >some< word or words always will be the highest-probability spam clue,
but the precise corpus obviously differs from person to person.  That's what
makes SpamBayes work so well.

As a writer, editor, avid reader, and participant in 15 discussion groups, I
receive many narratives on many topics, and, except for isolated words and the
nonsense characters the spammer has put at the end of my example, there is
nothing in such text that sounds or looks particularly different from my normal
message stream.

In general, I'd think that such neutral text would tend to lower a message's spam
probability, and the effect of one or a few suspect words would be
insignificant.  Even if the word "baseball" rated at 300 percent likely to be
spam <g>, the rest of the more "normal" words would, it seems to me, offset the
spammy effect of "baseball" or other seldom-seen general words.

But, as Dennis Miller says, "I could be wrong."

Toby, thanks very much for shedding some new light on these spam-filter-avoidance
narratives.  I'm still not convinced, but I certainly appreciate you stirring my
thinking.

Cheers!

Rich Barger
Kansas City

---

Toby Dickenson wrote:

> On Saturday 17 July 2004 03:33, Richard B Barger ABC APR wrote:
>
> > I get the sense that "legitimate-appearing" text isn't easily caught by
> > SpamBayes .... I've appended an example at the
> > end of this message.
>
> That example includes the word 'baseball' in an attempt to appear legitimate to
> those users whose ham email contains that word.
>
> For me that word almost only appears in this type of passage, and it was the
> strongest spam clue in your email. The attempt to appear legitimate will fail
> for all users except the minority who are baseball fans.
>
> --
> Toby Dickenson




More information about the Spambayes mailing list