[Spambayes] RE: chi-combining

Rob W.W. Hooft rob@hooft.net
Tue Nov 19 10:27:22 2002

Tim Peters wrote:
> In an offline thread with Greg Louis (who's working on bogofilter), I tried
> an experiment using just the S, then just the H, components of our spamprob
> calculation.  We currently return (1+S-H)/2.  The "justs" result here just
> returns S, the "justh" just returns 1-H.  justs is a comparative disaster,
> but the more I stare at it, the more I think justh did surprisingly well:

Try your "invisible ham" spam with this. I'm sure it will score 
rock-solid ham. By using "justh" you're basically telling spammers that 
you're not sensitive to spam words, as long as there is enough of the 
message that looks like ham!

The two cases where this makes a difference are

    H=1 S=1 : this is the case I just described: A message that looks
              like both ham and spam would be unsure before, but will now
              result in a Ham score.
    H=0 S=0 : A message that doesn't look like anything seen before used
              to result in an unsure, but will now result in a "Spam"

I suspect that 1-H is easier to counter for the ephemeral "smart 
spammer" than (1+S-H)/2. It is another form of cancellation disease.


Rob W.W. Hooft  ||  rob@hooft.net  ||  http://www.hooft.net/people/rob/

