[Spambayes] How low can you go?

Tim Peters tim.one at comcast.net
Sat Dec 13 23:53:14 EST 2003

[Bill Yerazunis]
> I tried that too - for each window stepping, only the most extreme
> probability was used.  Essentially this decorrellated the incoming
> stream so that Bayesian modeling was a little more accurate.

I think details matter a lot here, and I doubt they left your results
predictive of ours.  Two things in particular:

1. We do Bayesian modeling of individual word spamprobs, but there's
   nothing Bayesian about the way we combine spamprobs.  As the graphs on


   show, we had relatively enormous "spectacular failure" rates when
   using Bayesian combining in the very early days, but those dropped
   so low after moving to chi-combining that there are no instances
   of a spectacular failure at all on the third graph.  By
   "spectacular failure" I mean an extremely low-scoring spam or
   extremely high-scoring ham.  Graham's scheme produced tons of
   these (compared to what eventually proved possible).

2. Gary suggested a window-based scoring gimmick, but I didn't
   implement it that way because it was too poor an approximation to
   "strongest over all possible tilings".  Instead it was done like:

    throw all the features into a bag
    while the bag isn't empty:
        pick a feature F with maximal strength among all
            features still in the bag (meaning a feature whose
            spamprob is maximally distant from 0.5, in either
        feed F's spamprob into scoring
        remove every feature from the bag that intersects with
            F in at least one position (in particular, that
            removes F from the bag, and possibly other features too)

> But the results were a statistical failure.
> the error rate on my standard test corpus jumped from 68 (using
> no correction) to 80 using this "tiling" method.

We didn't do enough tests to say anything with confidence; the initial tests
showed better performance than what we do now given the same (small) amount
of training data, but there wasn't enough coverage in the initial tests to
have confidence in the results.  The tests were especially weak because they
were only done on one corpus.

> What _has_ worked better is to use a Markov model instead of a
> Bayesian model; that actually gets me down to 56.
> I haven't tried tiling Markov yet... oh dear... another CPU-day
> down the tubes.  :)

Since Markov models come in 150 flavors of their own, I'll wait until you
write a paper about it <wink>.

More information about the Spambayes mailing list