[Spambayes] Getting rid of max_spamprob and min_spamprob

Neil Schemenauer nas@python.ca
Tue, 17 Sep 2002 21:19:40 -0700


Tim Peters wrote:
> Quick idea:  this kind of thing is why a classic Bayesian classifier works
> in log space.

The idea of working with logs had crossed my mind but I didn't pursue
it.  I've now changed my code.  Much nicer.  Thanks for the nudge.

> To avoid massive cancellation at the end, it's probably numerically
> better to do
> 
>     sum(for i = 1 to n, log(h_i/s_i) + log(S/H))


How about calculating and storing log(h_i/s_i * S/H) with the rest of the
word info?

> It's certainly open to experiment!

I've played with it for about an hour tonight.  I can't achieve the
accuracy that the CVS code is getting.  Here's the latest result:

    total unique false pos 49
    total unique false neg 1
    average fp % 2.72222222222
    average fn % 0.0555555555555

That one false negative is pretty odd.¹  The false positives look like
reasonable mistakes.

  Neil

¹ http://arctrix.com/nas/bizarre_fn.txt