[Spambayes] Getting rid of max_spamprob and min_spamprob
Neil Schemenauer
nas@python.ca
Tue, 17 Sep 2002 21:19:40 -0700
Tim Peters wrote:
> Quick idea: this kind of thing is why a classic Bayesian classifier works
> in log space.
The idea of working with logs had crossed my mind but I didn't pursue
it. I've now changed my code. Much nicer. Thanks for the nudge.
> To avoid massive cancellation at the end, it's probably numerically
> better to do
>
> sum(for i = 1 to n, log(h_i/s_i) + log(S/H))
How about calculating and storing log(h_i/s_i * S/H) with the rest of the
word info?
> It's certainly open to experiment!
I've played with it for about an hour tonight. I can't achieve the
accuracy that the CVS code is getting. Here's the latest result:
total unique false pos 49
total unique false neg 1
average fp % 2.72222222222
average fn % 0.0555555555555
That one false negative is pretty odd.¹ The false positives look like
reasonable mistakes.
Neil
¹ http://arctrix.com/nas/bizarre_fn.txt