[spambayes-dev] Mozilla SpamBayes "porting"
Miguel Vargas
miguel at vargas.com
Sun Feb 22 22:46:10 EST 2004
Tim Peters wrote:
> which is an excellent match to what you reported. You can verify that the
Great. I just confirmed that when I fixed my off-by-one error I got the
correct value (0.822...).
This points to a problem in the section where I calculate the
probability per token. So then I noticed the 2 assertions from the
probability function that I left out from my code
assert hamcount <= nham
assert spamcount <= nspam
That is when I realized that we are counting the tokens differently. It
looks like SpamBayes only counts a token once per message no matter how
many times it appears. Mozilla counts every instance of a token, so
hamcount can easily be greater than nham, that is eveident in the email
I sent before
>> ngood = 861, nbad = 759
...
>> token 5: hamcount = 5802 spamcount = 4680
I'm off to patch Mozilla...
More information about the spambayes-dev
mailing list