[spambayes-dev] Mozilla SpamBayes "porting"

Miguel Vargas miguel at vargas.com
Sun Feb 22 22:46:10 EST 2004


Tim Peters wrote:
> which is an excellent match to what you reported.  You can verify that the

Great.  I just confirmed that when I fixed my off-by-one error I got the 
correct value (0.822...).

This points to a problem in the section where I calculate the 
probability per token.  So then I noticed the 2 assertions from the 
probability function that I left out from my code

         assert hamcount <= nham
         assert spamcount <= nspam

That is when I realized that we are counting the tokens differently.  It 
looks like SpamBayes only counts a token once per message no matter how 
many times it appears.  Mozilla counts every instance of a token, so 
hamcount can easily be greater than nham, that is eveident in the email 
I sent before

 >> ngood = 861, nbad = 759
...
 >> token 5: hamcount = 5802 spamcount = 4680

I'm off to patch Mozilla...



More information about the spambayes-dev mailing list