[Spambayes] proposed changes to hammie & co.

T. Alexander Popiel popiel@wolfskeep.com
Fri Nov 22 18:49:58 2002


In message:  <3DDDF19E.3000305@hooft.net>
             Rob Hooft <rob@hooft.net> writes:
>
>Is this [spamprob] calculation for the few words in one message
>really time-determining?

No, which I went on to admit in the stuff you snipped. ;-)

>There is another way of caching: Make a dictionary 
>that maps count-tuples to spam probabilities.
>
>  (1,0) -> 0.155
>  (0,1) -> 0.844
>etc.

I'm not sure this is better; it would definitely have a
higher cache hit rate, but the lookups are significantly
more expensive (fetch the wordinfo, extract the counts,
then fetch the probability).

Something to measure...

>I definitely wouldn't move the calculation into the wordinfo class. It 
>is a different task, so it "should" (design) be a separate class....

I moderately agree, but OOP folks tend to have an aversion
to pure data classes (as I think WordInfo should be). ;-)

- Alex



More information about the Spambayes mailing list