[Spambayes] proposed changes to hammie & co.
T. Alexander Popiel
popiel@wolfskeep.com
Fri Nov 22 18:49:58 2002
In message: <3DDDF19E.3000305@hooft.net>
Rob Hooft <rob@hooft.net> writes:
>
>Is this [spamprob] calculation for the few words in one message
>really time-determining?
No, which I went on to admit in the stuff you snipped. ;-)
>There is another way of caching: Make a dictionary
>that maps count-tuples to spam probabilities.
>
> (1,0) -> 0.155
> (0,1) -> 0.844
>etc.
I'm not sure this is better; it would definitely have a
higher cache hit rate, but the lookups are significantly
more expensive (fetch the wordinfo, extract the counts,
then fetch the probability).
Something to measure...
>I definitely wouldn't move the calculation into the wordinfo class. It
>is a different task, so it "should" (design) be a separate class....
I moderately agree, but OOP folks tend to have an aversion
to pure data classes (as I think WordInfo should be). ;-)
- Alex
More information about the Spambayes
mailing list