[Spambayes] proposed changes to hammie & co.
Rob Hooft
rob@hooft.net
Fri Nov 22 08:58:06 2002
T. Alexander Popiel wrote:
> In message: <w53y97nxxof.fsf@woozle.org>
> Neale Pickett <neale@woozle.org> writes:
>
>>So then, "T. Alexander Popiel" <popiel@wolfskeep.com> is all like:
>>
>>
>>>In message: <w53d6ozzhyt.fsf@woozle.org>
>>> Neale Pickett <neale@woozle.org> writes:
>>>
>>>>I'm currently entwined with mucking the heck out of WordInfo. I've got
>>>>a neato scheme based on Alex's patch and comments where the WordInfo
>>>>classes still compute their own probabilities, but also keep a revision
>>>>number which is compared against a MetaInfo class.
>>>
>>>Eww, do we gotta? I thought I was trying to make the DB smaller. ;-)
>>
>>Ah, but the only thing *stored* is (spamcount, hamcount). The
>>probability is calculated the first time you ask for it. If you don't
>>update nspam or nham, the next time you ask for it it gives the cached
>>value. So the database is small, but you still get the in-memory
>>probability caching if you're using a pickle or ZODB.
>
>
> Sounds like there is no caching benefit for one-message-per-invocation
> situations like running out of procmail, then.
Is this calculation for the few words in one message really
time-determining? There is another way of caching: Make a dictionary
that maps count-tuples to spam probabilities.
(1,0) -> 0.155
(0,1) -> 0.844
etc.
I definitely wouldn't move the calculation into the wordinfo class. It
is a different task, so it "should" (design) be a separate class....
Rob
--
Rob W.W. Hooft || rob@hooft.net || http://www.hooft.net/people/rob/
More information about the Spambayes
mailing list