[Spambayes] proposed changes to hammie & co.

Rob Hooft rob@hooft.net
Fri Nov 22 08:58:06 2002


T. Alexander Popiel wrote:
> In message:  <w53y97nxxof.fsf@woozle.org>
>              Neale Pickett <neale@woozle.org> writes:
> 
>>So then, "T. Alexander Popiel" <popiel@wolfskeep.com> is all like:
>>
>>
>>>In message:  <w53d6ozzhyt.fsf@woozle.org>
>>>             Neale Pickett <neale@woozle.org> writes:
>>>
>>>>I'm currently entwined with mucking the heck out of WordInfo.  I've got
>>>>a neato scheme based on Alex's patch and comments where the WordInfo
>>>>classes still compute their own probabilities, but also keep a revision
>>>>number which is compared against a MetaInfo class.
>>>
>>>Eww, do we gotta?  I thought I was trying to make the DB smaller. ;-)
>>
>>Ah, but the only thing *stored* is (spamcount, hamcount).  The
>>probability is calculated the first time you ask for it.  If you don't
>>update nspam or nham, the next time you ask for it it gives the cached
>>value.  So the database is small, but you still get the in-memory
>>probability caching if you're using a pickle or ZODB.
> 
> 
> Sounds like there is no caching benefit for one-message-per-invocation
> situations like running out of procmail, then.  

Is this calculation for the few words in one message really 
time-determining? There is another way of caching: Make a dictionary 
that maps count-tuples to spam probabilities.

  (1,0) -> 0.155
  (0,1) -> 0.844
etc.

I definitely wouldn't move the calculation into the wordinfo class. It 
is a different task, so it "should" (design) be a separate class....

Rob


-- 
Rob W.W. Hooft  ||  rob@hooft.net  ||  http://www.hooft.net/people/rob/




More information about the Spambayes mailing list