[Spambayes] Re: [Spambayes-checkins] spambayes classifier.py,1.53.2.6,1.53.2.7

Fri Nov 22 20:26:54 2002

11/22/2002 1:16:02 PM, "T. Alexander Popiel" <popiel@wolfskeep.com> wrote:

>In message:  <w53wun5v3np.fsf@woozle.org>
>             Neale Pickett <neale@woozle.org> writes:
>>
>>What do you think of this idea:
>>
>>probcache is kept as a property of Classifier.  Make a
>>classifier.probability(self, word) method which looks up that word's
>>(spamcount, hamcount) tuple in probcache.  If it's not there, compute it
>>and add it.  Whenever Classifier.learn or Classifier.unlearn are called,
>>probcache is blown away.
>>
>>This will effectively cache probabilities on demand, and make sure they
>>are current.  No need for a revision anymore.
>>
>>Sound good?
>
>Sounds good to me.  If you split the probability computation itself
>into a separate method from the cache management stuff, then it makes
>it easier to subclass to replace just the counts->probability formula.

>From my careful and time consuming examination of the code <wink>, it appeared 
to me that meta revision only changed when nham or nspam changed.  Therefore, 
caching on the ratios rather than nham and nspam allowed the cache to be 
pertinent all the time.  Nuking a cache is expensive...

As for indexing on an integer vs a float.  Both are immutable types, so you're 
really indexing on an object reference, not the value.  I think python is 
smart enough to realize this, and not waste the time hashing on the value in 
this instance... correct me if I'm wrong.

- TimS
>
>- Alex
>
>
- Tim
www.fourstonesExpressions.com