[Spambayes]
Re: [Spambayes-checkins] spambayes classifier.py,1.53.2.6,1.53.2.7
Tim Stone - Four Stones Expressions
tim@fourstonesExpressions.com
Fri Nov 22 20:26:54 2002
11/22/2002 1:16:02 PM, "T. Alexander Popiel" <popiel@wolfskeep.com> wrote:
>In message: <w53wun5v3np.fsf@woozle.org>
> Neale Pickett <neale@woozle.org> writes:
>>
>>What do you think of this idea:
>>
>>probcache is kept as a property of Classifier. Make a
>>classifier.probability(self, word) method which looks up that word's
>>(spamcount, hamcount) tuple in probcache. If it's not there, compute it
>>and add it. Whenever Classifier.learn or Classifier.unlearn are called,
>>probcache is blown away.
>>
>>This will effectively cache probabilities on demand, and make sure they
>>are current. No need for a revision anymore.
>>
>>Sound good?
>
>Sounds good to me. If you split the probability computation itself
>into a separate method from the cache management stuff, then it makes
>it easier to subclass to replace just the counts->probability formula.
>From my careful and time consuming examination of the code <wink>, it appeared
to me that meta revision only changed when nham or nspam changed. Therefore,
caching on the ratios rather than nham and nspam allowed the cache to be
pertinent all the time. Nuking a cache is expensive...
As for indexing on an integer vs a float. Both are immutable types, so you're
really indexing on an object reference, not the value. I think python is
smart enough to realize this, and not waste the time hashing on the value in
this instance... correct me if I'm wrong.
- TimS
>
>- Alex
>
>
- Tim
www.fourstonesExpressions.com
More information about the Spambayes
mailing list