[Spambayes] Re: [Spambayes-checkins] spambayes classifier.py,1.53.2.6,1.53.2.7

Neale Pickett neale@woozle.org
Fri Nov 22 18:51:38 2002


So then, "T. Alexander Popiel" <popiel@wolfskeep.com> is all like:

> In message:  <E18FGk5-00006z-00@sc8-pr-cvs1.sourceforge.net>
>              "Tim Stone" <timstone4@users.sourceforge.net> writes:
> >Update of /cvsroot/spambayes/spambayes
> >In directory sc8-pr-cvs1:/tmp/cvs-serv400
> >
> >Modified Files:
> >      Tag: hammie-playground
> >	classifier.py 
> >Log Message:
> >Added probability calculation result caching.  No benchmark available to see
> >how much, if any, performance gain is achieved, but it seems like it could
> >be significant, particularly in training large corpora, or with long running
> >processes.
> 
> You need to nuke the probcache when meta.revision changes. :-)
> 
> Also, wouldn't the cache implemented by this patch be more
> efficient if it indexed by hamcount and spamcount (both
> integers) instead of hamratio and spamratio (both floats)?

I should think so.  What do you think of this idea:

probcache is kept as a property of Classifier.  Make a
classifier.probability(self, word) method which looks up that word's
(spamcount, hamcount) tuple in probcache.  If it's not there, compute it
and add it.  Whenever Classifier.learn or Classifier.unlearn are called,
probcache is blown away.

This will effectively cache probabilities on demand, and make sure they
are current.  No need for a revision anymore.

Sound good?



More information about the Spambayes mailing list