[Spambayes] Re: caching stuff

T. Alexander Popiel popiel@wolfskeep.com
Fri Nov 22 20:50:55 2002


In message:  <B7WSUSLKKEKGJITQ3CBB9ECTPNHA6ZU.3dde92fe@riven>
             <tim@fourstonesExpressions.com> writes:
>
>From my careful and time consuming examination of the code <wink>, it
>appeared to me that meta revision only changed when nham or nspam changed.
>Therefore, caching on the ratios rather than nham and nspam allowed the
>cache to be pertinent all the time.  Nuking a cache is expensive...

Unfortunately, preserving the cache when nham or nspam changes is bad,
because the bayesian adjustment changes, even if the ham and spam
ratios don't.  :-(

Nuking a cache in toto is a lot less expensive than individually
invalidating or updating records (which was update_probabilities
downfall).  Either is a lot less expensive than giving the wrong
answer.

>As for indexing on an integer vs a float.  Both are immutable types, so
>you're really indexing on an object reference, not the value.

Eh, I don't think so... but I don't know enough python internals to
be sure.  (Sure, they are immutable types, but I strongly doubt that
they're hashed as objects; that would imply that all references to
a float value 3.0 were references to the same object... which means
some sort of search for the 3.0 object when you added 2.5 and 0.5...
which would be a severe performance lose.  It seems far more likely
that they're hashed by value instead (even if that value is currently
boxed in an object).)

Does anyone with more python mojo have a definitive answer?  Guido?

- Alex



More information about the Spambayes mailing list