[Tutor] memoize, lookup, or KIS?

Albert-Jan Roskam fomcl at yahoo.com
Mon Nov 19 18:38:22 CET 2012



> Emphasis on "might". Unless you have timed the code with or without a 
> lookup
> table, you're just guessing whether it is an optimization or a 
> pessimization.
> 

See my earlier reply to Oscar's mail. I used cProfile and memoizing was almost twice as fast in the fastsest implementation.

> On the other hand, my intuition is that it *will* benefit from memoization,
> so my guess is the same as your guess :)
> 
> 
>>  On the other hand, it might complicate the code too much, so it might
>>  be better to Keep It Simple (KIS). Is the code below a sound approach?
> 
> 
> No. You should separate the cache code from the non-cache code. Decorators
> are ideal for that, but even without decorators you should split the code.
> See below.

Yes, it feels a lot less mentally straining to read the code where memoizing and the actual job are separated.

> Also, you should have some way to stop the lookup table from growing forever.
> If you are running Python 3.3, you can use functools.lru_cache, which
> implements a Least Recently Used cache. Once the cache reaches a certain size,
> the element which was least recently used is flushed.
> 

I read about that. Time to upgrade! Wish I could use this in the office!

<snip>
 
> class Test(object):
> 
>     def __init__(self):
>         self._cache = {}
> 
>     def clear_cache(self):
>         self._cache.clear()
> 
>     def spss2strDate(self, gregorianDate, fmt="%Y-%m-%d", 
> recodeSysmisTo=""):
>         # cached wrapper
>         t = (gregorianDate, fmt, recodeSysmisTo)
>         if t in self._cache:
>             # cache hit
>             return self._cache[t]
>         # otherwise cache miss
>         result = self._calculate_spss2strDate(*t)  # this does the real work
>         if len(self._cache) > 1000:
>             self._cache.popitem()  # discard an arbitrary (key,value) pair
>         self._cache[t] = result
>         return result
> 
>     def _calculate_spss2strDate(self, gregorianDate, fmt, recodeSysmisTo):
>         # uncached version
>         return some_calculation(...)
> 
> 
> You can fill in the details for the uncached version :)
> 
> One good improvement would be to add instrumentation[1] to the cache so you can
> tell whether or not the cache is worthwhile. E.g. record how many times each
> set of arguments are used. If you find that hardly any triple of arguments is
> used multiple times (i.e. nearly every call is unique), then the cache is a
> waste of time.

It would be cool if the memoizer "turned off" itself off if hardly items are retrieved from it (ie, the calls are almost always unique).
stop_memoizing = (n_cached_version / float(n_newly_calculated_version)) < some_threshold and len(cache) > some_value


More information about the Tutor mailing list