[Tutor] memoize, lookup, or KIS?
Albert-Jan Roskam
fomcl at yahoo.com
Mon Nov 19 18:38:22 CET 2012
> Emphasis on "might". Unless you have timed the code with or without a
> lookup
> table, you're just guessing whether it is an optimization or a
> pessimization.
>
See my earlier reply to Oscar's mail. I used cProfile and memoizing was almost twice as fast in the fastsest implementation.
> On the other hand, my intuition is that it *will* benefit from memoization,
> so my guess is the same as your guess :)
>
>
>> On the other hand, it might complicate the code too much, so it might
>> be better to Keep It Simple (KIS). Is the code below a sound approach?
>
>
> No. You should separate the cache code from the non-cache code. Decorators
> are ideal for that, but even without decorators you should split the code.
> See below.
Yes, it feels a lot less mentally straining to read the code where memoizing and the actual job are separated.
> Also, you should have some way to stop the lookup table from growing forever.
> If you are running Python 3.3, you can use functools.lru_cache, which
> implements a Least Recently Used cache. Once the cache reaches a certain size,
> the element which was least recently used is flushed.
>
I read about that. Time to upgrade! Wish I could use this in the office!
<snip>
> class Test(object):
>
> def __init__(self):
> self._cache = {}
>
> def clear_cache(self):
> self._cache.clear()
>
> def spss2strDate(self, gregorianDate, fmt="%Y-%m-%d",
> recodeSysmisTo=""):
> # cached wrapper
> t = (gregorianDate, fmt, recodeSysmisTo)
> if t in self._cache:
> # cache hit
> return self._cache[t]
> # otherwise cache miss
> result = self._calculate_spss2strDate(*t) # this does the real work
> if len(self._cache) > 1000:
> self._cache.popitem() # discard an arbitrary (key,value) pair
> self._cache[t] = result
> return result
>
> def _calculate_spss2strDate(self, gregorianDate, fmt, recodeSysmisTo):
> # uncached version
> return some_calculation(...)
>
>
> You can fill in the details for the uncached version :)
>
> One good improvement would be to add instrumentation[1] to the cache so you can
> tell whether or not the cache is worthwhile. E.g. record how many times each
> set of arguments are used. If you find that hardly any triple of arguments is
> used multiple times (i.e. nearly every call is unique), then the cache is a
> waste of time.
It would be cool if the memoizer "turned off" itself off if hardly items are retrieved from it (ie, the calls are almost always unique).
stop_memoizing = (n_cached_version / float(n_newly_calculated_version)) < some_threshold and len(cache) > some_value
More information about the Tutor
mailing list