[Tutor] memoize, lookup, or KIS?
Oscar Benjamin
oscar.j.benjamin at gmail.com
Mon Nov 19 13:18:19 CET 2012
On 19 November 2012 11:02, Albert-Jan Roskam <fomcl at yahoo.com> wrote:
> Hi,
>
> I have a function that converts a date value, expressed as the number of
> seconds sinds start of the gregorian calendar, into a human-readable format
> (typically an iso-date). So if a record contains x date values, and a data
> set contains y records, the number of function calls are x * y. Imagine a
> data set with 1M records with dob and enrollment_date in in it: the number
> of function calls is huge (well, 2M).
The number of function calls is one factor that affects whether
memoisation is worthwhile. The more important questions are: How often
do you call the function with the same input values? How expensive is
the function compared with the table lookup?
Memoisation can, for a cheap function, actually slow it down. Before
you do anything like this you need to profile your code and ensure
that the function in question is actually a bottleneck. Otherwise
you're wasting memory and coding time without getting much of a speed
up.
> I was reading about memoize decorators the other day and I realized that
> this function might benefit from memoizing, or a lookup table. On the other
> hand, it might complicate the code too much, so it might be better to Keep
> It Simple (KIS). Is the code below a sound approach? I believe that, in
> effect, it uses a memoization approach (as it is a slowly growing lookup
> table).
This function is using memoisation but why aren't you just using the
memoisation decorators that you read about?
> import datetime
>
> class Test(object):
>
> def __init__(self):
> self.isoDateLookup = {}
> self.lookupCount = 0
>
> def spss2strDate(self, gregorianDate, fmt="%Y-%m-%d",
> recodeSysmisTo=""):
> """ This function converts internal SPSS dates (number of seconds
> since midnight, Oct 14, 1582 (the beginning of the Gregorian
> calendar))
> to a human-readable format """
> MAXLOOKUP = 10**6
> try:
> if not hasattr(self, "gregorianEpoch"):
> self.gregorianEpoch = datetime.datetime(1582, 10, 14, 0, 0,
> 0)
> if fmt == "%Y-%m-%d" and len(self.isoDateLookup) <= MAXLOOKUP:
> try:
> result = self.isoDateLookup[gregorianDate]
> self.lookupCount += 1
> except KeyError:
> theDate = self.gregorianEpoch +
> datetime.timedelta(seconds=gregorianDate)
> result = datetime.datetime.strftime(theDate, fmt)
> self.isoDateLookup[gregorianDate] = result
> return result
> else:
> theDate = self.gregorianEpoch +
> datetime.timedelta(seconds=gregorianDate)
> return datetime.datetime.strftime(theDate, fmt)
> except OverflowError:
> return recodeSysmisTo
> except TypeError:
> return recodeSysmisTo
> except ValueError:
> return recodeSysmisTo
The advantage of using a decorator for this are that you don't need to
complicate the internal code of the function that is memoised and it
is easy to enable and disable memoisation. I would just use a
decorator. You don't need to write one yourself. Since Python 3.2 the
standard library contains a memoisation decorator that you can use:
http://docs.python.org/dev/library/functools.html#functools.lru_cache
Presumably the place where you read about them would have listed some
example decorators that you can use for memoisation. Here's a quick
example that works for hashable inputs:
def memo(func):
table = {}
def wrapper(inputarg):
try:
return table[inputarg]
except KeyError:
table[inputarg] = val = func(inputarg)
return val
return wrapper
@memo
def square(x):
print('Calling square()')
return x ** 2
print('2**2 is %i' % square(2)) # cache miss
print('3**2 is %i' % square(3)) # cache miss
print('2**2 is %i' % square(2)) # cache hit
Output:
$ python tmp.py
Calling square()
2**2 is 4
Calling square()
3**2 is 9
2**2 is 4
Oscar
More information about the Tutor
mailing list