[Tutor] memoize, lookup, or KIS?

Oscar Benjamin oscar.j.benjamin at gmail.com
Mon Nov 19 13:18:19 CET 2012


On 19 November 2012 11:02, Albert-Jan Roskam <fomcl at yahoo.com> wrote:
> Hi,
>
> I have a function that converts a date value, expressed as the number of
> seconds sinds start of the gregorian calendar, into a human-readable format
> (typically an iso-date). So if a record contains x date values, and a data
> set contains y records, the number of function calls are x * y. Imagine a
> data set with 1M records with dob and enrollment_date in in it: the number
> of function calls is huge (well, 2M).

The number of function calls is one factor that affects whether
memoisation is worthwhile. The more important questions are: How often
do you call the function with the same input values? How expensive is
the function compared with the table lookup?

Memoisation can, for a cheap function, actually slow it down. Before
you do anything like this you need to profile your code and ensure
that the function in question is actually a bottleneck. Otherwise
you're wasting memory and coding time without getting much of a speed
up.

> I was reading about memoize decorators the other day and I realized that
> this function might benefit from memoizing, or a lookup table. On the other
> hand, it might complicate the code too much, so it might be better to Keep
> It Simple (KIS). Is the code below a sound approach? I believe that, in
> effect, it uses a memoization approach (as it is a slowly growing lookup
> table).

This function is using memoisation but why aren't you just using the
memoisation decorators that you read about?

> import datetime
>
> class Test(object):
>
>     def __init__(self):
>         self.isoDateLookup = {}
>         self.lookupCount = 0
>
>     def spss2strDate(self, gregorianDate, fmt="%Y-%m-%d",
> recodeSysmisTo=""):
>         """ This function converts internal SPSS dates (number of seconds
>         since midnight, Oct 14, 1582 (the beginning of the Gregorian
> calendar))
>         to a human-readable format """
>         MAXLOOKUP = 10**6
>         try:
>             if not hasattr(self, "gregorianEpoch"):
>                 self.gregorianEpoch = datetime.datetime(1582, 10, 14, 0, 0,
> 0)
>             if fmt == "%Y-%m-%d" and len(self.isoDateLookup) <= MAXLOOKUP:
>                 try:
>                     result = self.isoDateLookup[gregorianDate]
>                     self.lookupCount += 1
>                 except KeyError:
>                     theDate = self.gregorianEpoch +
> datetime.timedelta(seconds=gregorianDate)
>                     result = datetime.datetime.strftime(theDate, fmt)
>                     self.isoDateLookup[gregorianDate] = result
>                 return result
>             else:
>                 theDate = self.gregorianEpoch +
> datetime.timedelta(seconds=gregorianDate)
>                 return datetime.datetime.strftime(theDate, fmt)
>         except OverflowError:
>             return recodeSysmisTo
>         except TypeError:
>             return recodeSysmisTo
>         except ValueError:
>             return recodeSysmisTo

The advantage of using a decorator for this are that you don't need to
complicate the internal code of the function that is memoised and it
is easy to enable and disable memoisation. I would just use a
decorator. You don't need to write one yourself. Since Python 3.2 the
standard library contains a memoisation decorator that you can use:

http://docs.python.org/dev/library/functools.html#functools.lru_cache

Presumably the place where you read about them would have listed some
example decorators that you can use for memoisation. Here's a quick
example that works for hashable inputs:

def memo(func):
    table = {}
    def wrapper(inputarg):
        try:
            return table[inputarg]
        except KeyError:
            table[inputarg] = val = func(inputarg)
            return val
    return wrapper

@memo
def square(x):
    print('Calling square()')
    return x ** 2

print('2**2 is %i' % square(2)) # cache miss
print('3**2 is %i' % square(3)) # cache miss
print('2**2 is %i' % square(2)) # cache hit

Output:
$ python tmp.py
Calling square()
2**2 is 4
Calling square()
3**2 is 9
2**2 is 4


Oscar


More information about the Tutor mailing list