Memoization and encapsulation

Mike Meyer mwm at mired.org
Thu Jan 5 01:59:16 EST 2006


drewp at bigasterisk.com writes:
>> Faster is:
>>class FastHashableList(list):
>>   def __hash__(self):
>>      return id(self)
> But that's wrong.

You're right - I should have fixed equality while I was at it. The
results may not be what you had in mind, but it's a perfectly
reasonable use case.

>> There isn't a straightforward way to make memoised work because -
>> well, it's not a straightforward thing to make work.
> Seems straightforward to me, as long as equality is defined for the
> argument types. A hash that spreads the argument values across a hash
> table is a fine optimization, but I don't think a memoizer needs to
> have optimal performance in that area to be useful. In many of the
> cases I have come across, even the dumbest list search (i.e. what you
> get if all args always hash to 0) will be much faster than actually
> rerunning the original function.
>
>> x = HL([1, 2, 3])       # Or FHL
>> print broken(x)
>> x.append(4)
>> print broken(x)
>>
>> Will your memoised handle this case correctly?
>
> For that one to work my fancy memoizer would have to make a copy of the
> arguments. The comparisons will work fine, though. [1,2,3] !=
> [1,2,3,4].

Right - you need to make a copy. There are three things to consider
here:

1) the copy doesn't have to be a list. You can create a tuple of the
list and use that, and get the same effect without having create a
hashable list type.

2) You need to apply this hack for *every* mutable type that you want
to handle as an argument.

3) There will still be cases where it's broken:

@memoised
def still_broken_after_all_these_years(x):
    return id(x) + 1

x = [1, 2, 3]
y = [1, 2, 3]

assert still_broken_after_all_these_years(x) != still_broken_after_all_these_ears(y)

>> Which is also why lists aren't hashable - there's no good definition
>> of the hash of a list that isn't broken for some common use case.
> Only if broken means 'slow'. My hunch is that lists weren't made
> hashable because people would naively use them heavily as dict keys and
> suffer poorer performance than if they used tuples.

You're only looking at *your* use case. The definition you used isn't
broken for that, just slow. For *other* use cases, it's broken. There
are cases where you want a list to be considered equal to itself as it
changes over time.

> I would still like to see a well-done memoizer that doesn't suffer from
> the unhashable bug and maybe has a way to avoid the mutable object bug
> that you presented in your last post.

Like I said, I don't think it's possible to do it in a way that
"works" for all possible functions. For instance, you could solve the
two bugs you mention in one fell swoop, by declaring that you don't
memoize calls that have mutable arguments, but call the function every
time for them. That's really the only solution that's going to work
for all functions. Except that "hashable" and "immutable" aren't
synonyms; there are hashable builtins that are mutable, so there are
arguments that will be memoized incorrectly.

          <mike
-- 
Mike Meyer <mwm at mired.org>			http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.



More information about the Python-list mailing list