Best way to hash a dictionary
Alex Martelli
aleax at aleax.it
Mon Mar 3 13:34:21 EST 2003
Mike Meyer wrote:
> Alex Martelli <aleax at aleax.it> writes:
>> The obvious idea would be something like:
>>
>> class HashableDict(dict):
>> def __init__(self, *args, **kwds):
>> dict.__init__(self, *args, **kwds)
>> my_hash = 0
>> for key in self: my_hash ^= hash(key)
>> self.my_hash = my_hash
>> def __hash__(self):
>> return self.my_hash
>
> The idea that struc me as obvious - and survives in the face of
> mutating the dictionary after creating it - was:
>
> class HashableDict(dict):
> def __hash__(self): return id(self)
>
> Which guarantees unique hash keys. The downside is that two identical
> dictionaries won't have the same value.
Unfortunately, this breaks the constraint:
a==b implies hash(a)==hash(b)
which is what makes it non-obvious to me. My suggested solution
requires that a HashableDict be never changed after __init__, but
that's what the OP sort-of-implied (to my understanding, at least)
when he mentioned "immutable hashes" -- that he has no need to
mutate instances of HashableDict. For debugging purposes, one
might insert a conditional check, e.g.:
class HashableDict(dict):
def __init__(self, *args, **kwds):
dict.__init__(self, *args, **kwds)
self.my_hash = self.makehash()
def makehash(self):
my_hash = 0
for key in self: my_hash ^= hash(key)
self.my_hash = my_hash
def __hash__(self):
assert self.my_hash == self.makehash()
return self.my_hash
When this is run *without* -O, i.e., for development and debug,
the assert statement ensures (well, sort of!-) that the instance
has not erroneously been modified; when run with -O, i.e. in
production mode, the assert statement generates no code, thus,
while there are no checks, no performance price at all is paid.
Alex
More information about the Python-list
mailing list