Best way to hash a dictionary

Alex Martelli aleax at aleax.it
Mon Mar 3 13:34:21 EST 2003


Mike Meyer wrote:

> Alex Martelli <aleax at aleax.it> writes:
>> The obvious idea would be something like:
>> 
>> class HashableDict(dict):
>>     def __init__(self, *args, **kwds):
>>         dict.__init__(self, *args, **kwds)
>>         my_hash = 0
>>         for key in self: my_hash ^= hash(key)
>>         self.my_hash = my_hash
>>     def __hash__(self):
>>         return self.my_hash
> 
> The idea that struc me as obvious - and survives in the face of
> mutating the dictionary after creating it - was:
> 
> class HashableDict(dict):
>     def __hash__(self): return id(self)
> 
> Which guarantees unique hash keys. The downside is that two identical
> dictionaries won't have the same value.

Unfortunately, this breaks the constraint:
    a==b   implies   hash(a)==hash(b)
which is what makes it non-obvious to me.  My suggested solution
requires that a HashableDict be never changed after __init__, but
that's what the OP sort-of-implied (to my understanding, at least)
when he mentioned "immutable hashes" -- that he has no need to
mutate instances of HashableDict.  For debugging purposes, one
might insert a conditional check, e.g.:

class HashableDict(dict):
   def __init__(self, *args, **kwds):
       dict.__init__(self, *args, **kwds)
       self.my_hash = self.makehash()
   def makehash(self):
       my_hash = 0
       for key in self: my_hash ^= hash(key)
       self.my_hash = my_hash
   def __hash__(self):
       assert self.my_hash == self.makehash()
       return self.my_hash

When this is run *without* -O, i.e., for development and debug,
the assert statement ensures (well, sort of!-) that the instance
has not erroneously been modified; when run with -O, i.e. in
production mode, the assert statement generates no code, thus,
while there are no checks, no performance price at all is paid.


Alex





More information about the Python-list mailing list