Instances as dictionary key, __hash__ and __eq__

Tim Delaney tim.delaney at aptare.com
Mon Feb 18 22:16:09 CET 2013


On 19 February 2013 06:51, Jean-Michel Pichavant <jeanmichel at sequans.com>wrote:

> Greetings,
>
> I opened something like a month ago a thread about hash functions and how
> I could write classes which instances can be safely used as dictionary keys.
> I though I had it but when I read back my code, I think I wrote yet
> another bug.
>
> Consider the following simple (buggy) class, python 2.5
>
> class FooSet(object):
>     """Define an algorithm set, containing pdcch/pdsch (or none)."""
>     def __init__(self, pdcch, pdsch):
>         self.pdcch = bool(pdcch)
>         self.pdsch = bool(pdsch)
>     # __hash__ and __eq__ allow to use the object as a dictionary key
>     def __hash__(self):
>         return hash((self.pdcch, self.pdsch))
>     def __eq__(self, other):
>         return hash(self) == hash(other)
>
> Can you confirm that using the hash function for testing equality is a
> very bad idea ?
>

Yes - it is a *very* bad idea. A hash by definition can produce collisions,
since you are taking much larger amount of data and are trying to represent
it in a smaller amount of space. It's effectively lossy compression - you
can never reliably get the original back.


> One obvious solution would be:
>
> def __eq__(self, other):
>     return self.pdsch = other.pdsch and self.pdcch == other.pdcch
>

This is a correct and the simplest way to do it.


> But I was looking for a "standard" solution, that I could use for
> basically all my container classes
>
> So I came up with these ones:
>
> def __hash__(self):
>     return hash(tuple(vars(self).values()))
> def __eq__(self, other):
>     return vars(self) == vars(other)
>
> But I'm not sure about vars(self).values(), I don't really care about the
> order of the values, but I need to be sure that for 2 equal dictionaries,
> they will both return their values in the same order.
> And that's the point, I'm not sure at all.
>

You cannot rely on this. Dictionaries are unordered, and the order that
items are added affects the order that the elements will be iterated over.
You could sort the vars by name (thus giving the stable order you need) but
there's another flaw - vars() contains more than just the attributes you
set.

>>> class A():
...     pass
...
>>> vars(A)
mappingproxy({'__qualname__': 'A', '__dict__': <attribute '__dict__' of 'A'
objects>, '__module__':
'__main__', '__weakref__': <attribute '__weakref__' of 'A' objects>,
'__doc__': None})

So by using vars you are preventing instances of subclasses of your class
from comparing equal to each other (or to instances of the base class).

Additionally,  If I'm making things much more complicated than they need to
> be, let me know.
>

You are. There are ways to achieve what you want, but it requires a lot
more setup and discipline. The simplest way is probably to have a
_equal_fields() method that subclasses override, returning a tuple of the
attributes that should be hashed. Then in __hash__() and __eq__ you iterate
over the returned tuple, get the value for each attribute and either hash
or compare.

Of course, you have to take into account in __eq__ that the other instance
may not have the same attributes (e.g. self is a subclass that uses extra
attributes in its __hash__ and __eq__).

Tim Delaney
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20130219/423df150/attachment.html>


More information about the Python-list mailing list