[Tutor] hash()ing a list

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Mon Mar 28 21:33:05 CEST 2005



On Sun, 27 Mar 2005, Orri Ganel wrote:

> So, any class that has 'rich comparison methods' defined is unhashable?
> What gives? (See '[Tutor] unhashable objects')


Hi Orri,

If we change what it means for two objects to be equal, hashing won't work
in a nice way until we also make hashing take equality into account.


Here's an example, using a fairly useless Pair class:

######
>>> class Pair(object):
...     def __init__(self, x, y):
...         self.x, self.y = x, y
...     def __repr__(self):
...         return 'Pair(%s, %s)' % (self.x, self.y)
...     def __eq__(self, other):
...         return (self.x, self.y) == (other.x, other.y)
...
>>> p1 = Pair(3, 4)
>>> p1
Pair(3, 4)
>>> p2 = Pair(3, 4)
>>> p2
Pair(3, 4)
>>> p1 == p2
True
######

(I know, this class is really useless since we already have tuples, but
bear with me.  *grin*)


Say that we might want to use pairs as keys in a dictionary.  Naively, we
just do it:

######
>>> d = {}
>>> d[Pair(17, 29)] = 'saturn'
######

And things look ok so far:

###
>>> d
{Pair(17, 29): 'saturn'}
###



But if we go back and try to retrieve 'saturn' through our key, we're in
for a surprise:

######
>>> d[Pair(17, 29)]
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
KeyError: Pair(17, 29)
######


The reason is because different Pair instances, by default, hash to
different values:

######
>>> hash(Pair(17, 29))
1076767244
>>> hash(Pair(17, 29))
1076767116
######

The default hash code of an object is its memory address: that's why the
numbers are different here.


So to make this work, to allow Pairs to be used reliably as keys, we also
have to write an appropriate __hash__() function that overrides the
default:

######
    def __hash__(self):
        return hash((self.x, self.y))
######


And the other direction is true: if we define a __hash__() function, we
have to write a comparison function for equality, or else the hash scheme
also breaks:

######
>>> class Pair2:
...     def __init__(self, x, y): self.x, self.y = x, y
...     def __repr__(self): return "Pair2(%s, %s)" % (self.x, self.y)
...     def __hash__(self): return hash((self.x, self.y))
...
>>> d = {}
>>> d[Pair2(4, 9)] = "lunchtime"
>>> d
{Pair2(4, 9): 'lunchtime'}
>>> d[Pair2(4, 9)]
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
KeyError: Pair2(4, 9)
######

The reason hashing breaks here depends strongly on what hashing really is
doing.  We can talk about this more if you'd like.


In general, if we have two objects, and if they are equal to each other,
they should also "hash" to the same values.  Otherwise, they're fairly
useless as dictionary keys.

(But of course, if we never intend to use an object as a key, then we
probably don't need to care about __hash__().  *grin*)


I know I'm rushing this, so please feel free to ask more questions about
this.



More information about the Tutor mailing list