[Python-ideas] incremental hashing in __hash__

Ethan Furman ethan at stoneleaf.us
Wed Dec 28 12:10:59 EST 2016


On 12/27/2016 07:13 PM, jab at math.brown.edu wrote:

> According to the docs [6]:
>
> """
> it is advised to mix together the hash values of the components of the
> object that also play a part in comparison of objects by packing them
> into a tuple and hashing the tuple. Example:
>
> def __hash__(self):
>      return hash((self.name, self.nick, self.color))
>
> """
>
>
> Applying this advice to the use cases above would require creating an
> arbitrarily large tuple in memory before passing it to hash(), which
> is then just thrown away. It would be preferable if there were a way
> to pass multiple values to hash() in a streaming fashion, such that
> the overall hash were computed incrementally, without building up a
> large object in memory first.

Part of the reason for creating __hash__ like above is that:

- it's simple
- it's reliable

However, it's not the only way to have a hash algorithm that works; in fact, the beginning of the sentence you quoted says:

> The only required property is that objects which compare equal have
>  the same hash value;

In other words, objects that do not compare equal can also have the same hash value (although too much of that will reduce the efficiency of Python's containers).

> (ii) priming the overall hash value with some class-specific initial
> value, so that if an instance of a different type of collection, which
> comprised the same items but which compared unequal, were to compute
> its hash value out of the same constituent items, we make sure our
> hash value differs.

This is unnecessary:  hashes are compared first as a way to weed out impossible matches, but when the hashes are the same an actual __eq__ test is still done [7].

--
~Ethan~


[6] https://docs.python.org/3/reference/datamodel.html#object.__hash__
[7] some test code to prove above points:

--- 8< ------------------------------------------------------------
from unittest import main, TestCase

class Eggs(object):
     def __init__(self, value):
         self.value = value
     def __hash__(self):
         return hash(self.value)
     def __eq__(self, other):
         if isinstance(other, self.__class__):
             return self.value == other.value
         return NotImplemented

class Spam(object):
     def __init__(self, value):
         self.value = value
     def __hash__(self):
         return hash(self.value)
     def __eq__(self, other):
         if isinstance(other, self.__class__):
             return self.value == other.value
         return NotImplemented

e10 = Eggs(1)
e20 = Eggs(2)
e11 = Eggs(1)
e21 = Eggs(2)

s10 = Spam(1)
s20 = Spam(2)
s11 = Spam(1)
s21 = Spam(2)

bag = {}
bag[e10] = 1
bag[s10] = 2
bag[e20] = 3
bag[s20] = 4

class TestEqualityAndHashing(TestCase):

     def test_equal(self):
         # same class, same value --> equal
         self.assertEqual(e10, e11)
         self.assertEqual(e20, e21)
         self.assertEqual(s10, s11)
         self.assertEqual(s20, s21)

     def test_not_equal(self):
         # different class, same value --> not equal
         self.assertEqual(e10.value, s10.value)
         self.assertNotEqual(e10, s10)
         self.assertEqual(e20.value, s20.value)
         self.assertNotEqual(e20, s20)

     def test_same_hash(self):
         # same class, same value, same hash
         self.assertEqual(hash(e10), hash(e11))
         self.assertEqual(hash(e20), hash(e21))
         self.assertEqual(hash(s10), hash(s11))
         self.assertEqual(hash(s20), hash(s21))

         # different class, same value, same hash
         self.assertEqual(hash(e10), hash(s10))
         self.assertEqual(hash(e11), hash(s11))
         self.assertEqual(hash(e20), hash(s20))
         self.assertEqual(hash(e21), hash(s21))

     def test_as_key(self):
         # different objects from different classes with same hash should still be distinct
         self.assertEqual(len(bag), 4)
         self.assertEqual(bag[e10], 1)
         self.assertEqual(bag[s10], 2)
         self.assertEqual(bag[e20], 3)
         self.assertEqual(bag[s20], 4)

         # different objects from same classes with same hash should not be distinct
         self.assertEqual(bag[e11], 1)
         self.assertEqual(bag[s11], 2)
         self.assertEqual(bag[e21], 3)
         self.assertEqual(bag[s21], 4)

main()
--- 8< ------------------------------------------------------------


More information about the Python-ideas mailing list