[Python-ideas] incremental hashing in __hash__
Ethan Furman
ethan at stoneleaf.us
Wed Dec 28 12:10:59 EST 2016
On 12/27/2016 07:13 PM, jab at math.brown.edu wrote:
> According to the docs [6]:
>
> """
> it is advised to mix together the hash values of the components of the
> object that also play a part in comparison of objects by packing them
> into a tuple and hashing the tuple. Example:
>
> def __hash__(self):
> return hash((self.name, self.nick, self.color))
>
> """
>
>
> Applying this advice to the use cases above would require creating an
> arbitrarily large tuple in memory before passing it to hash(), which
> is then just thrown away. It would be preferable if there were a way
> to pass multiple values to hash() in a streaming fashion, such that
> the overall hash were computed incrementally, without building up a
> large object in memory first.
Part of the reason for creating __hash__ like above is that:
- it's simple
- it's reliable
However, it's not the only way to have a hash algorithm that works; in fact, the beginning of the sentence you quoted says:
> The only required property is that objects which compare equal have
> the same hash value;
In other words, objects that do not compare equal can also have the same hash value (although too much of that will reduce the efficiency of Python's containers).
> (ii) priming the overall hash value with some class-specific initial
> value, so that if an instance of a different type of collection, which
> comprised the same items but which compared unequal, were to compute
> its hash value out of the same constituent items, we make sure our
> hash value differs.
This is unnecessary: hashes are compared first as a way to weed out impossible matches, but when the hashes are the same an actual __eq__ test is still done [7].
--
~Ethan~
[6] https://docs.python.org/3/reference/datamodel.html#object.__hash__
[7] some test code to prove above points:
--- 8< ------------------------------------------------------------
from unittest import main, TestCase
class Eggs(object):
def __init__(self, value):
self.value = value
def __hash__(self):
return hash(self.value)
def __eq__(self, other):
if isinstance(other, self.__class__):
return self.value == other.value
return NotImplemented
class Spam(object):
def __init__(self, value):
self.value = value
def __hash__(self):
return hash(self.value)
def __eq__(self, other):
if isinstance(other, self.__class__):
return self.value == other.value
return NotImplemented
e10 = Eggs(1)
e20 = Eggs(2)
e11 = Eggs(1)
e21 = Eggs(2)
s10 = Spam(1)
s20 = Spam(2)
s11 = Spam(1)
s21 = Spam(2)
bag = {}
bag[e10] = 1
bag[s10] = 2
bag[e20] = 3
bag[s20] = 4
class TestEqualityAndHashing(TestCase):
def test_equal(self):
# same class, same value --> equal
self.assertEqual(e10, e11)
self.assertEqual(e20, e21)
self.assertEqual(s10, s11)
self.assertEqual(s20, s21)
def test_not_equal(self):
# different class, same value --> not equal
self.assertEqual(e10.value, s10.value)
self.assertNotEqual(e10, s10)
self.assertEqual(e20.value, s20.value)
self.assertNotEqual(e20, s20)
def test_same_hash(self):
# same class, same value, same hash
self.assertEqual(hash(e10), hash(e11))
self.assertEqual(hash(e20), hash(e21))
self.assertEqual(hash(s10), hash(s11))
self.assertEqual(hash(s20), hash(s21))
# different class, same value, same hash
self.assertEqual(hash(e10), hash(s10))
self.assertEqual(hash(e11), hash(s11))
self.assertEqual(hash(e20), hash(s20))
self.assertEqual(hash(e21), hash(s21))
def test_as_key(self):
# different objects from different classes with same hash should still be distinct
self.assertEqual(len(bag), 4)
self.assertEqual(bag[e10], 1)
self.assertEqual(bag[s10], 2)
self.assertEqual(bag[e20], 3)
self.assertEqual(bag[s20], 4)
# different objects from same classes with same hash should not be distinct
self.assertEqual(bag[e11], 1)
self.assertEqual(bag[s11], 2)
self.assertEqual(bag[e21], 3)
self.assertEqual(bag[s21], 4)
main()
--- 8< ------------------------------------------------------------
More information about the Python-ideas
mailing list