__hash__ and ordered vs. unordered collections

Chris Angelico rosuav at gmail.com
Mon Nov 20 13:54:56 EST 2017


On Tue, Nov 21, 2017 at 4:47 AM, Josh B. <jabronson at gmail.com> wrote:
> Now for the question: Is this useful? I ask because this leads to the following behavior:
>
>>>> unordered = MyColl([1, 2, 3])
>>>> ordered = MyOrderedColl([3, 2, 1])
>>>> s = {ordered, unordered}
>>>> len(s)
> 1
>>>> s = {ordered}
>>>> unordered in s
> True
>>>> # etc.
>
> In other words, sets and mappings can't tell unordered and ordered apart; they're treated like the same values.
>
> However, I'm less confident that this kind of behavior is useful for MyColl and MyOrderedColl. Could anyone who feels more certain one way or the other please explain the rationale and possibly even give some real-world examples?
>

This isn't a consequence of __hash__, it's a consequence of __eq__.
You have declared that MyColl and MyOrderedColl are equal, therefore
only one of them stays in the set.

But what you have is the strangeness of non-transitive equality, which
is likely to cause problems.

>>> unordered = MyColl([1, 2, 3])
>>> ordered1 = MyColl([3, 2, 1])
>>> ordered2 = MyColl([2, 1, 3])

unordered is equal to each of the others, but they're not equal to
each other. So if you put them into a set, you'll get results that
depend on order. Here's a simpler form of non-transitive equality:

>>> class Oddity(int):
...     def __eq__(self, other):
...         if other - 5 <= self <= other + 5:
...             return True
...         return int(self) == other
...     def __hash__(self):
...         return 1
...
>>> x, y, z = Oddity(5), Oddity(10), Oddity(15)
>>> x == y, y == z, x == z
(True, True, False)
>>> {x, y, z}
{5, 15}
>>> {y, x, z}
{10}

Setting __hash__ to a constant value is safe (but inefficient); it's
all based on how __eq__ works. So the question is: are you willing to
accept the bizarre behaviour of non-transitive equality?

ChrisA



More information about the Python-list mailing list