[Python-ideas] checking for identity before comparing built-in objects

Thu Oct 4 18:51:23 CEST 2012

On 10/04/2012 03:53 PM, Steven D'Aprano wrote:
> On 04/10/12 21:48, Max Moroz wrote:
>> It seems that built-in classes do not short-circuit `__eq__` method
>> when the objects are identical, at least in CPython:
>>
>>      f = frozenset(range(200000000))
>>      f1 = f
>>      f1 == f # this operation will take about 1 sec on my machine
>
> You shouldn't over-generalize. Some built-ins do short-circuit __eq__
> when the objects are identical. I believe that strings and ints both
> do. Other types might not.
>
>
>> Is there any disadvantage to checking whether the equality was called
>> with the same object, and if it was, return `True` right away?
>
> That would break floats and Decimals, both of which support NANs.
>
> The decision whether or not to optimize __eq__ should be left up to the
> type. Some types, for example, might decide to optimize x == x even if
> x contains a NAN or other objects that break reflexivity of equality.
> Other types might prefer not to.
>
> (Please do not start an argument about NANs and reflexivity. That's
> been argued to death, and there are very good reasons for the IEEE 754
> standard to define NANs the way they do.)
>
> Since frozensets containing NANs are rare (I presume), I think it is
> reasonable to optimize frozenset equality. But I do not think it is
> reasonable for Python to mandate identity checking before __eq__.
>

But it seems like set and frozenset behave like this anyway (using "is" to compare it's items):

 >>> frozenset([float("nan")]) == frozenset([float("nan")])
False

 >>> s = frozenset([float("nan")])
 >>> s == s
True

 >>> NaN = float("nan")
 >>> NaN == NaN
False
 >>> frozenset([NaN]) == frozenset([NaN])
True

So the "is" optimization should not change it's semantics.

(I tested this in Python 2.7.3 and 3.2.3)

>
>
>> I noticed this when trying to memoize a function that has large
>> frozenset arguments. While hashing of a large argument is very fast
>> after it's done once (hash value is presumably cached), the equality
>> comparison is always slow even against itself. So when the same large
>> argument is provided over and over, memoization is slow.
>
> I'm not sure what you are doing here, because dicts (at least in Python
> 3.2) already short-circuit equality:
>
> py> NAN = float('nan')
> py> NAN == NAN
> False
> py> d = {NAN: 42}
> py> d[NAN]
> 42
>
> Actually, that behaviour goes back to at least 2.4, so I'm not sure how
> you are doing memoization and not seeing the same optimization.
>
>
>