[Python-ideas] checking for identity before comparing built-in objects

Steven D'Aprano steve at pearwood.info
Thu Oct 4 15:53:50 CEST 2012

On 04/10/12 21:48, Max Moroz wrote:
> It seems that built-in classes do not short-circuit `__eq__` method
> when the objects are identical, at least in CPython:
>      f = frozenset(range(200000000))
>      f1 = f
>      f1 == f # this operation will take about 1 sec on my machine

You shouldn't over-generalize. Some built-ins do short-circuit __eq__
when the objects are identical. I believe that strings and ints both
do. Other types might not.

> Is there any disadvantage to checking whether the equality was called
> with the same object, and if it was, return `True` right away?

That would break floats and Decimals, both of which support NANs.

The decision whether or not to optimize __eq__ should be left up to the
type. Some types, for example, might decide to optimize x == x even if
x contains a NAN or other objects that break reflexivity of equality.
Other types might prefer not to.

(Please do not start an argument about NANs and reflexivity. That's
been argued to death, and there are very good reasons for the IEEE 754
standard to define NANs the way they do.)

Since frozensets containing NANs are rare (I presume), I think it is
reasonable to optimize frozenset equality. But I do not think it is
reasonable for Python to mandate identity checking before __eq__.

> I noticed this when trying to memoize a function that has large
> frozenset arguments. While hashing of a large argument is very fast
> after it's done once (hash value is presumably cached), the equality
> comparison is always slow even against itself. So when the same large
> argument is provided over and over, memoization is slow.

I'm not sure what you are doing here, because dicts (at least in Python
3.2) already short-circuit equality:

py> NAN = float('nan')
py> NAN == NAN
py> d = {NAN: 42}
py> d[NAN]

Actually, that behaviour goes back to at least 2.4, so I'm not sure how
you are doing memoization and not seeing the same optimization.


More information about the Python-ideas mailing list