[Python-ideas] checking for identity before comparing built-in objects

Tue Oct 9 07:32:12 CEST 2012

On Tue, Oct 9, 2012 at 12:16 AM, Steven D'Aprano <steve at pearwood.info> wrote:
> NANs don't quite mean "unknown result". If they did they would probably
> be called "MISSING" or "UNKNOWN" or "NA" (Not Available).
>
> NANs represent a calculation result which is Not A Number. Hence the
> name :-)

This is quite true, but in Python "Not A Number" is spelled None.  In
many aspects, None is like signaling NaN - any numerical operation on
it results in a type error, but None == None is True.

..
> Since neither sqrt(-1) nor sqrt(-2) exist in the reals, we cannot say
> that they are equal. If we did, we could prove anything:
>
> sqrt(-1) = sqrt(-2)
>
> Square both sides:
>
> -1 = -2

This is a typical mathematical fallacy where a progression of
seemingly equivalent equations contains an invalid operation.  See
http://en.wikipedia.org/wiki/Mathematical_fallacy#All_numbers_equal_all_other_numbers

This is not an argument to make nan == nan false.  The IEEE 754
argument goes as follows: in the domain of 2**64 bit patterns most
patterns represent real numbers, some represent infinities and some do
not represent either infinities or numbers.  Boolean comparison
operations are defined on the entire domain,  but <, =, or > outcomes
are not exclusive if NaNs are present.  The forth outcome is
"unordered."  In other words for any two patterns x and y one and only
one of the following is true: x < y or x = y or x > y or x and y are
unordered.  If x is NaN, it compares as unordered to any other pattern
including itself.   This explains why compareQuietEqual(x, x) is false
when x is NaN.  In this case, x is unordered with itself, unordered is
different from equal, so  compareQuietEqual(x, x) cannot be true.  It
cannot raise an exception either because it has to be quiet.  Thus the
only correct result is to return false.

The problem that we have in Python is that float.__eq__ is used for
too many different things and compareQuietEqual is not always
appropriate. Here is a partial list:

1. x == y
2. x in [y]
3. {y:1}[x]
4. x in {y}
5. [y].index(x)

In python 3, we already took a step away from using the same notion of
equality in all these cases.  Thus in #2, we use x is y or x == y
instead of plain x == y.  But that leads to some strange results:

>>> x = float('nan')
>>> x in [x]
True
>>> float('nan') in [float('nan')]
False

An alternative would be to define x in l as any(isnan(x) and isnan(y)
or x == y for y in l) when x and all elements of l are floats.  Again,
I am not making a change proposal - just mention a possibility.