[Python-Dev] PyObject_RichCompareBool identity shortcut

Thu Apr 28 07:27:36 CEST 2011

On Wed, Apr 27, 2011 at 11:14 PM, Guido van Rossum <guido at python.org> wrote:
..
>> ISTM, the current state of affairs is reasonable.
>
> Hardly; when I picked the NaN behavior I knew the IEEE std prescribed
> it but had never seen any code that used this.
>

Same here.  The only code I've seen that depended on this NaN behavior
was either buggy (programmer did not consider NaN case) or was using x
== x as a way to detect nans.  The later idiom is universally frowned
upon regardless of the language.  In Python one should use
math.isnan() for this purpose.

I would like to present a challenge to the proponents of the status
quo.  Look through your codebase and find code that will behave
differently if nan == nan were True.   Then come back and report how
many bugs you have found. :-)  Seriously, though, I bet that if you
find anything, it will fall into one of the two cases I mentioned
above.

..
> I expect that that if 15 years or so ago I had decided to ignore the
> IEEE std and declare that object identity always implies equality it
> would have seemed quite reasonable as well... The rule could be
> something like "the == operator first checks for identity and if left
> and right are the same object, the answer is True without calling the
> object's __eq__ method; similarly the != would always return False
> when an object is compared to itself".

Note that ctypes' floats already behave this way:

>>> x = c_double(float('nan'))
>>> x == x
True

..
> Doing this in 3.3 would, alas, be a huge undertaking -- I expect that
> there are tons of unittests that depend either on the current NaN
> behavior or on x == x calling x.__eq__(x). Plus the decimal unittests
> would be affected. Perhaps somebody could try?

Before we go down this path, I would like to discuss another
peculiarity of NaNs:

>>> float('nan') < 0
False
>>> float('nan') > 0
False

This property in my experience causes much more trouble than nan ==
nan being false.  The problem is that common sorting or binary search
algorithms may degenerate into infinite loops in the presence of nans.
 This may even happen when searching for a finite value in a large
array that contains a single nan.  Errors like this do happen in the
wild and and after chasing a bug like this programmers tend to avoid
nans at all costs.  Oftentimes this leads to using "magic"
placeholders such as 1e300 for missing data.

Since py3k has already made None < 0 an error, it may be reasonable
for float('nan') < 0 to raise an error as well (probably ValueError
rather than TypeError).  This will not make lists with nans sortable
or searchable using binary search, but will make associated bugs
easier to find.