[Python-Dev] PyObject_RichCompareBool identity shortcut

Thu Apr 28 05:42:03 CEST 2011

On 2011-04-27 22:16 , Guido van Rossum wrote:
> On Wed, Apr 27, 2011 at 11:48 AM, Robert Kern<robert.kern at gmail.com>  wrote:
>> On 4/27/11 12:44 PM, Terry Reedy wrote:
>>>
>>> On 4/27/2011 10:53 AM, Guido van Rossum wrote:
>>
>>>> Maybe we should just call off the odd NaN comparison behavior?
>>>
>>> Eiffel seems to have survived, though I do not know if it used for
>>> numerical
>>> work. I wonder how much code would break and what the scipy folks would
>>> think.
>>
>> I suspect most of us would oppose changing it on general
>> backwards-compatibility grounds rather than actually *liking* the current
>> behavior. If the behavior changed with Python floats, we'd have to mull over
>> whether we try to match that behavior with our scalar types (one of which
>> subclasses from float) and our arrays. We would be either incompatible with
>> Python or C, and we'd probably end up choosing Python to diverge from. It
>> would make a mess, honestly. We already have to explain why equality is
>> funky for arrays (arr1 == arr2 is a rich comparison that gives an array, not
>> a bool, so we can't do containment tests for lists of arrays), so NaN is
>> pretty easy to explain afterward.
>
> So does NumPy also follow Python's behavior about ignoring the NaN
> special-casing when doing array ops?

By "ignoring the NaN special-casing", do you mean that identity is checked 
first? When we use dtype=object arrays (arrays that contain Python objects as 
their data), yes:

[~]
|1> nan = float('nan')

[~]
|2> import numpy as np

[~]
|3> a = np.array([1, 2, nan], dtype=object)

[~]
|4> nan in a
True

[~]
|5> float('nan') in a
False

Just like lists:

[~]
|6> nan in [1, 2, nan]
True

[~]
|7> float('nan') in [1, 2, nan]
False

Actually, we go a little further by using PyObject_RichCompareBool() rather than 
PyObject_RichCompare() to implement the array-wise comparisons in addition to 
containment:

[~]
|8> a == nan
array([False, False,  True], dtype=bool)

[~]
|9> [x == nan for x in [1, 2, nan]]
[False, False, False]

But for dtype=float arrays (which contain C doubles, not Python objects) we use 
C semantics. Literally, we use whatever C's == operator gives us for the two 
double values. Since there is no concept of identity for this case, there is no 
cognate behavior of Python to match.

[~]
|10> b = np.array([1.0, 2.0, nan], dtype=float)

[~]
|11> b == nan
array([False, False, False], dtype=bool)

[~]
|12> nan in b
False

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco