[Python-Dev] Not-a-Number (was PyObject_RichCompareBool identity shortcut)

Thu Apr 28 22:13:25 CEST 2011

On 4/28/2011 4:40 AM, Mark Shannon wrote:

> NaN is *not* a number (the clue is in the name).

The problem is that the committee itself did not believe or stay 
consistent with that. In the text of the draft, they apparently refer to 
Nan as an indefinite, unspecified *number*. Sort of like a random 
variable with a uniform pseudo* distribution over the reals (* 0 
everywhere with integral 1). Or a quantum particle present but smeared 
out over all space. And that apparently is their rationale for Nan != 
NaN: an unspecified number will equal another unspecified number with 
probability 0. The rationale for bool(NaN)==True is that an unspecified 
*number* will be 0 with probability 0. If Nan truly indicated an 
*absence* (like 0 and '') then bool(NaN) should be False,

I think the committee goofed -- badly. Statisticians used missing value 
indicators long before the committee existed. They has no problem 
thinking that the indicator, as an object, equaled itself. So one could 
write (and I often did through the 1980s) the equivalent of

for i,x in enumerate(datavec):
   if x == XMIS: # singleton missing value indicator for BMDP
     datavec[i] = default

(Statistics packages have no concept of identity different from equality.)

If statisticians had made XMIS != XMIS, that obvious code would not have 
worked, as it will not today for Python. Instead, the special case 
circumlocution of "if isXMIS(x):" would have been required, adding one 
more unnecessary function to the list of builtins.

NaN is, in its domain, the equivalent of None (== Not a Value), which 
also serves an an alternative to immediately raising an exception. But 
like XMIS, None==None. Also, bool(None) is corretly for something that 
indicates absence.

> Python treats it as if it were a number:

As I said, so did the committee, and that was its mistake that we are 
more or less stuck with.

> NaN does not have to be a float or a Decimal.
> Perhaps it should have its own class.

Like None

> As pointed out by Meyer:
> NaN == NaN is False
> is no more logical than
> NaN != NaN is False

This is wrong if False/True are interpreted as probabilities 0 and 1.

> To summarise:
>
> NaN is required so that floating point operations on arrays and lists
> do not raise unwanted exceptions.

Like None.

> NaN is Not a Number (therefore should be neither a float nor a Decimal).
> Making it a new class would solve some of the problems discussed,
> but would create new problems instead.

Agreed, if we were starting fresh.

> Correct behaviour of collections is more important than IEEE conformance
> of NaN comparisons.

Also agreed.

-- 
Terry Jan Reedy