[Numpy-discussion] Complex nan ordering

Pauli Virtanen pav at iki.fi
Sun Jul 18 19:00:26 EDT 2010


Sun, 18 Jul 2010 15:57:47 -0600, Charles R Harris wrote:
> On Sun, Jul 18, 2010 at 3:36 PM, Pauli Virtanen <pav at iki.fi> wrote:
[clip]
>> I suggest the following, aping the way the real nan works:
>>
>> - (z, nan), (nan, z), (nan, nan), where z is any fp value, are all
>>  equivalent representations of "cnan", as far as comparisons, sort
>>  order, etc are concerned.
>
> - The ordering between (z, nan), (nan, z), (nan, nan) is undefined. This
>>  means e.g. that maximum([cnan_1, cnan_2]) can return either cnan_1 or
>>  cnan_2 if both are some cnans.
>
> The sort and cmp order was defined in 1.4.0, see the release notes.
> (z,z), (z, nan), (nan, z), (nan, nan) are in correct order and there are
> tests to enforce this. Sort and searchsorted need to work together.

Ok, now we're diving into an obscure corner that hopefully many people 
don't care about :)

There are several issues here:

1) We should not use lexical order in comparison operations,
   since this contradicts real-valued nan arithmetic.

   Currently (and in 1.4) we do some weird sort of mixture,
   which seems inconsistent.

2) maximum/minimum should propagate nans, fmax/fmin should not

3) sort/searchsorted, and amax/argmax need to play together

4) as long as 1)-3) are valid, I don't think anybody cares what
   what exactly we mean by a "complex nan", as long as

   np.isnan("complex nan") == True

   The fact that there are happen to be several different representations
   of a complex nan should not be important.

    ***

1) 

Unless we want to define

	(complex(nan, 0) > complex(0, 0)) == True

we cannot strictly follow the lexical order in comparisons. And if we 
define it like this, we contradict real-valued nan arithmetic, which IMHO 
is quite bad.

Here, it would make sense to me to lump all the different complex nans 
into a single "cnan", as far as the arithmetic comparison operations are 
concerned. Then,

	z OP cnan == False

for all comparison operations.

In 1.4.1 we have

>>> import numpy as np
>>> np.__version__
'1.4.1'
>>> x = np.complex64(complex(np.nan, 1))
>>> y = np.complex64(complex(0, 1))
>>> x >= y
False
>>> x < y
False
>>> x = np.complex64(complex(1, np.nan))
>>> y = np.complex64(complex(0, 1))
>>> x >= y
True
>>> x < y
False

which seems an obscure mix of real-valued nan arithmetic and lexical 
ordering -- I don't think it's the correct choice...

Of course, the practical importance of this decision approaches zero, but 
it would be nice to be consistent.

    ***

2)

For maximum/amax, strict lexical order contradicts nan propagation:

    maximum(1+nan*j, 2+0j) == 2+0j  ???

I don't see why we should follow the lexical order when both arguments 
are nans. The implementation will be faster if we don't.

Also, this way argmax (which should be nan-propagating) can stop looking 
once it finds the first nan -- and it does not need to care if later on 
in the array there would be a "greater" nan.

    ***

3)

For sort/searchsorted we have a technical reason to do something more, 
and there the strict lexical order seems the correct decision.

For `argmax` it was possible to be compatible with `amax` when lumping 
cnans in maximum -- just return the first cnan.

    ***

4)

As far as np.isnan is concerned,

>>> np.isnan(complex(0, nan))
True
>>> np.isnan(complex(nan, 0))
True
>>> np.isnan(complex(nan, nan))
True

So I think nobody should care which complex nan a function such as 
maximum or amax returns. 

We can of course give up some performance to look for the "greatest" nan 
in these cases, but I do not think that it would be very worthwhile.

-- 
Pauli Virtanen




More information about the NumPy-Discussion mailing list