Re: [Numpy-discussion] Complex nan ordering

18 Jul 2010

      Sun, 18 Jul 2010 15:57:47 -0600, Charles R Harris wrote:
...
On Sun, Jul 18, 2010 at 3:36 PM, Pauli Virtanen <pav@iki.fi> wrote:
[clip]
...
I suggest the following, aping the way the real nan works:
- (z, nan), (nan, z), (nan, nan), where z is any fp value, are all
 equivalent representations of "cnan", as far as comparisons, sort
 order, etc are concerned.
- The ordering between (z, nan), (nan, z), (nan, nan) is undefined. This
...
means e.g. that maximum([cnan_1, cnan_2]) can return either cnan_1 or
 cnan_2 if both are some cnans.
The sort and cmp order was defined in 1.4.0, see the release notes.
(z,z), (z, nan), (nan, z), (nan, nan) are in correct order and there are
tests to enforce this. Sort and searchsorted need to work together.
Ok, now we're diving into an obscure corner that hopefully many people 
don't care about :)

There are several issues here:

1) We should not use lexical order in comparison operations,
   since this contradicts real-valued nan arithmetic.

   Currently (and in 1.4) we do some weird sort of mixture,
   which seems inconsistent.

2) maximum/minimum should propagate nans, fmax/fmin should not

3) sort/searchsorted, and amax/argmax need to play together

4) as long as 1)-3) are valid, I don't think anybody cares what
   what exactly we mean by a "complex nan", as long as

   np.isnan("complex nan") == True

   The fact that there are happen to be several different representations
   of a complex nan should not be important.

    ***

1) 

Unless we want to define

	(complex(nan, 0) > complex(0, 0)) == True

we cannot strictly follow the lexical order in comparisons. And if we 
define it like this, we contradict real-valued nan arithmetic, which IMHO 
is quite bad.

Here, it would make sense to me to lump all the different complex nans 
into a single "cnan", as far as the arithmetic comparison operations are 
concerned. Then,

	z OP cnan == False

for all comparison operations.

In 1.4.1 we have
...
...
...
import numpy as np
np.__version__
'1.4.1'
x = np.complex64(complex(np.nan, 1))
y = np.complex64(complex(0, 1))
x >= y
False
x < y
False
x = np.complex64(complex(1, np.nan))
y = np.complex64(complex(0, 1))
x >= y
True
x < y
False
which seems an obscure mix of real-valued nan arithmetic and lexical 
ordering -- I don't think it's the correct choice...

Of course, the practical importance of this decision approaches zero, but 
it would be nice to be consistent.

    ***

2)

For maximum/amax, strict lexical order contradicts nan propagation:

    maximum(1+nan*j, 2+0j) == 2+0j  ???

I don't see why we should follow the lexical order when both arguments 
are nans. The implementation will be faster if we don't.

Also, this way argmax (which should be nan-propagating) can stop looking 
once it finds the first nan -- and it does not need to care if later on 
in the array there would be a "greater" nan.

    ***

3)

For sort/searchsorted we have a technical reason to do something more, 
and there the strict lexical order seems the correct decision.

For `argmax` it was possible to be compatible with `amax` when lumping 
cnans in maximum -- just return the first cnan.

    ***

4)

As far as np.isnan is concerned,
...
...
...
np.isnan(complex(0, nan))
True
np.isnan(complex(nan, 0))
True
np.isnan(complex(nan, nan))
True
So I think nobody should care which complex nan a function such as 
maximum or amax returns. 

We can of course give up some performance to look for the "greatest" nan 
in these cases, but I do not think that it would be very worthwhile.

-- 
Pauli Virtanen