Hi, The current way of Numpy handles ordering of complex nan is not very well defined. We should attempt to clarify this for 1.5.0. For example, what should these return: r1 = np.maximum(complex(1, nan), complex(2, 0)) r2 = np.complex64(complex(1, nan)) < np.complex64(complex(2, 0)) or, what should `r3` be after this: r3 = np.array([complex(3, nan), complex(1, 0), complex(nan, 2)]) r3.sort() Previously, we have defined a lexical ordering relation for complex numbers, x < y iff x.real < y.real or (x.real == y.real and x.imag < y.imag) but applying this to the above can cause some surprises: amax([1, 2, 4, complex(3, nan)]) == 4 which breaks nan propagation, and moreover the result depends on the order of the items, and the precise way the algorithm is written. *** Numpy IIRC has specified a lexical order between complex numbers for some time now, unlike Python in which complex numbers are unordered. So we won't change how finite numbers are handled, only the nan handling needs to be specified. *** I suggest the following, aping the way the real nan works: - (z, nan), (nan, z), (nan, nan), where z is any fp value, are all equivalent representations of "cnan", as far as comparisons, sort order, etc are concerned. - The ordering between (z, nan), (nan, z), (nan, nan) is undefined. This means e.g. that maximum([cnan_1, cnan_2]) can return either cnan_1 or cnan_2 if both are some cnans. - Moreover, all comparisons <, >, ==, <=, >= where one or more operands is a cnan are false. - Except that when sorting, cnans are to be placed last. The advantages are now that nan propagation is now easier to implement, and we get faster code. Moreover, complex nans start to behave more similarly as their real counterparts in comparisons etc.; for instance in the above cases r1 = (1, nan) r2 = False r3 = [complex(1, 0), complex(3, nan), complex(nan, 2)] where in `r3` the order of the last two elements is unspecified. This is in fact the SVN trunk now works (final tweaks in r8508, 8509). Comments are welcome. -- Pauli Virtanen
On Sun, Jul 18, 2010 at 3:36 PM, Pauli Virtanen <pav@iki.fi> wrote:
Hi,
The current way of Numpy handles ordering of complex nan is not very well defined. We should attempt to clarify this for 1.5.0.
For example, what should these return:
r1 = np.maximum(complex(1, nan), complex(2, 0))
r2 = np.complex64(complex(1, nan)) < np.complex64(complex(2, 0))
or, what should `r3` be after this:
r3 = np.array([complex(3, nan), complex(1, 0), complex(nan, 2)]) r3.sort()
Previously, we have defined a lexical ordering relation for complex numbers,
x < y iff x.real < y.real or (x.real == y.real and x.imag < y.imag)
but applying this to the above can cause some surprises:
amax([1, 2, 4, complex(3, nan)]) == 4
which breaks nan propagation, and moreover the result depends on the order of the items, and the precise way the algorithm is written.
***
Numpy IIRC has specified a lexical order between complex numbers for some time now, unlike Python in which complex numbers are unordered.
So we won't change how finite numbers are handled, only the nan handling needs to be specified.
***
I suggest the following, aping the way the real nan works:
- (z, nan), (nan, z), (nan, nan), where z is any fp value, are all equivalent representations of "cnan", as far as comparisons, sort order, etc are concerned.
- The ordering between (z, nan), (nan, z), (nan, nan) is undefined. This
means e.g. that maximum([cnan_1, cnan_2]) can return either cnan_1 or cnan_2 if both are some cnans.
The sort and cmp order was defined in 1.4.0, see the release notes. (z,z), (z, nan), (nan, z), (nan, nan) are in correct order and there are tests to enforce this. Sort and searchsorted need to work together. - Moreover, all comparisons <, >, ==, <=, >= where one or more operands
is a cnan are false.
- Except that when sorting, cnans are to be placed last.
And in sort order.
The advantages are now that nan propagation is now easier to implement, and we get faster code. Moreover, complex nans start to behave more similarly as their real counterparts in comparisons etc.; for instance in the above cases
r1 = (1, nan) r2 = False r3 = [complex(1, 0), complex(3, nan), complex(nan, 2)]
where in `r3` the order of the last two elements is unspecified.
This is in fact the SVN trunk now works (final tweaks in r8508, 8509).
Chuck
Sun, 18 Jul 2010 15:57:47 -0600, Charles R Harris wrote:
On Sun, Jul 18, 2010 at 3:36 PM, Pauli Virtanen <pav@iki.fi> wrote: [clip]
I suggest the following, aping the way the real nan works:
- (z, nan), (nan, z), (nan, nan), where z is any fp value, are all equivalent representations of "cnan", as far as comparisons, sort order, etc are concerned.
- The ordering between (z, nan), (nan, z), (nan, nan) is undefined. This
means e.g. that maximum([cnan_1, cnan_2]) can return either cnan_1 or cnan_2 if both are some cnans.
The sort and cmp order was defined in 1.4.0, see the release notes. (z,z), (z, nan), (nan, z), (nan, nan) are in correct order and there are tests to enforce this. Sort and searchsorted need to work together.
Ok, now we're diving into an obscure corner that hopefully many people don't care about :) There are several issues here: 1) We should not use lexical order in comparison operations, since this contradicts real-valued nan arithmetic. Currently (and in 1.4) we do some weird sort of mixture, which seems inconsistent. 2) maximum/minimum should propagate nans, fmax/fmin should not 3) sort/searchsorted, and amax/argmax need to play together 4) as long as 1)-3) are valid, I don't think anybody cares what what exactly we mean by a "complex nan", as long as np.isnan("complex nan") == True The fact that there are happen to be several different representations of a complex nan should not be important. *** 1) Unless we want to define (complex(nan, 0) > complex(0, 0)) == True we cannot strictly follow the lexical order in comparisons. And if we define it like this, we contradict real-valued nan arithmetic, which IMHO is quite bad. Here, it would make sense to me to lump all the different complex nans into a single "cnan", as far as the arithmetic comparison operations are concerned. Then, z OP cnan == False for all comparison operations. In 1.4.1 we have
import numpy as np np.__version__ '1.4.1' x = np.complex64(complex(np.nan, 1)) y = np.complex64(complex(0, 1)) x >= y False x < y False x = np.complex64(complex(1, np.nan)) y = np.complex64(complex(0, 1)) x >= y True x < y False
which seems an obscure mix of real-valued nan arithmetic and lexical ordering -- I don't think it's the correct choice... Of course, the practical importance of this decision approaches zero, but it would be nice to be consistent. *** 2) For maximum/amax, strict lexical order contradicts nan propagation: maximum(1+nan*j, 2+0j) == 2+0j ??? I don't see why we should follow the lexical order when both arguments are nans. The implementation will be faster if we don't. Also, this way argmax (which should be nan-propagating) can stop looking once it finds the first nan -- and it does not need to care if later on in the array there would be a "greater" nan. *** 3) For sort/searchsorted we have a technical reason to do something more, and there the strict lexical order seems the correct decision. For `argmax` it was possible to be compatible with `amax` when lumping cnans in maximum -- just return the first cnan. *** 4) As far as np.isnan is concerned,
np.isnan(complex(0, nan)) True np.isnan(complex(nan, 0)) True np.isnan(complex(nan, nan)) True
So I think nobody should care which complex nan a function such as maximum or amax returns. We can of course give up some performance to look for the "greatest" nan in these cases, but I do not think that it would be very worthwhile. -- Pauli Virtanen
On Sun, Jul 18, 2010 at 5:00 PM, Pauli Virtanen <pav@iki.fi> wrote:
Sun, 18 Jul 2010 15:57:47 -0600, Charles R Harris wrote:
On Sun, Jul 18, 2010 at 3:36 PM, Pauli Virtanen <pav@iki.fi> wrote: [clip]
I suggest the following, aping the way the real nan works:
- (z, nan), (nan, z), (nan, nan), where z is any fp value, are all equivalent representations of "cnan", as far as comparisons, sort order, etc are concerned.
- The ordering between (z, nan), (nan, z), (nan, nan) is undefined. This
means e.g. that maximum([cnan_1, cnan_2]) can return either cnan_1 or cnan_2 if both are some cnans.
The sort and cmp order was defined in 1.4.0, see the release notes. (z,z), (z, nan), (nan, z), (nan, nan) are in correct order and there are tests to enforce this. Sort and searchsorted need to work together.
Ok, now we're diving into an obscure corner that hopefully many people don't care about :)
There are several issues here:
1) We should not use lexical order in comparison operations, since this contradicts real-valued nan arithmetic.
How so? Nans sort to the end for reals and also to the end for complex. The sort order for complex isn't strictly a lexical extension of the reals, it's a bit closer to what you are talking about, *all* complex numbers containing nans sort higher than "real" complex numbers. The need was to separate the nan containing numbers from the real numbers. But within each of the "real" and "nan" regions the numbers are sorted lexically.
Currently (and in 1.4) we do some weird sort of mixture, which seems inconsistent.
2) maximum/minimum should propagate nans, fmax/fmin should not
So they do at this time.
3) sort/searchsorted, and amax/argmax need to play together
Then I think amax/amin should conform to the sort order. If we are going to compare nans, then they should to sit somewhere in a strict order, they can't both be largest and smallest. The choice of where to put them is somewhat arbitrary, but they need to go somewhere consistent.
4) as long as 1)-3) are valid, I don't think anybody cares what what exactly we mean by a "complex nan", as long as
np.isnan("complex nan") == True
But that has nothing to do with sorting order, its just a broad classification like positive numbers. In this case it is nan containing complex numbers.
The fact that there are happen to be several different representations of a complex nan should not be important.
Why not? Suppose you want to search for certain combinations?
***
1)
Unless we want to define
(complex(nan, 0) > complex(0, 0)) == True
Looks reasonable to me. That is what the sort order does.
we cannot strictly follow the lexical order in comparisons. And if we define it like this, we contradict real-valued nan arithmetic, which IMHO is quite bad.
As mentioned above, the sorting order for complex isn't strictly lexical. Whether it is reasonable to extend the sorting order to the usual comparisons is a different question. I didn't do it for a reason, but maybe now is the time to "sort" things out.
Here, it would make sense to me to lump all the different complex nans into a single "cnan", as far as the arithmetic comparison operations are concerned. Then,
z OP cnan == False
for all comparison operations.
In 1.4.1 we have
import numpy as np np.__version__ '1.4.1' x = np.complex64(complex(np.nan, 1)) y = np.complex64(complex(0, 1)) x >= y False x < y False x = np.complex64(complex(1, np.nan)) y = np.complex64(complex(0, 1)) x >= y True x < y False
which seems an obscure mix of real-valued nan arithmetic and lexical ordering -- I don't think it's the correct choice...
Of course, the practical importance of this decision approaches zero, but it would be nice to be consistent.
***
2)
For maximum/amax, strict lexical order contradicts nan propagation:
maximum(1+nan*j, 2+0j) == 2+0j ???
But that isn't what the sort order yields. An complex number containing nans in any position will always sort greater than (z,z), it is only in comparisons between two numbers containing nans that the lexical order comes back into play.
I don't see why we should follow the lexical order when both arguments are nans. The implementation will be faster if we don't.
I'm actually a bit curious about the speed.
Also, this way argmax (which should be nan-propagating) can stop looking once it finds the first nan -- and it does not need to care if later on in the array there would be a "greater" nan.
***
3)
For sort/searchsorted we have a technical reason to do something more, and there the strict lexical order seems the correct decision.
Exactly.
For `argmax` it was possible to be compatible with `amax` when lumping cnans in maximum -- just return the first cnan.
I don't have a problem distinguishing sort order from normal comparison order, the notes explicitly label it a sorting order. I think we just need to be clear if we make a distinction and choose what is best for each.
***
4)
As far as np.isnan is concerned,
np.isnan(complex(0, nan)) True np.isnan(complex(nan, 0)) True np.isnan(complex(nan, nan)) True
So I think nobody should care which complex nan a function such as maximum or amax returns.
Sure. As long as it is clear that sorting will lead to different results.
We can of course give up some performance to look for the "greatest" nan in these cases, but I do not think that it would be very worthwhile.
Well, the sort comparison function was optimized on the assumption that nans are not the common case. At least I think it was, it is rather complex ;) Chuck
On Sun, Jul 18, 2010 at 3:36 PM, Pauli Virtanen <pav@iki.fi> wrote:
Hi,
The current way of Numpy handles ordering of complex nan is not very well defined. We should attempt to clarify this for 1.5.0.
For example, what should these return:
r1 = np.maximum(complex(1, nan), complex(2, 0))
r2 = np.complex64(complex(1, nan)) < np.complex64(complex(2, 0))
or, what should `r3` be after this:
r3 = np.array([complex(3, nan), complex(1, 0), complex(nan, 2)]) r3.sort()
Previously, we have defined a lexical ordering relation for complex numbers,
x < y iff x.real < y.real or (x.real == y.real and x.imag < y.imag)
but applying this to the above can cause some surprises:
amax([1, 2, 4, complex(3, nan)]) == 4
which breaks nan propagation, and moreover the result depends on the order of the items, and the precise way the algorithm is written.
However, nans have been propagated by maximum and minimum since 1.4.0. There was a question, discussed on the list, as to what 'nan' complex to return in the propagation, but it was still a nan complex in your definition of such objects. The final choice was driven by using the first of the already available complex nans as it was the easiest thing to do. However, (nan, nan) would be just as easy to do now that nans are available. I'm not sure what your modifications to the macros buys us, what do you want to achieve? <snip> Chuck
However, nans have been propagated by maximum and minimum since 1.4.0. There was a question, discussed on the list, as to what 'nan' complex to return in the propagation, but it was still a nan complex in your definition of such objects. The final choice was driven by using the first of the already available complex nans as it was the easiest thing to do. However, (nan, nan) would be just as easy to do now that nans are available.
Then we would be creating an inconsistency between amax/argmax. Of course if we say all cnans are equivalent, it doesn't matter.
I'm not sure what your modifications to the macros buys us, what do you want to achieve?
1) fix bugs in nan propagation, maximum(1, complex(0, nan)) used to return 1. The result also depended previously on the order of arguments. 2) make complex nan behave similarly as a real nan in comparisons (non-sorting). I think both of these are worthwhile. Pauli
participants (2)
-
Charles R Harris
-
Pauli Virtanen