What is the sign of nan?
data:image/s3,"s3://crabby-images/e4aa6/e4aa6e420ae6ff6dcb338785e846cb1efd9d677a" alt=""
Hi All, I've been cleaning up the ufunc loops and the sign function currently doesn't have a defined behavior for nans. This makes the results depend on the order/type of comparisons in the code, which looks fragile to me. So what should it return? I vote for nan but am open for suggestions. Chuck
data:image/s3,"s3://crabby-images/e4aa6/e4aa6e420ae6ff6dcb338785e846cb1efd9d677a" alt=""
On Mon, Sep 29, 2008 at 3:52 PM, Charles R Harris <charlesr.harris@gmail.com
wrote:
Hi All,
I've been cleaning up the ufunc loops and the sign function currently doesn't have a defined behavior for nans. This makes the results depend on the order/type of comparisons in the code, which looks fragile to me. So what should it return? I vote for nan but am open for suggestions.
And while we're at it, lets decide how to treat max/min when nans are involved. Or should we just say the behavior is undefined. Chuck
data:image/s3,"s3://crabby-images/c4c8c/c4c8c9ee578d359a3234c68c5656728c7c864441" alt=""
On Mon, Sep 29, 2008 at 17:13, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, Sep 29, 2008 at 3:52 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
Hi All,
I've been cleaning up the ufunc loops and the sign function currently doesn't have a defined behavior for nans. This makes the results depend on the order/type of comparisons in the code, which looks fragile to me. So what should it return? I vote for nan but am open for suggestions.
And while we're at it, lets decide how to treat max/min when nans are involved. Or should we just say the behavior is undefined.
When feasible, I would like float(s)->float functions to return NaN when given a NaN as an argument. At least as the main versions of the function. Specific NaN-ignoring functions can also be introduced, but as separate functions. I don't know what exactly to do about float->int functions (e.g. argmin). I also don't know how these should interact with the current seterr() state. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
data:image/s3,"s3://crabby-images/e4aa6/e4aa6e420ae6ff6dcb338785e846cb1efd9d677a" alt=""
On Mon, Sep 29, 2008 at 4:28 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Mon, Sep 29, 2008 at 17:13, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, Sep 29, 2008 at 3:52 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
Hi All,
I've been cleaning up the ufunc loops and the sign function currently doesn't have a defined behavior for nans. This makes the results depend
on
the order/type of comparisons in the code, which looks fragile to me. So what should it return? I vote for nan but am open for suggestions.
And while we're at it, lets decide how to treat max/min when nans are involved. Or should we just say the behavior is undefined.
When feasible, I would like float(s)->float functions to return NaN when given a NaN as an argument. At least as the main versions of the function. Specific NaN-ignoring functions can also be introduced, but as separate functions. I don't know what exactly to do about float->int functions (e.g. argmin). I also don't know how these should interact with the current seterr() state.
So the proposition is, sign, max, min return nan when any of the arguments is nan. +1 Complex numbers are more complicated because we first compare the real parts, then the imaginary. Arguably 1 > 0 + nan*1j. I propose that the sign of a complex number containing nans should be nan, but I can't decide what should happen with max/min Chuck
data:image/s3,"s3://crabby-images/e4aa6/e4aa6e420ae6ff6dcb338785e846cb1efd9d677a" alt=""
On Mon, Sep 29, 2008 at 4:40 PM, Charles R Harris <charlesr.harris@gmail.com
wrote:
On Mon, Sep 29, 2008 at 4:28 PM, Robert Kern <robert.kern@gmail.com>wrote:
On Mon, Sep 29, 2008 at 17:13, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, Sep 29, 2008 at 3:52 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
Hi All,
I've been cleaning up the ufunc loops and the sign function currently doesn't have a defined behavior for nans. This makes the results depend
on
the order/type of comparisons in the code, which looks fragile to me. So what should it return? I vote for nan but am open for suggestions.
And while we're at it, lets decide how to treat max/min when nans are involved. Or should we just say the behavior is undefined.
When feasible, I would like float(s)->float functions to return NaN when given a NaN as an argument. At least as the main versions of the function. Specific NaN-ignoring functions can also be introduced, but as separate functions. I don't know what exactly to do about float->int functions (e.g. argmin). I also don't know how these should interact with the current seterr() state.
So the proposition is, sign, max, min return nan when any of the arguments is nan.
+1
I also propose that all logical operators involving nan return false, i.e., ==, !=, <, <=, >, >=, and, or, xor, not.
data:image/s3,"s3://crabby-images/e4aa6/e4aa6e420ae6ff6dcb338785e846cb1efd9d677a" alt=""
On Mon, Sep 29, 2008 at 4:54 PM, Charles R Harris <charlesr.harris@gmail.com
wrote:
On Mon, Sep 29, 2008 at 4:40 PM, Charles R Harris < charlesr.harris@gmail.com> wrote:
On Mon, Sep 29, 2008 at 4:28 PM, Robert Kern <robert.kern@gmail.com>wrote:
On Mon, Sep 29, 2008 at 17:13, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, Sep 29, 2008 at 3:52 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
Hi All,
I've been cleaning up the ufunc loops and the sign function currently doesn't have a defined behavior for nans. This makes the results
depend on
the order/type of comparisons in the code, which looks fragile to me. So what should it return? I vote for nan but am open for suggestions.
And while we're at it, lets decide how to treat max/min when nans are involved. Or should we just say the behavior is undefined.
When feasible, I would like float(s)->float functions to return NaN when given a NaN as an argument. At least as the main versions of the function. Specific NaN-ignoring functions can also be introduced, but as separate functions. I don't know what exactly to do about float->int functions (e.g. argmin). I also don't know how these should interact with the current seterr() state.
So the proposition is, sign, max, min return nan when any of the arguments is nan.
+1
I also propose that all logical operators involving nan return false, i.e., ==, !=, <, <=, >, >=, and, or, xor, not.
Currently this is so except for !=. On my machine nan != nan is true. Looks like it is being computed in C as !(nan == nan). Hmm, anyone know of a C standard on this? Chuck
data:image/s3,"s3://crabby-images/c4c8c/c4c8c9ee578d359a3234c68c5656728c7c864441" alt=""
On Mon, Sep 29, 2008 at 18:10, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, Sep 29, 2008 at 4:54 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, Sep 29, 2008 at 4:40 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, Sep 29, 2008 at 4:28 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Mon, Sep 29, 2008 at 17:13, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, Sep 29, 2008 at 3:52 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
Hi All,
I've been cleaning up the ufunc loops and the sign function currently doesn't have a defined behavior for nans. This makes the results depend on the order/type of comparisons in the code, which looks fragile to me. So what should it return? I vote for nan but am open for suggestions.
And while we're at it, lets decide how to treat max/min when nans are involved. Or should we just say the behavior is undefined.
When feasible, I would like float(s)->float functions to return NaN when given a NaN as an argument. At least as the main versions of the function. Specific NaN-ignoring functions can also be introduced, but as separate functions. I don't know what exactly to do about float->int functions (e.g. argmin). I also don't know how these should interact with the current seterr() state.
So the proposition is, sign, max, min return nan when any of the arguments is nan.
+1
I also propose that all logical operators involving nan return false, i.e., ==, !=, <, <=, >, >=, and, or, xor, not.
Currently this is so except for !=. On my machine nan != nan is true. Looks like it is being computed in C as !(nan == nan). Hmm, anyone know of a C standard on this?
C99 Annex F: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf In particular: """ F.8.3 Relational operators x != x -> false The statement x != x is true if x is a NaN. """ -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
data:image/s3,"s3://crabby-images/e4aa6/e4aa6e420ae6ff6dcb338785e846cb1efd9d677a" alt=""
On Mon, Sep 29, 2008 at 5:16 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Mon, Sep 29, 2008 at 18:10, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, Sep 29, 2008 at 4:54 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, Sep 29, 2008 at 4:40 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, Sep 29, 2008 at 4:28 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Mon, Sep 29, 2008 at 17:13, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, Sep 29, 2008 at 3:52 PM, Charles R Harris <charlesr.harris@gmail.com> wrote: > > Hi All, > > I've been cleaning up the ufunc loops and the sign function
currently
> doesn't have a defined behavior for nans. This makes the results > depend on > the order/type of comparisons in the code, which looks fragile to me. > So > what should it return? I vote for nan but am open for suggestions.
And while we're at it, lets decide how to treat max/min when nans are involved. Or should we just say the behavior is undefined.
When feasible, I would like float(s)->float functions to return NaN when given a NaN as an argument. At least as the main versions of the function. Specific NaN-ignoring functions can also be introduced, but as separate functions. I don't know what exactly to do about float->int functions (e.g. argmin). I also don't know how these should interact with the current seterr() state.
So the proposition is, sign, max, min return nan when any of the arguments is nan.
+1
I also propose that all logical operators involving nan return false, i.e., ==, !=, <, <=, >, >=, and, or, xor, not.
Currently this is so except for !=. On my machine nan != nan is true. Looks like it is being computed in C as !(nan == nan). Hmm, anyone know of a C standard on this?
C99 Annex F:
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf
In particular:
""" F.8.3 Relational operators x != x -> false The statement x != x is true if x is a NaN. """
Thanks, that was very helpful. I wonder how widespread the less, lessequal, etc. macros are? Chuck
data:image/s3,"s3://crabby-images/1a93b/1a93b5e9663e7b8870fb78b607a558d9cd4b94cb" alt=""
Charles R Harris wrote:
Thanks, that was very helpful. I wonder how widespread the less, lessequal, etc. macros are?
If it is C99, count on some platforms (MS in particular) to not do it. Also, when doing things in C, beware that some compilers break most reasonable expectations about floating points. In particular, x - x where x is a NaN or inf will returns 0 with MS compilers with the compiler flags currently used in python (/Ox), which breaks almost any code out there relying on proper NaN/Inf handling. So if possible, special case with isnan/isinf/isfinite cheers, David
data:image/s3,"s3://crabby-images/e4aa6/e4aa6e420ae6ff6dcb338785e846cb1efd9d677a" alt=""
Hi David, On Mon, Sep 29, 2008 at 9:07 PM, David Cournapeau < david@ar.media.kyoto-u.ac.jp> wrote:
Charles R Harris wrote:
Thanks, that was very helpful. I wonder how widespread the less, lessequal, etc. macros are?
If it is C99, count on some platforms (MS in particular) to not do it. Also, when doing things in C, beware that some compilers break most reasonable expectations about floating points. In particular, x - x where x is a NaN or inf will returns 0 with MS compilers with the compiler flags currently used in python (/Ox), which breaks almost any code out there relying on proper NaN/Inf handling. So if possible, special case with isnan/isinf/isfinite
Do any of the MS compilers handle these things correctly? I am loath to load up the code with expensive macros just to deal with bass ackwards compilers. Especially if I have to special case all the infs also. However, if I can use ifdefs to compile various versions I could be talked into it. OTOH, we could just say the results are undefined when non-conforming compilers are used. Chuck
data:image/s3,"s3://crabby-images/1a93b/1a93b5e9663e7b8870fb78b607a558d9cd4b94cb" alt=""
Charles R Harris wrote:
Do any of the MS compilers handle these things correctly?
Don't know. To be 100 % honest, one of the problem for MS compilers is the /Ox flag (for IEEE-754 rules). This should not be used for numpy, period (I am sure you could break numpy with gcc and -ffast-math and co; the difference being gcc by default is compliant by default and documented properly). Also, MS compiler (even the recent ones) say they are not C99 compliant, because there is no customer need for it: http://blogs.msdn.com/vcblog/archive/2007/11/05/iso-c-standard-update.aspx cheers, David
data:image/s3,"s3://crabby-images/e4aa6/e4aa6e420ae6ff6dcb338785e846cb1efd9d677a" alt=""
On Mon, Sep 29, 2008 at 9:53 PM, David Cournapeau < david@ar.media.kyoto-u.ac.jp> wrote:
Charles R Harris wrote:
Do any of the MS compilers handle these things correctly?
Don't know. To be 100 % honest, one of the problem for MS compilers is the /Ox flag (for IEEE-754 rules). This should not be used for numpy, period (I am sure you could break numpy with gcc and -ffast-math and co; the difference being gcc by default is compliant by default and documented properly). Also, MS compiler (even the recent ones) say they are not C99 compliant, because there is no customer need for it:
http://blogs.msdn.com/vcblog/archive/2007/11/05/iso-c-standard-update.aspx
That's why they don't support long doubles either. I wonder how they plan on moving into the HPC area with that attitude? Chuck
data:image/s3,"s3://crabby-images/1a93b/1a93b5e9663e7b8870fb78b607a558d9cd4b94cb" alt=""
Charles R Harris wrote:
So the proposition is, sign, max, min return nan when any of the arguments is nan.
Note that internally, signbit (the C function) returns an integer.
Complex numbers are more complicated because we first compare the real parts, then the imaginary. Arguably 1 > 0 + nan*1j.
Really ? Without thinking about the consequences, returning a NaN complex would be what I expect, should we go the route comparison with NaN returns a NaN.
I propose that the sign of a complex number containing nans should be nan, but I can't decide what should happen with max/min
Did you take a look at: http://projects.scipy.org/scipy/numpy/wiki/ProperNanHandling Anne and me did this, with several approaches. We did not consider interactions with the FPU error state, though, which is something which needs to be added. cheers, David
data:image/s3,"s3://crabby-images/e4aa6/e4aa6e420ae6ff6dcb338785e846cb1efd9d677a" alt=""
On Mon, Sep 29, 2008 at 9:02 PM, David Cournapeau < david@ar.media.kyoto-u.ac.jp> wrote:
Charles R Harris wrote:
So the proposition is, sign, max, min return nan when any of the arguments is nan.
Note that internally, signbit (the C function) returns an integer.
That is the signature of the ufunc. It could be changed... I believe the actual signbit of nan is undefined but I suppose we could return -1 in the nan case. That would be a fairly typical error signal for integers.
Complex numbers are more complicated because we first compare the real parts, then the imaginary. Arguably 1 > 0 + nan*1j.
Really ? Without thinking about the consequences, returning a NaN complex would be what I expect, should we go the route comparison with NaN returns a NaN.
Yeah, I'm headed that way also.
I propose that the sign of a complex number containing nans should be nan, but I can't decide what should happen with max/min
Did you take a look at:
http://projects.scipy.org/scipy/numpy/wiki/ProperNanHandling
Note that in my branch the current behavior is In [11]: a = np.array([0, np.nan, -1]) In [12]: np.max(a) Out[12]: nan In [13]: np.min(a) Out[13]: nan This is consistent with regarding nans as propagating errors. I can merge my branch back into yours if you want to play with these things. Chuck
data:image/s3,"s3://crabby-images/1a93b/1a93b5e9663e7b8870fb78b607a558d9cd4b94cb" alt=""
Charles R Harris wrote:
On Mon, Sep 29, 2008 at 9:02 PM, David Cournapeau <david@ar.media.kyoto-u.ac.jp <mailto:david@ar.media.kyoto-u.ac.jp>> wrote:
Charles R Harris wrote: > > So the proposition is, sign, max, min return nan when any of the > arguments is nan.
Note that internally, signbit (the C function) returns an integer.
That is the signature of the ufunc. It could be changed...
Nope, I am talking about the C99 signbit macro. man signbit tells me: NAME signbit - test sign of a real floating point number SYNOPSIS #include <math.h> int signbit(x); Compile with -std=c99; link with -lm. cheers, David
data:image/s3,"s3://crabby-images/e4aa6/e4aa6e420ae6ff6dcb338785e846cb1efd9d677a" alt=""
On Mon, Sep 29, 2008 at 9:54 PM, David Cournapeau < david@ar.media.kyoto-u.ac.jp> wrote:
Charles R Harris wrote:
On Mon, Sep 29, 2008 at 9:02 PM, David Cournapeau <david@ar.media.kyoto-u.ac.jp <mailto:david@ar.media.kyoto-u.ac.jp>> wrote:
Charles R Harris wrote: > > So the proposition is, sign, max, min return nan when any of the > arguments is nan.
Note that internally, signbit (the C function) returns an integer.
That is the signature of the ufunc. It could be changed...
Nope, I am talking about the C99 signbit macro. man signbit tells me:
NAME signbit - test sign of a real floating point number
SYNOPSIS #include <math.h>
int signbit(x);
Compile with -std=c99; link with -lm.
Yes, that too. But I was thinking of the ufunc returning nan when needed. However, I think -1 is the way to go for that to get minimal breakage. Chuck
data:image/s3,"s3://crabby-images/1a93b/1a93b5e9663e7b8870fb78b607a558d9cd4b94cb" alt=""
Charles R Harris wrote:
Yes, that too. But I was thinking of the ufunc returning nan when needed
I think this is better for consistency, yes. NaN is not a number, so it has no sign :) More seriously, I think those features should be clearly listed and thought out before being implemented, particularly wrt error handling and testing. I would rather see umathmodule.c cleaning first, wo any functionality change, and then, once integrated in the trunk, do the NaN changes in a branch, in particular because of broken implementations which will not do what you expect. Personally, I also have changes for NaN handling, but on my private git import of numpy. Also, if we fix this, we should fix this once for all, and with a lot of tests, if only to test on buggy implementations. cheers, David
data:image/s3,"s3://crabby-images/e4aa6/e4aa6e420ae6ff6dcb338785e846cb1efd9d677a" alt=""
On Mon, Sep 29, 2008 at 10:09 PM, David Cournapeau < david@ar.media.kyoto-u.ac.jp> wrote:
Charles R Harris wrote:
Yes, that too. But I was thinking of the ufunc returning nan when needed
I think this is better for consistency, yes. NaN is not a number, so it has no sign :) More seriously, I think those features should be clearly listed and thought out before being implemented, particularly wrt error handling and testing. I would rather see umathmodule.c cleaning first,
umathmodule.c is cleaned/done. Or was until I put in some of the nan handling. But some of the nan comparisons were just odd, in particular the sign function, which was effectively undefined. And since the current behavior is an accident of comparison choices and order, I consider it undefined also. So at a minimum I would fix up sign/maximum/minimum. That would be consistent with both masking and error propagation in common usage. The curiosities arise when using the reduce method. Also returning -1 for the signbit seems like the right thing to do. If the nan comparisons can't be relied on, then using the isnan macro is an option. However, I am not inclined to go on and use isfinite and all those.
wo any functionality change, and then, once integrated in the trunk, do the NaN changes in a branch, in particular because of broken implementations which will not do what you expect. Personally, I also have changes for NaN handling, but on my private git import of numpy.
I'm am currently working off your branch (at your suggestion). Is it safe to use NAN?
Also, if we fix this, we should fix this once for all, and with a lot of tests, if only to test on buggy implementations.
Yes, the tests will define the standard. There is one test for sign of complex numbers that currently breaks, but I don't consider that important as anyone currently relying on sign when the array contains nans is out of their mind. An option for making the code more portable is to make the various comparison operators macros, then worry about fixing the macros for various compilers. I would be happy to use macros such as lessthan, which in addition to other things are not supposed to raise exceptions. Of course, if MS compilers don't even get the arithmetic right... Chuck
data:image/s3,"s3://crabby-images/1a93b/1a93b5e9663e7b8870fb78b607a558d9cd4b94cb" alt=""
Charles R Harris wrote:
umathmodule.c is cleaned/done. Or was until I put in some of the nan handling. But some of the nan comparisons were just odd, in particular the sign function, which was effectively undefined. And since the current behavior is an accident of comparison choices and order, I consider it undefined also. So at a minimum I would fix up sign/maximum/minimum.
Yes, I was not arguing about making the changes, but about making them together in the same branch. Because I bet you only build on linux, and if something is broken on one platform, it will be difficult to make the difference between pure code changes and feature changes. Whereas if you have two branches (the second being on top of the other one), then we can test the two differently. This makes job of testers much easier, I believe. At least, it makes my life easier when testing with ICC, MS compilers and co.
I'm am currently working off your branch (at your suggestion). Is it safe to use NAN?
I've just answered to your private email about that point :)
Yes, the tests will define the standard. There is one test for sign of complex numbers that currently breaks, but I don't consider that important as anyone currently relying on sign when the array contains nans is out of their mind. An option for making the code more portable is to make the various comparison operators macros, then worry about fixing the macros for various compilers.
Yes, that's by far the best method for compatibility. Implement a layer of compatibility, and then do as if you were on a common platform. That's the whole point of my clean_math_branch, BTW. This make tracking bugs much easier, cheers, David
data:image/s3,"s3://crabby-images/c4c8c/c4c8c9ee578d359a3234c68c5656728c7c864441" alt=""
On Mon, Sep 29, 2008 at 23:02, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, Sep 29, 2008 at 9:02 PM, David Cournapeau <david@ar.media.kyoto-u.ac.jp> wrote:
Charles R Harris wrote:
So the proposition is, sign, max, min return nan when any of the arguments is nan.
Note that internally, signbit (the C function) returns an integer.
That is the signature of the ufunc. It could be changed... I believe the actual signbit of nan is undefined but I suppose we could return -1 in the nan case. That would be a fairly typical error signal for integers.
numpy.signbit() should work like C99 signbit() (where possible), IMO. It can only return (integer) 0 or 1, and it does differentiate between NAN and -NAN. I don't think we should invent new semantics if we can avoid it. I think we can change what the platform provides, but only in the direction of C99, IMO. I see signbit() as more along the lines of functions like isnan() than log(). There is no C99 cognate for numpy.sign(), and it is a float->float function, so I think we could make it return NAN. C99's copysign(x,y) is almost a cognate (e.g. numpy.sign(y) == copysign(1.0,y) except for y==+/-0.0), but since it does fall down on y==0, I don't think it's determinative for y==NAN. [~]$ man copysign COPYSIGN(3) BSD Library Functions Manual COPYSIGN(3) NAME copysign -- changes the sign of x to that of y SYNOPSIS #include <math.h> double copysign(double x, double y); ... [~]$ gcc --version i686-apple-darwin9-gcc-4.0.1 (GCC) 4.0.1 (Apple Inc. build 5465) Copyright (C) 2005 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. [~]$ cat foo.c #include <stdio.h> #include <math.h> int main(int argc, char **argv) { printf("signbit(NAN) = %d\n", signbit(NAN)); printf("signbit(-NAN) = %d\n", signbit(-NAN)); printf("copysign(1.0, NAN) = %g\n", copysign(1.0, NAN)); printf("copysign(1.0, -NAN) = %g\n", copysign(1.0, -NAN)); return 0; } [~]$ gcc -std=c99 -o foo foo.c -lm [~]$ ./foo signbit(NAN) = 0 signbit(-NAN) = 1 copysign(1.0, NAN) = 1 copysign(1.0, -NAN) = -1 -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
data:image/s3,"s3://crabby-images/1a93b/1a93b5e9663e7b8870fb78b607a558d9cd4b94cb" alt=""
Robert Kern wrote:
numpy.signbit() should work like C99 signbit() (where possible), IMO. It can only return (integer) 0 or 1, and it does differentiate between NAN and -NAN. I don't think we should invent new semantics if we can avoid it.
Agreed, but for signbit case, I can't find what the semantic should be for signbit and NANs; do you know any reference ? For example, going into your example: #include <stdio.h> #include <math.h> int main(int argc, char **argv) { printf("signbit(NAN) = %d\n", signbit(NAN)); printf("signbit(-NAN) = %d\n", signbit(-NAN)); printf("signbit((-1) * NAN) = %d\n", signbit((-1) * NAN)); printf("signbit(- NAN + NAN) = %d\n", signbit(-NAN + NAN)); printf("signbit(NAN - NAN) = %d\n", signbit(NAN - NAN)); return 0; } when run tells me: signbit(NAN) = 0 signbit(-NAN) = -2147483648 signbit((-1) * NAN) = 0 signbit(- NAN + NAN) = -2147483648 signbit(NAN - NAN) = 0 Does not this indicate that signbit(NAN) is undefined ? I guess I am afraid that the glibc NAN is just one type of NAN, and is not the behavior of any NAN, cheers, David
data:image/s3,"s3://crabby-images/1a93b/1a93b5e9663e7b8870fb78b607a558d9cd4b94cb" alt=""
David Cournapeau wrote:
when run tells me:
signbit(NAN) = 0 signbit(-NAN) = -2147483648 signbit((-1) * NAN) = 0 signbit(- NAN + NAN) = -2147483648 signbit(NAN - NAN) = 0
Does not this indicate that signbit(NAN) is undefined ? I guess I am afraid that the glibc NAN is just one type of NAN, and is not the behavior of any NAN,
Bah, this is gibberish. signbit test for the signbit (duh), nothing else, so this is always valid, and never raises a FPE_INVALID: glibc says that copysign is always valid, but I tested with signbit as well, and it does not seem to be raised either for that macro. http://www.gnu.org/software/libc/manual/html_node/FP-Bit-Twiddling.html#FP-B... Funny how far a bug in median can take us :) cheers, David
data:image/s3,"s3://crabby-images/e4aa6/e4aa6e420ae6ff6dcb338785e846cb1efd9d677a" alt=""
On Mon, Sep 29, 2008 at 10:50 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Mon, Sep 29, 2008 at 23:02, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, Sep 29, 2008 at 9:02 PM, David Cournapeau <david@ar.media.kyoto-u.ac.jp> wrote:
Charles R Harris wrote:
So the proposition is, sign, max, min return nan when any of the arguments is nan.
Note that internally, signbit (the C function) returns an integer.
That is the signature of the ufunc. It could be changed... I believe the actual signbit of nan is undefined but I suppose we could return -1 in
the
nan case. That would be a fairly typical error signal for integers.
numpy.signbit() should work like C99 signbit() (where possible), IMO. It can only return (integer) 0 or 1, and it does differentiate between NAN and -NAN. I don't think we should invent new semantics if we can avoid it. I think we can change what the platform provides, but only in the direction of C99, IMO. I see signbit() as more along the lines of functions like isnan() than log().
Sounds reasonable.
There is no C99 cognate for numpy.sign(), and it is a float->float function, so I think we could make it return NAN. C99's copysign(x,y) is almost a cognate (e.g. numpy.sign(y) == copysign(1.0,y) except for y==+/-0.0), but since it does fall down on y==0, I don't think it's determinative for y==NAN.
Sign doesn't distinguish +/-0 . The sign bit of 0 is explicitly cleared in the current (and former) code by adding +0 to the result.
[~]$ man copysign COPYSIGN(3) BSD Library Functions Manual COPYSIGN(3)
NAME copysign -- changes the sign of x to that of y
SYNOPSIS #include <math.h>
double copysign(double x, double y); ... [~]$ gcc --version i686-apple-darwin9-gcc-4.0.1 (GCC) 4.0.1 (Apple Inc. build 5465) Copyright (C) 2005 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
[~]$ cat foo.c #include <stdio.h> #include <math.h>
int main(int argc, char **argv) { printf("signbit(NAN) = %d\n", signbit(NAN)); printf("signbit(-NAN) = %d\n", signbit(-NAN)); printf("copysign(1.0, NAN) = %g\n", copysign(1.0, NAN)); printf("copysign(1.0, -NAN) = %g\n", copysign(1.0, -NAN)); return 0; } [~]$ gcc -std=c99 -o foo foo.c -lm [~]$ ./foo signbit(NAN) = 0 signbit(-NAN) = 1 copysign(1.0, NAN) = 1 copysign(1.0, -NAN) = -1
Hmm, signbit(NAN) = 0 signbit(-NAN) = -2147483648 copysign(1.0, NAN) = 1 copysign(1.0, -NAN) = -1 signbit(0.0) = 0 signbit(-0.0) = -2147483648 copysign(1.0, 0.0) = 1 copysign(1.0, -0.0) = -1 Looking at the standard, signbit is only required to return a non-zero value for negatives. I think we need to be more explicit for numpy. How about 1? Chuck
data:image/s3,"s3://crabby-images/e4aa6/e4aa6e420ae6ff6dcb338785e846cb1efd9d677a" alt=""
On Mon, Sep 29, 2008 at 4:28 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Mon, Sep 29, 2008 at 17:13, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, Sep 29, 2008 at 3:52 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
Hi All,
I've been cleaning up the ufunc loops and the sign function currently doesn't have a defined behavior for nans. This makes the results depend
on
the order/type of comparisons in the code, which looks fragile to me. So what should it return? I vote for nan but am open for suggestions.
And while we're at it, lets decide how to treat max/min when nans are involved. Or should we just say the behavior is undefined.
When feasible, I would like float(s)->float functions to return NaN when given a NaN as an argument. At least as the main versions of the function. Specific NaN-ignoring functions can also be introduced, but as separate functions. I don't know what exactly to do about float->int functions (e.g. argmin). I also don't know how these should interact with the current seterr() state.
OK, maximum, minimum, and sign now return NaN. I still don't know what to do for the complex cases, although I suspect they should do the same on the principal that if either the real or imaginary parts are NaN then the number is undefined. Chuck
data:image/s3,"s3://crabby-images/c4c8c/c4c8c9ee578d359a3234c68c5656728c7c864441" alt=""
On Mon, Sep 29, 2008 at 20:22, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, Sep 29, 2008 at 4:28 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Mon, Sep 29, 2008 at 17:13, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, Sep 29, 2008 at 3:52 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
Hi All,
I've been cleaning up the ufunc loops and the sign function currently doesn't have a defined behavior for nans. This makes the results depend on the order/type of comparisons in the code, which looks fragile to me. So what should it return? I vote for nan but am open for suggestions.
And while we're at it, lets decide how to treat max/min when nans are involved. Or should we just say the behavior is undefined.
When feasible, I would like float(s)->float functions to return NaN when given a NaN as an argument. At least as the main versions of the function. Specific NaN-ignoring functions can also be introduced, but as separate functions. I don't know what exactly to do about float->int functions (e.g. argmin). I also don't know how these should interact with the current seterr() state.
OK, maximum, minimum, and sign now return NaN.
Oops! F.9.9.2 The fmax functions 1 If just one argument is a NaN, the fmax functions return the other argument (if both arguments are NaNs, the functions return a NaN). 2 The body of the fmax function might be {return (isgreaterequal(x, y) || isnan(y)) ? x : y; } If we want to follow C99 semantics rather than our own NaN-always-propagates semantics, then we should do this instead. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
data:image/s3,"s3://crabby-images/e4aa6/e4aa6e420ae6ff6dcb338785e846cb1efd9d677a" alt=""
On Mon, Sep 29, 2008 at 11:20 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Mon, Sep 29, 2008 at 20:22, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, Sep 29, 2008 at 4:28 PM, Robert Kern <robert.kern@gmail.com>
wrote:
On Mon, Sep 29, 2008 at 17:13, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, Sep 29, 2008 at 3:52 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
Hi All,
I've been cleaning up the ufunc loops and the sign function currently doesn't have a defined behavior for nans. This makes the results
depend
on the order/type of comparisons in the code, which looks fragile to me. So what should it return? I vote for nan but am open for suggestions.
And while we're at it, lets decide how to treat max/min when nans are involved. Or should we just say the behavior is undefined.
When feasible, I would like float(s)->float functions to return NaN when given a NaN as an argument. At least as the main versions of the function. Specific NaN-ignoring functions can also be introduced, but as separate functions. I don't know what exactly to do about float->int functions (e.g. argmin). I also don't know how these should interact with the current seterr() state.
OK, maximum, minimum, and sign now return NaN.
Oops!
F.9.9.2 The fmax functions 1 If just one argument is a NaN, the fmax functions return the other argument (if both arguments are NaNs, the functions return a NaN). 2 The body of the fmax function might be {return (isgreaterequal(x, y) || isnan(y)) ? x : y; }
If we want to follow C99 semantics rather than our own NaN-always-propagates semantics, then we should do this instead.
That would have the virtue that amax/amin would return the actual max and min of the non nan numbers. Unless all of them were nans, in which case we would get the starting value, which should probabably be nan in this case ;) Hey, that would work nicely. So, I think I should use the comparison macros for the floats. However, if I read correctly, they don't raise exceptions. Would that be a problem? Chuck
data:image/s3,"s3://crabby-images/bec8e/bec8ecae7d7b04c697ee6e83a98da565062b6105" alt=""
On Tue, Sep 30, 2008 at 1:20 AM, Robert Kern <robert.kern@gmail.com> wrote:
F.9.9.2 The fmax functions 1 If just one argument is a NaN, the fmax functions return the other argument (if both arguments are NaNs, the functions return a NaN). 2 The body of the fmax function might be {return (isgreaterequal(x, y) || isnan(y)) ? x : y; }
If we want to follow C99 semantics rather than our own NaN-always-propagates semantics, then we should do this instead.
+1 for NaN-always-propagates since we have explicit variants for the alternative semantics. Users are more likely to remember that "NaNs always propagate" than "as stated in the C99 standard...". -- Nathan Bell wnbell@gmail.com http://graphics.cs.uiuc.edu/~wnbell/
data:image/s3,"s3://crabby-images/1a93b/1a93b5e9663e7b8870fb78b607a558d9cd4b94cb" alt=""
Nathan Bell wrote:
+1 for NaN-always-propagates since we have explicit variants for the alternative semantics.
Users are more likely to remember that "NaNs always propagate" than "as stated in the C99 standard...".
I don't know. I would like to agree with you, but OTOH, starting to go against the C99 standard may bring us quite far. FWIW, matlab has the same behavior as mandated by C99 (but not R, by default). The problem I have with fmax is that: - isgreaterequal may be slow ? May well be red-herring. - I guess isgreaterequal is not available on windows with MS compilers - we can't detect NaN with FPE_INVALID (since by definition isgreaterequal never raises it), hence can't detect it with seterr. This is starting to get mind-blowing... cheers, David
data:image/s3,"s3://crabby-images/e4aa6/e4aa6e420ae6ff6dcb338785e846cb1efd9d677a" alt=""
On Mon, Sep 29, 2008 at 11:56 PM, David Cournapeau < david@ar.media.kyoto-u.ac.jp> wrote:
Nathan Bell wrote:
+1 for NaN-always-propagates since we have explicit variants for the alternative semantics.
Users are more likely to remember that "NaNs always propagate" than "as stated in the C99 standard...".
I don't know. I would like to agree with you, but OTOH, starting to go against the C99 standard may bring us quite far. FWIW, matlab has the same behavior as mandated by C99 (but not R, by default).
The problem I have with fmax is that: - isgreaterequal may be slow ? May well be red-herring.
For gcc, isgreaterequal should be the same as >= the way numpy is set up now, i.e., no errors raised. Looking at the assembly it all looked pretty clean with both numbers stored in the FPU. What I couldn't figure is why every compare needed to be done afresh, i.e. a > b and a == b had to run two compares, this on variables declared constant. I'm kind of inclined to follow the C standard as I figure there is already a lot of discussion behind it and it would be kind of a waste to hash it all out again on this list. Chuck
data:image/s3,"s3://crabby-images/c4c8c/c4c8c9ee578d359a3234c68c5656728c7c864441" alt=""
On Tue, Sep 30, 2008 at 00:46, Nathan Bell <wnbell@gmail.com> wrote:
On Tue, Sep 30, 2008 at 1:20 AM, Robert Kern <robert.kern@gmail.com> wrote:
F.9.9.2 The fmax functions 1 If just one argument is a NaN, the fmax functions return the other argument (if both arguments are NaNs, the functions return a NaN). 2 The body of the fmax function might be {return (isgreaterequal(x, y) || isnan(y)) ? x : y; }
If we want to follow C99 semantics rather than our own NaN-always-propagates semantics, then we should do this instead.
+1 for NaN-always-propagates since we have explicit variants for the alternative semantics.
Users are more likely to remember that "NaNs always propagate" than "as stated in the C99 standard...".
OTOH, Python 2.6 and up will be following to the C99 standard as closely as possible. I would prefer to keep up with them. It's true that "as stated in the C99 standard" is more difficult to remember, but "NaNs always propagate" is probably not going to be consistent with everything we actually implement, no matter how hard we try. Regardless of what policy we use, the best thing is to have a consistent implementation on every platform and a comprehensive test suite to ensure that. Then, a user doesn't have to remember; they just try it out in the interpreter and see what it does. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
data:image/s3,"s3://crabby-images/e4aa6/e4aa6e420ae6ff6dcb338785e846cb1efd9d677a" alt=""
On Tue, Sep 30, 2008 at 12:24 AM, Robert Kern <robert.kern@gmail.com> wrote:
On Tue, Sep 30, 2008 at 00:46, Nathan Bell <wnbell@gmail.com> wrote:
On Tue, Sep 30, 2008 at 1:20 AM, Robert Kern <robert.kern@gmail.com> wrote:
F.9.9.2 The fmax functions 1 If just one argument is a NaN, the fmax functions return the other argument (if both arguments are NaNs, the functions return a NaN). 2 The body of the fmax function might be {return (isgreaterequal(x, y) || isnan(y)) ? x : y; }
If we want to follow C99 semantics rather than our own NaN-always-propagates semantics, then we should do this instead.
+1 for NaN-always-propagates since we have explicit variants for the alternative semantics.
Users are more likely to remember that "NaNs always propagate" than "as stated in the C99 standard...".
OTOH, Python 2.6 and up will be following to the C99 standard as closely as possible. I would prefer to keep up with them. It's true that "as stated in the C99 standard" is more difficult to remember, but "NaNs always propagate" is probably not going to be consistent with everything we actually implement, no matter how hard we try.
I wonder how much of the Python stuff we can steal^W borrow. I assume the Python license is compatible with numpy? Chuck
data:image/s3,"s3://crabby-images/c4c8c/c4c8c9ee578d359a3234c68c5656728c7c864441" alt=""
On Tue, Sep 30, 2008 at 01:34, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Tue, Sep 30, 2008 at 12:24 AM, Robert Kern <robert.kern@gmail.com> wrote:
OTOH, Python 2.6 and up will be following to the C99 standard as closely as possible. I would prefer to keep up with them. It's true that "as stated in the C99 standard" is more difficult to remember, but "NaNs always propagate" is probably not going to be consistent with everything we actually implement, no matter how hard we try.
I wonder how much of the Python stuff we can steal^W borrow. I assume the Python license is compatible with numpy?
Yeah. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
data:image/s3,"s3://crabby-images/8a52f/8a52f622fca521cbf6fbeca2ff158cee3a90c638" alt=""
I've seen no mention in this thread of IEEE Std 754-2008 which was published last month. minNum(x, y) and maxNum(x, y) return a floating-point number if the other argument is NaN. The OP's question is about the sign of NaN. In 754r it can be tested, copied, etc. Operations involving NaNs propagate the payload but not the sign bit AFAIK. -- Pete Forman -./\.- Disclaimer: This post is originated WesternGeco -./\.- by myself and does not represent pete.forman@westerngeco.com -./\.- the opinion of Schlumberger or http://petef.22web.net -./\.- WesternGeco.
data:image/s3,"s3://crabby-images/e4aa6/e4aa6e420ae6ff6dcb338785e846cb1efd9d677a" alt=""
On Tue, Sep 30, 2008 at 5:10 AM, Pete Forman <pete.forman@westerngeco.com>wrote:
I've seen no mention in this thread of IEEE Std 754-2008 which was published last month. minNum(x, y) and maxNum(x, y) return a floating-point number if the other argument is NaN.
The OP's question is about the sign of NaN. In 754r it can be tested, copied, etc. Operations involving NaNs propagate the payload but not the sign bit AFAIK.
OK, here is what is looks like to me at the moment given that numpy requires an IEEE754 machine: - We need a reliable value for NAN. This is perhaps best done by using a union and explicitly twiddling the bits depending on the endian choice of the architecture. For architectures with foobar extended precision we don't worry about supporting the extended precision. The result can be tested, perhaps when numpy is loaded. We could possibly get the value from python. What happens to PPC? - Max/min follow the IEEE standard. Given a choice of nan/non-nan, return non-nan. This can be extended to complex numbers where the choice is based on the real parts unless they are equal or both nans, in which case the decision is made on the imaginary parts. - Signbit returns the value of the signbit function, but nonzero values are set to 1. - I am unsure of sign. Should it return signed zeros? Should it return nan for nan or return the sign of the nan? I am inclined towards returning nan. Chuck
data:image/s3,"s3://crabby-images/8a52f/8a52f622fca521cbf6fbeca2ff158cee3a90c638" alt=""
"Charles R Harris" <charlesr.harris@gmail.com> writes:
OK, here is what is looks like to me at the moment given that numpy requires an IEEE754 machine:
o We need a reliable value for NAN. [...]
o Max/min follow the IEEE standard. Given a choice of nan/non-nan, return non-nan. [...]
Yes, that follows 754r and C99.
o Signbit returns the value of the signbit function, but nonzero values are set to 1.
Looks okay to me.
o I am unsure of sign. Should it return signed zeros? Should it return nan for nan or return the sign of the nan? I am inclined towards returning nan.
How is sign used? If it is in x * sign(y) then it might be better to use copysign(x, y) which is well defined even with signed zeros and NaNs. It depends on whether you want special behavior when y is zero. In copysign y being 0 or +0 is considered positive, so x is returned. So you could use this as a specification. def sign(y): if y == 0: # True for -0 and +0 too return 0 # or perhaps return y else return copysign(1, y) Your inclination leads to this. def sign(y): if y == 0 or isnan(y): return y else return copysign(1, y) The better choice will be governed by how sign is used in practice. -- Pete Forman -./\.- Disclaimer: This post is originated WesternGeco -./\.- by myself and does not represent pete.forman@westerngeco.com -./\.- the opinion of Schlumberger or http://petef.22web.net -./\.- WesternGeco.
data:image/s3,"s3://crabby-images/e4aa6/e4aa6e420ae6ff6dcb338785e846cb1efd9d677a" alt=""
On Tue, Sep 30, 2008 at 7:42 AM, Pete Forman <pete.forman@westerngeco.com>wrote:
"Charles R Harris" <charlesr.harris@gmail.com> writes:
OK, here is what is looks like to me at the moment given that numpy requires an IEEE754 machine:
o We need a reliable value for NAN. [...]
o Max/min follow the IEEE standard. Given a choice of nan/non-nan, return non-nan. [...]
Yes, that follows 754r and C99.
o Signbit returns the value of the signbit function, but nonzero values are set to 1.
Looks okay to me.
o I am unsure of sign. Should it return signed zeros? Should it return nan for nan or return the sign of the nan? I am inclined towards returning nan.
How is sign used? If it is in x * sign(y) then it might be better to use copysign(x, y) which is well defined even with signed zeros and NaNs. It depends on whether you want special behavior when y is zero. In copysign y being 0 or +0 is considered positive, so x is returned.
So you could use this as a specification.
def sign(y): if y == 0: # True for -0 and +0 too return 0 # or perhaps return y else return copysign(1, y)
Your inclination leads to this.
def sign(y): if y == 0 or isnan(y): return y else return copysign(1, y)
I'm leaning towards the first at the moment. I would prefer the signed zero also, but that might actually break some code so probably the safe near term choice is the unsigned zero. For max/min I am going to introduce new ufuncs, fmax/fmin, which return numbers unless both arguements are nan. The current maximum/minimum functions will return nan if either arguement is a nan. How these might integrated into the max/min ndarray methods can be left to the future. Chuck
participants (5)
-
Charles R Harris
-
David Cournapeau
-
Nathan Bell
-
Pete Forman
-
Robert Kern