[Numpy-discussion] comparing arrays with NaN in them.
Christopher Barker
Chris.Barker at noaa.gov
Fri Aug 24 12:08:05 EDT 2007
Matthieu Brucher wrote:
> 2007/8/24, mark <markbak at gmail.com <mailto:markbak at gmail.com>>:
> There may be multiple nan-s, but what Chris did is simply create one
> with the same nan's
>
> >>> a = N.array((1,2,3,N.nan))
> >>> b = N.array((1,2,3,N.nan))
>
> I think these should be the same.
I'm the OP, but It depends what you mean by "the same". Yes, these two
arrays are the same, and that's what I want to test for in this case.
However, in the mathematical sense, I do understand what NaN == NaN
should be false -- if you're doing math, those NaN's could have been
arrived at by very different calculations, so you really wouldn't want
them to compare equal, so the IEEE standard that NaN does not compare
equal to anything makes sense to me.
However, what I'm doing is testing to make sure I got the result I
expected, so I want to know if two arrays are the same, including NaN's
in the same places. If I wasn't working with an array package, I guess
I'd be testing for NaN specifically where I expect it, so the solution I
came up with before makes the most sense:
N.alltrue(a[~N.isnan(a)] == b[~N.isnan(b)])
However, it's not likely, but that could give a true result if the NaN's
were in different places, but there were the same number and everything
happened to work out right. So maybe there is a need for a:
nanequal, to go with:
nanargmax
nanargmin
nanmax
nanmin
nansum
> You can have several different NaN,
You can? I thought NaN was defined by IEEE 754 as a particular bit
pattern (one for each precision, anyway).
Warren Focke wrote:
> Maybe something with masked arrays?
In this case, I'm using NaN to mean: "no valid data", so masked arrays
are probably a better solution anyway. However, I like the simplicity of
storing a non-value in the same binary array.
However, if I do go with masked arrays:
What's the status of the two masked array implementations? Which should
I use? Unless there are huge feature differences (which I don't think
there are), then I want to use the one that's going to get maintained
into the future -- do we know yet which that will be?
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker at noaa.gov
More information about the NumPy-Discussion
mailing list