[Numpy-discussion] [Cdat-discussion] Arrays containing NaNs
Charles Doutriaux
doutriaux1 at llnl.gov
Fri Jul 25 11:53:41 EDT 2008
Hi Bruce,
Thx for the reply, we're aware of this, basically the question was why
not mask NaN automatically when creating a nump.ma array?
C.
Bruce Southey wrote:
> Charles Doutriaux wrote:
>
>> Hi Stephane,
>>
>> This is a good suggestion, I'm ccing the numpy list on this. Because I'm
>> wondering if it wouldn't be a better fit to do it directly at the
>> numpy.ma level.
>>
>> I'm sure they already thought about this (and 'inf' values as well) and
>> if they don't do it , there's probably some good reason we didn't think
>> of yet.
>> So before i go ahead and do it in MV2 I'd like to know the reason why
>> it's not in numpy.ma, they are probably valid for MVs too.
>>
>> C.
>>
>> Stephane Raynaud wrote:
>>
>>
>>> Hi,
>>>
>>> how about automatically (or at least optionally) masking all NaN
>>> values when creating a MV array?
>>>
>>> On Thu, Jul 24, 2008 at 11:43 PM, Arthur M. Greene
>>> <amg at iri.columbia.edu <mailto:amg at iri.columbia.edu>> wrote:
>>>
>>> Yup, this works. Thanks!
>>>
>>> I guess it's time for me to dig deeper into numpy syntax and
>>> functions, now that CDAT is using the numpy core for array
>>> management...
>>>
>>> Best,
>>>
>>> Arthur
>>>
>>>
>>> Charles Doutriaux wrote:
>>>
>>> Seems right to me,
>>>
>>> Except that the syntax might scare a bit the new users :)
>>>
>>> C.
>>>
>>> Andrew.Dawson at uea.ac.uk <mailto:Andrew.Dawson at uea.ac.uk> wrote:
>>>
>>> Hi,
>>>
>>> I'm not sure if what I am about to suggest is a good idea
>>> or not, perhaps Charles will correct me if this is a bad
>>> idea for any reason.
>>>
>>> Lets say you have a cdms variable called U with NaNs as
>>> the missing
>>> value. First we can replace the NaNs with 1e20:
>>>
>>> U.data[numpy.where(numpy.isnan(U.data))] = 1e20
>>>
>>> And remember to set the missing value of the variable
>>> appropriately:
>>>
>>> U.setMissing(1e20)
>>>
>>> I hope that helps, Andrew
>>>
>>>
>>>
>>> Hi Arthur,
>>>
>>> If i remember correctly the way i used to do it was:
>>> a= MV2.greater(data,1.) b=MV2.less_equal(data,1)
>>> c=MV2.logical_and(a,b) # Nan are the only one left
>>> data=MV2.masked_where(c,data)
>>>
>>> BUT I believe numpy now has way to deal with nan I
>>> believe it is numpy.nan_to_num But it replaces with 0
>>> so it may not be what you
>>> want
>>>
>>> C.
>>>
>>>
>>> Arthur M. Greene wrote:
>>>
>>> A typical netcdf file is opened, and the single
>>> variable extracted:
>>>
>>>
>>> fpr=cdms.open('prTS2p1_SEA_allmos.cdf')
>>> pr0=fpr('prcp') type(pr0)
>>>
>>> <class 'cdms2.tvariable.TransientVariable'>
>>>
>>> Masked values (indicating ocean in this case) show
>>> up here as NaNs.
>>>
>>>
>>> pr0[0,-15:-5,0]
>>>
>>> prcp array([NaN NaN NaN NaN NaN NaN 0.37745094
>>> 0.3460784 0.21960783 0.19117641])
>>>
>>> So far this is all consistent. A map of the first
>>> time step shows the proper land-ocean boundaries,
>>> reasonable-looking values, and so on. But there
>>> doesn't seem to be any way to mask
>>> this array, so, e.g., an 'xy' average can be
>>> computed (it
>>> comes out all nans). NaN is not equal to anything
>>> -- even
>>> itself -- so there does not seem to be any
>>> condition, among the
>>> MV.masked_xxx options, that can be applied as a
>>> test. Also, it
>>> does not seem possible to compute seasonal averages,
>>> anomalies, etc. -- they also produce just NaNs.
>>>
>>> The workaround I've come up with -- for now -- is
>>> to first generate a new array of identical shape,
>>> filled with 1.0E+20. One test I've found that can
>>> detect NaNs is numpy.isnan:
>>>
>>>
>>> isnan(pr0[0,0,0])
>>>
>>> True
>>>
>>> So it is _possible_ to tediously loop through
>>> every value in the old array, testing with isnan,
>>> then copying to the new array if the test fails.
>>> Then the axes have to be reset...
>>>
>>> isnan does not accept array arguments, so one
>>> cannot do, e.g.,
>>>
>>> prmasked=MV.masked_where(isnan(pr0),pr0)
>>>
>>> The element-by-element conversion is quite slow.
>>> (I'm still waiting for it to complete, in fact).
>>> Any suggestions for dealing with NaN-infested data
>>> objects?
>>>
>>> Thanks!
>>>
>>> AMG
>>>
>>> P.S. This is 5.0.0.beta, RHEL4.
>>>
>>>
>>> *^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*
>>> Arthur M. Greene, Ph.D.
>>> The International Research Institute for Climate and Society
>>> The Earth Institute, Columbia University, Lamont Campus
>>> Monell Building, 61 Route 9W, Palisades, NY 10964-8000 USA
>>> amg*at*iri-dot-columbia\dot\edu | http:// iri.columbia.edu
>>> *^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*
>>>
>>>
>>> -------------------------------------------------------------------------
>>> This SF.Net email is sponsored by the Moblin Your Move Developer's
>>> challenge
>>> Build the coolest Linux based applications with Moblin SDK & win
>>> great prizes
>>> Grand prize is a trip for two to an Open Source event anywhere in
>>> the world
>>> http:// moblin-contest.org/redirect.php?banner_id=100&url=/
>>> <http:// moblin-contest.org/redirect.php?banner_id=100&url=/>
>>> _______________________________________________
>>> Cdat-discussion mailing list
>>> Cdat-discussion at lists.sourceforge.net
>>> <mailto:Cdat-discussion at lists.sourceforge.net>
>>> https:// lists.sourceforge.net/lists/listinfo/cdat-discussion
>>>
>>>
>>>
>>>
>>> --
>>> Stephane Raynaud
>>> ------------------------------------------------------------------------
>>>
>>> -------------------------------------------------------------------------
>>> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
>>> Build the coolest Linux based applications with Moblin SDK & win great prizes
>>> Grand prize is a trip for two to an Open Source event anywhere in the world
>>> http:// moblin-contest.org/redirect.php?banner_id=100&url=/
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Cdat-discussion mailing list
>>> Cdat-discussion at lists.sourceforge.net
>>> https:// lists.sourceforge.net/lists/listinfo/cdat-discussion
>>>
>>>
>>>
>> _______________________________________________
>> Numpy-discussion mailing list
>> Numpy-discussion at scipy.org
>> http:// projects.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>>
> Please look the various NumPy functions to ignore NaN like nansum(). See
> the NumPy example list
> (http:// www. scipy.org/Numpy_Example_List_With_Doc) for examples under
> nan or individual functions.
>
> To get the mean you can do something like:
>
> import numpy
> x = numpy.array([2, numpy.nan, 1])
> numpy.nansum(x)/(x.shape[0]-numpy.isnan(x).sum())
> x_masked = numpy.ma.masked_where(numpy.isnan(x) , x)
> x_masked.mean()
>
> The real advantage of masked arrays is that you have greater control
> over the filtering so you can also filter extreme values:
>
> y = numpy.array([2, numpy.nan, 1, 1000])
> y_masked =numpy.ma.masked_where(numpy.isnan(y) , y)
> y_masked =numpy.ma.masked_where(y_masked > 100 , y_masked)
> y_masked.mean()
>
> Regards
> Bruce
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http:// projects.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
More information about the NumPy-Discussion
mailing list