Re: [Numpydiscussion] [Cdatdiscussion] Arrays containing NaNs
Hi Stephane, This is a good suggestion, I'm ccing the numpy list on this. Because I'm wondering if it wouldn't be a better fit to do it directly at the numpy.ma level. I'm sure they already thought about this (and 'inf' values as well) and if they don't do it , there's probably some good reason we didn't think of yet. So before i go ahead and do it in MV2 I'd like to know the reason why it's not in numpy.ma, they are probably valid for MVs too. C. Stephane Raynaud wrote:
Hi,
how about automatically (or at least optionally) masking all NaN values when creating a MV array?
On Thu, Jul 24, 2008 at 11:43 PM, Arthur M. Greene <amg@iri.columbia.edu <mailto:amg@iri.columbia.edu>> wrote:
Yup, this works. Thanks!
I guess it's time for me to dig deeper into numpy syntax and functions, now that CDAT is using the numpy core for array management...
Best,
Arthur
Charles Doutriaux wrote:
Seems right to me,
Except that the syntax might scare a bit the new users :)
C.
Andrew.Dawson@uea.ac.uk <mailto:Andrew.Dawson@uea.ac.uk> wrote:
Hi,
I'm not sure if what I am about to suggest is a good idea or not, perhaps Charles will correct me if this is a bad idea for any reason.
Lets say you have a cdms variable called U with NaNs as the missing value. First we can replace the NaNs with 1e20:
U.data[numpy.where(numpy.isnan(U.data))] = 1e20
And remember to set the missing value of the variable appropriately:
U.setMissing(1e20)
I hope that helps, Andrew
Hi Arthur,
If i remember correctly the way i used to do it was: a= MV2.greater(data,1.) b=MV2.less_equal(data,1) c=MV2.logical_and(a,b) # Nan are the only one left data=MV2.masked_where(c,data)
BUT I believe numpy now has way to deal with nan I believe it is numpy.nan_to_num But it replaces with 0 so it may not be what you want
C.
Arthur M. Greene wrote:
A typical netcdf file is opened, and the single variable extracted:
fpr=cdms.open('prTS2p1_SEA_allmos.cdf') pr0=fpr('prcp') type(pr0)
<class 'cdms2.tvariable.TransientVariable'>
Masked values (indicating ocean in this case) show up here as NaNs.
pr0[0,15:5,0]
prcp array([NaN NaN NaN NaN NaN NaN 0.37745094 0.3460784 0.21960783 0.19117641])
So far this is all consistent. A map of the first time step shows the proper landocean boundaries, reasonablelooking values, and so on. But there doesn't seem to be any way to mask this array, so, e.g., an 'xy' average can be computed (it comes out all nans). NaN is not equal to anything  even itself  so there does not seem to be any condition, among the MV.masked_xxx options, that can be applied as a test. Also, it does not seem possible to compute seasonal averages, anomalies, etc.  they also produce just NaNs.
The workaround I've come up with  for now  is to first generate a new array of identical shape, filled with 1.0E+20. One test I've found that can detect NaNs is numpy.isnan:
isnan(pr0[0,0,0])
True
So it is _possible_ to tediously loop through every value in the old array, testing with isnan, then copying to the new array if the test fails. Then the axes have to be reset...
isnan does not accept array arguments, so one cannot do, e.g.,
prmasked=MV.masked_where(isnan(pr0),pr0)
The elementbyelement conversion is quite slow. (I'm still waiting for it to complete, in fact). Any suggestions for dealing with NaNinfested data objects?
Thanks!
AMG
P.S. This is 5.0.0.beta, RHEL4.
*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~* Arthur M. Greene, Ph.D. The International Research Institute for Climate and Society The Earth Institute, Columbia University, Lamont Campus Monell Building, 61 Route 9W, Palisades, NY 109648000 USA amg*at*iridotcolumbia\dot\edu  http://iri.columbia.edu *^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*
 This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblincontest.org/redirect.php?banner_id=100&url=/ <http://moblincontest.org/redirect.php?banner_id=100&url=/> _______________________________________________ Cdatdiscussion mailing list Cdatdiscussion@lists.sourceforge.net <mailto:Cdatdiscussion@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/cdatdiscussion
 Stephane Raynaud 
 This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http:// moblincontest.org/redirect.php?banner_id=100&url=/ 
_______________________________________________ Cdatdiscussion mailing list Cdatdiscussion@lists.sourceforge.net https:// lists.sourceforge.net/lists/listinfo/cdatdiscussion
Charles Doutriaux wrote:
Hi Stephane,
This is a good suggestion, I'm ccing the numpy list on this. Because I'm wondering if it wouldn't be a better fit to do it directly at the numpy.ma level.
I'm sure they already thought about this (and 'inf' values as well) and if they don't do it , there's probably some good reason we didn't think of yet. So before i go ahead and do it in MV2 I'd like to know the reason why it's not in numpy.ma, they are probably valid for MVs too.
C.
Stephane Raynaud wrote:
Hi,
how about automatically (or at least optionally) masking all NaN values when creating a MV array?
On Thu, Jul 24, 2008 at 11:43 PM, Arthur M. Greene <amg@iri.columbia.edu <mailto:amg@iri.columbia.edu>> wrote:
Yup, this works. Thanks!
I guess it's time for me to dig deeper into numpy syntax and functions, now that CDAT is using the numpy core for array management...
Best,
Arthur
Charles Doutriaux wrote:
Seems right to me,
Except that the syntax might scare a bit the new users :)
C.
Andrew.Dawson@uea.ac.uk <mailto:Andrew.Dawson@uea.ac.uk> wrote:
Hi,
I'm not sure if what I am about to suggest is a good idea or not, perhaps Charles will correct me if this is a bad idea for any reason.
Lets say you have a cdms variable called U with NaNs as the missing value. First we can replace the NaNs with 1e20:
U.data[numpy.where(numpy.isnan(U.data))] = 1e20
And remember to set the missing value of the variable appropriately:
U.setMissing(1e20)
I hope that helps, Andrew
Hi Arthur,
If i remember correctly the way i used to do it was: a= MV2.greater(data,1.) b=MV2.less_equal(data,1) c=MV2.logical_and(a,b) # Nan are the only one left data=MV2.masked_where(c,data)
BUT I believe numpy now has way to deal with nan I believe it is numpy.nan_to_num But it replaces with 0 so it may not be what you want
C.
Arthur M. Greene wrote:
A typical netcdf file is opened, and the single variable extracted:
fpr=cdms.open('prTS2p1_SEA_allmos.cdf') pr0=fpr('prcp') type(pr0)
<class 'cdms2.tvariable.TransientVariable'>
Masked values (indicating ocean in this case) show up here as NaNs.
pr0[0,15:5,0]
prcp array([NaN NaN NaN NaN NaN NaN 0.37745094 0.3460784 0.21960783 0.19117641])
So far this is all consistent. A map of the first time step shows the proper landocean boundaries, reasonablelooking values, and so on. But there doesn't seem to be any way to mask this array, so, e.g., an 'xy' average can be computed (it comes out all nans). NaN is not equal to anything  even itself  so there does not seem to be any condition, among the MV.masked_xxx options, that can be applied as a test. Also, it does not seem possible to compute seasonal averages, anomalies, etc.  they also produce just NaNs.
The workaround I've come up with  for now  is to first generate a new array of identical shape, filled with 1.0E+20. One test I've found that can detect NaNs is numpy.isnan:
isnan(pr0[0,0,0])
True
So it is _possible_ to tediously loop through every value in the old array, testing with isnan, then copying to the new array if the test fails. Then the axes have to be reset...
isnan does not accept array arguments, so one cannot do, e.g.,
prmasked=MV.masked_where(isnan(pr0),pr0)
The elementbyelement conversion is quite slow. (I'm still waiting for it to complete, in fact). Any suggestions for dealing with NaNinfested data objects?
Thanks!
AMG
P.S. This is 5.0.0.beta, RHEL4.
*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~* Arthur M. Greene, Ph.D. The International Research Institute for Climate and Society The Earth Institute, Columbia University, Lamont Campus Monell Building, 61 Route 9W, Palisades, NY 109648000 USA amg*at*iridotcolumbia\dot\edu  http://iri.columbia.edu *^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*
 This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblincontest.org/redirect.php?banner_id=100&url=/ <http://moblincontest.org/redirect.php?banner_id=100&url=/> _______________________________________________ Cdatdiscussion mailing list Cdatdiscussion@lists.sourceforge.net <mailto:Cdatdiscussion@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/cdatdiscussion
 Stephane Raynaud 
 This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http:// moblincontest.org/redirect.php?banner_id=100&url=/ 
_______________________________________________ Cdatdiscussion mailing list Cdatdiscussion@lists.sourceforge.net https:// lists.sourceforge.net/lists/listinfo/cdatdiscussion
_______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
Please look the various NumPy functions to ignore NaN like nansum(). See the NumPy example list (http://www.scipy.org/Numpy_Example_List_With_Doc) for examples under nan or individual functions. To get the mean you can do something like: import numpy x = numpy.array([2, numpy.nan, 1]) numpy.nansum(x)/(x.shape[0]numpy.isnan(x).sum()) x_masked = numpy.ma.masked_where(numpy.isnan(x) , x) x_masked.mean() The real advantage of masked arrays is that you have greater control over the filtering so you can also filter extreme values: y = numpy.array([2, numpy.nan, 1, 1000]) y_masked =numpy.ma.masked_where(numpy.isnan(y) , y) y_masked =numpy.ma.masked_where(y_masked > 100 , y_masked) y_masked.mean() Regards Bruce
Hi Bruce, Thx for the reply, we're aware of this, basically the question was why not mask NaN automatically when creating a nump.ma array? C. Bruce Southey wrote:
Charles Doutriaux wrote:
Hi Stephane,
This is a good suggestion, I'm ccing the numpy list on this. Because I'm wondering if it wouldn't be a better fit to do it directly at the numpy.ma level.
I'm sure they already thought about this (and 'inf' values as well) and if they don't do it , there's probably some good reason we didn't think of yet. So before i go ahead and do it in MV2 I'd like to know the reason why it's not in numpy.ma, they are probably valid for MVs too.
C.
Stephane Raynaud wrote:
Hi,
how about automatically (or at least optionally) masking all NaN values when creating a MV array?
On Thu, Jul 24, 2008 at 11:43 PM, Arthur M. Greene <amg@iri.columbia.edu <mailto:amg@iri.columbia.edu>> wrote:
Yup, this works. Thanks!
I guess it's time for me to dig deeper into numpy syntax and functions, now that CDAT is using the numpy core for array management...
Best,
Arthur
Charles Doutriaux wrote:
Seems right to me,
Except that the syntax might scare a bit the new users :)
C.
Andrew.Dawson@uea.ac.uk <mailto:Andrew.Dawson@uea.ac.uk> wrote:
Hi,
I'm not sure if what I am about to suggest is a good idea or not, perhaps Charles will correct me if this is a bad idea for any reason.
Lets say you have a cdms variable called U with NaNs as the missing value. First we can replace the NaNs with 1e20:
U.data[numpy.where(numpy.isnan(U.data))] = 1e20
And remember to set the missing value of the variable appropriately:
U.setMissing(1e20)
I hope that helps, Andrew
Hi Arthur,
If i remember correctly the way i used to do it was: a= MV2.greater(data,1.) b=MV2.less_equal(data,1) c=MV2.logical_and(a,b) # Nan are the only one left data=MV2.masked_where(c,data)
BUT I believe numpy now has way to deal with nan I believe it is numpy.nan_to_num But it replaces with 0 so it may not be what you want
C.
Arthur M. Greene wrote:
A typical netcdf file is opened, and the single variable extracted:
fpr=cdms.open('prTS2p1_SEA_allmos.cdf') pr0=fpr('prcp') type(pr0)
<class 'cdms2.tvariable.TransientVariable'>
Masked values (indicating ocean in this case) show up here as NaNs.
pr0[0,15:5,0]
prcp array([NaN NaN NaN NaN NaN NaN 0.37745094 0.3460784 0.21960783 0.19117641])
So far this is all consistent. A map of the first time step shows the proper landocean boundaries, reasonablelooking values, and so on. But there doesn't seem to be any way to mask this array, so, e.g., an 'xy' average can be computed (it comes out all nans). NaN is not equal to anything  even itself  so there does not seem to be any condition, among the MV.masked_xxx options, that can be applied as a test. Also, it does not seem possible to compute seasonal averages, anomalies, etc.  they also produce just NaNs.
The workaround I've come up with  for now  is to first generate a new array of identical shape, filled with 1.0E+20. One test I've found that can detect NaNs is numpy.isnan:
isnan(pr0[0,0,0])
True
So it is _possible_ to tediously loop through every value in the old array, testing with isnan, then copying to the new array if the test fails. Then the axes have to be reset...
isnan does not accept array arguments, so one cannot do, e.g.,
prmasked=MV.masked_where(isnan(pr0),pr0)
The elementbyelement conversion is quite slow. (I'm still waiting for it to complete, in fact). Any suggestions for dealing with NaNinfested data objects?
Thanks!
AMG
P.S. This is 5.0.0.beta, RHEL4.
*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~* Arthur M. Greene, Ph.D. The International Research Institute for Climate and Society The Earth Institute, Columbia University, Lamont Campus Monell Building, 61 Route 9W, Palisades, NY 109648000 USA amg*at*iridotcolumbia\dot\edu  http:// iri.columbia.edu *^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*
 This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http:// moblincontest.org/redirect.php?banner_id=100&url=/ <http:// moblincontest.org/redirect.php?banner_id=100&url=/> _______________________________________________ Cdatdiscussion mailing list Cdatdiscussion@lists.sourceforge.net <mailto:Cdatdiscussion@lists.sourceforge.net> https:// lists.sourceforge.net/lists/listinfo/cdatdiscussion
 Stephane Raynaud 
 This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http:// moblincontest.org/redirect.php?banner_id=100&url=/ 
_______________________________________________ Cdatdiscussion mailing list Cdatdiscussion@lists.sourceforge.net https:// lists.sourceforge.net/lists/listinfo/cdatdiscussion
_______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http:// projects.scipy.org/mailman/listinfo/numpydiscussion
Please look the various NumPy functions to ignore NaN like nansum(). See the NumPy example list (http:// www. scipy.org/Numpy_Example_List_With_Doc) for examples under nan or individual functions.
To get the mean you can do something like:
import numpy x = numpy.array([2, numpy.nan, 1]) numpy.nansum(x)/(x.shape[0]numpy.isnan(x).sum()) x_masked = numpy.ma.masked_where(numpy.isnan(x) , x) x_masked.mean()
The real advantage of masked arrays is that you have greater control over the filtering so you can also filter extreme values:
y = numpy.array([2, numpy.nan, 1, 1000]) y_masked =numpy.ma.masked_where(numpy.isnan(y) , y) y_masked =numpy.ma.masked_where(y_masked > 100 , y_masked) y_masked.mean()
Regards Bruce _______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http:// projects.scipy.org/mailman/listinfo/numpydiscussion
Charles Doutriaux wrote:
Hi Bruce,
Thx for the reply, we're aware of this, basically the question was why not mask NaN automatically when creating a nump.ma array?
C.
Bruce Southey wrote:
Charles Doutriaux wrote:
Hi Stephane,
This is a good suggestion, I'm ccing the numpy list on this. Because I'm wondering if it wouldn't be a better fit to do it directly at the numpy.ma level.
I'm sure they already thought about this (and 'inf' values as well) and if they don't do it , there's probably some good reason we didn't think of yet. So before i go ahead and do it in MV2 I'd like to know the reason why it's not in numpy.ma, they are probably valid for MVs too.
C.
Stephane Raynaud wrote:
Hi,
how about automatically (or at least optionally) masking all NaN values when creating a MV array?
On Thu, Jul 24, 2008 at 11:43 PM, Arthur M. Greene <amg@iri.columbia.edu <mailto:amg@iri.columbia.edu>> wrote:
Yup, this works. Thanks!
I guess it's time for me to dig deeper into numpy syntax and functions, now that CDAT is using the numpy core for array management...
Best,
Arthur
Charles Doutriaux wrote:
Seems right to me,
Except that the syntax might scare a bit the new users :)
C.
Andrew.Dawson@uea.ac.uk <mailto:Andrew.Dawson@uea.ac.uk> wrote:
Hi,
I'm not sure if what I am about to suggest is a good idea or not, perhaps Charles will correct me if this is a bad idea for any reason.
Lets say you have a cdms variable called U with NaNs as the missing value. First we can replace the NaNs with 1e20:
U.data[numpy.where(numpy.isnan(U.data))] = 1e20
And remember to set the missing value of the variable appropriately:
U.setMissing(1e20)
I hope that helps, Andrew
Hi Arthur,
If i remember correctly the way i used to do it was: a= MV2.greater(data,1.) b=MV2.less_equal(data,1) c=MV2.logical_and(a,b) # Nan are the only one left data=MV2.masked_where(c,data)
BUT I believe numpy now has way to deal with nan I believe it is numpy.nan_to_num But it replaces with 0 so it may not be what you want
C.
Arthur M. Greene wrote:
A typical netcdf file is opened, and the single variable extracted:
fpr=cdms.open('prTS2p1_SEA_allmos.cdf') pr0=fpr('prcp') type(pr0)
<class 'cdms2.tvariable.TransientVariable'>
Masked values (indicating ocean in this case) show up here as NaNs.
pr0[0,15:5,0]
prcp array([NaN NaN NaN NaN NaN NaN 0.37745094 0.3460784 0.21960783 0.19117641])
So far this is all consistent. A map of the first time step shows the proper landocean boundaries, reasonablelooking values, and so on. But there doesn't seem to be any way to mask this array, so, e.g., an 'xy' average can be computed (it comes out all nans). NaN is not equal to anything  even itself  so there does not seem to be any condition, among the MV.masked_xxx options, that can be applied as a test. Also, it does not seem possible to compute seasonal averages, anomalies, etc.  they also produce just NaNs.
The workaround I've come up with  for now  is to first generate a new array of identical shape, filled with 1.0E+20. One test I've found that can detect NaNs is numpy.isnan:
isnan(pr0[0,0,0])
True
So it is _possible_ to tediously loop through every value in the old array, testing with isnan, then copying to the new array if the test fails. Then the axes have to be reset...
isnan does not accept array arguments, so one cannot do, e.g.,
prmasked=MV.masked_where(isnan(pr0),pr0)
The elementbyelement conversion is quite slow. (I'm still waiting for it to complete, in fact). Any suggestions for dealing with NaNinfested data objects?
Thanks!
AMG
P.S. This is 5.0.0.beta, RHEL4.
*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~* Arthur M. Greene, Ph.D. The International Research Institute for Climate and Society The Earth Institute, Columbia University, Lamont Campus Monell Building, 61 Route 9W, Palisades, NY 109648000 USA amg*at*iridotcolumbia\dot\edu  http:// iri.columbia.edu *^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*
 This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http:// moblincontest.org/redirect.php?banner_id=100&url=/ <http:// moblincontest.org/redirect.php?banner_id=100&url=/> _______________________________________________ Cdatdiscussion mailing list Cdatdiscussion@lists.sourceforge.net <mailto:Cdatdiscussion@lists.sourceforge.net> https:// lists.sourceforge.net/lists/listinfo/cdatdiscussion
 Stephane Raynaud 
 This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http:// moblincontest.org/redirect.php?banner_id=100&url=/ 
_______________________________________________ Cdatdiscussion mailing list Cdatdiscussion@lists.sourceforge.net https:// lists.sourceforge.net/lists/listinfo/cdatdiscussion
_______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http:// projects.scipy.org/mailman/listinfo/numpydiscussion
Please look the various NumPy functions to ignore NaN like nansum(). See the NumPy example list (http:// www. scipy.org/Numpy_Example_List_With_Doc) for examples under nan or individual functions.
To get the mean you can do something like:
import numpy x = numpy.array([2, numpy.nan, 1]) numpy.nansum(x)/(x.shape[0]numpy.isnan(x).sum()) x_masked = numpy.ma.masked_where(numpy.isnan(x) , x) x_masked.mean()
The real advantage of masked arrays is that you have greater control over the filtering so you can also filter extreme values:
y = numpy.array([2, numpy.nan, 1, 1000]) y_masked =numpy.ma.masked_where(numpy.isnan(y) , y) y_masked =numpy.ma.masked_where(y_masked > 100 , y_masked) y_masked.mean()
Regards Bruce _______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http:// projects.scipy.org/mailman/listinfo/numpydiscussion
_______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
You mean like doing: import numpy y=numpy.ma.MaskedArray([ 2., numpy.nan, 1., 1000.], numpy.isnan(y)) ? Bruce
I mean not having to it myself. data is a numpy array with NaN in it masked_data = numpy.ma.array(data) returns a masked array with a mask where NaN were in data C. Bruce Southey wrote:
Charles Doutriaux wrote:
Hi Bruce,
Thx for the reply, we're aware of this, basically the question was why not mask NaN automatically when creating a nump.ma array?
C.
Bruce Southey wrote:
Charles Doutriaux wrote:
Hi Stephane,
This is a good suggestion, I'm ccing the numpy list on this. Because I'm wondering if it wouldn't be a better fit to do it directly at the numpy.ma level.
I'm sure they already thought about this (and 'inf' values as well) and if they don't do it , there's probably some good reason we didn't think of yet. So before i go ahead and do it in MV2 I'd like to know the reason why it's not in numpy.ma, they are probably valid for MVs too.
C.
Stephane Raynaud wrote:
Hi,
how about automatically (or at least optionally) masking all NaN values when creating a MV array?
On Thu, Jul 24, 2008 at 11:43 PM, Arthur M. Greene <amg@iri.columbia.edu <mailto:amg@iri.columbia.edu>> wrote:
Yup, this works. Thanks!
I guess it's time for me to dig deeper into numpy syntax and functions, now that CDAT is using the numpy core for array management...
Best,
Arthur
Charles Doutriaux wrote:
Seems right to me,
Except that the syntax might scare a bit the new users :)
C.
Andrew.Dawson@uea.ac.uk <mailto:Andrew.Dawson@uea.ac.uk> wrote:
Hi,
I'm not sure if what I am about to suggest is a good idea or not, perhaps Charles will correct me if this is a bad idea for any reason.
Lets say you have a cdms variable called U with NaNs as the missing value. First we can replace the NaNs with 1e20:
U.data[numpy.where(numpy.isnan(U.data))] = 1e20
And remember to set the missing value of the variable appropriately:
U.setMissing(1e20)
I hope that helps, Andrew
Hi Arthur,
If i remember correctly the way i used to do it was: a= MV2.greater(data,1.) b=MV2.less_equal(data,1) c=MV2.logical_and(a,b) # Nan are the only one left data=MV2.masked_where(c,data)
BUT I believe numpy now has way to deal with nan I believe it is numpy.nan_to_num But it replaces with 0 so it may not be what you want
C.
Arthur M. Greene wrote:
A typical netcdf file is opened, and the single variable extracted:
fpr=cdms.open('prTS2p1_SEA_allmos.cdf') pr0=fpr('prcp') type(pr0)
<class 'cdms2.tvariable.TransientVariable'>
Masked values (indicating ocean in this case) show up here as NaNs.
pr0[0,15:5,0]
prcp array([NaN NaN NaN NaN NaN NaN 0.37745094 0.3460784 0.21960783 0.19117641])
So far this is all consistent. A map of the first time step shows the proper landocean boundaries, reasonablelooking values, and so on. But there doesn't seem to be any way to mask this array, so, e.g., an 'xy' average can be computed (it comes out all nans). NaN is not equal to anything  even itself  so there does not seem to be any condition, among the MV.masked_xxx options, that can be applied as a test. Also, it does not seem possible to compute seasonal averages, anomalies, etc.  they also produce just NaNs.
The workaround I've come up with  for now  is to first generate a new array of identical shape, filled with 1.0E+20. One test I've found that can detect NaNs is numpy.isnan:
isnan(pr0[0,0,0])
True
So it is _possible_ to tediously loop through every value in the old array, testing with isnan, then copying to the new array if the test fails. Then the axes have to be reset...
isnan does not accept array arguments, so one cannot do, e.g.,
prmasked=MV.masked_where(isnan(pr0),pr0)
The elementbyelement conversion is quite slow. (I'm still waiting for it to complete, in fact). Any suggestions for dealing with NaNinfested data objects?
Thanks!
AMG
P.S. This is 5.0.0.beta, RHEL4.
*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~* Arthur M. Greene, Ph.D. The International Research Institute for Climate and Society The Earth Institute, Columbia University, Lamont Campus Monell Building, 61 Route 9W, Palisades, NY 109648000 USA amg*at*iridotcolumbia\dot\edu  http:// iri.columbia.edu *^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*
 This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http:// moblincontest.org/redirect.php?banner_id=100&url=/ <http:// moblincontest.org/redirect.php?banner_id=100&url=/> _______________________________________________ Cdatdiscussion mailing list Cdatdiscussion@lists.sourceforge.net <mailto:Cdatdiscussion@lists.sourceforge.net> https:// lists.sourceforge.net/lists/listinfo/cdatdiscussion
 Stephane Raynaud 
 This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http:// moblincontest.org/redirect.php?banner_id=100&url=/ 
_______________________________________________ Cdatdiscussion mailing list Cdatdiscussion@lists.sourceforge.net https:// lists.sourceforge.net/lists/listinfo/cdatdiscussion
_______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http:// projects.scipy.org/mailman/listinfo/numpydiscussion
Please look the various NumPy functions to ignore NaN like nansum(). See the NumPy example list (http:// www. scipy.org/Numpy_Example_List_With_Doc) for examples under nan or individual functions.
To get the mean you can do something like:
import numpy x = numpy.array([2, numpy.nan, 1]) numpy.nansum(x)/(x.shape[0]numpy.isnan(x).sum()) x_masked = numpy.ma.masked_where(numpy.isnan(x) , x) x_masked.mean()
The real advantage of masked arrays is that you have greater control over the filtering so you can also filter extreme values:
y = numpy.array([2, numpy.nan, 1, 1000]) y_masked =numpy.ma.masked_where(numpy.isnan(y) , y) y_masked =numpy.ma.masked_where(y_masked > 100 , y_masked) y_masked.mean()
Regards Bruce _______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http:// projects.scipy.org/mailman/listinfo/numpydiscussion
_______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http:// projects.scipy.org/mailman/listinfo/numpydiscussion
You mean like doing:
import numpy y=numpy.ma.MaskedArray([ 2., numpy.nan, 1., 1000.], numpy.isnan(y))
?
Bruce
_______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http:// projects.scipy.org/mailman/listinfo/numpydiscussion
Charles Doutriaux wrote:
I mean not having to it myself. data is a numpy array with NaN in it masked_data = numpy.ma.array(data) returns a masked array with a mask where NaN were in data
Checking for nans is an expensive operation, so it makes sense to make it optional rather than impose the cost on all masked array creations. If you want the same effect, you can do this: masked_data = numpy.ma.masked_invalid(data) Eric
C.
Bruce Southey wrote:
Charles Doutriaux wrote:
Hi Bruce,
Thx for the reply, we're aware of this, basically the question was why not mask NaN automatically when creating a nump.ma array?
C.
Bruce Southey wrote:
Charles Doutriaux wrote:
Hi Stephane,
This is a good suggestion, I'm ccing the numpy list on this. Because I'm wondering if it wouldn't be a better fit to do it directly at the numpy.ma level.
I'm sure they already thought about this (and 'inf' values as well) and if they don't do it , there's probably some good reason we didn't think of yet. So before i go ahead and do it in MV2 I'd like to know the reason why it's not in numpy.ma, they are probably valid for MVs too.
C.
Stephane Raynaud wrote:
Hi,
how about automatically (or at least optionally) masking all NaN values when creating a MV array?
On Thu, Jul 24, 2008 at 11:43 PM, Arthur M. Greene <amg@iri.columbia.edu <mailto:amg@iri.columbia.edu>> wrote:
Yup, this works. Thanks!
I guess it's time for me to dig deeper into numpy syntax and functions, now that CDAT is using the numpy core for array management...
Best,
Arthur
Charles Doutriaux wrote:
Seems right to me,
Except that the syntax might scare a bit the new users :)
C.
Andrew.Dawson@uea.ac.uk <mailto:Andrew.Dawson@uea.ac.uk> wrote:
Hi,
I'm not sure if what I am about to suggest is a good idea or not, perhaps Charles will correct me if this is a bad idea for any reason.
Lets say you have a cdms variable called U with NaNs as the missing value. First we can replace the NaNs with 1e20:
U.data[numpy.where(numpy.isnan(U.data))] = 1e20
And remember to set the missing value of the variable appropriately:
U.setMissing(1e20)
I hope that helps, Andrew
Hi Arthur,
If i remember correctly the way i used to do it was: a= MV2.greater(data,1.) b=MV2.less_equal(data,1) c=MV2.logical_and(a,b) # Nan are the only one left data=MV2.masked_where(c,data)
BUT I believe numpy now has way to deal with nan I believe it is numpy.nan_to_num But it replaces with 0 so it may not be what you want
C.
Arthur M. Greene wrote:
A typical netcdf file is opened, and the single variable extracted:
fpr=cdms.open('prTS2p1_SEA_allmos.cdf') pr0=fpr('prcp') type(pr0)
<class 'cdms2.tvariable.TransientVariable'>
Masked values (indicating ocean in this case) show up here as NaNs.
pr0[0,15:5,0]
prcp array([NaN NaN NaN NaN NaN NaN 0.37745094 0.3460784 0.21960783 0.19117641])
So far this is all consistent. A map of the first time step shows the proper landocean boundaries, reasonablelooking values, and so on. But there doesn't seem to be any way to mask this array, so, e.g., an 'xy' average can be computed (it comes out all nans). NaN is not equal to anything  even itself  so there does not seem to be any condition, among the MV.masked_xxx options, that can be applied as a test. Also, it does not seem possible to compute seasonal averages, anomalies, etc.  they also produce just NaNs.
The workaround I've come up with  for now  is to first generate a new array of identical shape, filled with 1.0E+20. One test I've found that can detect NaNs is numpy.isnan:
isnan(pr0[0,0,0])
True
So it is _possible_ to tediously loop through every value in the old array, testing with isnan, then copying to the new array if the test fails. Then the axes have to be reset...
isnan does not accept array arguments, so one cannot do, e.g.,
prmasked=MV.masked_where(isnan(pr0),pr0)
The elementbyelement conversion is quite slow. (I'm still waiting for it to complete, in fact). Any suggestions for dealing with NaNinfested data objects?
Thanks!
AMG
P.S. This is 5.0.0.beta, RHEL4.
*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~* Arthur M. Greene, Ph.D. The International Research Institute for Climate and Society The Earth Institute, Columbia University, Lamont Campus Monell Building, 61 Route 9W, Palisades, NY 109648000 USA amg*at*iridotcolumbia\dot\edu  http:// iri.columbia.edu *^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*
 This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http:// moblincontest.org/redirect.php?banner_id=100&url=/ <http:// moblincontest.org/redirect.php?banner_id=100&url=/> _______________________________________________ Cdatdiscussion mailing list Cdatdiscussion@lists.sourceforge.net <mailto:Cdatdiscussion@lists.sourceforge.net> https:// lists.sourceforge.net/lists/listinfo/cdatdiscussion
 Stephane Raynaud 
 This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http:// moblincontest.org/redirect.php?banner_id=100&url=/ 
_______________________________________________ Cdatdiscussion mailing list Cdatdiscussion@lists.sourceforge.net https:// lists.sourceforge.net/lists/listinfo/cdatdiscussion
_______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http:// projects.scipy.org/mailman/listinfo/numpydiscussion
Please look the various NumPy functions to ignore NaN like nansum(). See the NumPy example list (http:// www. scipy.org/Numpy_Example_List_With_Doc) for examples under nan or individual functions.
To get the mean you can do something like:
import numpy x = numpy.array([2, numpy.nan, 1]) numpy.nansum(x)/(x.shape[0]numpy.isnan(x).sum()) x_masked = numpy.ma.masked_where(numpy.isnan(x) , x) x_masked.mean()
The real advantage of masked arrays is that you have greater control over the filtering so you can also filter extreme values:
y = numpy.array([2, numpy.nan, 1, 1000]) y_masked =numpy.ma.masked_where(numpy.isnan(y) , y) y_masked =numpy.ma.masked_where(y_masked > 100 , y_masked) y_masked.mean()
Regards Bruce _______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http:// projects.scipy.org/mailman/listinfo/numpydiscussion
_______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http:// projects.scipy.org/mailman/listinfo/numpydiscussion
You mean like doing:
import numpy y=numpy.ma.MaskedArray([ 2., numpy.nan, 1., 1000.], numpy.isnan(y))
?
Bruce
_______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http:// projects.scipy.org/mailman/listinfo/numpydiscussion
_______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
participants (3)

Bruce Southey

Charles Doutriaux

Eric Firing