Dear experts, I am encountering a strange behaviour of python data array as below. I have been trying to use the data from a netcdf file(attached herewith) to do certain calculation using below code. If I take absolute value of the same array and look for values <.5 I get a different value than the original array. But the fact is that this particular case do not have any negative values in the array( but there are other files where it can have negative values so the condition is put). I do not see any reason for getting different numbers for values <.5 in case of bt and expected it to be same as that of r2010. If any one has a guess on what is behind this behaviour please help. In [14]: from netCDF4 import Dataset as nc In [15]: nf=nc('r2010.nc') In [16]: r2010=nf.variables['R2010'][:] In [17]: bt=abs(r2010) In [18]: bt[bt<=.5].shape Out[18]: (2872,) In [19]: r2010[r2010<.5].shape Out[19]: (36738,) bt.min() Out[20]: 0.0027588337040836768 In [21]: bt.max() Out[21]: 3.5078965479057089 In [22]: r2010.max() Out[22]: 3.5078965479057089 In [23]: r2010.min() Out[23]: 0.0027588337040836768 *************************************************************** Sudheer Joseph Indian National Centre for Ocean Information Services Ministry of Earth Sciences, Govt. of India POST BOX NO: 21, IDA Jeedeemetla P.O. Via Pragathi Nagar,Kukatpally, Hyderabad; Pin:5000 55 Tel:+91-40-23886047(O),Fax:+91-40-23895011(O), Tel:+91-40-23044600(R),Tel:+91-40-9440832534(Mobile) E-mail:sjo.India@gmail.com;sudheer.joseph@yahoo.com Web- http://oppamthadathil.tripod.com ***************************************************************
Seems to be related to the masked values: print r2010[:3,:3] [[-- -- --] [-- -- --] [-- -- --]] print abs(r2010)[:3,:3] [[-- -- --] [-- -- --] [-- -- --]] print r2010[ r2010[:3,:3] <0 ] [-- -- -- -- -- -- -- -- --] print r2010[ abs(r2010)[:3,:3] < 0] [] Nicolas On 13 Mar 2014, at 16:52, Sudheer Joseph <sudheer.joseph@yahoo.com> wrote:
Dear experts, I am encountering a strange behaviour of python data array as below. I have been trying to use the data from a netcdf file(attached herewith) to do certain calculation using below code. If I take absolute value of the same array and look for values <.5 I get a different value than the original array. But the fact is that this particular case do not have any negative values in the array( but there are other files where it can have negative values so the condition is put). I do not see any reason for getting different numbers for values <.5 in case of bt and expected it to be same as that of r2010. If any one has a guess on what is behind this behaviour please help.
In [14]: from netCDF4 import Dataset as nc
In [15]: nf=nc('r2010.nc') In [16]: r2010=nf.variables['R2010'][:] In [17]: bt=abs(r2010) In [18]: bt[bt<=.5].shape Out[18]: (2872,) In [19]: r2010[r2010<.5].shape Out[19]: (36738,)
bt.min() Out[20]: 0.0027588337040836768
In [21]: bt.max() Out[21]: 3.5078965479057089 In [22]: r2010.max() Out[22]: 3.5078965479057089 In [23]: r2010.min() Out[23]: 0.0027588337040836768
*************************************************************** Sudheer Joseph Indian National Centre for Ocean Information Services Ministry of Earth Sciences, Govt. of India POST BOX NO: 21, IDA Jeedeemetla P.O. Via Pragathi Nagar,Kukatpally, Hyderabad; Pin:5000 55 Tel:+91-40-23886047(O),Fax:+91-40-23895011(O), Tel:+91-40-23044600(R),Tel:+91-40-9440832534(Mobile) E-mail:sjo.India@gmail.com;sudheer.joseph@yahoo.com Web- http://oppamthadathil.tripod.com ***************************************************************<r2010.nc>_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Mar 13, 2014, at 9:39 AM, Nicolas Rougier <Nicolas.Rougier@inria.fr> wrote:
Seems to be related to the masked values:
Good hint -- a masked array keeps the "junk" values in the main array. What "abs" are you using -- it may not be mask-aware. ( you want a numpy abs anyway) Also -- I'm not sure I know what happens with Boolean operators on masked arrays when you use them to index. I'd investigate that. (sorry, not at a machine I can play with now) Chris
print r2010[:3,:3] [[-- -- --] [-- -- --] [-- -- --]]
print abs(r2010)[:3,:3] [[-- -- --] [-- -- --] [-- -- --]]
print r2010[ r2010[:3,:3] <0 ] [-- -- -- -- -- -- -- -- --]
print r2010[ abs(r2010)[:3,:3] < 0] []
Nicolas
On 13 Mar 2014, at 16:52, Sudheer Joseph <sudheer.joseph@yahoo.com> wrote:
Dear experts, I am encountering a strange behaviour of python data array as below. I have been trying to use the data from a netcdf file(attached herewith) to do certain calculation using below code. If I take absolute value of the same array and look for values <.5 I get a different value than the original array. But the fact is that this particular case do not have any negative values in the array( but there are other files where it can have negative values so the condition is put). I do not see any reason for getting different numbers for values <.5 in case of bt and expected it to be same as that of r2010. If any one has a guess on what is behind this behaviour please help.
In [14]: from netCDF4 import Dataset as nc
In [15]: nf=nc('r2010.nc') In [16]: r2010=nf.variables['R2010'][:] In [17]: bt=abs(r2010) In [18]: bt[bt<=.5].shape Out[18]: (2872,) In [19]: r2010[r2010<.5].shape Out[19]: (36738,)
bt.min() Out[20]: 0.0027588337040836768
In [21]: bt.max() Out[21]: 3.5078965479057089 In [22]: r2010.max() Out[22]: 3.5078965479057089 In [23]: r2010.min() Out[23]: 0.0027588337040836768
*************************************************************** Sudheer Joseph Indian National Centre for Ocean Information Services Ministry of Earth Sciences, Govt. of India POST BOX NO: 21, IDA Jeedeemetla P.O. Via Pragathi Nagar,Kukatpally, Hyderabad; Pin:5000 55 Tel:+91-40-23886047(O),Fax:+91-40-23895011(O), Tel:+91-40-23044600(R),Tel:+91-40-9440832534(Mobile) E-mail:sjo.India@gmail.com;sudheer.joseph@yahoo.com Web- http://oppamthadathil.tripod.com ***************************************************************<r2010.nc>_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Thank you very much Nicolas and Chris, The hint was helpful and from that I treid below steps ( a crude way I would say) and getting same result now I have been using abs available by default and it is the same with numpy.absolute( i checked). nr= ((r2010>r2010.min()) & (r2010<r2010.max())) nr[nr<.5].shape Out[25]: (33868,) anr=numpy.absolute(nr) anr[anr<.5].shape Out[27]: (33868,) This way I used may have problem when mask used has values which can affect the min max operation. So I would like to know if there is a standard formal ( python/numpy) way to handle masked array when they need to be subjected to boolean operations. with best regards, Sudheer *************************************************************** Sudheer Joseph Indian National Centre for Ocean Information Services Ministry of Earth Sciences, Govt. of India POST BOX NO: 21, IDA Jeedeemetla P.O. Via Pragathi Nagar,Kukatpally, Hyderabad; Pin:5000 55 Tel:+91-40-23886047(O),Fax:+91-40-23895011(O), Tel:+91-40-23044600(R),Tel:+91-40-9440832534(Mobile) E-mail:sjo.India@gmail.com;sudheer.joseph@yahoo.com Web- http://oppamthadathil.tripod.com *************************************************************** -------------------------------------------- On Thu, 13/3/14, Chris Barker - NOAA Federal <chris.barker@noaa.gov> wrote: Subject: Re: [Numpy-discussion] python array To: "Discussion of Numerical Python" <numpy-discussion@scipy.org> Date: Thursday, 13 March, 2014, 11:53 PM On Mar 13, 2014, at 9:39 AM, Nicolas Rougier <Nicolas.Rougier@inria.fr> wrote:
Seems to be related to the masked values:
Good hint -- a masked array keeps the "junk" values in the main array. What "abs" are you using -- it may not be mask-aware. ( you want a numpy abs anyway) Also -- I'm not sure I know what happens with Boolean operators on masked arrays when you use them to index. I'd investigate that. (sorry, not at a machine I can play with now) Chris
print r2010[:3,:3] [[-- -- --] [-- -- --] [-- -- --]]
print abs(r2010)[:3,:3] [[-- -- --] [-- -- --] [-- -- --]]
print r2010[ r2010[:3,:3] <0 ] [-- -- -- -- -- -- -- -- --]
print r2010[ abs(r2010)[:3,:3] < 0] []
Nicolas
On 13 Mar 2014, at 16:52, Sudheer Joseph <sudheer.joseph@yahoo.com> wrote:
Dear experts,
I am encountering a strange behaviour of python data array as below. I have been trying to use the data from a netcdf file(attached herewith) to do certain calculation using below code. If I take absolute value of the same array and look for values <.5 I get a different value than the original array. But the fact is that this particular case do not have any negative values in the array( but there are other files where it can have negative values so the condition is put). I do not see any reason for getting different numbers for values <.5 in case of bt and expected it to be same as that of r2010. If any one has a guess on what is behind this behaviour please help.
In [14]: from netCDF4 import Dataset as nc
In [15]: nf=nc('r2010.nc') In [16]: r2010=nf.variables['R2010'][:] In [17]: bt=abs(r2010) In [18]: bt[bt<=.5].shape Out[18]: (2872,) In [19]: r2010[r2010<.5].shape Out[19]: (36738,)
bt.min() Out[20]: 0.0027588337040836768
In [21]: bt.max() Out[21]: 3.5078965479057089 In [22]: r2010.max() Out[22]: 3.5078965479057089 In [23]: r2010.min() Out[23]: 0.0027588337040836768
Sudheer Joseph Indian National Centre for Ocean Information Services Ministry of Earth Sciences, Govt. of India POST BOX NO: 21, IDA Jeedeemetla P.O. Via Pragathi Nagar,Kukatpally, Hyderabad; Pin:5000 55 Tel:+91-40-23886047(O),Fax:+91-40-23895011(O),
Tel:+91-40-23044600(R),Tel:+91-40-9440832534(Mobile)
E-mail:sjo.India@gmail.com;sudheer.joseph@yahoo.com Web- http://oppamthadathil.tripod.com
***************************************************************<r2010.nc>_______________________________________________
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Sorry, The below solution I thoght working was not working but was just giving array size. -------------------------------------------- On Fri, 14/3/14, Sudheer Joseph <sudheer.joseph@yahoo.com> wrote: Subject: Re: [Numpy-discussion] python array To: "Discussion of Numerical Python" <numpy-discussion@scipy.org> Date: Friday, 14 March, 2014, 1:09 AM Thank you very much Nicolas and Chris, The hint was helpful and from that I treid below steps ( a crude way I would say) and getting same result now I have been using abs available by default and it is the same with numpy.absolute( i checked). nr= ((r2010>r2010.min()) & (r2010<r2010.max())) nr[nr<.5].shape Out[25]: (33868,) anr=numpy.absolute(nr) anr[anr<.5].shape Out[27]: (33868,) This way I used may have problem when mask used has values which can affect the min max operation. So I would like to know if there is a standard formal ( python/numpy) way to handle masked array when they need to be subjected to boolean operations. with best regards, Sudheer *************************************************************** Sudheer Joseph Indian National Centre for Ocean Information Services Ministry of Earth Sciences, Govt. of India POST BOX NO: 21, IDA Jeedeemetla P.O. Via Pragathi Nagar,Kukatpally, Hyderabad; Pin:5000 55 Tel:+91-40-23886047(O),Fax:+91-40-23895011(O), Tel:+91-40-23044600(R),Tel:+91-40-9440832534(Mobile) E-mail:sjo.India@gmail.com;sudheer.joseph@yahoo.com Web- http://oppamthadathil.tripod.com *************************************************************** -------------------------------------------- On Thu, 13/3/14, Chris Barker - NOAA Federal <chris.barker@noaa.gov> wrote: Subject: Re: [Numpy-discussion] python array To: "Discussion of Numerical Python" <numpy-discussion@scipy.org> Date: Thursday, 13 March, 2014, 11:53 PM On Mar 13, 2014, at 9:39 AM, Nicolas Rougier <Nicolas.Rougier@inria.fr> wrote:
Seems to be related to the masked values:
print r2010[:3,:3] [[-- -- --] [-- -- --] [-- -- --]]
print abs(r2010)[:3,:3] [[-- -- --] [-- -- --] [-- -- --]]
print r2010[ r2010[:3,:3] <0 ] [-- -- -- -- -- -- -- -- --]
print r2010[ abs(r2010)[:3,:3] < 0] []
Nicolas
On 13 Mar 2014, at 16:52, Sudheer Joseph <sudheer.joseph@yahoo.com> wrote:
Dear experts,
I am encountering a strange behaviour of python data array as below. I have been
Good hint -- a masked array keeps the "junk" values in the main array. What "abs" are you using -- it may not be mask-aware. ( you want a numpy abs anyway) Also -- I'm not sure I know what happens with Boolean operators on masked arrays when you use them to index. I'd investigate that. (sorry, not at a machine I can play with now) Chris trying to use the data from a netcdf file(attached herewith) to do certain calculation using below code. If I take absolute value of the same array and look for values <.5 I get a different value than the original array. But the fact is that this particular case do not have any negative values in the array( but there are other files where it can have negative values so the condition is put). I do not see any reason for getting different numbers for values <.5 in case of bt and expected it to be same as that of r2010. If any one has a guess on what is behind this behaviour please help.
In [14]: from netCDF4 import Dataset as nc
In [15]: nf=nc('r2010.nc') In [16]: r2010=nf.variables['R2010'][:] In [17]: bt=abs(r2010) In [18]: bt[bt<=.5].shape Out[18]: (2872,) In [19]: r2010[r2010<.5].shape Out[19]: (36738,)
bt.min() Out[20]: 0.0027588337040836768
In [21]: bt.max() Out[21]: 3.5078965479057089 In [22]: r2010.max() Out[22]: 3.5078965479057089 In [23]: r2010.min() Out[23]: 0.0027588337040836768
Sudheer Joseph Indian National Centre for Ocean Information Services Ministry of Earth Sciences, Govt. of India POST BOX NO: 21, IDA Jeedeemetla P.O. Via Pragathi Nagar,Kukatpally, Hyderabad; Pin:5000 55 Tel:+91-40-23886047(O),Fax:+91-40-23895011(O),
Tel:+91-40-23044600(R),Tel:+91-40-9440832534(Mobile)
E-mail:sjo.India@gmail.com;sudheer.joseph@yahoo.com Web- http://oppamthadathil.tripod.com
***************************************************************<r2010.nc>_______________________________________________
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
The difference appears to be that the boolean selection pulls out all data values <= 0.5 whether or not they are masked, and then carries over the appropriate masks to the new array. So r2010 and bt contain identical unmasked values but different numbers of masked values. Because the initial fill value for your masked values was a large negative number, in r2010 those masked values are carried over. In bt, you've taken the absolute value of the data array, so those fill values are now positive and they are no longer carried over into the indexed array. Because the final arrays are still masked, you are observing no difference in the statistical properties of the arrays, only their sizes, because one contains many more masked values than the other. I don't think this should be a problem for your computations. If you're concerned, you could always explicitly demask them before your computations. See the example problem below. ~Brett In [61]: import numpy as np In [62]: import numpy.ma as ma In [65]: a = np.arange(-8, 8).reshape((4, 4)) In [66]: a Out[66]: array([[-8, -7, -6, -5], [-4, -3, -2, -1], [ 0, 1, 2, 3], [ 4, 5, 6, 7]]) In [68]: b = ma.masked_array(a, mask=a < 0) In [69]: b Out[69]: masked_array(data = [[-- -- -- --] [-- -- -- --] [0 1 2 3] [4 5 6 7]], mask = [[ True True True True] [ True True True True] [False False False False] [False False False False]], fill_value = 999999) In [70]: b.data Out[70]: array([[-8, -7, -6, -5], [-4, -3, -2, -1], [ 0, 1, 2, 3], [ 4, 5, 6, 7]]) In [71]: c = abs(b) In [72]: c[c <= 4].shape Out[72]: (9L,) In [73]: b[b <= 4].shape Out[73]: (13L,) In [74]: b[b <= 4] Out[74]: masked_array(data = [-- -- -- -- -- -- -- -- 0 1 2 3 4], mask = [ True True True True True True True True False False False False False], fill_value = 999999) In [75]: c[c <= 4] Out[75]: masked_array(data = [-- -- -- -- 0 1 2 3 4], mask = [ True True True True False False False False False], fill_value = 999999) On Thu, Mar 13, 2014 at 8:14 PM, Sudheer Joseph <sudheer.joseph@yahoo.com>wrote:
Sorry, The below solution I thoght working was not working but was just giving array size.
-------------------------------------------- On Fri, 14/3/14, Sudheer Joseph <sudheer.joseph@yahoo.com> wrote:
Subject: Re: [Numpy-discussion] python array To: "Discussion of Numerical Python" <numpy-discussion@scipy.org> Date: Friday, 14 March, 2014, 1:09 AM
Thank you very much Nicolas and Chris,
The hint was helpful and from that I treid below steps ( a crude way I would say) and getting same result now
I have been using abs available by default and it is the same with numpy.absolute( i checked).
nr= ((r2010>r2010.min()) & (r2010<r2010.max())) nr[nr<.5].shape Out[25]: (33868,) anr=numpy.absolute(nr) anr[anr<.5].shape Out[27]: (33868,)
This way I used may have problem when mask used has values which can affect the min max operation.
So I would like to know if there is a standard formal ( python/numpy) way to handle masked array when they need to be subjected to boolean operations.
with best regards, Sudheer
*************************************************************** Sudheer Joseph Indian National Centre for Ocean Information Services Ministry of Earth Sciences, Govt. of India POST BOX NO: 21, IDA Jeedeemetla P.O. Via Pragathi Nagar,Kukatpally, Hyderabad; Pin:5000 55 Tel:+91-40-23886047(O),Fax:+91-40-23895011(O), Tel:+91-40-23044600(R),Tel:+91-40-9440832534(Mobile) E-mail:sjo.India@gmail.com;sudheer.joseph@yahoo.com Web- http://oppamthadathil.tripod.com ***************************************************************
-------------------------------------------- On Thu, 13/3/14, Chris Barker - NOAA Federal <chris.barker@noaa.gov> wrote:
Subject: Re: [Numpy-discussion] python array To: "Discussion of Numerical Python" <numpy-discussion@scipy.org> Date: Thursday, 13 March, 2014, 11:53 PM
On Mar 13, 2014, at 9:39 AM, Nicolas Rougier <Nicolas.Rougier@inria.fr> wrote:
Seems to be related to the masked values:
Good hint -- a masked array keeps the "junk" values in the main array.
What "abs" are you using -- it may not be mask-aware. ( you want a numpy abs anyway)
Also -- I'm not sure I know what happens with Boolean operators on masked arrays when you use them to index. I'd investigate that. (sorry, not at a machine I can play with now)
Chris
print r2010[:3,:3] [[-- -- --] [-- -- --] [-- -- --]]
print abs(r2010)[:3,:3] [[-- -- --] [-- -- --] [-- -- --]]
print r2010[ r2010[:3,:3] <0 ] [-- -- -- -- -- -- -- -- --]
print r2010[ abs(r2010)[:3,:3] < 0] []
Nicolas
On 13 Mar 2014, at 16:52, Sudheer Joseph <sudheer.joseph@yahoo.com> wrote:
Dear experts,
I am encountering a strange behaviour of python data array as below. I have been trying to use the data from a netcdf file(attached herewith) to do certain calculation using below code. If I take absolute value of the same array and look for values <.5 I get a different value than the original array. But the fact is that this particular case do not have any negative values in the array( but there are other files where it can have negative values so the condition is put). I do not see any reason for getting different numbers for values <.5 in case of bt and expected it to be same as that of r2010. If any one has a guess on what is behind this behaviour please help.
In [14]: from netCDF4 import Dataset as nc
In [15]: nf=nc('r2010.nc') In [16]: r2010=nf.variables['R2010'][:] In [17]: bt=abs(r2010) In [18]: bt[bt<=.5].shape Out[18]: (2872,) In [19]: r2010[r2010<.5].shape Out[19]: (36738,)
bt.min() Out[20]: 0.0027588337040836768
In [21]: bt.max() Out[21]: 3.5078965479057089 In [22]: r2010.max() Out[22]: 3.5078965479057089 In [23]: r2010.min() Out[23]: 0.0027588337040836768
Sudheer Joseph Indian National Centre for Ocean Information Services Ministry of Earth Sciences, Govt. of India POST BOX NO: 21, IDA Jeedeemetla P.O. Via Pragathi Nagar,Kukatpally, Hyderabad; Pin:5000 55 Tel:+91-40-23886047(O),Fax:+91-40-23895011(O),
Tel:+91-40-23044600(R),Tel:+91-40-9440832534(Mobile)
E-mail:sjo.India@gmail.com;sudheer.joseph@yahoo.com Web- http://oppamthadathil.tripod.com
***************************************************************<r2010.nc _______________________________________________
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Thank you Olsen, My objective was to find out, how many values are falling under different ranges. ie, find RMS < ,5 and then rms between .5 and .8 etc. If there is a speficic python way of handling mask and making boolean operation with out any doubt, I was looking for that. The data I am using has a mask and if I wanted to tell python do not consider the masked values and masked grid points, ( a percentatge is calculated afterwards using the number of grid points) while doing the calculation. I will try in detail the example you send and see how python handles this. with best regards, Sudheer *************************************************************** Sudheer Joseph Indian National Centre for Ocean Information Services Ministry of Earth Sciences, Govt. of India POST BOX NO: 21, IDA Jeedeemetla P.O. Via Pragathi Nagar,Kukatpally, Hyderabad; Pin:5000 55 Tel:+91-40-23886047(O),Fax:+91-40-23895011(O), Tel:+91-40-23044600(R),Tel:+91-40-9440832534(Mobile) E-mail:sjo.India@gmail.com;sudheer.joseph@yahoo.com Web- http://oppamthadathil.tripod.com *************************************************************** -------------------------------------------- On Fri, 14/3/14, Brett Olsen <brett.olsen@gmail.com> wrote: Subject: Re: [Numpy-discussion] python array To: "Discussion of Numerical Python" <numpy-discussion@scipy.org> Date: Friday, 14 March, 2014, 2:07 AM The difference appears to be that the boolean selection pulls out all data values <= 0.5 whether or not they are masked, and then carries over the appropriate masks to the new array. So r2010 and bt contain identical unmasked values but different numbers of masked values. Because the initial fill value for your masked values was a large negative number, in r2010 those masked values are carried over. In bt, you've taken the absolute value of the data array, so those fill values are now positive and they are no longer carried over into the indexed array. Because the final arrays are still masked, you are observing no difference in the statistical properties of the arrays, only their sizes, because one contains many more masked values than the other. I don't think this should be a problem for your computations. If you're concerned, you could always explicitly demask them before your computations. See the example problem below. ~Brett In [61]: import numpy as np In [62]: import numpy.ma as ma In [65]: a = np.arange(-8, 8).reshape((4, 4)) In [66]: aOut[66]:array([[-8, -7, -6, -5], [-4, -3, -2, -1], [ 0, 1, 2, 3], [ 4, 5, 6, 7]]) In [68]: b = ma.masked_array(a, mask=a < 0) In [69]: b Out[69]:masked_array(data = [[-- -- -- --] [-- -- -- --] [0 1 2 3] [4 5 6 7]], mask = [[ True True True True] [ True True True True] [False False False False] [False False False False]], fill_value = 999999) In [70]: b.data Out[70]:array([[-8, -7, -6, -5], [-4, -3, -2, -1], [ 0, 1, 2, 3], [ 4, 5, 6, 7]]) In [71]: c = abs(b) In [72]: c[c <= 4].shapeOut[72]: (9L,) In [73]: b[b <= 4].shapeOut[73]: (13L,) In [74]: b[b <= 4]Out[74]:masked_array(data = [-- -- -- -- -- -- -- -- 0 1 2 3 4], mask = [ True True True True True True True True False False False False False], fill_value = 999999) In [75]: c[c <= 4] Out[75]:masked_array(data = [-- -- -- -- 0 1 2 3 4], mask = [ True True True True False False False False False], fill_value = 999999) On Thu, Mar 13, 2014 at 8:14 PM, Sudheer Joseph <sudheer.joseph@yahoo.com> wrote: Sorry, The below solution I thoght working was not working but was just giving array size. -------------------------------------------- On Fri, 14/3/14, Sudheer Joseph <sudheer.joseph@yahoo.com> wrote: Subject: Re: [Numpy-discussion] python array To: "Discussion of Numerical Python" <numpy-discussion@scipy.org> Date: Friday, 14 March, 2014, 1:09 AM Thank you very much Nicolas and Chris, The hint was helpful and from that I treid below steps ( a crude way I would say) and getting same result now I have been using abs available by default and it is the same with numpy.absolute( i checked). nr= ((r2010>r2010.min()) & (r2010<r2010.max())) nr[nr<.5].shape Out[25]: (33868,) anr=numpy.absolute(nr) anr[anr<.5].shape Out[27]: (33868,) This way I used may have problem when mask used has values which can affect the min max operation. So I would like to know if there is a standard formal ( python/numpy) way to handle masked array when they need to be subjected to boolean operations. with best regards, Sudheer *************************************************************** Sudheer Joseph Indian National Centre for Ocean Information Services Ministry of Earth Sciences, Govt. of India POST BOX NO: 21, IDA Jeedeemetla P.O. Via Pragathi Nagar,Kukatpally, Hyderabad; Pin:5000 55 Tel:+91-40-23886047(O),Fax:+91-40-23895011(O), Tel:+91-40-23044600(R),Tel:+91-40-9440832534(Mobile) E-mail:sjo.India@gmail.com;sudheer.joseph@yahoo.com Web- http://oppamthadathil.tripod.com *************************************************************** -------------------------------------------- On Thu, 13/3/14, Chris Barker - NOAA Federal <chris.barker@noaa.gov> wrote: Subject: Re: [Numpy-discussion] python array To: "Discussion of Numerical Python" <numpy-discussion@scipy.org> Date: Thursday, 13 March, 2014, 11:53 PM On Mar 13, 2014, at 9:39 AM, Nicolas Rougier <Nicolas.Rougier@inria.fr> wrote: > > Seems to be related to the masked values: Good hint -- a masked array keeps the "junk" values in the main array. What "abs" are you using -- it may not be mask-aware. ( you want a numpy abs anyway) Also -- I'm not sure I know what happens with Boolean operators on masked arrays when you use them to index. I'd investigate that. (sorry, not at a machine I can play with now) Chris > print r2010[:3,:3] > [[-- -- --] > [-- -- --] > [-- -- --]] > > print abs(r2010)[:3,:3] > [[-- -- --] > [-- -- --] > [-- -- --]] > > > print r2010[ r2010[:3,:3] <0 ] > [-- -- -- -- -- -- -- -- --] > > print r2010[ abs(r2010)[:3,:3] < 0] > [] > > Nicolas > > > > On 13 Mar 2014, at 16:52, Sudheer Joseph <sudheer.joseph@yahoo.com> wrote: > >> Dear experts, >> I am encountering a strange behaviour of python data array as below. I have been trying to use the data from a netcdf file(attached herewith) to do certain calculation using below code. If I take absolute value of the same array and look for values <.5 I get a different value than the original array. But the fact is that this particular case do not have any negative values in the array( but there are other files where it can have negative values so the condition is put). I do not see any reason for getting different numbers for values <.5 in case of bt and expected it to be same as that of r2010. If any one has a guess on what is behind this behaviour please help. >> >> >> In [14]: from netCDF4 import Dataset as nc >> >> In [15]: nf=nc('r2010.nc') >> In [16]: r2010=nf.variables['R2010'][:] >> In [17]: bt=abs(r2010) >> In [18]: bt[bt<=.5].shape >> Out[18]: (2872,) >> In [19]: r2010[r2010<.5].shape >> Out[19]: (36738,) >> >> >> bt.min() >> Out[20]: 0.0027588337040836768 >> >> In [21]: bt.max() >> Out[21]: 3.5078965479057089 >> In [22]: r2010.max() >> Out[22]: 3.5078965479057089 >> In [23]: r2010.min() >> Out[23]: 0.0027588337040836768 >> >> >> >> *************************************************************** >> Sudheer Joseph >> Indian National Centre for Ocean Information Services >> Ministry of Earth Sciences, Govt. of India >> POST BOX NO: 21, IDA Jeedeemetla P.O. >> Via Pragathi Nagar,Kukatpally, Hyderabad; Pin:5000 55 >> Tel:+91-40-23886047(O),Fax:+91-40-23895011(O), >> Tel:+91-40-23044600(R),Tel:+91-40-9440832534(Mobile) >> E-mail:sjo.India@gmail.com;sudheer.joseph@yahoo.com >> Web- http://oppamthadathil.tripod.com >> ***************************************************************<r2010.nc>_______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -----Inline Attachment Follows----- _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Dear Oslen, I had a detailed look at the example you send and points I got were below a = np.arange(-8, 8).reshape((4, 4)) b = ma.masked_array(a, mask=a < 0) Out[33]: b[b<4] masked_array(data = [-- -- -- -- -- -- -- -- 0 1 2 3], mask = [ True True True True True True True True False False False False], fill_value = 999999) In [34]: b[b<4].shape Out[34]: (12,) In [35]: b[b<4].data Out[35]: array([-8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3]) This shows while numpy can do the bolean operation and list the data meeting the criteria( by masking the data further), it do not actually allow us get the count of data that meets the crieteria. I was interested in count. Because my objective was to find out how many numbers in the grid fall under different catagory.( <=4 , >4 & <=8 , >8<=10) etc. and find the percentage of them. Is there a way to get the counts correctly ? that is my botheration now !! with best regards, Sudheer On Fri, 14/3/14, Brett Olsen <brett.olsen@gmail.com> wrote: Subject: Re: [Numpy-discussion] python array To: "Discussion of Numerical Python" <numpy-discussion@scipy.org> Date: Friday, 14 March, 2014, 2:07 AM The difference appears to be that the boolean selection pulls out all data values <= 0.5 whether or not they are masked, and then carries over the appropriate masks to the new array. So r2010 and bt contain identical unmasked values but different numbers of masked values. Because the initial fill value for your masked values was a large negative number, in r2010 those masked values are carried over. In bt, you've taken the absolute value of the data array, so those fill values are now positive and they are no longer carried over into the indexed array. Because the final arrays are still masked, you are observing no difference in the statistical properties of the arrays, only their sizes, because one contains many more masked values than the other. I don't think this should be a problem for your computations. If you're concerned, you could always explicitly demask them before your computations. See the example problem below. ~Brett In [61]: import numpy as np In [62]: import numpy.ma as ma In [65]: a = np.arange(-8, 8).reshape((4, 4)) In [66]: aOut[66]:array([[-8, -7, -6, -5], [-4, -3, -2, -1], [ 0, 1, 2, 3], [ 4, 5, 6, 7]]) In [68]: b = ma.masked_array(a, mask=a < 0) In [69]: b Out[69]:masked_array(data = [[-- -- -- --] [-- -- -- --] [0 1 2 3] [4 5 6 7]], mask = [[ True True True True] [ True True True True] [False False False False] [False False False False]], fill_value = 999999) In [70]: b.data Out[70]:array([[-8, -7, -6, -5], [-4, -3, -2, -1], [ 0, 1, 2, 3], [ 4, 5, 6, 7]]) In [71]: c = abs(b) In [72]: c[c <= 4].shapeOut[72]: (9L,) In [73]: b[b <= 4].shapeOut[73]: (13L,) In [74]: b[b <= 4]Out[74]:masked_array(data = [-- -- -- -- -- -- -- -- 0 1 2 3 4], mask = [ True True True True True True True True False False False False False], fill_value = 999999) In [75]: c[c <= 4] Out[75]:masked_array(data = [-- -- -- -- 0 1 2 3 4], mask = [ True True True True False False False False False], fill_value = 999999) On Thu, Mar 13, 2014 at 8:14 PM, Sudheer Joseph <sudheer.joseph@yahoo.com> wrote: Sorry, The below solution I thoght working was not working but was just giving array size. -------------------------------------------- On Fri, 14/3/14, Sudheer Joseph <sudheer.joseph@yahoo.com> wrote: Subject: Re: [Numpy-discussion] python array To: "Discussion of Numerical Python" <numpy-discussion@scipy.org> Date: Friday, 14 March, 2014, 1:09 AM Thank you very much Nicolas and Chris, The hint was helpful and from that I treid below steps ( a crude way I would say) and getting same result now I have been using abs available by default and it is the same with numpy.absolute( i checked). nr= ((r2010>r2010.min()) & (r2010<r2010.max())) nr[nr<.5].shape Out[25]: (33868,) anr=numpy.absolute(nr) anr[anr<.5].shape Out[27]: (33868,) This way I used may have problem when mask used has values which can affect the min max operation. So I would like to know if there is a standard formal ( python/numpy) way to handle masked array when they need to be subjected to boolean operations. with best regards, Sudheer *************************************************************** Sudheer Joseph Indian National Centre for Ocean Information Services Ministry of Earth Sciences, Govt. of India POST BOX NO: 21, IDA Jeedeemetla P.O. Via Pragathi Nagar,Kukatpally, Hyderabad; Pin:5000 55 Tel:+91-40-23886047(O),Fax:+91-40-23895011(O), Tel:+91-40-23044600(R),Tel:+91-40-9440832534(Mobile) E-mail:sjo.India@gmail.com;sudheer.joseph@yahoo.com Web- http://oppamthadathil.tripod.com *************************************************************** -------------------------------------------- On Thu, 13/3/14, Chris Barker - NOAA Federal <chris.barker@noaa.gov> wrote: Subject: Re: [Numpy-discussion] python array To: "Discussion of Numerical Python" <numpy-discussion@scipy.org> Date: Thursday, 13 March, 2014, 11:53 PM On Mar 13, 2014, at 9:39 AM, Nicolas Rougier <Nicolas.Rougier@inria.fr> wrote: > > Seems to be related to the masked values: Good hint -- a masked array keeps the "junk" values in the main array. What "abs" are you using -- it may not be mask-aware. ( you want a numpy abs anyway) Also -- I'm not sure I know what happens with Boolean operators on masked arrays when you use them to index. I'd investigate that. (sorry, not at a machine I can play with now) Chris > print r2010[:3,:3] > [[-- -- --] > [-- -- --] > [-- -- --]] > > print abs(r2010)[:3,:3] > [[-- -- --] > [-- -- --] > [-- -- --]] > > > print r2010[ r2010[:3,:3] <0 ] > [-- -- -- -- -- -- -- -- --] > > print r2010[ abs(r2010)[:3,:3] < 0] > [] > > Nicolas > > > > On 13 Mar 2014, at 16:52, Sudheer Joseph <sudheer.joseph@yahoo.com> wrote: > >> Dear experts, >> I am encountering a strange behaviour of python data array as below. I have been trying to use the data from a netcdf file(attached herewith) to do certain calculation using below code. If I take absolute value of the same array and look for values <.5 I get a different value than the original array. But the fact is that this particular case do not have any negative values in the array( but there are other files where it can have negative values so the condition is put). I do not see any reason for getting different numbers for values <.5 in case of bt and expected it to be same as that of r2010. If any one has a guess on what is behind this behaviour please help. >> >> >> In [14]: from netCDF4 import Dataset as nc >> >> In [15]: nf=nc('r2010.nc') >> In [16]: r2010=nf.variables['R2010'][:] >> In [17]: bt=abs(r2010) >> In [18]: bt[bt<=.5].shape >> Out[18]: (2872,) >> In [19]: r2010[r2010<.5].shape >> Out[19]: (36738,) >> >> >> bt.min() >> Out[20]: 0.0027588337040836768 >> >> In [21]: bt.max() >> Out[21]: 3.5078965479057089 >> In [22]: r2010.max() >> Out[22]: 3.5078965479057089 >> In [23]: r2010.min() >> Out[23]: 0.0027588337040836768 >> >> >> >> *************************************************************** >> Sudheer Joseph >> Indian National Centre for Ocean Information Services >> Ministry of Earth Sciences, Govt. of India >> POST BOX NO: 21, IDA Jeedeemetla P.O. >> Via Pragathi Nagar,Kukatpally, Hyderabad; Pin:5000 55 >> Tel:+91-40-23886047(O),Fax:+91-40-23895011(O), >> Tel:+91-40-23044600(R),Tel:+91-40-9440832534(Mobile) >> E-mail:sjo.India@gmail.com;sudheer.joseph@yahoo.com >> Web- http://oppamthadathil.tripod.com >> ***************************************************************<r2010.nc>_______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -----Inline Attachment Follows----- _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On 2014/03/13 9:09 PM, Sudheer Joseph wrote:
Dear Oslen,
I had a detailed look at the example you send and points I got were below
a = np.arange(-8, 8).reshape((4, 4)) b = ma.masked_array(a, mask=a < 0)
Out[33]: b[b<4] masked_array(data = [-- -- -- -- -- -- -- -- 0 1 2 3], mask = [ True True True True True True True True False False False False], fill_value = 999999) In [34]: b[b<4].shape Out[34]: (12,) In [35]: b[b<4].data Out[35]: array([-8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3])
This shows while numpy can do the bolean operation and list the data meeting the criteria( by masking the data further), it do not actually allow us get the count of data that meets the crieteria. I was interested in count. Because my objective was to find out how many numbers in the grid fall under different catagory.( <=4 , >4 & <=8 , >8<=10) etc. and find the percentage of them.
Is there a way to get the counts correctly ? that is my botheration now !!
Certainly. If all you need are statistics of the type you describe, where you are working with a 1-D array, then extract the unmasked values into an ordinary ndarray, and work with that: a = np.random.randn(100) am = np.ma.masked_less(a, -0.2) print am.count() # number of masked values a_nomask = am.compressed() print type(a_nomask) print a_nomask.shape # number of points with value less than 0.5: print (a_nomask < 0.5).sum() # (Boolean True is 1) # Or if you want the actual array of values, not just the count: a_nomask[a_nomask < 0.5] Eric
with best regards, Sudheer
Dear Oslen,
I had a detailed look at the example you send and
Thank you Eric, The compress is the option which is gets the correct numbers. a = np.arange(-8, 8).reshape((4, 4)) In [67]: b = ma.masked_array(a, mask=a < 0) In [68]: bb=b.compressed() In [69]: b[b<4].size Out[69]: 12 In [70]: bb=b.compressed() In [71]: bb[bb<=4].size Out[71]: 5 with best regards, Sudheer *************************************************************** Sudheer Joseph Indian National Centre for Ocean Information Services Ministry of Earth Sciences, Govt. of India POST BOX NO: 21, IDA Jeedeemetla P.O. Via Pragathi Nagar,Kukatpally, Hyderabad; Pin:5000 55 Tel:+91-40-23886047(O),Fax:+91-40-23895011(O), Tel:+91-40-23044600(R),Tel:+91-40-9440832534(Mobile) E-mail:sjo.India@gmail.com;sudheer.joseph@yahoo.com Web- http://oppamthadathil.tripod.com *************************************************************** -------------------------------------------- On Fri, 14/3/14, Eric Firing <efiring@hawaii.edu> wrote: Subject: Re: [Numpy-discussion] python array To: numpy-discussion@scipy.org Date: Friday, 14 March, 2014, 7:20 AM On 2014/03/13 9:09 PM, Sudheer Joseph wrote: points I got were below
a = np.arange(-8, 8).reshape((4, 4)) b = ma.masked_array(a, mask=a < 0)
Out[33]: b[b<4] masked_array(data = [-- -- -- -- -- -- -- -- 0 1 2 3],
fill_value =
In [34]: b[b<4].shape Out[34]: (12,) In [35]: b[b<4].data Out[35]: array([-8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3])
This shows while numpy can do the bolean operation and
mask = [ True True True True True True True True False False False False], 999999) list the data meeting the criteria( by masking the data further), it do not actually allow us get the count of data that meets the crieteria. I was interested in count. Because my objective was to find out how many numbers in the grid fall under different catagory.( <=4 , >4 & <=8 , >8<=10) etc. and find the percentage of them.
Is there a way to get the counts
correctly ? that is my botheration now !! Certainly. If all you need are statistics of the type you describe, where you are working with a 1-D array, then extract the unmasked values into an ordinary ndarray, and work with that: a = np.random.randn(100) am = np.ma.masked_less(a, -0.2) print am.count() # number of masked values a_nomask = am.compressed() print type(a_nomask) print a_nomask.shape # number of points with value less than 0.5: print (a_nomask < 0.5).sum() # (Boolean True is 1) # Or if you want the actual array of values, not just the count: a_nomask[a_nomask < 0.5] Eric
with best regards, Sudheer
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
participants (5)
-
Brett Olsen
-
Chris Barker - NOAA Federal
-
Eric Firing
-
Nicolas Rougier
-
Sudheer Joseph