[Numpy-discussion] python array

Sudheer Joseph sudheer.joseph at yahoo.com
Fri Mar 14 03:09:33 EDT 2014


Dear Oslen,
                         
I had  a detailed look at the example you send and points I got were below

a = np.arange(-8, 8).reshape((4, 4))
b = ma.masked_array(a, mask=a < 0)


Out[33]: b[b<4]
masked_array(data = [-- -- -- -- -- -- -- -- 0 1 2 3],
             mask = [ True  True  True  True  True  True  True  True False False False False],
       fill_value = 999999)
In [34]: b[b<4].shape
Out[34]: (12,)
In [35]: b[b<4].data
Out[35]: array([-8, -7, -6, -5, -4, -3, -2, -1,  0,  1,  2,  3])

This shows while numpy can do the bolean operation and list the data meeting the criteria( by masking the data further), it do not actually allow us get the count of data that meets the crieteria. I was interested in count. Because my objective was to find out how many numbers in the grid fall under different catagory.( <=4 , >4 & <=8 , >8<=10) etc. and find the percentage of them.

 Is there a way to get the counts correctly ? that is my botheration now !!

with best regards,
Sudheer









On Fri, 14/3/14, Brett Olsen <brett.olsen at gmail.com> wrote:

 Subject: Re: [Numpy-discussion] python array
 To: "Discussion of Numerical Python" <numpy-discussion at scipy.org>
 Date: Friday, 14 March, 2014, 2:07 AM
 
 The difference appears
 to be that the boolean selection pulls out all data values
 <= 0.5 whether or not they are masked, and then carries
 over the appropriate masks to the new array.  So r2010 and
 bt contain identical unmasked values but different numbers
 of masked values.  Because the initial fill value for your
 masked values was a large negative number, in r2010 those
 masked values are carried over.  In bt, you've taken
 the absolute value of the data array, so those fill values
 are now positive and they are no longer carried over into
 the indexed array.
 
 Because the final arrays are still masked, you
 are observing no difference in the statistical properties of
 the arrays, only their sizes, because one contains many more
 masked values than the other.  I don't think this
 should be a problem for your computations. If you're
 concerned, you could always explicitly demask them before
 your computations.  See the example problem below.
 
 ~Brett
 In [61]: import numpy as np
 In [62]: import numpy.ma as ma
 In [65]: a = np.arange(-8, 8).reshape((4,
 4))
 
 In [66]:
 aOut[66]:array([[-8, -7, -6,
 -5],       [-4, -3, -2, -1],   
    [ 0,  1,  2,  3],       [ 4,  5,
  6,  7]])
 In [68]: b = ma.masked_array(a, mask=a < 0)
 
 
 In [69]: b
 Out[69]:masked_array(data
 = [[-- -- -- --] [-- -- --
 --] [0 1 2 3] [4 5 6
 7]],             mask =
  [[ True  True  True  True] [ True  True
  True  True] [False False False
 False] [False False False False]], 
      fill_value = 999999)
 In [70]: b.data
 Out[70]:array([[-8, -7, -6,
 -5],       [-4, -3, -2, -1],   
    [ 0,  1,  2,  3],       [ 4,  5,
  6,  7]])
 In [71]: c = abs(b)
 
 In [72]: c[c <= 4].shapeOut[72]:
 (9L,)
 In [73]: b[b <= 4].shapeOut[73]:
 (13L,)
 In [74]: b[b <=
 4]Out[74]:masked_array(data = [-- --
 -- -- -- -- -- -- 0 1 2 3 4],
              mask = [ True  True  True  True
  True  True  True  True False False False
 False False],       fill_value =
 999999)
 
 In [75]: c[c <= 4]
 Out[75]:masked_array(data = [-- -- -- -- 0 1
 2 3 4],             mask = [ True  True
  True  True False False False False False], 
      fill_value = 999999)
 
 
 On Thu, Mar 13, 2014 at
 8:14 PM, Sudheer Joseph <sudheer.joseph at yahoo.com>
 wrote:
 
 Sorry,
 
            The below solution I thoght working was not
 working but was just giving array size.
 
 
 
 --------------------------------------------
 
 On Fri, 14/3/14, Sudheer Joseph
 <sudheer.joseph at yahoo.com>
 wrote:
 
 
 
  Subject: Re: [Numpy-discussion] python array
 
  To: "Discussion of Numerical Python" <numpy-discussion at scipy.org>
 
  Date: Friday, 14 March, 2014, 1:09 AM
 
 
 
  Thank you very much Nicolas and
 
  Chris,
 
                 
 
               The
 
  hint was helpful and from that I treid below steps ( a
 crude
 
  way I would say) and getting same result now
 
 
 
  I have been using abs available by default and it is the
 
  same with numpy.absolute( i checked).
 
 
 
  nr= ((r2010>r2010.min()) & (r2010<r2010.max()))
 
  nr[nr<.5].shape
 
  Out[25]: (33868,)
 
  anr=numpy.absolute(nr)
 
  anr[anr<.5].shape
 
  Out[27]: (33868,)
 
 
 
  This way I used may have problem when mask used has
 values
 
  which can affect the min max operation.
 
 
 
  So I would like to know if there is a standard formal (
 
  python/numpy) way to handle masked array when they need
 to
 
  be subjected to boolean operations.
 
 
 
  with best regards,
 
  Sudheer
 
 
 
 
 
  ***************************************************************
 
  Sudheer Joseph         
 
  Indian National Centre for Ocean Information Services
 
  Ministry of Earth Sciences, Govt. of India
 
  POST BOX NO: 21, IDA Jeedeemetla P.O.
 
  Via Pragathi Nagar,Kukatpally, Hyderabad; Pin:5000 55
 
  Tel:+91-40-23886047(O),Fax:+91-40-23895011(O),
 
  Tel:+91-40-23044600(R),Tel:+91-40-9440832534(Mobile)
 
  E-mail:sjo.India at gmail.com;sudheer.joseph at yahoo.com
 
  Web- http://oppamthadathil.tripod.com
 
  ***************************************************************
 
 
 
  --------------------------------------------
 
  On Thu, 13/3/14, Chris Barker - NOAA Federal <chris.barker at noaa.gov>
 
  wrote:
 
 
 
   Subject: Re: [Numpy-discussion] python array
 
   To: "Discussion of Numerical Python" <numpy-discussion at scipy.org>
 
   Date: Thursday, 13 March, 2014, 11:53 PM
 
 
 
   On Mar 13, 2014, at 9:39 AM, Nicolas
 
   Rougier <Nicolas.Rougier at inria.fr>
 
   wrote:
 
 
 
   >
 
   > Seems to be related to the masked values:
 
 
 
   Good hint -- a masked array keeps the "junk"
 values in the
 
   main array.
 
 
 
   What "abs" are you using -- it may not be
 mask-aware. (
 
  you
 
   want a
 
   numpy abs anyway)
 
 
 
   Also -- I'm not sure I know what happens with
 Boolean
 
   operators on
 
   masked arrays when you use them to index. I'd
 investigate
 
   that.
 
   (sorry, not at a machine I can play with now)
 
 
 
   Chris
 
 
 
 
 
   > print r2010[:3,:3]
 
   > [[-- -- --]
 
   > [-- -- --]
 
   > [-- -- --]]
 
   >
 
   > print abs(r2010)[:3,:3]
 
   > [[-- -- --]
 
   > [-- -- --]
 
   > [-- -- --]]
 
   >
 
   >
 
   > print r2010[ r2010[:3,:3] <0 ]
 
   > [-- -- -- -- -- -- -- -- --]
 
   >
 
   > print r2010[ abs(r2010)[:3,:3] < 0]
 
   > []
 
   >
 
   > Nicolas
 
   >
 
   >
 
   >
 
   > On 13 Mar 2014, at 16:52, Sudheer Joseph <sudheer.joseph at yahoo.com>
 
   wrote:
 
   >
 
   >> Dear experts,
 
   >>             
 
          I am encountering a strange
 
   behaviour of python data array as below. I have been
 
  trying
 
   to use the data from a netcdf file(attached herewith) to
 
  do
 
   certain calculation using below code. If I take absolute
 
   value of the same array and look for values <.5  I
 
   get a different value than the original array. But the
 
  fact
 
   is that this particular case do not have any negative
 
  values
 
   in the array( but there are other files where it can
 have
 
   negative values so the condition is put). I do not see
 any
 
   reason for getting different numbers for values <.5
 in
 
   case of bt and expected it to be same as that of r2010.
 If
 
   any one has a guess on what is behind this behaviour
 
  please
 
   help.
 
   >>
 
   >>
 
   >> In [14]: from netCDF4 import Dataset as nc
 
   >>
 
   >> In [15]: nf=nc('r2010.nc')
 
   >> In [16]: r2010=nf.variables['R2010'][:]
 
   >> In [17]: bt=abs(r2010)
 
   >> In [18]: bt[bt<=.5].shape
 
   >> Out[18]: (2872,)
 
   >> In [19]: r2010[r2010<.5].shape
 
   >> Out[19]: (36738,)
 
   >>
 
   >>
 
   >> bt.min()
 
   >> Out[20]: 0.0027588337040836768
 
   >>
 
   >> In [21]: bt.max()
 
   >> Out[21]: 3.5078965479057089
 
   >> In [22]: r2010.max()
 
   >> Out[22]: 3.5078965479057089
 
   >> In [23]: r2010.min()
 
   >> Out[23]: 0.0027588337040836768
 
   >>
 
   >>
 
   >>
 
   >>
 
  
 ***************************************************************
 
   >> Sudheer Joseph
 
   >> Indian National Centre for Ocean Information
 
   Services
 
   >> Ministry of Earth Sciences, Govt. of India
 
   >> POST BOX NO: 21, IDA Jeedeemetla P.O.
 
   >> Via Pragathi Nagar,Kukatpally, Hyderabad;
 
  Pin:5000
 
   55
 
   >> Tel:+91-40-23886047(O),Fax:+91-40-23895011(O),
 
   >>
 
   Tel:+91-40-23044600(R),Tel:+91-40-9440832534(Mobile)
 
   >> E-mail:sjo.India at gmail.com;sudheer.joseph at yahoo.com
 
   >> Web- http://oppamthadathil.tripod.com
 
   >>
 
  
 ***************************************************************<r2010.nc>_______________________________________________
 
   >> NumPy-Discussion mailing list
 
   >> NumPy-Discussion at scipy.org
 
   >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
   >
 
   > _______________________________________________
 
   > NumPy-Discussion mailing list
 
   > NumPy-Discussion at scipy.org
 
   > http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
   _______________________________________________
 
   NumPy-Discussion mailing list
 
   NumPy-Discussion at scipy.org
 
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
  _______________________________________________
 
  NumPy-Discussion mailing list
 
  NumPy-Discussion at scipy.org
 
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
 _______________________________________________
 
 NumPy-Discussion mailing list
 
 NumPy-Discussion at scipy.org
 
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
 
 -----Inline Attachment Follows-----
 
 _______________________________________________
 NumPy-Discussion mailing list
 NumPy-Discussion at scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 



More information about the NumPy-Discussion mailing list