[Numpy-discussion] Histogram does not preserve subclasses of ndarray (e.g. masked arrays)

Bruce Southey bsouthey at gmail.com
Thu Sep 2 16:47:57 EDT 2010


  On 09/02/2010 02:50 PM, Joe Kington wrote:
> Hi all,
>
> I just wanted to check if this would be considered a bug.
>
> numpy.histogram does not appear to preserve subclasses of ndarrays 
> (e.g. masked arrays).  This leads to considerable problems when 
> working with masked arrays. (As per this Stack Overflow question 
> <http://stackoverflow.com/questions/3610040/how-to-create-the-histogram-of-an-array-with-masked-values-in-numpy>)
>
> E.g.
>
> import numpy as np
> x = np.arange(100)
> x = np.ma.masked_where(x > 30, x)
>
> counts, bin_edges = np.histogram(x)
>
> yields:
> counts --> array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10])
> bin_edges --> array([  0. ,   9.9,  19.8,  29.7,  39.6,  49.5,  59.4,  
> 69.3,  79.2, 89.1,  99. ])
>
> I would have expected histogram to ignore the masked portion of the 
> data.  Is this a bug, or expected behavior?  I'll open a bug report, 
> if it's not expected behavior...
>
> This would appear to be easily fixed by using asanyarray rather than 
> asarray within histogram.  E.g. this diff for numpy/lib/function_base.py
> Index: function_base.py
> ===================================================================
> --- function_base.py    (revision 8604)
> +++ function_base.py    (working copy)
> @@ -132,9 +132,9 @@
>
>      """
>
> -    a = asarray(a)
> +    a = asanyarray(a)
>      if weights is not None:
> -        weights = asarray(weights)
> +        weights = asanyarray(weights)
>          if np.any(weights.shape != a.shape):
>              raise ValueError(
>                      'weights should have the same shape as a.')
> @@ -156,7 +156,7 @@
>              mx += 0.5
>          bins = linspace(mn, mx, bins+1, endpoint=True)
>      else:
> -        bins = asarray(bins)
> +        bins = asanyarray(bins)
>          if (np.diff(bins) < 0).any():
>              raise AttributeError(
>                      'bins must increase monotonically.')
>
> Thanks!
> -Joe
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
I would not call it a bug as this a known 'feature' of functions that 
use np.asarray().  You are welcome to file a enhancement bug but there 
are some issues that need to be addressed.

Typical questions that come to mind are:
1) Should a user be warned that the input is a masked array?
2) Should histogram count the number of masked values?
3) What is the expected output when normed=True?
4) What type of array should be the weights and bin arguments?
5) What is the dimensions of the weight and bin arguments since it only 
needs to have the number of bins?
6) If the input array is masked should the weight and bins arguments 
also be masked arrays when applicable? If so, what happens if the masks 
are in different locations between arrays?

Regards
Bruce

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20100902/2f4839aa/attachment.html>


More information about the NumPy-Discussion mailing list