[Numpy-discussion] numpy histogram normed=True (bug / confusing behavior)

josef.pktd at gmail.com josef.pktd at gmail.com
Fri Aug 6 12:37:45 EDT 2010


On Fri, Aug 6, 2010 at 11:46 AM, Nils Becker <n.becker at amolf.nl> wrote:
> Hi,
>
> I found what looks like a bug in histogram, when the option normed=True
> is used together with non-uniform bins.
>
> Consider this example:
>
> import numpy as np
> data = np.array([1, 2, 3, 4])
> bins = np.array([.5, 1.5, 4.5])
> bin_widths = np.diff(bins)
> (counts, dummy) = np.histogram(data, bins)
> (densities, dummy) = np.histogram(data, bins, normed=True)
>
> What this gives is:
>
> bin_widths
> array([ 1.,  3.])
>
> counts
> array([1, 3])
>
> densities
> array([ 0.1,  0.3])
>
> The documentation claims that histogram with normed=True gives a
> density, which integrates to 1. In this example, it is true that
> (densities * bin_widths).sum() is 1. However, clearly the data are
> equally spaced, so their density should be uniform and equal to 0.25.
> Note that (0.25 * bin_widths).sum() is also 1.
>
> I believe np.histogram(data, bins, normed=True) effectively does :
> np.histogram(data, bins, normed=False) / (bins[-1] - bins[0]).
>
> However, it _should_ do
> np.histogram(data, bins, normed=False) / bins_widths
>
> to get a true density over the data coordinate as a result. It's easy to
> fix by hand, but I think the documentation is at least misleading?!
>
> sorry if this has been discussed before; I did not find it anyway (numpy
> 1.3)

Either I also don't understand histogram or this is a bug.

>>> data = np.arange(1,10)
>>> bins = np.array([.5, 1.5, 4.5, 7.5, 8.5, 9.5])
>>> np.histogram(data, bins, normed=True)
(array([ 0.04761905,  0.14285714,  0.14285714,  0.04761905,
0.04761905]), array([ 0.5,  1.5,  4.5,  7.5,  8.5,  9.5]))
>>> np.histogram(data, bins)
(array([1, 3, 3, 1, 1]), array([ 0.5,  1.5,  4.5,  7.5,  8.5,  9.5]))
>>> np.diff(bins)
array([ 1.,  3.,  3.,  1.,  1.])

I don't see what the normed=True numbers are in this case.

>>> np.array([ 1.,  3.,  3.,  1.,  1.])/7
array([ 0.14285714,  0.42857143,  0.42857143,  0.14285714,  0.14285714])

Josef

>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



More information about the NumPy-Discussion mailing list