[Numpy-discussion] numpy histogram normed=True (bug / confusing behavior)
Nils Becker
n.becker at amolf.nl
Fri Aug 6 16:53:58 EDT 2010
Hi again,
first a correction: I posted
> I believe np.histogram(data, bins, normed=True) effectively does :
>>> np.histogram(data, bins, normed=False) / (bins[-1] - bins[0]).
>>>
>>> However, it _should_ do
>>> np.histogram(data, bins, normed=False) / bins_widths
but there is a normalization missing; it should read
I believe np.histogram(data, bins, normed=True) effectively does
np.histogram(data, bins, normed=False) / (bins[-1] - bins[0]) / data.sum()
However, it _should_ do
np.histogram(data, bins, normed=False) / bins_widths / data.sum()
Bruce Southey replied:
> As I recall, there as issues with this aspect.
> Please search the discussion regarding histogram especially David
> Huard's reply in this thread:
> http://thread.gmane.org/gmane.comp.python.numeric.general/22445
I think this discussion pertains to a switch in calling conventions
which happened at the time. The last reply of D. Huard (to me) seems to
say that they did not fix anything in the _old_ semantics, but that the
new semantics is expected to work properly.
I tried with an infinite bin:
counts, dmy = np.histogram([1,2,3,4], [0.5,1.5,np.inf])
counts
array([1,3])
ncounts, dmy = np.histogram([1,2,3,4], [0.5,1.5,np.inf], normed=1)
ncounts
array([0.,0.])
this also does not make a lot of sense to me. A better result would be
array([0.25, 0.]), since 25% of the points fall in the first bin; 75%
fall in the second but are spread out over an infinite interval, giving
0. This is what my second proposal would give. I cannot find anything
wrong with it so far...
Cheers, Nils
More information about the NumPy-Discussion
mailing list