Histogram bin definition
Hi all, I am busy documenting `histogram`, and the definition of a "bin" eludes me. Here is the behaviour that troubles me:
np.histogram([1,2,1], bins=[0, 1, 2, 3], new=True) (array([0, 2, 1]), array([0, 1, 2, 3]))
From this result, it seems as if a bin is defined as the half-open interval [right_edge, left_edge).
Now, looks what happens in the following case:
np.histogram([1,2,3], bins=[0,1,2,3], new=True) (array([0, 1, 2]), array([0, 1, 2, 3]))
Here, the last bin is defined by the closed interval [right_edge, left_edge]! Is this a bug, or a design consideration? Regards Stéfan
Hi Stefan, It's designed this way. The main reason is that the default bin edges are generated using linspace(a.min(), a.max(), bin) when bin is an integer. If we leave the rightmost edge open, then the histogram of a 100 items array will typically yield an histogram with 99 values because the maximum value is an outlier. I thought the least surprising behavior was to make sure that all items are counted. The other reason has to do with backward compatibility, I tried to avoid breakage for the simplest use case. `histogram(r, bins=10)` yields the same thing as `histogram(r, bins=10, new=True)` We could avoid the open ended edge by defining the edges by linspace(a.min(), a.max()+delta, bin), but people will wonder why the right edge is 3.000001 instead of 3. Cheers, David 2008/7/16 Stéfan van der Walt <stefan@sun.ac.za>:
Hi all,
I am busy documenting `histogram`, and the definition of a "bin" eludes me. Here is the behaviour that troubles me:
np.histogram([1,2,1], bins=[0, 1, 2, 3], new=True) (array([0, 2, 1]), array([0, 1, 2, 3]))
From this result, it seems as if a bin is defined as the half-open interval [right_edge, left_edge).
Now, looks what happens in the following case:
np.histogram([1,2,3], bins=[0,1,2,3], new=True) (array([0, 1, 2]), array([0, 1, 2, 3]))
Here, the last bin is defined by the closed interval [right_edge, left_edge]!
Is this a bug, or a design consideration?
Regards Stéfan _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
participants (2)
-
David Huard
-
Stéfan van der Walt