[Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

Bruce Southey bsouthey at gmail.com
Sat Apr 5 14:01:45 EDT 2008


Hi,
I have been investigating Ticket #605 'Incorrect behavior of
numpy.histogram' (http://scipy.org/scipy/numpy/ticket/605 ).

The fix for this ticket really depends on what the expectations are
for the bin limits and different applications have different behavior.
Consequently, I think that feedback from the community is important.

I have attached a modified histogram function where I use a very
simple and obvious example:
r= numpy.array([1,2,2,3,3,3,4,4,4,4,5,5,5,5,5])
dbin=[2,3,4]

The current (Default) behavior provides the counts as array([2, 3,
9]). Here the values less than 2 are ignored and the last bin contains
all values greater than or equal to 4.

1) Should the first bin contain all values less than or equal to the
value of the first limit and the last bin contain all values greater
than the value of the last limit?
This produced the counts as: array([3, 3, 9]) (I termed this
'Accumulate' in the output).

2) Should any values outside than the range of the bins be excluded?
That is remove any value that is smaller than the lowest value of the
bin and higher than the highest value of the bin.
This produced the counts as: array([2, 3, 4]) (I termed this 'Exclude'
in the output)

3) Should there be extra bins for these values?
While I did not implement this option, it would provide the counts as:
array([1,2,3,4,5])

4) Is there some other expectation?

Thanks for any input,
Bruce
-------------- next part --------------
A non-text attachment was scrubbed...
Name: histo.py
Type: text/x-python
Size: 2944 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20080405/5bdc3e89/attachment.py>


More information about the NumPy-Discussion mailing list