[Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

Anne Archibald peridot.faceted at gmail.com
Sat Apr 5 15:54:27 EDT 2008


On 05/04/2008, Bruce Southey <bsouthey at gmail.com> wrote:

>  1) Should the first bin contain all values less than or equal to the
>  value of the first limit and the last bin contain all values greater
>  than the value of the last limit?
>  This produced the counts as: array([3, 3, 9]) (I termed this
>  'Accumulate' in the output).
>
>  2) Should any values outside than the range of the bins be excluded?
>  That is remove any value that is smaller than the lowest value of the
>  bin and higher than the highest value of the bin.
>  This produced the counts as: array([2, 3, 4]) (I termed this 'Exclude'
>  in the output)
>
>  3) Should there be extra bins for these values?
>  While I did not implement this option, it would provide the counts as:
>  array([1,2,3,4,5])

There's also a fourth option - raise an exception if any points are
outside the range.

I hope this is a question about defaults - really what I would most
want is to have the choice, as a keyword option. For the default, I
would be tempted to go with option 4, raising an exception. This seems
pretty much guaranteed not to produce surprising results: if there's
any question about what the results should be it produces an error,
and the user can run it again with specific instructions. Plus in many
contexts having any points that don't belong in any bin is the result
of a programming error.

Anne



More information about the NumPy-Discussion mailing list