Hello,
I'm trying to calculate a 1-d histogram of a distribution that contains mostly zeros,
and I'm having problems with examples where the values to be histogrammed fall
exactly on the bin boundaries:
For example, this gives me the expected results (entering the exact bin values):
>>> data
array([ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0.05, -0.05])
>>> bins_list = numpy.array([-0.1, -0.05, 0.0, 0.05, 0.1])
>>> (counts, edges) = numpy.histogram(data, bins=bins_list)
>>> counts
array([ 0, 1, 10, 1])
>>> edges
array([-0.1 , -0.05, 0. , 0.05, 0.1 ])
but this does not (generating the bin values via bumpy.arange):
>>> bins_arange = numpy.arange(-0.1, 0.101, 0.05)
>>> data
array([ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0.05, -0.05])
>>> bins_arange
array([-0.1 , -0.05, 0. , 0.05, 0.1 ])
>>> (counts, edges) = numpy.histogram(data, bins=bins_arange)
>>> counts
array([ 0, 1, 11, 0])
I'm assuming this is due to slight rounding in the calculation of bins_arange,
as compared to the manually entered values in bins_list.
What is the recommended way of getting the first set of results, without
having to manually enter all the values in the "bins" argument?
The following also gives me unexpected results:
>>> data
array([ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0.05, -0.05])
counts, edges) = numpy.histogram(data, range=(-0.1, 0.1), bins=4)
>>> counts
array([ 0, 1, 11, 0])
Thank you for any advice,
Catherine