Hi Catherine,

I can't reproduce your issue with bins_list vs. bins_arange, but passing both range and number of bins to np.histogram does give the same strange behavior for me:

In [16]: data = np.array([ 0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,
        0.  ,  0.05, -0.05])

In [17]: bins_list = np.array([-0.1, -0.05, 0.0, 0.05, 0.1])

In [18]: np.histogram(data, bins=bins_list)
Out[18]: (array([ 0,  1, 10,  1]), array([-0.1 , -0.05,  0.  ,  0.05,  0.1 ]))

In [19]: bins_arange = np.arange(-0.1, 0.101, 0.05)

In [20]: np.histogram(data, bins=bins_arange)
Out[20]: (array([ 0,  1, 10,  1]), array([-0.1 , -0.05,  0.  ,  0.05,  0.1 ]))

In [21]: np.histogram(data, range=(-0.1, 0.1), bins=4)
Out[21]: (array([ 0,  1, 11,  0]), array([-0.1 , -0.05,  0.  ,  0.05,  0.1 ]))

In [22]: np.version.version
Out[22]: '1.8.1'

Looks like the 0.05 value of data is being binned differently in the last case, but I'm not sure why either...

Mark


On Wed, Jul 2, 2014 at 2:05 AM, Chris Barker <chris.barker@noaa.gov> wrote:
A few thoughts:

1) don't use arange() for flaoting point numbers, use linspace().

2) histogram1d is a floating point function, and you shouldn't expect exact results for floating point -- in particular, values exactly at the bin boundaries are likely to be "uncertain" -- not quite the right word, but you get the idea.

3) if you expect have a lot of certain specific values, say, integers, or zeros -- then you don't want your bin boundaries to be exactly at the value -- they should be between the expected values.

4) remember that histogramming is inherently sensitive to bin position anyway -- if these small bin-boundary differences matter, than you may not be using teh best approach.

-HTH,
  -Chris






>>> data
array([ 0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,
        0.  ,  0.05, -0.05])
>>> bins_list = numpy.array([-0.1, -0.05, 0.0, 0.05, 0.1])
>>> (counts, edges) = numpy.histogram(data, bins=bins_list)
>>> counts
array([ 0,  1, 10,  1])
>>> edges
array([-0.1 , -0.05,  0.  ,  0.05,  0.1 ])



but this does not (generating the bin values via bumpy.arange):

>>> bins_arange = numpy.arange(-0.1, 0.101, 0.05)
>>> data
array([ 0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,
        0.  ,  0.05, -0.05])
>>> bins_arange
array([-0.1 , -0.05,  0.  ,  0.05,  0.1 ])
>>> (counts, edges) = numpy.histogram(data, bins=bins_arange)
>>> counts
array([ 0,  1, 11,  0])

I'm assuming this is due to slight rounding in the calculation of bins_arange,
as compared to the manually entered values in bins_list.

What is the recommended way of getting the first set of results, without
having to manually enter all the values in the "bins" argument?

The following also gives me unexpected results:

>>> data
array([ 0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,
        0.  ,  0.05, -0.05])
counts, edges) = numpy.histogram(data, range=(-0.1, 0.1), bins=4)
>>> counts
array([ 0,  1, 11,  0])



Thank you for any advice,

Catherine
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion



--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@noaa.gov

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion