Looks this could be a float32 vs float64 problem: In [19]: data32 = np.array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.05, -0.05], dtype=np.float32) In [20]: data64 = np.array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.05, -0.05], dtype=np.float64) In [21]: bins32 = np.arange(-0.1, 0.101, 0.05, dtype=np.float32) In [22]: bins64 = np.arange(-0.1, 0.101, 0.05, dtype=np.float64) In [23]: np.histogram(data32, bins32) Out[23]: (array([ 0, 1, 10, 1]), array([-0.1 , -0.05, 0. , 0.05, 0.1 ], dtype=float32)) In [24]: np.histogram(data32, bins64) Out[24]: (array([ 1, 0, 10, 1]), array([-0.1 , -0.05, 0. , 0.05, 0.1 ])) In [25]: np.histogram(data64, bins32) Out[25]: (array([ 0, 1, 11, 0]), array([-0.1 , -0.05, 0. , 0.05, 0.1 ], dtype=float32)) In [26]: np.histogram(data64, bins64) Out[26]: (array([ 0, 1, 10, 1]), array([-0.1 , -0.05, 0. , 0.05, 0.1 ])) I guess users always be very careful when mixing floating point types, but should numpy prevent (or warn) the user from doing so in this case? On Wed, Jul 2, 2014 at 10:07 AM, Mark Szepieniec <mszepien@gmail.com> wrote:
Hi Catherine,
I can't reproduce your issue with bins_list vs. bins_arange, but passing both range and number of bins to np.histogram does give the same strange behavior for me:
In [16]: data = np.array([ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.05, -0.05])
In [17]: bins_list = np.array([-0.1, -0.05, 0.0, 0.05, 0.1])
In [18]: np.histogram(data, bins=bins_list) Out[18]: (array([ 0, 1, 10, 1]), array([-0.1 , -0.05, 0. , 0.05, 0.1 ]))
In [19]: bins_arange = np.arange(-0.1, 0.101, 0.05)
In [20]: np.histogram(data, bins=bins_arange) Out[20]: (array([ 0, 1, 10, 1]), array([-0.1 , -0.05, 0. , 0.05, 0.1 ]))
In [21]: np.histogram(data, range=(-0.1, 0.1), bins=4) Out[21]: (array([ 0, 1, 11, 0]), array([-0.1 , -0.05, 0. , 0.05, 0.1 ]))
In [22]: np.version.version Out[22]: '1.8.1'
Looks like the 0.05 value of data is being binned differently in the last case, but I'm not sure why either...
Mark
On Wed, Jul 2, 2014 at 2:05 AM, Chris Barker <chris.barker@noaa.gov> wrote:
A few thoughts:
1) don't use arange() for flaoting point numbers, use linspace().
2) histogram1d is a floating point function, and you shouldn't expect exact results for floating point -- in particular, values exactly at the bin boundaries are likely to be "uncertain" -- not quite the right word, but you get the idea.
3) if you expect have a lot of certain specific values, say, integers, or zeros -- then you don't want your bin boundaries to be exactly at the value -- they should be between the expected values.
4) remember that histogramming is inherently sensitive to bin position anyway -- if these small bin-boundary differences matter, than you may not be using teh best approach.
-HTH, -Chris
data array([ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.05, -0.05]) bins_list = numpy.array([-0.1, -0.05, 0.0, 0.05, 0.1]) (counts, edges) = numpy.histogram(data, bins=bins_list) counts array([ 0, 1, 10, 1]) edges array([-0.1 , -0.05, 0. , 0.05, 0.1 ])
but this does not (generating the bin values via bumpy.arange):
bins_arange = numpy.arange(-0.1, 0.101, 0.05) data array([ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.05, -0.05]) bins_arange array([-0.1 , -0.05, 0. , 0.05, 0.1 ]) (counts, edges) = numpy.histogram(data, bins=bins_arange) counts array([ 0, 1, 11, 0])
I'm assuming this is due to slight rounding in the calculation of bins_arange, as compared to the manually entered values in bins_list.
What is the recommended way of getting the first set of results, without having to manually enter all the values in the "bins" argument?
The following also gives me unexpected results:
data array([ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.05, -0.05]) counts, edges) = numpy.histogram(data, range=(-0.1, 0.1), bins=4) counts array([ 0, 1, 11, 0])
Thank you for any advice,
Catherine _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion