[Numpy-discussion] Initial implementation of histogram_discrete()

Sat Nov 14 07:10:38 EST 2009

On Sat, Nov 14, 2009 at 6:53 AM, Priit Laes <plaes at plaes.org> wrote:
> Ühel kenal päeval, R, 2009-11-13 kell 13:36, kirjutas Ernest Adrogué:
>> 13/11/09 @ 09:41 (+0200), thus spake Priit Laes:
>> > Does anyone have a scenario where one would actually have both negative
>> > and positive numbers (integers) in the list?
>>
>> Yes: when you have a random variable that is the difference
>> of two (discrete) random variables. For example, if you measure
>> the difference in number of days off per week because of sickness
>> between two groups of people, you would end up with a discrete
>> variable with both positive and negative integers.
>>
>> > So, how about numpy.histogram_discrete() that returns data the way
>> > histogram() does: a list containing histogram values (ie counts) and
>> > list of sorted items from min(input)...max(input). ?
>>
>> In my humble opinion, it would be nice.
> \o/
>
> I have pushed the preliminary version to:
> http://github.com/plaes/numpy/commits/histogram_discrete
>
> It can currently handle datasets with negative items and weights. I'm
> also planning to add optional range argument to the function, but I
> first need to figure out how to parse the range=(min, max) using C
> API... ;)
>
> numpy.histogram_discrete() returns list containing histogram value and
> bins (hopefully this is the right definition)
>
> hist, bins = numpy.histogram_discrete(data)
>
> Example:
> In [1]: import numpy
> In [2]: data = numpy.random.poisson(3, 300)
> In [3]: numpy.histogram_discrete(data)
> Out[3]:
> [array([15, 50, 72, 59, 52, 34,  8,  7,  3]),
>  array([0, 1, 2, 3, 4, 5, 6, 7, 8])]
> In [4]:
> In [5]: data = [-1, 5]
> In [6]: numpy.histogram_discrete(data, weights=[2, 0])
> Out[6]:
> [array([ 2.,  0.,  0.,  0.,  0.,  0.,  0.]),
>  array([-1,  0,  1,  2,  3,  4,  5])]


Sorry, I still don't see much reason to do this in c

>>> data = [-1, 5]
>>> c=np.bincount(data-np.min(data),weights=[2,0])
>>> x=np.arange(np.min(data),np.min(data)+len(c))
>>> c,x
(array([ 2.,  0.,  0.,  0.,  0.,  0.,  0.]), array([-1,  0,  1,  2,
3,  4,  5]))
>>> data = [11,5]
>>> np.bincount(data,weights=[2,0])
array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  2.])
>>> np.arange(max(data)+1)
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
>>> c=np.bincount(data-np.min(data),weights=[2,0])
>>> x=np.arange(np.min(data),np.min(data)+len(c))
>>> c,x
(array([ 0.,  0.,  0.,  0.,  0.,  0.,  2.]), array([ 5,  6,  7,  8,
9, 10, 11]))

Josef


>
> Priit :)
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>