[Numpy-discussion] bincount limitations

Thu Jun 2 13:39:08 EDT 2011

On Thu, Jun 2, 2011 at 1:11 PM, Robert Kern <robert.kern at gmail.com> wrote:
> On Thu, Jun 2, 2011 at 12:08, Skipper Seabold <jsseabold at gmail.com> wrote:
>> On Wed, Jun 1, 2011 at 10:10 PM, Alan G Isaac <alan.isaac at gmail.com> wrote:
>>>> On Thu, Jun 2, 2011 at 1:49 AM, Mark Miller<markperrymiller at gmail.com>  wrote:
>>>>> Not quite. Bincount is fine if you have a set of approximately
>>>>> sequential numbers. But if you don't....
>>>
>>>
>>> On 6/1/2011 9:35 PM, David Cournapeau wrote:
>>>> Even worse, it fails miserably if you sequential numbers but with a high shift.
>>>> np.bincount([100000001, 100000002]) # will take a lof of memory
>>>> Doing bincount with dict is faster in those cases.
>>>
>>>
>>> Since this discussion has turned shortcomings of bincount,
>>> may I ask why np.bincount([]) is not an empty array?
>>> Even more puzzling, why is np.bincount([],minlength=6)
>>> not a 6-array of zeros?
>>>
>>
>> Just looks like it wasn't coded that way, but it's low-hanging fruit.
>> Any objections to adding this behavior? This commit should take care
>> of it. Tests pass. Comments welcome, as I'm just getting my feet wet
>> here.
>>
>> https://github.com/jseabold/numpy/commit/133148880bba5fa3a11dfbb95cefb3da4f7970d5
>
> I would use np.zeros(5, dtype=int) in test_empty_with_minlength(), but
> otherwise, it looks good.
>

Ok, thanks. Made the change and removed the old test that it fails on
empty. Pull request.

https://github.com/numpy/numpy/pull/84

Skipper