[Numpy-discussion] Vectorizing array updates

Anne Archibald peridot.faceted at gmail.com
Wed Apr 29 18:45:21 EDT 2009


2009/4/29 Dan Goodman <dg.gmane at thesamovar.net>:
> Robert Kern wrote:
>> On Wed, Apr 29, 2009 at 16:19, Dan Goodman <dg.gmane at thesamovar.net> wrote:
>>> Robert Kern wrote:
>>>> On Wed, Apr 29, 2009 at 08:03, Daniel Yarlett <daniel.yarlett at gmail.com> wrote:
>>>>
>>>>> As you can see, Current is different in the two cases. Any ideas how I
>>>>> can recreate the behavior of the iterative process in a more numpy-
>>>>> friendly, vectorized (and hopefully quicker) way?
>>>> Use bincount().
>>> Neat. Is there a memory efficient way of doing it if the indices are
>>> very large but there aren't many of them? e.g. if the indices were
>>> I=[10000, 20000] then bincount would create a gigantic array of 20000
>>> elements for just two addition operations!
>>
>> indices -= indices.min()
>>
>
> Ah OK, but bincount is still going to make an array of 10000 elements in
> that case...
>
> I came up with this trick, but I wonder whether it's overkill:
>
> Supposing you want to do y+=x[I] where x is big and indices in I are
> large but sparse (although potentially including repeats).
>
> J=unique(I)
> K=digitize(I,J)-1
> b=bincount(K)
> y+=b*x[J]
>
> Well it seems to work, but surely there must be a nicer way?

Well, a few lines serve to rearrange the indices array so it contains
each number only once, and then to add together (with bincount) all
values with the same index:

ix = np.unique1d(indices)
reverse_index = np.searchsorted(ix, indices)
r = np.bincount(reverse_index, weights=values)

You can then do:

y[ix] += r

All this could be done away with if bincount supported input and
output arguments. Then

y[indices] += values

could become

np.bincount(indices, weights=values, input=y, output=y)

implemented y the same nice tight loop one would use in C. (I'm not
sure bincount is the right name for such a general function, though.)

This question comes up fairly often on the mailing list, so it would
be nice to have a single function to point to.


Anne



More information about the NumPy-Discussion mailing list