[Numpy-discussion] Vectorizing array updates

Wed Apr 29 16:52:09 EDT 2009

On Wed, Apr 29, 2009 at 08:03, Daniel Yarlett <daniel.yarlett at gmail.com> wrote:

> As you can see, Current is different in the two cases. Any ideas how I
> can recreate the behavior of the iterative process in a more numpy-
> friendly, vectorized (and hopefully quicker) way?

Use bincount().

> And possibly also
> about why my intuitions concerning the semantics of the vectorized
> code are wrong?

In Python, the statement

  x[indices] += y

turns into

  xi = x.__getitem__(indices)
  tmp = xi.__iadd__(y)
  x.__setitem__(indices, tmp)

The first statement necessarily copies the data. Then the __iadd__()
method modifies the copy in-place (tmp is xi after the operation for
numpy arrays, but not necessarily for other objects). Then the final
statement assigns the result back into the original array using fancy
indexing. Since there are duplicate indices, the last entry in tmp for
each duplicate index wins.

Because Python breaks up the syntax "x[indices] += y" into those three
discrete steps, information is lost on the way, and we cannot use that
syntax with the semantics of the loop.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco