nonuniform scatter operations
Hello, Is there an efficient way to implement a nonuniform gather operation in numpy? Specifically, I want to do something like n,m = 100,1000 X = random.uniform(size=n) K = random.randint(n, size=m) Y = random.uniform(size=m) for k,y in zip(K,Y): X[k] += y but I want it to be fast. The naive attempt "X[K] += Y" does not work, since the slice assumes the indices don't repeat. Thanks, Geoffrey
On Sun, Sep 28, 2008 at 12:34 AM, Geoffrey Irving <irving@naml.us> wrote:
Is there an efficient way to implement a nonuniform gather operation in numpy? Specifically, I want to do something like
n,m = 100,1000 X = random.uniform(size=n) K = random.randint(n, size=m) Y = random.uniform(size=m)
for k,y in zip(K,Y): X[k] += y
but I want it to be fast. The naive attempt "X[K] += Y" does not work, since the slice assumes the indices don't repeat.
I don't know of numpy solution, but in scipy you could use a sparse matrix to perform the operation. I think the following does what you want. from scipy.sparse import coo_matrix X += coo_matrix( (Y, (K,zeros(m,dtype=int)), shape=(n,1)).sum(axis=1) This reduces to a simple C++ loop, so speed should be good: http://projects.scipy.org/scipy/scipy/browser/trunk/scipy/sparse/sparsetools... -- Nathan Bell wnbell@gmail.com http://graphics.cs.uiuc.edu/~wnbell/
On Sat, Sep 27, 2008 at 10:01 PM, Nathan Bell <wnbell@gmail.com> wrote:
On Sun, Sep 28, 2008 at 12:34 AM, Geoffrey Irving <irving@naml.us> wrote:
Is there an efficient way to implement a nonuniform gather operation in numpy? Specifically, I want to do something like
n,m = 100,1000 X = random.uniform(size=n) K = random.randint(n, size=m) Y = random.uniform(size=m)
for k,y in zip(K,Y): X[k] += y
but I want it to be fast. The naive attempt "X[K] += Y" does not work, since the slice assumes the indices don't repeat.
I don't know of numpy solution, but in scipy you could use a sparse matrix to perform the operation. I think the following does what you want.
from scipy.sparse import coo_matrix X += coo_matrix( (Y, (K,zeros(m,dtype=int)), shape=(n,1)).sum(axis=1)
This reduces to a simple C++ loop, so speed should be good: http://projects.scipy.org/scipy/scipy/browser/trunk/scipy/sparse/sparsetools...
Thanks. That works great. A slightly cleaner version is X += coo_matrix((Y, (K, zeros_like(K)))).sum(axis=1) The next question is: is there a similar way that generalizes to the case where X is n by 3 and Y is m by 3 (besides the obvious loop over range(3), that is)? Geoffrey
On Sun, Sep 28, 2008 at 4:15 PM, Geoffrey Irving <irving@naml.us> wrote:
Thanks. That works great. A slightly cleaner version is
X += coo_matrix((Y, (K, zeros_like(K)))).sum(axis=1)
The next question is: is there a similar way that generalizes to the case where X is n by 3 and Y is m by 3 (besides the obvious loop over range(3), that is)?
You could flatten the arrays and make a single matrix that implemented the operation. I'd stick with the loop over range(3) though, it's more readable and likely to be as fast or faster than flattening the arrays yourself. -- Nathan Bell wnbell@gmail.com http://graphics.cs.uiuc.edu/~wnbell/
2008/9/28 Geoffrey Irving <irving@naml.us>:
Is there an efficient way to implement a nonuniform gather operation in numpy? Specifically, I want to do something like
n,m = 100,1000 X = random.uniform(size=n) K = random.randint(n, size=m) Y = random.uniform(size=m)
for k,y in zip(K,Y): X[k] += y
but I want it to be fast. The naive attempt "X[K] += Y" does not work, since the slice assumes the indices don't repeat.
I believe histogram can be persuaded to do this. Anne
participants (3)
-
Anne Archibald
-
Geoffrey Irving
-
Nathan Bell