[Numpy-discussion] Broadcasting and indexing
Thomas Robitaille
thomas.robitaille at gmail.com
Thu Jan 21 11:37:09 EST 2010
Hello,
I'm trying to understand how array broadcasting can be used for indexing. In the following, I use the term 'row' to refer to the first dimension of a 2D array, and 'column' to the second, just because that's how numpy prints them out.
If I consider the following example:
>>> a = np.random.random((4,5))
>>> b = np.random.random((5,))
>>> a + b
array([[ 1.45499556, 0.60633959, 0.48236157, 1.55357393, 1.4339261 ],
[ 1.28614593, 1.11265001, 0.63308615, 1.28904227, 1.34070499],
[ 1.26988279, 0.84683018, 0.98959466, 0.76388223, 0.79273084],
[ 1.27859505, 0.9721984 , 1.02725009, 1.38852061, 1.56065028]])
I understand how this works, because it works as expected as described in
http://docs.scipy.org/doc/numpy/reference/ufuncs.html#broadcasting
So b gets broadcast to shape (1,5), then because the first dimension is 1, the operation is applied to all rows.
Now I am trying to apply this to array indexing. So for example, I want to set specific columns, indicated by a boolean array, to zero, but the following fails:
>>> c = np.array([1,0,1,0,1], dtype=bool)
>>> a[c] = 0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: index (4) out of range (0<=index<3) in dimension 0
However, if I try reducing the size of c to 4, then it works, and sets rows, not columns, equal to zero
>>> c = np.array([1,0,1,0], dtype=bool)
>>> a[c] = 0
>>> a
array([[ 0. , 0. , 0. , 0. , 0. ],
[ 0.41526315, 0.7425491 , 0.39872546, 0.56141914, 0.69795153],
[ 0. , 0. , 0. , 0. , 0. ],
[ 0.40771227, 0.60209749, 0.7928894 , 0.66089748, 0.91789682]])
But I would have thought that the indexing array would have been broadcast in the same way as for a sum, i.e. c would be broadcast to have dimensions (1,5) and then would have been able to set certain columns in all rows to zero.
Why is it that for indexing, the broadcasting seems to happen in a different way than when performing operations like additions or multiplications? For background info, I'm trying to write a routine which performs a set of operations on an n-d array, where n is not known in advance, with a 1D array, so I can use broadcasting rules for most operations without knowing the dimensionality of the n-d array, but now that I need to perform indexing, and the convention seems to change, this is a real issue.
Thanks in advance for any advice,
Thomas
More information about the NumPy-Discussion
mailing list