
Hello, I'm trying to understand how array broadcasting can be used for indexing. In the following, I use the term 'row' to refer to the first dimension of a 2D array, and 'column' to the second, just because that's how numpy prints them out. If I consider the following example:
a = np.random.random((4,5)) b = np.random.random((5,)) a + b array([[ 1.45499556, 0.60633959, 0.48236157, 1.55357393, 1.4339261 ], [ 1.28614593, 1.11265001, 0.63308615, 1.28904227, 1.34070499], [ 1.26988279, 0.84683018, 0.98959466, 0.76388223, 0.79273084], [ 1.27859505, 0.9721984 , 1.02725009, 1.38852061, 1.56065028]])
I understand how this works, because it works as expected as described in http://docs.scipy.org/doc/numpy/reference/ufuncs.html#broadcasting So b gets broadcast to shape (1,5), then because the first dimension is 1, the operation is applied to all rows. Now I am trying to apply this to array indexing. So for example, I want to set specific columns, indicated by a boolean array, to zero, but the following fails:
c = np.array([1,0,1,0,1], dtype=bool) a[c] = 0 Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: index (4) out of range (0<=index<3) in dimension 0
However, if I try reducing the size of c to 4, then it works, and sets rows, not columns, equal to zero
c = np.array([1,0,1,0], dtype=bool) a[c] = 0 a array([[ 0. , 0. , 0. , 0. , 0. ], [ 0.41526315, 0.7425491 , 0.39872546, 0.56141914, 0.69795153], [ 0. , 0. , 0. , 0. , 0. ], [ 0.40771227, 0.60209749, 0.7928894 , 0.66089748, 0.91789682]])
But I would have thought that the indexing array would have been broadcast in the same way as for a sum, i.e. c would be broadcast to have dimensions (1,5) and then would have been able to set certain columns in all rows to zero. Why is it that for indexing, the broadcasting seems to happen in a different way than when performing operations like additions or multiplications? For background info, I'm trying to write a routine which performs a set of operations on an n-d array, where n is not known in advance, with a 1D array, so I can use broadcasting rules for most operations without knowing the dimensionality of the n-d array, but now that I need to perform indexing, and the convention seems to change, this is a real issue. Thanks in advance for any advice, Thomas

Hi Thomas, broadcasting rules are only for ufuncs (and by extension, some numpy functions using ufuncs). Indexing obeys different rules and always starts by the first dimension. However, you don't have to use broadcasting for such indexing operations:
a[:, c] = 0 zeroes columns indexed by c.
If you want to index along the 3rd dimension, you can use a[:, :, c], etc. If the dimension along which you index is a variable, you can also use the function np.rollaxis that allows to change the order of the dimensions of an array. You may then index along the first dimension (a[c]), then change back the order of the dimensions. Here is an example:
a = np.ones((3,4,5,6)) c = np.array([1,0,1,0,1], dtype=bool) tmp_a = np.rollaxis(a, 2, 0) tmp_a.shape (5, 3, 4, 6) tmp_a[c] = 0 a = np.rollaxis(tmp_a, 0, 3) a.shape (3, 4, 5, 6)
Hope this helps. Cheers, Emmanuelle On Thu, Jan 21, 2010 at 11:37:09AM -0500, Thomas Robitaille wrote:
Hello,
I'm trying to understand how array broadcasting can be used for indexing. In the following, I use the term 'row' to refer to the first dimension of a 2D array, and 'column' to the second, just because that's how numpy prints them out.
If I consider the following example:
a = np.random.random((4,5)) b = np.random.random((5,)) a + b array([[ 1.45499556, 0.60633959, 0.48236157, 1.55357393, 1.4339261 ], [ 1.28614593, 1.11265001, 0.63308615, 1.28904227, 1.34070499], [ 1.26988279, 0.84683018, 0.98959466, 0.76388223, 0.79273084], [ 1.27859505, 0.9721984 , 1.02725009, 1.38852061, 1.56065028]])
I understand how this works, because it works as expected as described in
http://docs.scipy.org/doc/numpy/reference/ufuncs.html#broadcasting
So b gets broadcast to shape (1,5), then because the first dimension is 1, the operation is applied to all rows.
Now I am trying to apply this to array indexing. So for example, I want to set specific columns, indicated by a boolean array, to zero, but the following fails:
c = np.array([1,0,1,0,1], dtype=bool) a[c] = 0 Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: index (4) out of range (0<=index<3) in dimension 0
However, if I try reducing the size of c to 4, then it works, and sets rows, not columns, equal to zero
c = np.array([1,0,1,0], dtype=bool) a[c] = 0 a array([[ 0. , 0. , 0. , 0. , 0. ], [ 0.41526315, 0.7425491 , 0.39872546, 0.56141914, 0.69795153], [ 0. , 0. , 0. , 0. , 0. ], [ 0.40771227, 0.60209749, 0.7928894 , 0.66089748, 0.91789682]])
But I would have thought that the indexing array would have been broadcast in the same way as for a sum, i.e. c would be broadcast to have dimensions (1,5) and then would have been able to set certain columns in all rows to zero.
Why is it that for indexing, the broadcasting seems to happen in a different way than when performing operations like additions or multiplications? For background info, I'm trying to write a routine which performs a set of operations on an n-d array, where n is not known in advance, with a 1D array, so I can use broadcasting rules for most operations without knowing the dimensionality of the n-d array, but now that I need to perform indexing, and the convention seems to change, this is a real issue.
Thanks in advance for any advice,
Thomas _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Thu, Jan 21, 2010 at 1:03 PM, Emmanuelle Gouillart <emmanuelle.gouillart@normalesup.org> wrote:
Hi Thomas,
broadcasting rules are only for ufuncs (and by extension, some numpy functions using ufuncs). Indexing obeys different rules and always starts by the first dimension.
Just a clarification: If there are several index arrays, then standard broadcasting rules apply for them. It's a bit messier when arrays and slice objects are mixed. An informative explanation was in the thread March 2009 about "Is this a bug?" and lots of examples are on the mailing list Josef
However, you don't have to use broadcasting for such indexing operations:
a[:, c] = 0 zeroes columns indexed by c.
If you want to index along the 3rd dimension, you can use a[:, :, c], etc. If the dimension along which you index is a variable, you can also use the function np.rollaxis that allows to change the order of the dimensions of an array. You may then index along the first dimension (a[c]), then change back the order of the dimensions. Here is an example:
a = np.ones((3,4,5,6)) c = np.array([1,0,1,0,1], dtype=bool) tmp_a = np.rollaxis(a, 2, 0) tmp_a.shape (5, 3, 4, 6) tmp_a[c] = 0 a = np.rollaxis(tmp_a, 0, 3) a.shape (3, 4, 5, 6)
Hope this helps.
Cheers,
Emmanuelle
On Thu, Jan 21, 2010 at 11:37:09AM -0500, Thomas Robitaille wrote:
Hello,
I'm trying to understand how array broadcasting can be used for indexing. In the following, I use the term 'row' to refer to the first dimension of a 2D array, and 'column' to the second, just because that's how numpy prints them out.
If I consider the following example:
a = np.random.random((4,5)) b = np.random.random((5,)) a + b array([[ 1.45499556, 0.60633959, 0.48236157, 1.55357393, 1.4339261 ], [ 1.28614593, 1.11265001, 0.63308615, 1.28904227, 1.34070499], [ 1.26988279, 0.84683018, 0.98959466, 0.76388223, 0.79273084], [ 1.27859505, 0.9721984 , 1.02725009, 1.38852061, 1.56065028]])
I understand how this works, because it works as expected as described in
http://docs.scipy.org/doc/numpy/reference/ufuncs.html#broadcasting
So b gets broadcast to shape (1,5), then because the first dimension is 1, the operation is applied to all rows.
Now I am trying to apply this to array indexing. So for example, I want to set specific columns, indicated by a boolean array, to zero, but the following fails:
c = np.array([1,0,1,0,1], dtype=bool) a[c] = 0 Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: index (4) out of range (0<=index<3) in dimension 0
However, if I try reducing the size of c to 4, then it works, and sets rows, not columns, equal to zero
c = np.array([1,0,1,0], dtype=bool) a[c] = 0 a array([[ 0. , 0. , 0. , 0. , 0. ], [ 0.41526315, 0.7425491 , 0.39872546, 0.56141914, 0.69795153], [ 0. , 0. , 0. , 0. , 0. ], [ 0.40771227, 0.60209749, 0.7928894 , 0.66089748, 0.91789682]])
But I would have thought that the indexing array would have been broadcast in the same way as for a sum, i.e. c would be broadcast to have dimensions (1,5) and then would have been able to set certain columns in all rows to zero.
Why is it that for indexing, the broadcasting seems to happen in a different way than when performing operations like additions or multiplications? For background info, I'm trying to write a routine which performs a set of operations on an n-d array, where n is not known in advance, with a 1D array, so I can use broadcasting rules for most operations without knowing the dimensionality of the n-d array, but now that I need to perform indexing, and the convention seems to change, this is a real issue.
Thanks in advance for any advice,
Thomas _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
participants (3)
-
Emmanuelle Gouillart
-
josef.pktd@gmail.com
-
Thomas Robitaille