Re: [Numpy-discussion] Indexing bug

Message: 2 Date: Sat, 30 Mar 2013 11:13:35 -0700 From: Jaime Fern?ndez del R?o jaime.frio@gmail.com Subject: Re: [Numpy-discussion] Indexing bug? To: Discussion of Numerical Python numpy-discussion@scipy.org Message-ID: CAPOWHWk+mL6KN6F2FHTPn5HTiU0UEQPj6KdxjNK_+T1E-YRiBg@mail.gmail.com Content-Type: text/plain; charset="iso-8859-1"
On Sat, Mar 30, 2013 at 11:01 AM, Ivan Oseledets ivan.oseledets@gmail.comwrote:
I am using numpy 1.6.1, and encountered a wierd fancy indexing bug:
import numpy as np c = np.random.randn(10,200,10);
In [29]: print c[[0,1],:200,:2].shape (2, 200, 2)
In [30]: print c[[0,1],:200,[0,1]].shape (2, 200)
It means, that here fancy indexing is not working right for a 3d array.
On Sat, Mar 30, 2013 at 11:01 AM, Ivan Oseledets ivan.oseledets@gmail.comwrote:
I am using numpy 1.6.1, and encountered a wierd fancy indexing bug:
import numpy as np c = np.random.randn(10,200,10);
In [29]: print c[[0,1],:200,:2].shape (2, 200, 2)
In [30]: print c[[0,1],:200,[0,1]].shape (2, 200)
It means, that here fancy indexing is not working right for a 3d array.
--> It is working fine, review the docs:
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-inde...
In your return, item [0, :] is c[0, :, 0] and item[1, :]is c[1, :, 1].
If you want a return of shape (2, 200, 2) where item [i, :, j] is c[i, :, j] you could use slicing:
c[:2, :200, :2]
or something more elaborate like:
c[np.arange(2)[:, None, None], np.arange(200)[:, None], np.arange(2)]
Jaime --->
Oh! So it is not a bug, it is a feature, which is completely incompatible with other array based languages (MATLAB and Fortran). To me, I can not find a single explanation why it is so in numpy. Taking submatrices from a matrix is a common operation and the syntax above is very natural to take submatrices, not a weird diagonal stuff. i.e.,
c = np.random.randn(100,100) d = c[[0,3],[2,3]]
should NOT produce two numbers! (and you can not do it using slices!)
In MATLAB and Fortran c(indi,indj) will produce a 2 x 2 matrix. How it can be done in numpy (and why the complications?)
So, please consider this message as a feature request.
Ivan

On Sun, Mar 31, 2013 at 6:14 AM, Ivan Oseledets ivan.oseledets@gmail.com wrote:
Message: 2 Date: Sat, 30 Mar 2013 11:13:35 -0700 From: Jaime Fern?ndez del R?o jaime.frio@gmail.com Subject: Re: [Numpy-discussion] Indexing bug? To: Discussion of Numerical Python numpy-discussion@scipy.org Message-ID: CAPOWHWk+mL6KN6F2FHTPn5HTiU0UEQPj6KdxjNK_+T1E-YRiBg@mail.gmail.com Content-Type: text/plain; charset="iso-8859-1"
On Sat, Mar 30, 2013 at 11:01 AM, Ivan Oseledets ivan.oseledets@gmail.comwrote:
I am using numpy 1.6.1, and encountered a wierd fancy indexing bug:
import numpy as np c = np.random.randn(10,200,10);
In [29]: print c[[0,1],:200,:2].shape (2, 200, 2)
In [30]: print c[[0,1],:200,[0,1]].shape (2, 200)
It means, that here fancy indexing is not working right for a 3d array.
On Sat, Mar 30, 2013 at 11:01 AM, Ivan Oseledets ivan.oseledets@gmail.comwrote:
I am using numpy 1.6.1, and encountered a wierd fancy indexing bug:
import numpy as np c = np.random.randn(10,200,10);
In [29]: print c[[0,1],:200,:2].shape (2, 200, 2)
In [30]: print c[[0,1],:200,[0,1]].shape (2, 200)
It means, that here fancy indexing is not working right for a 3d array.
--> It is working fine, review the docs:
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-inde...
In your return, item [0, :] is c[0, :, 0] and item[1, :]is c[1, :, 1].
If you want a return of shape (2, 200, 2) where item [i, :, j] is c[i, :, j] you could use slicing:
c[:2, :200, :2]
or something more elaborate like:
c[np.arange(2)[:, None, None], np.arange(200)[:, None], np.arange(2)]
Jaime --->
Oh! So it is not a bug, it is a feature, which is completely incompatible with other array based languages (MATLAB and Fortran). To me, I can not find a single explanation why it is so in numpy. Taking submatrices from a matrix is a common operation and the syntax above is very natural to take submatrices, not a weird diagonal stuff.
It is not a weird diagonal stuff, but a well define operation: when you use fancy indexing, the indexing numbers become coordinate (
i.e.,
c = np.random.randn(100,100) d = c[[0,3],[2,3]]
should NOT produce two numbers! (and you can not do it using slices!)
In MATLAB and Fortran c(indi,indj) will produce a 2 x 2 matrix. How it can be done in numpy (and why the complications?)
in your example, it is simple enough:
c[[0, 3], 2:4] (return the first row limited to columns 3, 4, and the 4th row limiter to columns 3, 4).
Numpy's syntax is' biased' toward fancy indexing, and you need more typing if you want to extract 'irregular' submatrices. Matlab has a different tradeoff (extracting irregular sub-matrices is sligthly easier, but selecting a few points is harder as you need sub2index to use linear indexing).
David

On Sun, Mar 31, 2013 at 6:14 AM, Ivan Oseledets ivan.oseledets@gmail.com wrote:
I am using numpy 1.6.1, and encountered a wierd fancy indexing bug:
import numpy as np c = np.random.randn(10,200,10);
In [29]: print c[[0,1],:200,:2].shape (2, 200, 2)
In [30]: print c[[0,1],:200,[0,1]].shape (2, 200)
It means, that here fancy indexing is not working right for a 3d array.
On Sat, Mar 30, 2013 at 11:01 AM, Ivan Oseledets ivan.oseledets@gmail.comwrote:
I am using numpy 1.6.1, and encountered a wierd fancy indexing bug:
import numpy as np c = np.random.randn(10,200,10);
In [29]: print c[[0,1],:200,:2].shape (2, 200, 2)
In [30]: print c[[0,1],:200,[0,1]].shape (2, 200)
It means, that here fancy indexing is not working right for a 3d array.
--> It is working fine, review the docs:
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-inde...
In your return, item [0, :] is c[0, :, 0] and item[1, :]is c[1, :, 1].
If you want a return of shape (2, 200, 2) where item [i, :, j] is c[i, :, j] you could use slicing:
c[:2, :200, :2]
or something more elaborate like:
c[np.arange(2)[:, None, None], np.arange(200)[:, None], np.arange(2)]
Jaime --->
Oh! So it is not a bug, it is a feature, which is completely incompatible with other array based languages (MATLAB and Fortran). To me, I can not find a single explanation why it is so in numpy. Taking submatrices from a matrix is a common operation and the syntax above is very natural to take submatrices, not a weird diagonal stuff. i.e.,
c = np.random.randn(100,100) d = c[[0,3],[2,3]]
should NOT produce two numbers! (and you can not do it using slices!)
In MATLAB and Fortran c(indi,indj) will produce a 2 x 2 matrix. How it can be done in numpy (and why the complications?)> So, please consider this message as a feature request.
Numpy's handling of such things is strictly more general than MATLAB and Fortran's (AFAIK), and fits in with the rest of the system quite nicely, I'd say.
The logic is: if you do a[row_coords, col_coords] then it treats row_coords and col_coords as parallel arrays, where each corresponding pair of entries in the two arrays gives the coordinates of one entry in the result -- so you get [a[row_coords[0], col_coords[0]], a[row_coords[1], col_coords[1]], ...].
This follows numpy's usual rules for arrays that are supposed to align: just like if you did 'row_coords + col_coords', say, which would give you [row_coords[0] + col_coords[0], row_coords[1] + col_coords[1], ...].
AND, all of this works just the same if row_coords and col_coords are arbitrary-dimensional arrays: the output of indexing (just like the output of '+') will have the same dimensionality as its input. So if they're both 2x2 arrays, then a[row_coords, col_coords] will be [[a[row_coords[0, 0], col_coords[0, 0]], a[row_coords[0, 1], col_coords[0, 1]], [a[row_coords[1, 0], col_coords[1, 0]], a[row_coords[1, 1], col_coords[1, 1]], ]
AND (here's the solution to your problem), this "aligning" uses the same rules as "+" does, i.e., broadcasting applies: http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html
So for your example, write c[[[0], [1]], :200, [[0, 1]]] which by broadcasting is equivalent to c[[[0, 1], [0, 1]], :200, [[0, 0], [1, 1]]] which is what you want.
The only problem then is that slicing indexing happens "inside" fancy indexing ("fancy indexing" is the name for this line-up-the-arrays-and-extract-coordinates thing). So what this means is that when you use both inside a single indexing operation, the what it does is for each set of corresponding fancy indexing coordinates, it doesn't just extract a single element for the final array -- it extracts an entire sub-array. The practical result is that the shape of your output array will be the shape of your aligned fancy indexing arrays (after broadcasting), and then with any sliced axes stuck on the end. So you might expect that the above expression would give you an array of shape (2, 200, 2) but in fact it will be of shape (2, 2, 200) because the (2, 2) part is the shape of the fancy index arrays, and the (200,) is the shape of the slices. (Notice that there is no relation at all between the shape of the input array and the shape of the fancy indexes!) You can fix this up with a call to rollaxes(), or by getting even fancier, using fancy indexes for all three axes, and making sure they each broadcast to the shape (2, 200, 2):
coord1 = np.asarray([0, 1]).reshape((2, 1, 1)) coord2 = np.arange(200).reshape((1, 200, 1)) coord3 = np.asarray([0, 1]).reshape((1, 1, 2)) c[coord1, coord2, coord3]
-n

On Sun, Mar 31, 2013 at 12:14 AM, Ivan Oseledets ivan.oseledets@gmail.comwrote:
<snip>
Oh! So it is not a bug, it is a feature, which is completely
incompatible with other array based languages (MATLAB and Fortran). To me, I can not find a single explanation why it is so in numpy. Taking submatrices from a matrix is a common operation and the syntax above is very natural to take submatrices, not a weird diagonal stuff. i.e.,
c = np.random.randn(100,100) d = c[[0,3],[2,3]]
should NOT produce two numbers! (and you can not do it using slices!)
In MATLAB and Fortran c(indi,indj) will produce a 2 x 2 matrix. How it can be done in numpy (and why the complications?)
So, please consider this message as a feature request.
There is already a function, ix, in the index_tricks that does this (I think it is essentially implementing the broadcasting trick that Nathaniel mentions. For me the index trick is easier, as I often forget the broadcasting details). Example:
In [14]: c = np.random.randn(100,100)
In [15]: c[[0,3],[2,3]] Out[15]: array([ 0.99141998, -0.88928917])
In [16]: c[np.ix_([0,3],[2,3])] Out[16]: array([[ 0.99141998, -1.98406295], [ 0.0729076 , -0.88928917]])
So for me, I think this is superior to MATLAB, as I have often had the case of wanting the result from the second option. In MATLAB you would need to extract the 2x2 matrix, and then take its diagonal. This can be wasteful when the index arrays become large.
Cheers, Aronne
participants (4)
-
Aronne Merrelli
-
David Cournapeau
-
Ivan Oseledets
-
Nathaniel Smith