[Numpy-discussion] use index array of len n to select columns of n x m array

Keith Goodman kwgoodman at gmail.com
Thu Aug 5 16:51:39 EDT 2010


On Thu, Aug 5, 2010 at 1:32 PM,  <josef.pktd at gmail.com> wrote:
> On Thu, Aug 5, 2010 at 4:07 PM, Martin Spacek <numpy at mspacek.mm.st> wrote:
>> josef.pkt wrote:
>>>>> a = np.array([[0, 1],
>>                   [2, 3],
>>                   [4, 5],
>>                   [6, 7],
>>                   [8, 9]])
>>>>> i = np.array([0, 1, 1, 0, 1])
>>>>> a[range(a.shape[0]), i]
>> array([0, 3, 5, 6, 9])
>>>>> a[np.arange(a.shape[0]), i]
>> array([0, 3, 5, 6, 9])
>>
>>
>> Thanks for all the tips. I guess I was hoping for something that could avoid
>> having to generate np.arange(a.shape[0]), but
>>
>>  >>> a[np.arange(a.shape[0]), i]
>>
>> sure is easy to understand. Is there maybe a more CPU and/or memory efficient
>> way? I kind of like John Salvatier's idea:
>>
>>  >>> np.choose(i, (a[:,0], a[:,1])
>>
>> but that would need to be generalized to "a" of arbitrary columns.
>
> seems to work:
>
>>>> np.choose(i, a.T)
> array([0, 3, 5, 6, 9])
>
> but wouldn't get around the 31 limit that you found.

Choose is fast:

>> N = 1000
>> a = np.random.randint(0, 9, (N,2))
>> i = np.random.randint(0, 2, N)
>> timeit a[range(N), i]
10000 loops, best of 3: 108 us per loop
>> timeit a[np.arange(N), i]
10000 loops, best of 3: 39 us per loop
>> timeit np.choose(i, a.T)
10000 loops, best of 3: 32.3 us per loop

But flat is faster:

>> timeit idx = np.arange(N); idx *= 2; idx += i; a.flat[idx]
100000 loops, best of 3: 16.8 us per loop



More information about the NumPy-Discussion mailing list