multidimensional take

Wed Mar 19 05:43:36 EST 2003

I have an interesting task which may be of general interest (for
number-crunchers, the rest might crunch what it likes).

I am working with the Numeric module (aka Numerical Python)
on large statistical datasets.

What I want to do is a _fast_ multidimensional "take" operation.

Numeric.take is defined so that

>>> a
array([[140, 455, 325, 360, 498],
       [372, 647, 636, 365, 462],
       [893, 141, 776, 238, 259]])
>>> b = [0, 3, 4]
>>> take(a, b, 1)
array([[140, 360, 498],
       [372, 365, 462],
       [893, 238, 259]])

>>> c = [[0, 3, 4], [0, 1, 4]]
>>> take(a, c, 1)
array([[[140, 360, 498],
        [140, 455, 498]],
       [[372, 365, 462],
        [372, 647, 462]],
       [[893, 238, 259],
        [893, 141, 259]]])

That is, for each element i in c, a[:,i] is returned, wether a[:,i]
might be a single element or a submatrix. The lookup is done on the
second axis because of the axis parameter, but I'll neglect this in
the following.

Now I want to have a function defined as following:

def mtake(a, ind):
    ind = array(ind)
    if len(ind.shape) <= 2:
        return array([ a[i] for i in ind])   # (1)
    else:
        return array([ mtake(a, i) for i in ind])

That means that the result of

mtake(a, ind[.., NewAxis])

would be equal to 

take(a, ind)

If the last dimesion has more than one element, it would perform a
coordinate lookup with the coordinates being the elements on the last
dimension of ind, as in

mtake(a, [[1, 0], [ 1, 1], [2, 4]])
array([372, 647, 259])

Now I want to have this function defined by using take
for the lookup done in (1), because it's much faster - 
take is implemented in C .

Any suggestions ?

Johannes