[Numpy-discussion] numpythonically getting elements with the minimum sum

Tue Jan 29 09:11:55 EST 2013

On Tue, 2013-01-29 at 14:53 +0100, Lluís wrote:
> Gregor Thalhammer writes:
> 
> > Am 28.1.2013 um 23:15 schrieb Lluís:
> 
> >> Hi,
> >> 
> >> I have a somewhat convoluted N-dimensional array that contains information of a
> >> set of experiments.
> >> 
> >> The last dimension has as many entries as iterations in the experiment (an
> >> iterative application), and the penultimate dimension has as many entries as
> >> times I have run that experiment; the rest of dimensions describe the features
> >> of the experiment:
> >> 
> >> data.shape == (... indefinite amount of dimensions ..., NUM_RUNS, NUM_ITERATIONS)
> >> 
> >> So, what I want is to get the data for the best run of each experiment:
> >> 
> >> best.shape == (... indefinite amount of dimensions ..., NUM_ITERATIONS)
> >> 
> >> by selecting, for each experiment, the run with the lowest total time (sum of
> >> the time of all iterations for that experiment).
> >> 
> >> 
> >> So far I've got the trivial part, but not the final indexing into "data":
> >> 
> >> dsum = data.sum(axis = -1)
> >> dmin = dsum.min(axis = -1)
> >> best = data[???]
> >> 
> >> 
> >> I'm sure there must be some numpythonic and generic way to get what I want, but
> >> fancy indexing is beating me here :)
> 
> > Did you have a look at the argmin function? It delivers the indices of the minimum values along an axis. Untested guess:
> 
> > dmin_idx = argmin(dsum, axis = -1)
> > best = data[..., dmin_idx, :]
> 
> Ah, sorry, my example is incorrect. I was actually using 'argmin', but indexing
> with it does not exactly work as I expected:
> 
>   >>> d1.shape
>   (2, 5, 10)
>   >>> dsum = d1.sum(axis = -1)
>   >>> dmin = d1.argmin(axis = -1)
>   >>> dmin.shape
>   (2,)
>   >>> d1_best = d1[...,dmin,:]

You need to use fancy indexing. Something like:
>>> d1_best = d1[np.arange(2), dmin,:]

Because the Ellipsis takes everything from the axis, while you want to
pick from multiple axes at the same time. That can be achieved with
fancy indexing (indexing with arrays). From another perspective, you
want to get rid of two axes in favor of a new one, but a slice/Ellipsis
always preserves the axis it works on.

>   >>> d1_best.shape
>   (2, 2, 10)
> 
> 
> Assuming 1st dimension is the test, 2nd the run and 10th the iterations, using
> this previous code with some example values:
> 
>   >>> dmin
>   [4 3]
>   >>> d1_best
>   [[[ ... contents of d1[0,4,:] ...]
>     [ ... contents of d1[0,3,:] ...]]
>    [[ ... contents of d1[1,4,:] ...]
>     [ ... contents of d1[1,3,:] ...]]]
> 
> 
> While I actually want this:
> 
>   [[ ... contents of d1[0,4,:] ...]
>    [ ... contents of d1[1,3,:] ...]]
> 
> 
> Thanks,
>   Lluis
>