[Numpy-discussion] numpythonically getting elements with the minimum sum
Lluís
xscript at gmx.net
Tue Jan 29 13:07:03 EST 2013
Lluís writes:
> Sebastian Berg writes:
>> On Tue, 2013-01-29 at 14:53 +0100, Lluís wrote:
>>> Gregor Thalhammer writes:
>>>
>>> > Am 28.1.2013 um 23:15 schrieb Lluís:
>>>
>>> >> Hi,
>>> >>
>>> >> I have a somewhat convoluted N-dimensional array that contains information of a
>>> >> set of experiments.
>>> >>
>>> >> The last dimension has as many entries as iterations in the experiment (an
>>> >> iterative application), and the penultimate dimension has as many entries as
>>> >> times I have run that experiment; the rest of dimensions describe the features
>>> >> of the experiment:
>>> >>
>>> >> data.shape == (... indefinite amount of dimensions ..., NUM_RUNS, NUM_ITERATIONS)
>>> >>
>>> >> So, what I want is to get the data for the best run of each experiment:
>>> >>
>>> >> best.shape == (... indefinite amount of dimensions ..., NUM_ITERATIONS)
>>> >>
>>> >> by selecting, for each experiment, the run with the lowest total time (sum of
>>> >> the time of all iterations for that experiment).
>>> >>
>>> >>
>>> >> So far I've got the trivial part, but not the final indexing into "data":
>>> >>
>>> >> dsum = data.sum(axis = -1)
>>> >> dmin = dsum.min(axis = -1)
>>> >> best = data[???]
>>> >>
>>> >>
>>> >> I'm sure there must be some numpythonic and generic way to get what I want, but
>>> >> fancy indexing is beating me here :)
>>>
>>> > Did you have a look at the argmin function? It delivers the indices of the minimum values along an axis. Untested guess:
>>>
>>> > dmin_idx = argmin(dsum, axis = -1)
>>> > best = data[..., dmin_idx, :]
>>>
>>> Ah, sorry, my example is incorrect. I was actually using 'argmin', but indexing
>>> with it does not exactly work as I expected:
>>>
>>> >>> d1.shape
>>> (2, 5, 10)
>>> >>> dsum = d1.sum(axis = -1)
>>> >>> dmin = d1.argmin(axis = -1)
>>> >>> dmin.shape
>>> (2,)
>>> >>> d1_best = d1[...,dmin,:]
>> You need to use fancy indexing. Something like:
>>>>> d1_best = d1[np.arange(2), dmin,:]
>> Because the Ellipsis takes everything from the axis, while you want to
>> pick from multiple axes at the same time. That can be achieved with
>> fancy indexing (indexing with arrays). From another perspective, you
>> want to get rid of two axes in favor of a new one, but a slice/Ellipsis
>> always preserves the axis it works on.
> Nice, thanks. That works for this specific example, but I couldn't get it to
> work with "d1.shape == (1, 2, 16, 5, 10)" (thus "dmin.shape == (1, 2, 16)"):
>>>> def get_best_run (data, field):
> ... """Returns the best run."""
> ... data = data.view(np.ndarray)
> ... assert data.ndim >= 2
> ... dsum = data[field].sum(axis=-1)
> ... dmin = dsum.argmin(axis=-1)
> ... idxs = [ np.arange(dlen) for dlen in data.shape[:-2] ]
> ... idxs += [ dmin ]
> ... idxs += [ slice(None) ]
> ... return data[tuple(idxs)]
>>>> d1.shape
> (2, 5, 10)
>>>> get_best_run(d1, "time")
> (2, 10)
>>>> d2.shape
> (1, 2, 16, 5, 10)
>>>> get_best_run(d2, "time")
> Traceback (most recent call last):
> ...
> File "./plot-user.py", line 89, in get_best_run
> res = data.view(np.ndarray)[tuple(idxs)]
> ValueError: shape mismatch: objects cannot be broadcast to a single shape
> After reading the "Advanced indexing section", my understanding is that the
> elements in "idxs" are not broadcastable to the same shape, but I'm not sure how
> I should build them to be broadcastable to what specific shape.
BTW, here's an equivalent that seems to work on all cases, although I would
prefer to avoid control code to manually fill-in the result:
>>> def get_best_run (data, field):
... """Returns the best run."""
... data = data.view(np.ndarray)
... assert data.ndim >= 2
... dsum = data[field].sum(axis=-1)
... dmin = dsum.argmin(axis=-1)
...
... res_shape = list(data.shape)
... del res_shape[-2]
... res = np.ndarray(res_shape, dtype = data.dtype)
...
... idxs = np.unravel_index(np.arange(dmin.size), dmin.shape)
... for idx in itertools.izip(*idxs):
... isum = dsum[idx]
... imin = dmin[idx]
... idata = data[idx]
... res[idx] = data[tuple(list(idx) + [imin])]
...
... return res
>>> d1.shape
(2, 5, 10)
>>> get_best_run(d1, "time")
(2, 10)
>>> d2.shape
(1, 2, 16, 5, 10)
>>> get_best_run(d2, "time")
(1, 2, 16, 10)
Thanks,
Lluis
>>> >>> d1_best.shape
>>> (2, 2, 10)
>>>
>>>
>>> Assuming 1st dimension is the test, 2nd the run and 10th the iterations, using
>>> this previous code with some example values:
>>>
>>> >>> dmin
>>> [4 3]
>>> >>> d1_best
>>> [[[ ... contents of d1[0,4,:] ...]
>>> [ ... contents of d1[0,3,:] ...]]
>>> [[ ... contents of d1[1,4,:] ...]
>>> [ ... contents of d1[1,3,:] ...]]]
>>>
>>>
>>> While I actually want this:
>>>
>>> [[ ... contents of d1[0,4,:] ...]
>>> [ ... contents of d1[1,3,:] ...]]
--
"And it's much the same thing with knowledge, for whenever you learn
something new, the whole world becomes that much richer."
-- The Princess of Pure Reason, as told by Norton Juster in The Phantom
Tollbooth
More information about the NumPy-Discussion
mailing list