[Numpy-discussion] tie breaking for max, min, argmax, argmin
Julian Taylor
jtaylor.debian at googlemail.com
Thu Mar 12 09:49:10 EDT 2015
On 03/12/2015 02:42 PM, Robert Kern wrote:
> On Thu, Mar 12, 2015 at 1:31 PM, Johannes Kulick
> <johannes.kulick at ipvs.uni-stuttgart.de
> <mailto:johannes.kulick at ipvs.uni-stuttgart.de>> wrote:
>>
>> Hello,
>>
>> I wonder if it would be worth to enhance max, min, argmax and argmin
> (more?)
>> with a tie breaking parameter: If multiple entries have the same value
> the first
>> value is returned by now. It would be useful to have a parameter to
> alter this
>> behavior to an arbitrary tie-breaking. I would propose, that the
> tie-breaking
>> function gets a list with all indices of the max/mins.
>>
>> Example:
>> >>> a = np.array([ 1, 2, 5, 5, 2, 1])
>> >>> np.argmax(a, tie_breaking=random.choice)
>> 3
>>
>> >>> np.argmax(a, tie_breaking=random.choice)
>> 2
>>
>> >>> np.argmax(a, tie_breaking=random.choice)
>> 2
>>
>> >>> np.argmax(a, tie_breaking=random.choice)
>> 2
>>
>> >>> np.argmax(a, tie_breaking=random.choice)
>> 3
>>
>> Especially for some randomized experiments it is necessary that not
> always the
>> first maximum is returned, but a random optimum. Thus I end up writing
> these
>> things over and over again.
>>
>> I understand, that max and min are crucial functions, which shouldn't
> be slowed
>> down by the proposed changes. Adding new functions instead of altering the
>> existing ones would be a good option.
>>
>> Are there any concerns against me implementing these things and
> sending a pull
>> request? Should such a function better be included in scipy for example?
>
> On the whole, I think I would prefer new functions for this. I assume
> you only need variants for argmin() and argmax() and not min() and
> max(), since all of the tied values for the latter two would be
> identical, so returning the first one is just as good as any other.
>
is this such a common usecase that its worth a numpy function to replace
one liners like this?
np.random.choice(np.where(a == a.max())[0])
its also not that inefficient if the number of equal elements is not too
large.
More information about the NumPy-Discussion
mailing list