Behaviour of max() and min() with equal keys
In CPython, the builtin max() and min() have the property that if there are items with equal keys, the first item is returned. From a quick look at their source, I think this is true for Jython and IronPython too. However, this isn't currently a documented guarantee. Could it be made so? (As with the decision to declare sort() stable, it seems likely that by now there's code out there relying on it anyway.) -M-
On Tue, Sep 7, 2010 at 8:34 PM, Matthew Woodcraft <matthew@woodcraft.me.uk> wrote:
In CPython, the builtin max() and min() have the property that if there are items with equal keys, the first item is returned. From a quick look at their source, I think this is true for Jython and IronPython too.
It's actually not clear to me that this behaviour is ideal; it might make more sense to have max() return the first item among equal largest elements, and min() return the last item. That way, the special case of two operands has the nice property that (max(x, y), min(x, y)) is always either (x, y) or (y, x), rather than being possibly (x, x) or (y, y). (That is, id(max(x, y)) and id(min(x, y)) are id(x) and id(y) in one order or the other.) The max and min methods for the Decimal module take care to work this way, for example:
x, y = Decimal(2), Decimal('2.0') x.max(y), x.min(y) (Decimal('2'), Decimal('2.0'))
But:
max(x, y), min(x, y) (Decimal('2'), Decimal('2'))
Can you give examples of code that relies on max and min returning the first among equals? Mark
FWIW: I think Mark is right. I never quite understood why that was, but never cared enough to complain. lvh
Mark Dickinson wrote:
Matthew Woodcraft wrote:
In CPython, the builtin max() and min() have the property that if there are items with equal keys, the first item is returned. From a quick look at their source, I think this is true for Jython and IronPython too.
It's actually not clear to me that this behaviour is ideal; it might make more sense to have max() return the first item among equal largest elements, and min() return the last item.
I don't care a great deal what the behaviour is; I would like it to be consistent across Python versions, and I think the pragmatic way to achieve that is to document the current behaviour.
Can you give examples of code that relies on max and min returning the first among equals?
An existing regression test whose output depends on which choice is made. (I was writing some code today which had to duplicate the behaviour of a non-Python program which uses first-among-equals, which is what brought this question up. In that case I wouldn't think it unreasonable to have to hand-code the loop rather than using max(), but if in practice Python is always going to be first-among-equals, it seems to me we might as well be 'allowed' to take advantage of it.) -M-
On Tue, Sep 7, 2010 at 1:44 PM, Mark Dickinson <dickinsm@gmail.com> wrote:
On Tue, Sep 7, 2010 at 8:34 PM, Matthew Woodcraft <matthew@woodcraft.me.uk> wrote:
In CPython, the builtin max() and min() have the property that if there are items with equal keys, the first item is returned. From a quick look at their source, I think this is true for Jython and IronPython too.
It's actually not clear to me that this behaviour is ideal; it might make more sense to have max() return the first item among equal largest elements, and min() return the last item. That way, the special case of two operands has the nice property that (max(x, y), min(x, y)) is always either (x, y) or (y, x), rather than being possibly (x, x) or (y, y). (That is, id(max(x, y)) and id(min(x, y)) are id(x) and id(y) in one order or the other.)
The max and min methods for the Decimal module take care to work this way, for example:
x, y = Decimal(2), Decimal('2.0') x.max(y), x.min(y) (Decimal('2'), Decimal('2.0'))
But:
max(x, y), min(x, y) (Decimal('2'), Decimal('2'))
Decimal may actually have this backwards. The idea would be that min(*lst) == sorted(lst)[0], and max(*lst) == sorted(lst)[-1]. Given a stable sort, then, max of equivalent elements would return the last element, and min the first. According to Alex Stepanov, this helps some with composing these small order statistics into larger stably-ordered functions. I do think Decimal.max's answer is better than the builtin's answer, and that the incremental benefit from returning the last instead of first is pretty small.
On Tue, Sep 7, 2010 at 2:40 PM, Jeffrey Yasskin <jyasskin@gmail.com> wrote:
On Tue, Sep 7, 2010 at 1:44 PM, Mark Dickinson <dickinsm@gmail.com> wrote:
On Tue, Sep 7, 2010 at 8:34 PM, Matthew Woodcraft <matthew@woodcraft.me.uk> wrote:
In CPython, the builtin max() and min() have the property that if there are items with equal keys, the first item is returned. From a quick look at their source, I think this is true for Jython and IronPython too.
It's actually not clear to me that this behaviour is ideal; it might make more sense to have max() return the first item among equal largest elements, and min() return the last item. That way, the special case of two operands has the nice property that (max(x, y), min(x, y)) is always either (x, y) or (y, x), rather than being possibly (x, x) or (y, y). (That is, id(max(x, y)) and id(min(x, y)) are id(x) and id(y) in one order or the other.)
The max and min methods for the Decimal module take care to work this way, for example:
x, y = Decimal(2), Decimal('2.0') x.max(y), x.min(y) (Decimal('2'), Decimal('2.0'))
But:
max(x, y), min(x, y) (Decimal('2'), Decimal('2'))
Decimal may actually have this backwards. The idea would be that min(*lst) == sorted(lst)[0], and max(*lst) == sorted(lst)[-1]. Given a stable sort, then, max of equivalent elements would return the last element, and min the first. According to Alex Stepanov, this helps some with composing these small order statistics into larger stably-ordered functions.
I do think Decimal.max's answer is better than the builtin's answer, and that the incremental benefit from returning the last instead of first is pretty small.
Actually, Decimal isn't doing anything along these lines. At least in Python 2.6, I get:
Decimal('2').max(Decimal('2.0')) Decimal('2') Decimal('2.0').max(Decimal('2')) Decimal('2') Decimal('2.0').min(Decimal('2')) Decimal('2.0') Decimal('2').min(Decimal('2.0')) Decimal('2.0')
indicating that '2' is considered larger than '2.0' in the min/max comparison. It's ignoring the order of the arguments. It also creates a new Decimal object for the return value, so I can't use id() to check which one of identical elements it returns.
On Tue, Sep 7, 2010 at 10:47 PM, Jeffrey Yasskin <jyasskin@gmail.com> wrote:
Actually, Decimal isn't doing anything along these lines. At least in Python 2.6, I get:
Decimal('2').max(Decimal('2.0')) Decimal('2') Decimal('2.0').max(Decimal('2')) Decimal('2') Decimal('2.0').min(Decimal('2')) Decimal('2.0') Decimal('2').min(Decimal('2.0')) Decimal('2.0')
indicating that '2' is considered larger than '2.0' in the min/max comparison.
Right; there's a strict specification about how equal values with different exponents (or different signs in the case of comparing zeros) should be treated.
It's ignoring the order of the arguments. It also creates a new Decimal object for the return value, so I can't use id() to check which one of identical elements it returns.
This bit surprises me. I honestly thought I'd fixed it up so that max(x, y) actually returned one of x and y (and min(x, y) returned the other). Oh well. Mark
On Tue, Sep 7, 2010 at 10:51 PM, Mark Dickinson <dickinsm@gmail.com> wrote:
On Tue, Sep 7, 2010 at 10:47 PM, Jeffrey Yasskin <jyasskin@gmail.com> wrote:
It's ignoring the order of the arguments. It also creates a new Decimal object for the return value, so I can't use id() to check which one of identical elements it returns.
This bit surprises me. I honestly thought I'd fixed it up so that max(x, y) actually returned one of x and y (and min(x, y) returned the other). Oh well.
Ah. I'd forgotten that the Decimal max and min methods are context aware, so that max(x, y) is rounded to the current context, and hence can actually be different from both x and y. So that was a bad example from me. Sorry.
from decimal import * getcontext().Emin = -500 x, y = Decimal('-1e-100'), Decimal('-1e-1000') x.max(y) Decimal('-0E-527')
Mark
On Tue, Sep 7, 2010 at 11:00 PM, Mark Dickinson <dickinsm@gmail.com> wrote:
On Tue, Sep 7, 2010 at 10:51 PM, Mark Dickinson <dickinsm@gmail.com> wrote:
On Tue, Sep 7, 2010 at 10:47 PM, Jeffrey Yasskin <jyasskin@gmail.com> wrote:
It's ignoring the order of the arguments. It also creates a new Decimal object for the return value, so I can't use id() to check which one of identical elements it returns.
This bit surprises me. I honestly thought I'd fixed it up so that max(x, y) actually returned one of x and y (and min(x, y) returned the other). Oh well.
Ah. I'd forgotten that the Decimal max and min methods are context aware, so that max(x, y) is rounded to the current context, and hence can actually be different from both x and y. So that was a bad example from me. Sorry.
Grr. s/max(x, y)/x.max(y)/g Mark
On Tue, Sep 7, 2010 at 10:40 PM, Jeffrey Yasskin <jyasskin@gmail.com> wrote:
Decimal may actually have this backwards. The idea would be that min(*lst) == sorted(lst)[0], and max(*lst) == sorted(lst)[-1]. Given a stable sort, then, max of equivalent elements would return the last element, and min the first.
Yes, you're right; that would make more sense than the other way around. Mark
On 09/07/2010 11:40 PM, Jeffrey Yasskin wrote:
Decimal may actually have this backwards. The idea would be that min(*lst) == sorted(lst)[0], and max(*lst) == sorted(lst)[-1].
Here you mean "is" rather than "==", right? The relations you spelled are guaranteed regardless of stability. (This doesn't apply to Decimal.max and Decimal.min, which return new objects.)
On Sep 7, 2010, at 12:34 PM, Matthew Woodcraft wrote:
In CPython, the builtin max() and min() have the property that if there are items with equal keys, the first item is returned. From a quick look at their source, I think this is true for Jython and IronPython too.
However, this isn't currently a documented guarantee. Could it be made so? (As with the decision to declare sort() stable, it seems likely that by now there's code out there relying on it anyway.)
That seems like a reasonable request. This behavior has been around for a very long time is unlikely to change. Elsewhere, we've made efforts to document sort stability (i.e. sorted(), heapq.nlargest(), heapq.nsmallest, merge(), etc). It is nice that min(it) parallels sorted(it)[0] and nsmallest(1, it). The same goes for max(it) paralleling sorted(it,reverse=True)[0] and nlargest(1, it). Raymond
Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
Matthew Woodcraft wrote:
In CPython, the builtin max() and min() have the property that if there are items with equal keys, the first item is returned. From a quick look at their source, I think this is true for Jython and IronPython too.
However, this isn't currently a documented guarantee. Could it be made so? (As with the decision to declare sort() stable, it seems likely that by now there's code out there relying on it anyway.)
That seems like a reasonable request. This behavior has been around for a very long time is unlikely to change. Elsewhere, we've made efforts to document sort stability (i.e. sorted(), heapq.nlargest(), heapq.nsmallest, merge(), etc).
I've submitted issue 9802 as a feature request with a concrete suggestion for the docs. -M-
participants (6)
-
Hrvoje Niksic
-
Jeffrey Yasskin
-
Laurens Van Houtven
-
Mark Dickinson
-
Matthew Woodcraft
-
Raymond Hettinger