[issue9802] Document 'stability' of builtin min() and max()
New submission from Matthew Woodcraft <matthew@woodcraft.me.uk>: In CPython, the builtin max() and min() have the property that if there are items with equal keys, the first item is returned. From a quick look at their source, I think this is true for Jython and IronPython too. I propose making this a documented guarantee. On Python-dev, Raymond Hettinger said: << That seems like a reasonable request. This behavior has been around for a very long time is unlikely to change. Elsewhere, we've made efforts to document sort stability (i.e. sorted(), heapq.nlargest(), heapq.nsmallest, merge(), etc).
(<http://mail.python.org/pipermail/python-dev/2010-September/103543.html>)
I'm attaching a patch with a concrete suggestion for a change to functions.rst, modelled on the documentation of heapq.nlargest(). ---------- assignee: docs@python components: Documentation files: maxmin.patch keywords: patch messages: 115892 nosy: docs@python, mattheww priority: normal severity: normal status: open title: Document 'stability' of builtin min() and max() type: feature request Added file: http://bugs.python.org/file18802/maxmin.patch _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9802> _______________________________________
Mark Dickinson <dickinsm@gmail.com> added the comment: Thanks for the patch! Comments: (1) Shouldn't 'reverse=True' be omitted in the second doc addition? (2) I'd also suggest adding a brief comment about what this means for distinct, but equal, objects; otherwise it's not really obvious what the point of the doc addition is. (3) As a matter of clarity, perhaps replace "this is" with "max(iterable, key=key) is", and similarly for min. As an aside, I still like Jeffrey Yasskin's suggestion on the python-dev mailing list that the sensible definition for max would maintain the invariant that max(iterable) be equivalent to sorted(iterable)[-1]; see Alexander Stepanov's writings in e.g., http://www.cs.rpi.edu/~musser/gsd/notes-on-programming-2006-10-13.pdf for more. But that's (a) another issue, and (b) perhaps not a significant enough benefit to be worth changing Python's semantics for. ---------- nosy: +jyasskin, mark.dickinson, rhettinger _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9802> _______________________________________
Changes by Raymond Hettinger <rhettinger@users.sourceforge.net>: ---------- assignee: docs@python -> rhettinger _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9802> _______________________________________
Matthew Woodcraft <matthew@woodcraft.me.uk> added the comment:
(1) Shouldn't 'reverse=True' be omitted in the second doc addition?
Yes, of course, sorry.
(2) I'd also suggest adding a brief comment about what this means for distinct, but equal, objects; otherwise it's not really obvious what the point of the doc addition is.
(3) As a matter of clarity, perhaps replace "this is" with "max(iterable, key=key) is", and similarly for min.
I've attached a new patch incorporating these suggestions. ---------- Added file: http://bugs.python.org/file18858/functions.rst.patch _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9802> _______________________________________
Antoine Pitrou <pitrou@free.fr> added the comment:
As an aside, I still like Jeffrey Yasskin's suggestion on the python-dev mailing list that the sensible definition for max would maintain the invariant that max(iterable) be equivalent to sorted(iterable)[-1]
What's interesting is the practical consequence that: x, y = min(x, y), max(x, y) cannot give you twice the same object. Of course, there are subtle implications of how it will be implemented (especially with objects which have a partial order relationship to each other). Since max() is supposed to work on any iterator, we probably don't want to build an intermediate sequence and fetch elements in reverse order; instead perhaps use (not Py_LT) instead of Py_GT. ---------- nosy: +pitrou versions: +Python 3.2 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9802> _______________________________________
Mark Dickinson <dickinsm@gmail.com> added the comment:
Of course, there are subtle implications of how it will be implemented
Indeed. Ideally, as you mention, the implementation would only use __lt__ (as with sort and bisect). I think that constraint only leaves one reasonable choice: namely, max and min for multiple args would be functionally equivalent to max_list and min_list below: def min2(x, y): return y if y < x else x def max2(x, y): return x if y < x else y def min_list(xs): return reduce(min2, xs) def max_list(xs): return reduce(max2, xs) ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9802> _______________________________________
Raymond Hettinger <rhettinger@users.sourceforge.net> added the comment: Documented the current behavior in r84822. Mark, if you're free on IRC at some point, I would like to discuss further. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9802> _______________________________________
Raymond Hettinger <rhettinger@users.sourceforge.net> added the comment: As discussed with Mark, am closing this one after having applied documentation changes. ---------- resolution: -> rejected status: open -> closed _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9802> _______________________________________
Stephen Evans <stephen@recombinant.co.uk> added the comment: As suggested by Mark following my post on comp.lang.python I am adding further comments to the discussion on this (closed) issue. For a more mathematical consideration of the issue: Stepanov, Alexander and Paul McJones. 2009. Elements of Programming. Addison Wesley. Pages 52-53 The problem with the builtin max() is with weak comparisons. Consider two python objects a and b that are equivalent and where the following are True: a is not b repr([a, b]) == repr(sorted([a, b])) repr([a, b]) == repr(sorted([a, b], reverse=True)) repr([b, a]) == repr(sorted([b, a])) repr([b, a]) == repr(sorted([b, a], reverse=True)) Assuming repr() implemented correctly for a and b. The only Python rich comparison required is (weak) __lt__. If (weak) __eq__ is implemented then the following are True: a == b b == a In bltinmodule.c builtin_max() uses Py_GT. For correctness this should use the converse of builtin_min() i.e. the boolean negation of PyObject_RichCompare using Py_LT (for valid results). If using Python rich comparisions then only __lt__ would be required for both min() and max() as with list.sort(). The following will then be True: min([a, b]) is a max([a, b]) is b min([b, a]) is b max([b, a]) is a min([a, b]) is max([b, a]) min([a, b]) is not min([b, a]) max([a, b]) is min([b, a]) max([a, b]) is not max([b, a]) The above will work if Py_GE is subtituted for Py_GT in builtin_max(), though this will require the implementation of __ge__ which is inconsistent with list.sort() and is a point of potential failure if the implementation of __ge__ is not the converse of the implementation __lt__. To reiterate - builtin max() should be the converse of builtin min(). ---------- components: +None -Documentation nosy: +Stephen.Evans versions: +Python 2.7, Python 3.3 Added file: http://bugs.python.org/file20996/min_max.py _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9802> _______________________________________
participants (5)
-
Antoine Pitrou
-
Mark Dickinson
-
Matthew Woodcraft
-
Raymond Hettinger
-
Stephen Evans