adding key argument to min and max
This is my first post to Python dev, so I figured I should introduce myself. My name's Steven Bethard and I'm a computer science Ph.D. student at the University of Colorado at Boulder working primarily in the areas of natural language processing and machine learning. During my undergrad at the University of Arizona, I worked as a teaching assistant teaching Java for 2 1/2 years, though now that I'm at CU Boulder I pretty much only work in Python. I started getting active on the Python list about 6 months ago, and I've been watching python-dev for the last few months. On to the real question... I posted a few notes about this on the python-list and didn't hear much of a response, so I thought that maybe python-dev is the more appropriate place (since it involves a change to some of the builtin functions). For Python 2.5, I'd like to add a keyword argument 'key' to min and max like we have now for list.sort and sorted. I've needed this a couple of times now, specifically when I have something like a dict of word counts, and I want the most frequent word, I'd like to do something like:
d = dict(aaa=3000, bbb=2000, ccc=1000) max(d, key=d.__getitem__) 'aaa'
I've implemented a patch that provides this functionality, but there are a few concerns about how it works. Here's some examples of what it does now:
d = dict(aaa=3000, bbb=2000, ccc=1000) max(d) 'ccc' max(d, key=d.__getitem__) 'aaa' max(d, d.__getitem__) {'aaa': 3000, 'bbb': 2000, 'ccc': 1000}
max(('aaa', 3000), ('bbb', 2000), ('ccc', 1000)) ('ccc', 1000) max(('aaa', 3000), ('bbb', 2000), ('ccc', 1000), key=operator.itemgetter(1)) ('aaa', 3000) max(('aaa', 3000), ('bbb', 2000), ('ccc', 1000), operator.itemgetter(1)) ('ccc', 1000)
Note the difference between the 2nd and 3rd use of max in each example. For backwards compatibility reasons, the 'key' argument cannot be specified as a positional argument or it will look like max is being used in the max(a, b, c, ...) form. This means that a 'key' argument can *only* be specified as a keyword parameter, thus giving us the asymmetry we see in these examples. My real question then is, is this asymmetry a problem? Is it okay to have a parameter that is *only* accessable as a keyword parameter? Thanks, Steve -- You can wordify anything if you just verb it. --- Bucky Katt, Get Fuzzy
[Steven Bethard]
For Python 2.5, I'd like to add a keyword argument 'key' to min and max like we have now for list.sort and sorted. . . . This means that a 'key' argument can *only* be specified as a keyword parameter, thus giving us the asymmetry we see in these examples.
FWIW, in Py2.5 I plan on adding a key= argument to heapq.nsmallest() and heapq.nlargest(). There is no "assymmetry" issue with those functions, so it can be implemented cleanly. And, since min/max are essentially the same nsmallest/nlargest with n==1, your use case is covered and there is no need to mess with the min/max builtins.
I've needed this a couple of times now, specifically when I have something like a dict of word counts, and I want the most frequent word
For Py2.4, you can cover your use cases readily adding the recipe for mostcommon() to a module of favorite utilities: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/347615 Alternatively, the recipe for a bag class is a more flexible and still reasonably efficient: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/259174 Raymond Hettinger
Raymond Hettinger <python@rcn.com> wrote:
[Steven Bethard]
For Python 2.5, I'd like to add a keyword argument 'key' to min and max like we have now for list.sort and sorted. . . . This means that a 'key' argument can *only* be specified as a keyword parameter, thus giving us the asymmetry we see in these examples.
FWIW, in Py2.5 I plan on adding a key= argument to heapq.nsmallest() and heapq.nlargest(). There is no "assymmetry" issue with those functions, so it can be implemented cleanly. And, since min/max are essentially the same nsmallest/nlargest with n==1, your use case is covered and there is no need to mess with the min/max builtins.
I don't want to put words into your mouth, so is this a vote against a key= argument for min and max? If nsmallest/nlargest get key= arguments, this would definitely cover the same cases. If a key= argument gets vetoed for min and max, I'd at least like to add a bit of documentation pointing users of min/max to nsmallest/nlargest if they want a key= argument... Steve -- You can wordify anything if you just verb it. --- Bucky Katt, Get Fuzzy
I don't want to put words into your mouth, so is this a vote against a key= argument for min and max?
Right. I don't think there is any need.
If nsmallest/nlargest get key= arguments, this would definitely cover the same cases.
Right.
If a key= argument gets vetoed for min and max, I'd at least like to add a bit of documentation pointing users of min/max to nsmallest/nlargest if they want a key= argument...
Sounds reasonable. Raymond P.S. In case you're interested, here is the patch: Index: heapq.py =================================================================== RCS file: /cvsroot/python/python/dist/src/Lib/heapq.py,v retrieving revision 1.27 diff -u -r1.27 heapq.py --- heapq.py 29 Nov 2004 05:54:47 -0000 1.27 +++ heapq.py 2 Dec 2004 01:32:44 -0000 @@ -307,6 +307,31 @@ except ImportError: pass +# Extend the implementations of nsmallest and nlargest to use a key= argument +_nsmallest = nsmallest +def nsmallest(n, iterable, key=None): + """Find the n smallest elements in a dataset. + + Equivalent to: sorted(iterable, key=key)[:n] + """ + if key is None: + return _nsmallest(n, iterable) + it = ((key(r), i, r) for i, r in enumerate(iterable)) # decorate + result = _nsmallest(n, it) + return [r for k, i, r in result] # undecorate + +_nlargest = nlargest +def nlargest(n, iterable, key=None): + """Find the n largest elements in a dataset. + + Equivalent to: sorted(iterable, key=key, reverse=True)[:n] + """ + if key is None: + return _nlargest(n, iterable) + it = ((key(r), i, r) for i, r in enumerate(iterable)) # decorate + result = _nlargest(n, it) + return [r for k, i, r in result] # undecorate + if __name__ == "__main__": # Simple sanity test heap = [] Index: test/test_heapq.py =================================================================== RCS file: /cvsroot/python/python/dist/src/Lib/test/test_heapq.py,v retrieving revision 1.16 diff -u -r1.16 test_heapq.py --- test/test_heapq.py 29 Nov 2004 05:54:48 -0000 1.16 +++ test/test_heapq.py 2 Dec 2004 01:32:44 -0000 @@ -105,13 +105,19 @@ def test_nsmallest(self): data = [random.randrange(2000) for i in range(1000)] + f = lambda x: x * 547 % 2000 for n in (0, 1, 2, 10, 100, 400, 999, 1000, 1100): self.assertEqual(nsmallest(n, data), sorted(data)[:n]) + self.assertEqual(nsmallest(n, data, key=f), + sorted(data, key=f)[:n]) def test_largest(self): data = [random.randrange(2000) for i in range(1000)] + f = lambda x: x * 547 % 2000 for n in (0, 1, 2, 10, 100, 400, 999, 1000, 1100): self.assertEqual(nlargest(n, data), sorted(data, reverse=True)[:n]) + self.assertEqual(nlargest(n, data, key=f), + sorted(data, key=f, reverse=True)[:n]) #======================================================================= =======
I don't want to put words into your mouth, so is this a vote against a key= argument for min and max?
Right. I don't think there is any need.
Hm, min and max are probably needed 2-3 orders of magnitude more frequently than nsmallest/nlargest. So I think it's reasonable to add the key= argument to min and max as well. (We didn't leave it out of sorted() because you can already do it with list.sort().)
def test_largest(self):
shouldn't that be test_nlargest? -- --Guido van Rossum (home page: http://www.python.org/~guido/)
-----Original Message----- From: python-dev-bounces+python=rcn.com@python.org [mailto:python-dev- bounces+python=rcn.com@python.org] On Behalf Of Steven Bethard Sent: Wednesday, December 01, 2004 4:04 PM To: python-dev@python.org Subject: [Python-Dev] adding key argument to min and max
This is my first post to Python dev, so I figured I should introduce myself.
My name's Steven Bethard and I'm a computer science Ph.D. student at the University of Colorado at Boulder working primarily in the areas of natural language processing and machine learning. During my undergrad at the University of Arizona, I worked as a teaching assistant teaching Java for 2 1/2 years, though now that I'm at CU Boulder I pretty much only work in Python. I started getting active on the Python list about 6 months ago, and I've been watching python-dev for the last few months.
For Python 2.5, I'd like to add a keyword argument 'key' to min and max like we have now for list.sort and sorted. . . . I've implemented a patch that provides this functionality, but there are a few concerns about how it works.
Guido says yes. So, load the patch to SF and assign to me for review, testing, and documentation. Raymond
participants (4)
-
Guido van Rossum
-
Phillip J. Eby
-
Raymond Hettinger
-
Steven Bethard