[New-bugs-announce] [issue6614] heapq.nsmallest and nlargest should be smarter/more usable/more consistent
report at bugs.python.org
Fri Jul 31 18:56:02 CEST 2009
New submission from Joshua Bronson <jabronson at gmail.com>:
> The latter two functions (nlargest and nsmallest) perform best for
> smaller values of n. For larger values, it is more efficient to use
> the sorted() function. Also, when n==1, it is more efficient to use
> the builtin min() and max() functions.
A quick usability win for the heapq module: Be more specific than "smaller" and
"larger", e.g. "when n is O(...(len(iterable)))".
> I find it interesting that the heapq functions tell you in the
> documentation that they aren't suitable for use where n==1 or where
> n is near the total size of the sequence whereas random.sample()
> chooses what it thinks is the best algorithm based on the input. At
> the very least I would have thought the heapq functions could check
> for n==1 and call min/max when it is.
+1. It looks like the pure Python implementation of nsmallest is actually already
choosing either an insort algorithm or a minheap algorithm based on whether n * 10 <=
len(iterable), but the C implementation of nsmallest unconditionally uses a minheap
algorithm. Neither the pure Python nor the C implementation of nlargest chooses its
algorithm conditionally. For the sake of consistency and usability, all
implementations of nsmallest and nlargest should choose the most efficient algorithm
from among min/max, insort, and minheap, and the docs should be updated to describe
Also, in looking through the heapq.py that came with my Python 2.6.2 distribution, I
noticed that line 196 seems to have no reason to be there:
_heappushpop = heappushpop
This is the only occurrence of "_heappushpop" in the file.
I made a patch for heapq.py, but since I'm not a CPython hacker, I thought it might
be better to leave the changes up to someone who could do both the pure Python and
the C implementations in tandem. I'd be happy to contribute documentation when the
time comes if that would help.
title: heapq.nsmallest and nlargest should be smarter/more usable/more consistent
versions: Python 2.5, Python 2.6, Python 2.7, Python 3.0, Python 3.1
Python tracker <report at bugs.python.org>
More information about the New-bugs-announce