[New-bugs-announce] [issue6614] heapq.nsmallest and nlargest should be smarter/more usable/more consistent

Joshua Bronson report at bugs.python.org
Fri Jul 31 18:56:02 CEST 2009

New submission from Joshua Bronson <jabronson at gmail.com>:

>From http://docs.python.org/library/heapq.html:
> The latter two functions (nlargest and nsmallest) perform best for
> smaller values of n. For larger values, it is more efficient to use
> the sorted() function. Also, when n==1, it is more efficient to use
> the builtin min() and max() functions.

A quick usability win for the heapq module: Be more specific than "smaller" and 
"larger", e.g. "when n is O(...(len(iterable)))".

>From http://groups.google.com/group/comp.lang.python/msg/9556f56ae25ecf3b:
> I find it interesting that the heapq functions tell you in the
> documentation that they aren't suitable for use where n==1 or where
> n is near the total size of the sequence whereas random.sample()
> chooses what it thinks is the best algorithm based on the input. At
> the very least I would have thought the heapq functions could check
> for n==1 and call min/max when it is.

+1. It looks like the pure Python implementation of nsmallest is actually already 
choosing either an insort algorithm or a minheap algorithm based on whether n * 10 <= 
len(iterable), but the C implementation of nsmallest unconditionally uses a minheap 
algorithm. Neither the pure Python nor the C implementation of nlargest chooses its 
algorithm conditionally. For the sake of consistency and usability, all 
implementations of nsmallest and nlargest should choose the most efficient algorithm 
from among min/max, insort, and minheap, and the docs should be updated to describe 
this behavior.

Also, in looking through the heapq.py that came with my Python 2.6.2 distribution, I 
noticed that line 196 seems to have no reason to be there:
    _heappushpop = heappushpop                                      
This is the only occurrence of "_heappushpop" in the file.

I made a patch for heapq.py, but since I'm not a CPython hacker, I thought it might 
be better to leave the changes up to someone who could do both the pure Python and 
the C implementations in tandem. I'd be happy to contribute documentation when the 
time comes if that would help.

messages: 91134
nosy: jab
severity: normal
status: open
title: heapq.nsmallest and nlargest should be smarter/more usable/more consistent
type: behavior
versions: Python 2.5, Python 2.6, Python 2.7, Python 3.0, Python 3.1

Python tracker <report at bugs.python.org>

More information about the New-bugs-announce mailing list