[issue11180] More efficient nlargest()/nsmallest()

Raymond Hettinger report at bugs.python.org
Fri Feb 11 09:51:34 CET 2011


Raymond Hettinger <rhettinger at users.sourceforge.net> added the comment:

The current routines are designed to consume only k elements of memory.  That is the way it should remain.  Manifesting the entire input iterable into memory is unnecessary and is not cache friendly.

Also, I question your assertions about running time.  In the worst case where the entire input in reverse sorted, it does take O(log k) for each insertion.  For other cases, like random orderings, it takes much less because many of the inputs only need to compare to the very top element of the heap.  In practice for small k and large n, running time close to O(n) is not uncommon.

-----------------
class Int(int):
    def __lt__(self, other):
        global cmps
        cmps += 1
        return int.__lt__(self, other)

from random import shuffle
from heapq import nsmallest

n = 10000
k = 16
data = list(map(Int, range(n)))

for i in range(10):
    shuffle(data)
    cmps = 0
    result = nsmallest(k, data)
    print(cmps, result)

-- test run ------------------

10485 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
10560 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
10584 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
10518 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
10582 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
10650 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
10409 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
10577 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
10499 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
10603 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]

----------
assignee:  -> rhettinger
components: +Library (Lib)
resolution:  -> rejected
status: open -> closed
versions: +Python 3.3

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue11180>
_______________________________________


More information about the Python-bugs-list mailing list