Hi, friends. I wanna ask if there is a function which is able to take a list as argument and then return its top-k maximums?

Bryan bryanjugglercryptographer at yahoo.com
Fri Apr 23 04:24:58 EDT 2010


Steven D'Aprano wrote:
> John Nagle wrote:
> >    Is "nlargest" smart enough to decide when it's cheaper to track the
> > N largest entries on a linear pass through the list than to sort?

It *always* does a linear pass through the list (linear, that is in
the length of the entire list). It tracks the n largest elements seen
so far in a heap.

> Doesn't appear to do so. From Python 3.1:

I think you're mis-reading it.

> def nlargest(n, iterable):
>     """Find the n largest elements in a dataset.
>
>     Equivalent to:  sorted(iterable, reverse=True)[:n]
>     """
>     it = iter(iterable)
>     result = list(islice(it, n))
>     if not result:
>         return result
>     heapify(result)
>     _heappushpop = heappushpop
>     for elem in it:
>         _heappushpop(result, elem)
>     result.sort(reverse=True)
>     return result

That doesn't sort, or even heapify, the whole list. It keeps the
largest n elements seen so far in the 'result' heap, smallest on top.

Note the doc for heappushpop: "Push item on the heap, then pop and
return the smallest item from the heap." Thus the 'result' heap stays
of size n.

The final result.sort() is just so the returned list is sorted, and
assuming that's a requirement, I think the algorithm is asymptotically
the best a comparison-based method can do, regardless of whether the
length of the entire sequence dominates n. I figure the worst-case run
time is Theta(s lg(n)) where s in the length of the sequence.

> Interestingly, nsmallest does use two different algorithms,
> depending on how many items you ask for. See the source code.

That is interesting. The above algorithm for nlargest is better, but
to use it for nsmallest requires a largest-on-top heap, which the
module does not bother to implement.


--
--Bryan



More information about the Python-list mailing list