Help understanding the decisions *behind* python?
Duncan Booth
duncan.booth at invalid.invalid
Wed Jul 22 07:55:13 EDT 2009
Steven D'Aprano <steven at REMOVE.THIS.cybersource.com.au> wrote:
> But that's the wrong solution to the problem. The OP wants the largest
> (or smallest) item, which he expects to get by sorting, then grabbing
> the first element:
>
> sorted(alist)[0]
>
> That requires sorting the entire list, only to throw away everything
> except the first item. A better solution is to use min() and max(),
> neither of which sort the list, so they are much faster.
>
And if they want the n smallest or largest items (where n is significantly
less than the length of the list[*]) they might consider using
heapq.nsmallest() and heapq.nlargest()
I find it interesting that the heapq functions tell you in the
documentation that they aren't suitable for use where n==1 or where n is
near the total size of the sequence whereas random.sample() chooses what it
thinks is the best algorithm based on the input. At the very least I would
have thought the heapq functions could check for n==1 and call min/max when
it is.
[*] Some quick playing with timeit seems to indicate that with a list of
200 integers nsmallest is fastest (by a very small margin) when n==2 and
n==3 but is beaten by sorted[:n] when n==4, so I guess you need a
reasonably long list to make it worth not sorting it: with list of 20,000
integers it isn't worth sorting unless you want more than about 700 items.
--
Duncan Booth http://kupuguy.blogspot.com
More information about the Python-list
mailing list