Selecting k smallest or largest elements from a large list in python; (benchmarking)
Peter Otten
__peter__ at web.de
Thu Sep 2 02:59:59 EDT 2010
Dmitry Chichkov wrote:
> Given: a large list (10,000,000) of floating point numbers;
> Task: fastest python code that finds k (small, e.g. 10) smallest
> items, preferably with item indexes;
> Limitations: in python, using only standard libraries (numpy & scipy
> is Ok);
>
> I've tried several methods. With N = 10,000,000, K = 10 The fastest so
> far (without item indexes) was pure python implementation
> nsmallest_slott_bisect (using bisect/insert). And with indexes
> nargsmallest_numpy_argmin (argmin() in the numpy array k times).
>
> Anyone up to the challenge beating my code with some clever selection
> algorithm?
If you don't care about stability, i. e. whether nlargest(2, [1, 2, 2.0])
returns [1, 2] or [1, 2.0], use
_heapq.nlargest() directly
$ python nsmallest_perf2.py
nsmallest --> 0.142784833908
nsmallest_slott_bisect --> 0.19291305542
$ cat nsmallest_perf2.py
from random import randint, random
import time
from bisect import insort
from itertools import islice
import _heapq
_funcs = []
def timeit(f):
_funcs.append(f)
def time_all(*args):
funcs = _funcs
width = max(len(f.__name__) for f in funcs)
prev = None
for f in funcs:
start = time.time()
result = f(*args)
end = time.time()
print "%-*s --> %10s" % (width, f.__name__, end - start)
if prev is None:
prev = result
else:
assert prev == result
timeit(_heapq.nsmallest)
@timeit
def nsmallest_slott_bisect(n, iterable, insort=insort):
it = iter(iterable)
mins = sorted(islice(it, n))
for el in it:
if el <= mins[-1]: #NOTE: equal sign is to preserve duplicates
insort(mins, el)
mins.pop()
return mins
test_data = [randint(10, 50) + random() for i in range(10**6)]
K = 10
time_all(K, test_data)
Peter
More information about the Python-list
mailing list