Selecting k smallest or largest elements from a large list in python; (benchmarking)
Terry Reedy
tjreedy at udel.edu
Thu Sep 2 18:14:42 EDT 2010
On 9/1/2010 9:08 PM, Dmitry Chichkov wrote:
> Given: a large list (10,000,000) of floating point numbers;
> Task: fastest python code that finds k (small, e.g. 10) smallest
> items, preferably with item indexes;
> Limitations: in python, using only standard libraries (numpy& scipy
> is Ok);
>
> I've tried several methods. With N = 10,000,000, K = 10 The fastest so
> far (without item indexes) was pure python implementation
> nsmallest_slott_bisect (using bisect/insert). And with indexes
> nargsmallest_numpy_argmin (argmin() in the numpy array k times).
>
> Anyone up to the challenge beating my code with some clever selection
> algorithm?
>
> Current Table:
> 1.66864395142 mins_heapq(items, n):
> 0.946580886841 nsmallest_slott_bisect(items, n):
> 1.38014793396 nargsmallest(items, n):
> 10.0732769966 sorted(items)[:n]:
> 3.17916202545 nargsmallest_numpy_argsort(items, n):
> 1.31794500351 nargsmallest_numpy_argmin(items, n):
> 2.37499308586 nargsmallest_numpy_array_argsort(items, n):
> 0.524670124054 nargsmallest_numpy_array_argmin(items, n):
>
> 0.0525538921356 numpy argmin(items): 1892997
> 0.364673852921 min(items): 10.0000026786
Your problem is underspecified;-).
Detailed timing comparisons are only valid for a particular Python
version running under a particular OS on particular hardware. So, to
actually run a contest, you would have to specify a version and OS and
offer to run entries on your machine, with as much else as possible
turned off, or else enlist a neutral judge to do so. And the timing
method should also be specified.
--
Terry Jan Reedy
More information about the Python-list
mailing list