On 01/30/2015 02:57 AM, Neal Becker wrote:
It sounds like np.partition could be used to answer the question: give me the highest K elements in a vector.
Is this a correct interpretation? Something like partial sort, but returned elements are unsorted.
I could really make some use of this, but in my case it is a list of objects I need to sort on a particular key. Is this algorithm available in general python code (not specific to numpy arrays)?
This is a correct interpretation, it is a selection algorithm, (np.select was already taken and I don't like c++ name nth_element). It guarantees all elements after the kth element are larger/equal to the kth and everything before smaller/equal. It can also select multiple orders in one go, e.g. the 10 largest and the 10 smallest (used for e.g. np.percentile) Note you can create a partial sort by running selection first and then sorting only the sections you want sorted. Its not perfectly efficient as it won't share selection pivots with quickselect but its faster than a full sort. Unfortunately python does not have something similar for objects. There is an issue for it though: http://bugs.python.org/issue21592 As a workaround you can generate a metric ndarray of your objects e.g. by selecting only your keys from the object that can be sorted with less than (ideally into a native numpy type like int/float) and then use argpartition to get the indices that would sort the original object array/list. If that is actually going to be faster than just using pythons full sort would need to be tested. Also be aware of https://github.com/numpy/numpy/issues/5524 jaime has just found likely after when looking at your mail :)