[Numpy-discussion] question np.partition

Jaime Fernández del Río jaime.frio at gmail.com
Fri Jan 30 10:34:03 EST 2015


On Fri, Jan 30, 2015 at 5:58 AM, Julian Taylor <
jtaylor.debian at googlemail.com> wrote:

> On 01/30/2015 02:57 AM, Neal Becker wrote:
> > It sounds like np.partition could be used to answer the question:
> > give me the highest K elements in a vector.
> >
> > Is this a correct interpretation?  Something like partial sort, but
> returned
> > elements are unsorted.
> >
> > I could really make some use of this, but in my case it is a list of
> objects I
> > need to sort on a particular key.  Is this algorithm available in
> general python
> > code (not specific to numpy arrays)?
> >
>
> This is a correct interpretation, it is a selection algorithm,
> (np.select was already taken and I don't like c++ name nth_element).
> It guarantees all elements after the kth element are larger/equal to the
> kth and everything before smaller/equal.
> It can also select multiple orders in one go, e.g. the 10 largest and
> the 10 smallest (used for e.g. np.percentile)
> Note you can create a partial sort by running selection first and then
> sorting only the sections you want sorted. Its not perfectly efficient
> as it won't share selection pivots with quickselect but its faster than
> a full sort.
>
> Unfortunately python does not have something similar for objects. There
> is an issue for it though:
> http://bugs.python.org/issue21592
>
> As a workaround you can generate a metric ndarray of your objects e.g.
> by selecting only your keys from the object that can be sorted with less
> than (ideally into a native numpy type like int/float) and then use
> argpartition to get the indices that would sort the original object
> array/list.
>

As of right now, np.partition ends up calling np.sort unless the dtype is
one of the numeric types. But it is smoking fast compared to sorting when
the real thing kicks in!

Jaime


> If that is actually going to be faster than just using pythons full sort
> would need to be tested.
> Also be aware of https://github.com/numpy/numpy/issues/5524 jaime has
> just found likely after when looking at your mail :)
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150130/fc531dbe/attachment.html>


More information about the NumPy-Discussion mailing list