On 09.06.2013 12:10, josef.pktd@gmail.com wrote:
On Wed, May 29, 2013 at 3:19 PM, <josef.pktd@gmail.com> wrote:
On Wed, May 29, 2013 at 12:25 PM, Julian Taylor <jtaylor.debian@googlemail.com> wrote:
On 05/29/2013 06:12 AM, josef.pktd@gmail.com wrote:
On Tue, May 28, 2013 at 6:31 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
Hi All,
There is a PR adding quickselect to numpy as a function `partition`. Comments on name and exposure in the numpy API are welcome.
here a a quick status report on the PR https://github.com/numpy/numpy/pull/3360 I now implemented partitioning via the introselect algorithm which is a quickselect median of 3 pivot with a cutoff on recursion depth to a median of median of 5 pivot for O(N) worst case complexity. Additionally it can stores its pivots for reuse in other partitions on the same data to reduce the space required to be partitioned next time, this is useful e.g. for multiple percentiles. It is functionally ready, but there are still some API/ABI issues to be solved. Mainly deciding if we put the selection algorithms in the ABI for 1.8 or not. Currently the ABI is unchanged so user types cannot make use of the algorithms (they will fall back to quicksort). The python api is now: np.partition(array, kth) where kth is an integer or array of integers it will move each index in kth into its final sorted position, so np.partition(a, range(a.size)) results in a (inefficient) sort. e.g.: d = np.array([66, 81, 21, 75, 46, -6, 66, 86, 242, 47, 88, 79]) np.partition(d, (2, -2)) # (2, 8) array([ -6, 21, 46, 47, 75, 66, 66, 79, 81, 86, 88, 242]) Multidimensional arrays will use the same array of kth, you cannot partition each axis by different values, you would have to explicitly loop to do that. Median is implemented in terms of partitioning already, but percentile is not. I would suggest someone else than me gives a try at implementing percentile in terms of partition to see if the documentation and api make sense to others.