[New-bugs-announce] [issue21592] Make statistics.median run in linear time
Thomas Dybdahl Ahle
report at bugs.python.org
Wed May 28 14:11:38 CEST 2014
New submission from Thomas Dybdahl Ahle:
The statistics module currently contains the following comment:
"FIXME: investigate ways to calculate medians without sorting? Quickselect?"
This is important, because users expect standard library functions to use state of the art implementations, and median by sorting has never been that.
There are many good linear time alternatives, the classical median-of-k algorithm was posted to the mailing list in a nice version by David Eppstein in 2002 . The fastest method in practice is probably Quick Select in the Floyd-Rivest version  which is similar to quick sort.
These algorithms also have the feature of letting us select any k-th order statistic, not just the median. This seems conceptually a lot simpler than the current median/median_low/median_high split.
However, sticking with the current api, a new implementation would have to support calculating a median as the average of n//2 and (n+1)//2'th order statistics. This could be implemented either by calling the select function twice, or by implementing a multi-select algorithm, which is also a well studied problem .
I'll be happy to contribute code, or help out in any other way.
components: Library (Lib)
title: Make statistics.median run in linear time
versions: Python 3.4, Python 3.5
Python tracker <report at bugs.python.org>
More information about the New-bugs-announce