[issue35775] Add a general selection function to statistics

Fri May 24 11:35:28 EDT 2019

Rémi Lapeyre <remi.lapeyre at henki.fr> added the comment:

Hi Steven, thanks for taking the time to reviewing my patch.

Regarding the relevance of add select(), I was looking for work to do in the bug tracker and found some references to it (https://bugs.python.org/issue21592#msg219934 for example).

I knew that there is multiples definition of the percentiles but got sloppy in my previous response by wanting to answer quickly. I will try not to do this again.

Regarding the use of sorting, I thought that sorting would be quicker than doing the other linear-time algorithm in Python given the general performance of Tim sort, some tests in https://bugs.python.org/issue21592 agreed with that.

For the iterator, I was thinking about how to implement percentiles when writing select() and thought that by writing:

def _select(data, i, key=None):
    if not len(data):
        raise StatisticsError("select requires at least one data point")
    if not (1 <= i <= len(data)):
        raise StatisticsError(f"The index looked for must be between 1 and {len(data)}")
    data = sorted(data, key=key)
    return islice(data, i-1, None)

def select(data, i, key=None):
    return next(_select(data, y, key=key))

and then doing some variant of:

    it = _select(data, i, key=key)
    left, right = next(it), next(it)
    # compute percentile with left and right

to implement the quantiles without sorting multiple time the list. Now that quantiles() has been implement by Raymond Hettinger, this is moot anyway.    

Since its probably not useful, feel free to disregard my PR.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue35775>
_______________________________________