Mailman 3 Percentile/Quantile "interpolation" refactor - NumPy-Discussion

Oct. 13, 2021

      Hi all,

after a long time Abel has helped us and refactored the quantile and
percentile functions' `interpolation` keyword.

This was long overdue since NumPy implements three (the non-default)
interpolation methods that appear to be very much non-standard.  On the
other hand, NumPy currently has no unbiased methods (i.e. population
estimate).

There are two main questions right now with respect to the API.  First
which names to use for the methods and second, how to deal with
"outliers".

The PR https://github.com/numpy/numpy/pull/19857#issuecomment-939852134
adds the methods and gives them (currently) the following names (sorted
by the R methods) – the names will be used as string identifiers:

1. inverted cdf
2. averaged inverted cdf
3. closest observation
4. interpolated inverted cdf
5. hazen  (name from wolfram)
6. weibull  (name from wolfram)
7. linear  (default!  Better name deferred)
8. median unbiased
9. normal unbiased

And additionally the four ones we currently have:

* lower
* higher
* nearest
* midpoint

Number 5. and 6. are named "exclusive" and "inclusive" by Python in
their `method` keyword argument.  While I like the name `method=` and
may want to move to it, I am not sure I like "inclusive" and
"exclusive".
The current plan was to defer the kwarg rename into a followup,
although it should be discussed before the next release.

The second main question is how to deal with outliers (this does not
affect the default method 7, which finds the sample quantiles and not a
population estimate).  Wikipedia says this:

    Packages differ in how they estimate quantiles beyond the lowest
    and highest values in the sample, i.e. p < 1/N and p > (N − 1)/N.
    Choices include returning an error value, computing linear
    extrapolation, or assuming a constant value.

The current choice is clipping (assuming a constant value), but this
could be modified.

Any feedback is appreciated!  Otherwise, this will probably move
forward in the current state for the next release.

Cheers,

Sebastian

Percentile/Quantile "interpolation" refactor

Sebastian Berg

Abel AOUN

Freya Pachl

joeevansjoe6＠gmail.com

A.S. Khangura

bas van beek

harrytallh＠gmail.com

Sebastian Berg

Norma Dye

Abel AOUN

Freya Pachl

joeevansjoe6＠gmail.com

A.S. Khangura

bas van beek

harrytallh＠gmail.com

Sebastian Berg

Norma Dye

tags

participants (8)