[Python-ideas] Adding quantile to the statistics module

Mahmoud Hashemi mahmoud at hatnote.com
Sat Mar 17 13:20:53 EDT 2018


Hahaha, that Hyndman story will never get old.

FWIW, based on much informal polling, the most common intuition on the
topic stems from elementary education: a median of an even-numbered set is
the mean of the two central values. So, linear-weighted average on
discontinuities seems to be least surprising.

Whichever type is chosen, quantiles are often computed in sets. For
instance, min/max/median, quartiles (+ interquartile range), and
percentiles. Quantiles was one of the main reasons statsutils uses a
class[1] to wrap datasets. Otherwise, there's a lot of work in resorting.
All the galloping in the world isn't going to beat sorting once. :)

Other calculations benefit from this cached approach, too. Variance is
faster to calculate after calculating stddev, for instance, but if memory
serves, quantiles are the most expensive for mid-sized datasets that don't
call for pandas/numpy.

[1]:
http://boltons.readthedocs.io/en/latest/statsutils.html#boltons.statsutils.Stats

On Sat, Mar 17, 2018 at 9:28 AM, Tim Peters <tim.peters at gmail.com> wrote:

> [Guido]
> > Since Python is not held to backwards compatibility with S, and for most
> > datasets (and users) it doesn't matter much, why not ho with the default
> > recommended by Hyndman & Fan?
>
> Here's Hyndman in 2016[1]:
>
> """
> The main point of our paper was that statistical software should
> standardize the definition of a sample quantile for consistency. We
> listed 9 different methods that we found in various software packages,
> and argued for one of them (type 8). In that sense, the paper was a
> complete failure. No major software uses type 8 by default, and the
> diversity of definitions continues 20 years later. In fact, the paper
> may have had the opposite effect to what was intended. We drew
> attention to the many approaches to computing sample quantiles and
> several software products added them all as options. Our own quantile
> function for R allows all 9 to be computed, and has type 7 as default
> (for backwards consistency – the price we had to pay to get R core to
> agree to include our function).
> """
>
> Familiar & hilarious ;-)
>
> [1] https://robjhyndman.com/hyndsight/sample-quantiles-20-years-later/
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180317/c058a0bc/attachment-0001.html>


More information about the Python-ideas mailing list