On Sat, Apr 27, 2019 at 6:10 AM Steven D'Aprano <steve@pearwood.info> wrote:
The statistics module is soon to get a quantile function.

For those of you who use statistics software (whether in Python, or
using other languages) and specifically use quantiles, what sort of
functionality would be useful to you?

For example:

- evenly-spaced quantiles (say, at 0.25, 0.5, 0.75)?
- unevenly-spaced quantiles (0.25, 0.8, 0.9, 0.995)?

If I'm interested in multiple quantiles, they are usually unevenly spaced. Something like [0.8, 0.9, 0.95, 0.99, 0.999] would be pretty typical if I'm not sure what the right threshold for an "outlier" is.
 
- one quantile at a time?

Yes, this is also quite common, once I know what threshold I care about.
 
- any specific definition?

NumPy's quantile function has an "interpolation" option for controlling the quantile definition. But in years of calculating quantiles for data analysis, I've never used it.
 
- quantiles of a distribution?

Yes, rarely -- though the only example that comes to mind is quantiles for a Normal distribution. (scipy.stats supports this use-case well.)
 
- anything else?

The flexibility of calculating either one or multiple quantiles with np.quantile() is pretty convenient. But this might make for a more dynamic type signature that you'd like for the standard library, e.g.,

T = TypeVar('T')

@overload
def quantile(data: Iterable[T], threshold: float) -> T: ...

@overload
def quantile(data: Iterable[T], threshold: Sequence[float]) -> List[T]: ...
 
Thanks in advance.


--
Steven
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/