On Sat, Apr 27, 2019 at 6:10 AM Steven D'Aprano <steve@pearwood.info> wrote:
The statistics module is soon to get a quantile function.
For those of you who use statistics software (whether in Python, or using other languages) and specifically use quantiles, what sort of functionality would be useful to you?
For example:
- evenly-spaced quantiles (say, at 0.25, 0.5, 0.75)? - unevenly-spaced quantiles (0.25, 0.8, 0.9, 0.995)?
If I'm interested in multiple quantiles, they are usually unevenly spaced. Something like [0.8, 0.9, 0.95, 0.99, 0.999] would be pretty typical if I'm not sure what the right threshold for an "outlier" is.
- one quantile at a time?
Yes, this is also quite common, once I know what threshold I care about.
- any specific definition?
NumPy's quantile function has an "interpolation" option for controlling the quantile definition. But in years of calculating quantiles for data analysis, I've never used it.
- quantiles of a distribution?
Yes, rarely -- though the only example that comes to mind is quantiles for a Normal distribution. (scipy.stats supports this use-case well.)
- anything else?
The flexibility of calculating either one or multiple quantiles with np.quantile() is pretty convenient. But this might make for a more dynamic type signature that you'd like for the standard library, e.g., T = TypeVar('T') @overload def quantile(data: Iterable[T], threshold: float) -> T: ... @overload def quantile(data: Iterable[T], threshold: Sequence[float]) -> List[T]: ...
Thanks in advance.
-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/