Users of statistics software, what quantile functionality would be useful for you?
The statistics module is soon to get a quantile function. For those of you who use statistics software (whether in Python, or using other languages) and specifically use quantiles, what sort of functionality would be useful to you? For example: - evenly-spaced quantiles (say, at 0.25, 0.5, 0.75)? - unevenly-spaced quantiles (0.25, 0.8, 0.9, 0.995)? - one quantile at a time? - any specific definition? - quantiles of a distribution? - anything else? Thanks in advance. -- Steven
On Sat, Apr 27, 2019 at 6:10 AM Steven D'Aprano <steve@pearwood.info> wrote:
The statistics module is soon to get a quantile function.
For those of you who use statistics software (whether in Python, or using other languages) and specifically use quantiles, what sort of functionality would be useful to you?
For example:
- evenly-spaced quantiles (say, at 0.25, 0.5, 0.75)? - unevenly-spaced quantiles (0.25, 0.8, 0.9, 0.995)?
If I'm interested in multiple quantiles, they are usually unevenly spaced. Something like [0.8, 0.9, 0.95, 0.99, 0.999] would be pretty typical if I'm not sure what the right threshold for an "outlier" is.
- one quantile at a time?
Yes, this is also quite common, once I know what threshold I care about.
- any specific definition?
NumPy's quantile function has an "interpolation" option for controlling the quantile definition. But in years of calculating quantiles for data analysis, I've never used it.
- quantiles of a distribution?
Yes, rarely -- though the only example that comes to mind is quantiles for a Normal distribution. (scipy.stats supports this use-case well.)
- anything else?
The flexibility of calculating either one or multiple quantiles with np.quantile() is pretty convenient. But this might make for a more dynamic type signature that you'd like for the standard library, e.g., T = TypeVar('T') @overload def quantile(data: Iterable[T], threshold: float) -> T: ... @overload def quantile(data: Iterable[T], threshold: Sequence[float]) -> List[T]: ...
Thanks in advance.
-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Steven D'Aprano writes:
using other languages) and specifically use quantiles, what sort of functionality would be useful to you?
For example:
- evenly-spaced quantiles (say, at 0.25, 0.5, 0.75)?
Yes.
- unevenly-spaced quantiles (0.25, 0.8, 0.9, 0.995)?
Rarely.
- one quantile at a time?
Yes.
- any specific definition?
My students' data is qualitative survey (Likert scale or categorical) or government-issue, "any" is "good enough for gov't work".
- quantiles of a distribution?
You mean the inverse of the cumulative distribution function? Very rarely.
- anything else?
Not that I can think of. Thank you for this addition! Steve
On Sat, Apr 27, 2019 at 7:51 AM Steven D'Aprano <steve@pearwood.info> wrote:
The statistics module is soon to get a quantile function.
For those of you who use statistics software (whether in Python, or using other languages) and specifically use quantiles, what sort of functionality would be useful to you?
For example:
- evenly-spaced quantiles (say, at 0.25, 0.5, 0.75)? - unevenly-spaced quantiles (0.25, 0.8, 0.9, 0.995)? - one quantile at a time? - any specific definition? - quantiles of a distribution? - anything else?
The stats that are pored over by my team every week are running times of mypy in various configurations. We currently show p25, p50, p75, p90, p95 and p99. We currently use the following definition: def pick(data: List[float], fraction: float) -> float: index = int(len(data) * fraction) before = data[max(0, index - 1)] after = data[min(len(data) - 1, index)] return (before + after) / 2.0 where `data` is a sorted array. Essentially we use the average of the two values nearest the cutoff point, except for edge cases. (I think we could do better, but this is the code I found in our repo. :-) -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him/his **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
participants (4)
-
Guido van Rossum
-
Stephan Hoyer
-
Stephen J. Turnbull
-
Steven D'Aprano