Percentile/Quantile "interpolation" refactor

Hi all, after a long time Abel has helped us and refactored the quantile and percentile functions' `interpolation` keyword. This was long overdue since NumPy implements three (the non-default) interpolation methods that appear to be very much non-standard. On the other hand, NumPy currently has no unbiased methods (i.e. population estimate). There are two main questions right now with respect to the API. First which names to use for the methods and second, how to deal with "outliers". The PR https://github.com/numpy/numpy/pull/19857#issuecomment-939852134 adds the methods and gives them (currently) the following names (sorted by the R methods) – the names will be used as string identifiers: 1. inverted cdf 2. averaged inverted cdf 3. closest observation 4. interpolated inverted cdf 5. hazen (name from wolfram) 6. weibull (name from wolfram) 7. linear (default! Better name deferred) 8. median unbiased 9. normal unbiased And additionally the four ones we currently have: * lower * higher * nearest * midpoint Number 5. and 6. are named "exclusive" and "inclusive" by Python in their `method` keyword argument. While I like the name `method=` and may want to move to it, I am not sure I like "inclusive" and "exclusive". The current plan was to defer the kwarg rename into a followup, although it should be discussed before the next release. The second main question is how to deal with outliers (this does not affect the default method 7, which finds the sample quantiles and not a population estimate). Wikipedia says this: Packages differ in how they estimate quantiles beyond the lowest and highest values in the sample, i.e. p < 1/N and p > (N − 1)/N. Choices include returning an error value, computing linear extrapolation, or assuming a constant value. The current choice is clipping (assuming a constant value), but this could be modified. Any feedback is appreciated! Otherwise, this will probably move forward in the current state for the next release. Cheers, Sebastian

Hello, Thanks for the summary of the PR Sebastian. About the default value of python `quantile` being "exclusive". They give some explanation about why it is their default, but only in a commented bloc above the code of `quantile` and not in the actual documentation. You can see it here: https://github.com/python/cpython/blob/main/Lib/statistics.py#L615 For reference the documentation is here: https://docs.python.org/3/library/statistics.html#statistics.quantiles By the way, it is method 6 which is named "exclusive" and method 7 named "inclusive" by python. And "inclusive" and "exclusive" names seem to come from excel. Sincerely, Abel ----- Mail original ----- De: "Sebastian Berg" <sebastian@sipsolutions.net> À: "Discussion of Numerical Python" <numpy-discussion@python.org> Envoyé: Mercredi 13 Octobre 2021 17:25:19 Objet: [Numpy-discussion] Percentile/Quantile "interpolation" refactor Hi all, after a long time Abel has helped us and refactored the quantile and percentile functions' `interpolation` keyword. This was long overdue since NumPy implements three (the non-default) interpolation methods that appear to be very much non-standard. On the other hand, NumPy currently has no unbiased methods (i.e. population estimate). There are two main questions right now with respect to the API. First which names to use for the methods and second, how to deal with "outliers". The PR https://github.com/numpy/numpy/pull/19857#issuecomment-939852134 adds the methods and gives them (currently) the following names (sorted by the R methods) – the names will be used as string identifiers: 1. inverted cdf 2. averaged inverted cdf 3. closest observation 4. interpolated inverted cdf 5. hazen (name from wolfram) 6. weibull (name from wolfram) 7. linear (default! Better name deferred) 8. median unbiased 9. normal unbiased And additionally the four ones we currently have: * lower * higher * nearest * midpoint Number 5. and 6. are named "exclusive" and "inclusive" by Python in their `method` keyword argument. While I like the name `method=` and may want to move to it, I am not sure I like "inclusive" and "exclusive". The current plan was to defer the kwarg rename into a followup, although it should be discussed before the next release. The second main question is how to deal with outliers (this does not affect the default method 7, which finds the sample quantiles and not a population estimate). Wikipedia says this: Packages differ in how they estimate quantiles beyond the lowest and highest values in the sample, i.e. p < 1/N and p > (N − 1)/N. Choices include returning an error value, computing linear extrapolation, or assuming a constant value. The current choice is clipping (assuming a constant value), but this could be modified. Any feedback is appreciated! Otherwise, this will probably move forward in the current state for the next release. Cheers, Sebastian _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: aoun@cerfacs.fr

On Thu, 21 Oct, 2021, 11:15 am , <joeevansjoe6@gmail.com> wrote:
keep sharing such informative article . https://bit.ly/3C551OO
Is this Spam?

Is this Spam?
Very much so, I'm afraid. There's been a bit of a spam problem as of late. Regards, Bas ________________________________ From: A.S. Khangura <arsh840@gmail.com> Sent: 21 October 2021 10:24 To: Discussion of Numerical Python <numpy-discussion@python.org> Subject: [Numpy-discussion] Re: Percentile/Quantile "interpolation" refactor On Thu, 21 Oct, 2021, 11:15 am , <joeevansjoe6@gmail.com<mailto:joeevansjoe6@gmail.com>> wrote: keep sharing such informative article . https://bit.ly/3C551OO Is this Spam? _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org<mailto:numpy-discussion@python.org> To unsubscribe send an email to numpy-discussion-leave@python.org<mailto:numpy-discussion-leave@python.org> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arsh840@gmail.com<mailto:arsh840@gmail.com>

that is amazing to see this types of articles . keep posting such type of articles . https://bit.ly/3C5wqA9

On Wed, 2021-10-13 at 10:25 -0500, Sebastian Berg wrote:
This PR is now merged to be included in the upcoming 1.22 release. Please don't hesitate in case there is any concern about it, all notes from the old email remain unchanged. There is a good chance that the documentation could use a bit of revising so input would be greatly appreciated! The one thing that I definitely plan to do before the next release is to rename the `interpolation` keyword argument to `method`. Method seems a much clearer name and it forces users who do not use the default to consider switching to a more standard methods. (Only the default version is really described in literature.) Cheers, Sebastian

Hello, Thanks for the summary of the PR Sebastian. About the default value of python `quantile` being "exclusive". They give some explanation about why it is their default, but only in a commented bloc above the code of `quantile` and not in the actual documentation. You can see it here: https://github.com/python/cpython/blob/main/Lib/statistics.py#L615 For reference the documentation is here: https://docs.python.org/3/library/statistics.html#statistics.quantiles By the way, it is method 6 which is named "exclusive" and method 7 named "inclusive" by python. And "inclusive" and "exclusive" names seem to come from excel. Sincerely, Abel ----- Mail original ----- De: "Sebastian Berg" <sebastian@sipsolutions.net> À: "Discussion of Numerical Python" <numpy-discussion@python.org> Envoyé: Mercredi 13 Octobre 2021 17:25:19 Objet: [Numpy-discussion] Percentile/Quantile "interpolation" refactor Hi all, after a long time Abel has helped us and refactored the quantile and percentile functions' `interpolation` keyword. This was long overdue since NumPy implements three (the non-default) interpolation methods that appear to be very much non-standard. On the other hand, NumPy currently has no unbiased methods (i.e. population estimate). There are two main questions right now with respect to the API. First which names to use for the methods and second, how to deal with "outliers". The PR https://github.com/numpy/numpy/pull/19857#issuecomment-939852134 adds the methods and gives them (currently) the following names (sorted by the R methods) – the names will be used as string identifiers: 1. inverted cdf 2. averaged inverted cdf 3. closest observation 4. interpolated inverted cdf 5. hazen (name from wolfram) 6. weibull (name from wolfram) 7. linear (default! Better name deferred) 8. median unbiased 9. normal unbiased And additionally the four ones we currently have: * lower * higher * nearest * midpoint Number 5. and 6. are named "exclusive" and "inclusive" by Python in their `method` keyword argument. While I like the name `method=` and may want to move to it, I am not sure I like "inclusive" and "exclusive". The current plan was to defer the kwarg rename into a followup, although it should be discussed before the next release. The second main question is how to deal with outliers (this does not affect the default method 7, which finds the sample quantiles and not a population estimate). Wikipedia says this: Packages differ in how they estimate quantiles beyond the lowest and highest values in the sample, i.e. p < 1/N and p > (N − 1)/N. Choices include returning an error value, computing linear extrapolation, or assuming a constant value. The current choice is clipping (assuming a constant value), but this could be modified. Any feedback is appreciated! Otherwise, this will probably move forward in the current state for the next release. Cheers, Sebastian _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: aoun@cerfacs.fr

On Thu, 21 Oct, 2021, 11:15 am , <joeevansjoe6@gmail.com> wrote:
keep sharing such informative article . https://bit.ly/3C551OO
Is this Spam?

Is this Spam?
Very much so, I'm afraid. There's been a bit of a spam problem as of late. Regards, Bas ________________________________ From: A.S. Khangura <arsh840@gmail.com> Sent: 21 October 2021 10:24 To: Discussion of Numerical Python <numpy-discussion@python.org> Subject: [Numpy-discussion] Re: Percentile/Quantile "interpolation" refactor On Thu, 21 Oct, 2021, 11:15 am , <joeevansjoe6@gmail.com<mailto:joeevansjoe6@gmail.com>> wrote: keep sharing such informative article . https://bit.ly/3C551OO Is this Spam? _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org<mailto:numpy-discussion@python.org> To unsubscribe send an email to numpy-discussion-leave@python.org<mailto:numpy-discussion-leave@python.org> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arsh840@gmail.com<mailto:arsh840@gmail.com>

that is amazing to see this types of articles . keep posting such type of articles . https://bit.ly/3C5wqA9

On Wed, 2021-10-13 at 10:25 -0500, Sebastian Berg wrote:
This PR is now merged to be included in the upcoming 1.22 release. Please don't hesitate in case there is any concern about it, all notes from the old email remain unchanged. There is a good chance that the documentation could use a bit of revising so input would be greatly appreciated! The one thing that I definitely plan to do before the next release is to rename the `interpolation` keyword argument to `method`. Method seems a much clearer name and it forces users who do not use the default to consider switching to a more standard methods. (Only the default version is really described in literature.) Cheers, Sebastian
participants (8)
-
A.S. Khangura
-
Abel AOUN
-
bas van beek
-
Freya Pachl
-
harrytallh@gmail.com
-
joeevansjoe6@gmail.com
-
Norma Dye
-
Sebastian Berg