Mailman 3 Merging very limited weights support for quantiles/percentiles - NumPy-Discussion

27 Oct 2023

      Hi all,

there is a PR to merge very limited support for weights in quantiles,
which given no further input I will probably merge based on sklearn
devs saying that they will use it.  This means, adding a `weights`
kwarg [1]. See:

    https://github.com/numpy/numpy/pull/24254

Limited here means that it would only work for the "inverted_cdf"
method (which is not the default one).

Why is it very limited?  Because this limited version is the only form
we/I am pretty confident about getting it right.

There are various problems with making it more broad:
1. Weights are not clearly defined and can have many meanings, e.g.:
   * frequency weights (repeated observations)
   * probability weights (removing sample biases)
   * "analytic"/"precision" weights (encoding observation
     precision/variance).

2. There is very little to no literature on how to deal with the
   subtleties of dealing with (in the context of the various types
   of weights:
   * Interpolation (relevant to all interpolating methods)
   * Unbiasing (the main difference between the methods)

The PR adds the most minimal thing, where weights are largly equivalent
(no unbiasing issues, no interpolation). [2]

Due to these complexities (and the lack of many statistic specialists
looking at it) there is a point to be made that we just shouldn't add
this in NumPy, but if nobody else has an opinion, I will go with the
sklearn devs who want it :).
(Also with weights we have to rely on full sorting for now, which can
be slow, which I can live with personally.)

- Sebastian

[1] There are different styles of weights and for some method that
clearly matters.  Thus, if we ever expand the definition, it may be
that `weights` has to be mapped to one of these, or that the the
generic `weights` kwarg would raise an error for these that you need to
pick a specific one like `pweights=`, or `fweights=`.

[2] I am not quite sure about "analytic weights" here, but to me these
do not really make sense in the context of a discrete interpolation
method.

Merging very limited weights support for quantiles/percentiles

Sebastian Berg

tags

participants (1)