Merging very limited weights support for quantiles/percentiles
Hi all, there is a PR to merge very limited support for weights in quantiles, which given no further input I will probably merge based on sklearn devs saying that they will use it. This means, adding a `weights` kwarg [1]. See: https://github.com/numpy/numpy/pull/24254 Limited here means that it would only work for the "inverted_cdf" method (which is not the default one). Why is it very limited? Because this limited version is the only form we/I am pretty confident about getting it right. There are various problems with making it more broad: 1. Weights are not clearly defined and can have many meanings, e.g.: * frequency weights (repeated observations) * probability weights (removing sample biases) * "analytic"/"precision" weights (encoding observation precision/variance). 2. There is very little to no literature on how to deal with the subtleties of dealing with (in the context of the various types of weights: * Interpolation (relevant to all interpolating methods) * Unbiasing (the main difference between the methods) The PR adds the most minimal thing, where weights are largly equivalent (no unbiasing issues, no interpolation). [2] Due to these complexities (and the lack of many statistic specialists looking at it) there is a point to be made that we just shouldn't add this in NumPy, but if nobody else has an opinion, I will go with the sklearn devs who want it :). (Also with weights we have to rely on full sorting for now, which can be slow, which I can live with personally.) - Sebastian [1] There are different styles of weights and for some method that clearly matters. Thus, if we ever expand the definition, it may be that `weights` has to be mapped to one of these, or that the the generic `weights` kwarg would raise an error for these that you need to pick a specific one like `pweights=`, or `fweights=`. [2] I am not quite sure about "analytic weights" here, but to me these do not really make sense in the context of a discrete interpolation method.
participants (1)
-
Sebastian Berg