
On Thu, Mar 6, 2014 at 2:51 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Wed, Mar 5, 2014 at 4:45 PM, Sebastian Berg <sebastian@sipsolutions.net> wrote:
Hi all,
in Pull Request https://github.com/numpy/numpy/pull/3864 Neol Dawe suggested adding new parameters to our `cov` and `corrcoef` functions to implement weights, which already exists for `average` (the PR still needs to be adapted).
The idea right now would be to add a `weights` and a `frequencies` keyword arguments to these functions.
In more detail: The situation is a bit more complex for `cov` and `corrcoef` than `average`, because there are different types of weights. The current plan would be to add two new keyword arguments: * weights: Uncertainty weights which causes `N` to be recalculated accordingly (This is R's `cov.wt` default I believe). * frequencies: When given, `N = sum(frequencies)` and the values are weighted by their frequency.
I don't understand this description at all. One them recalculates N, and the other sets N according to some calculation?
Is there a standard reference on how these are supposed to be interpreted? When you talk about per-value uncertainties, I start imagining that we're trying to estimate a population covariance given a set of samples each corrupted by independent measurement noise, and then there's some natural hierarchical Bayesian model one could write down and get an ML estimate of the latent covariance via empirical Bayes or something. But this requires a bunch of assumptions and is that really what we want to do? (Or maybe it collapses down into something simpler if the measurement noise is gaussian or something?)
I think the idea is that if you write formulas involving correlation or covariance using matrix notation, then these formulas can be generalized in several different ways by inserting some non-negative or positive diagonal matrices into the formulas in various places. The diagonal entries could be called 'weights'. If they are further restricted to sum to 1 then they could be called 'frequencies'. Or maybe this is too cynical and the jargon has a more standard meaning in this context.