[Numpy-discussion] Adding weights to cov and corrcoef

Thu Mar 6 14:51:29 EST 2014

On Wed, Mar 5, 2014 at 4:45 PM, Sebastian Berg
<sebastian at sipsolutions.net> wrote:
>
> Hi all,
>
> in Pull Request https://github.com/numpy/numpy/pull/3864 Neol Dawe
> suggested adding new parameters to our `cov` and `corrcoef` functions to
> implement weights, which already exists for `average` (the PR still
> needs to be adapted).
>
> The idea right now would be to add a `weights` and a `frequencies`
> keyword arguments to these functions.
>
> In more detail: The situation is a bit more complex for `cov` and
> `corrcoef` than `average`, because there are different types of weights.
> The current plan would be to add two new keyword arguments:
>   * weights: Uncertainty weights which causes `N` to be recalculated
>     accordingly (This is R's `cov.wt` default I believe).
>   * frequencies: When given, `N = sum(frequencies)` and the values
>     are weighted by their frequency.

I don't understand this description at all. One them recalculates N,
and the other sets N according to some calculation?

Is there a standard reference on how these are supposed to be
interpreted? When you talk about per-value uncertainties, I start
imagining that we're trying to estimate a population covariance given
a set of samples each corrupted by independent measurement noise, and
then there's some natural hierarchical Bayesian model one could write
down and get an ML estimate of the latent covariance via empirical
Bayes or something. But this requires a bunch of assumptions and is
that really what we want to do? (Or maybe it collapses down into
something simpler if the measurement noise is gaussian or something?)

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org