[Numpy-discussion] Adding weights to cov and corrcoef

Sebastian Berg sebastian at sipsolutions.net
Thu Mar 6 19:32:12 EST 2014


On Do, 2014-03-06 at 19:51 +0000, Nathaniel Smith wrote:
> On Wed, Mar 5, 2014 at 4:45 PM, Sebastian Berg
> <sebastian at sipsolutions.net> wrote:
> >
> > Hi all,
> >
> > in Pull Request https://github.com/numpy/numpy/pull/3864 Neol Dawe
> > suggested adding new parameters to our `cov` and `corrcoef` functions to
> > implement weights, which already exists for `average` (the PR still
> > needs to be adapted).
> >
> > The idea right now would be to add a `weights` and a `frequencies`
> > keyword arguments to these functions.
> >
> > In more detail: The situation is a bit more complex for `cov` and
> > `corrcoef` than `average`, because there are different types of weights.
> > The current plan would be to add two new keyword arguments:
> >   * weights: Uncertainty weights which causes `N` to be recalculated
> >     accordingly (This is R's `cov.wt` default I believe).
> >   * frequencies: When given, `N = sum(frequencies)` and the values
> >     are weighted by their frequency.
> 
> I don't understand this description at all. One them recalculates N,
> and the other sets N according to some calculation?
> 
> Is there a standard reference on how these are supposed to be
> interpreted? When you talk about per-value uncertainties, I start
> imagining that we're trying to estimate a population covariance given
> a set of samples each corrupted by independent measurement noise, and
> then there's some natural hierarchical Bayesian model one could write
> down and get an ML estimate of the latent covariance via empirical
> Bayes or something. But this requires a bunch of assumptions and is
> that really what we want to do? (Or maybe it collapses down into
> something simpler if the measurement noise is gaussian or something?)
> 

I had really hoped someone who knows this stuff very well would show
up ;).

I think these weights were uncertainties under gaussian assumption and
the other types of weights different, see `aweights` here:
http://www.stata.com/support/faqs/statistics/weights-and-summary-statistics/, but I did not check a statistics book or have one here right now (e.g. wikipedia is less than helpful).
Frankly unless there is some "obviously right" thing (for a
statistician), I would be careful add such new features. And while I
thought before that this might be the case, it isn't clear to me.

- Sebastian


> -n
> 





More information about the NumPy-Discussion mailing list