[Numpy-discussion] weighted mean; weighted standard error of the mean (sem)

Christopher Barrington-Leigh cpblpublic+numpy at gmail.com
Fri Sep 10 13:58:08 EDT 2010


Interesting. Thanks Erin, Josef and Keith.

There is a nice article on this at
http://www.stata.com/support/faqs/stat/supweight.html. In my case, the
model I've in mind is to assume that the expected value (mean) is the same
for each sample, and that the weights are/should be normalised, whence a
consistent estimator for sem is straightforward (if second moments can
be assumed to be
well behaved?). I suspect that this (survey-like) case is also one of
the two most standard/most common
expression that people want when they ask for an s.e. of the mean for
a weighted dataset. The other would be when the weights are not to be
normalised, but represent standard errors on the individual
measurements.

Surely what one wants, in the end, is a single function (or whatever)
called mean or sem which calculates different values for different
specified choices of model (assumptions)? And where possible that it has a
default model in mind for when none is specified?

thanks,
Chris

On Thu, Sep 9, 2010 at 9:13 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
> >>>> ma.std()
> >>   3.2548815339711115
> >
> > or maybe `w` reflects an underlying sampling scheme and you should
> > sample in the bootstrap according to w ?
>
> Yes....
>
> > if weighted average is a sum of linear functions of (normal)
> > distributed random variables, it still depends on whether the
> > individual observations have the same or different variances, e.g.
> > http://en.wikipedia.org/wiki/Weighted_mean#Statistical_properties
>
> ...lots of possibilities. As you have shown the problem is not yet
> well defined. Not much specification needed for the weighted mean,
> lots needed for the standard error of the weighted mean.
>
> > What I can't figure out is whether if you assume simga_i = sigma for
> > all observation i, do we use the weighted or the unweighted variance
> > to get an estimate of sigma. And I'm not able to replicate with simple
> > calculations what statsmodels.WLS gives me.
>
> My guess: if all you want is sigma of the individual i and you know
> sigma is the same for all i, then I suppose you don't care about the
> weight.
>
> >
> > ???
> >
> > Josef



More information about the NumPy-Discussion mailing list