Re: [Numpy-discussion] weighted mean; weighted standard error of the mean (sem)

10 Sep 2010


      Excerpts from cpblpublic's message of Thu Sep 09 22:22:05 -0400 2010:
...
I am looking for some reaally basic statistical tools. I have some
sample data, some sample weights for those measurements, and I want to
calculate a mean and a standard error of the mean.
Here are obvious places to look:
numpy
scipy.stats
statsmodels
It seems to me that numpy's "mean" and "average" functions have their
names backwards. That is, often a mean is defined more generally than
average, and includes the possibility of weighting, but in this case
it is "average" that has a weights argument. Can these functions be
merged/renamed/deprecated in the future?  It's clear to me that "mean"
should allow for weights.
None of these modules, above, offer standard error of the mean which
incorporates weights. scipy.stats.sem() doesn't, and that's the closest
thing. numpy's "var" doesn't allow weights.
There aren't any weighted variances in the above modules.
Again, are there favoured codes for these functions? Can they be
incorporated appropriately in the future?
Most immediately, I'd love to get code for weighted sem. I'll write it
otherwise, but it might be crude and dumb...
Thanks!
Chris Barrington-Leigh
UBC
This code below should do what you want.   It is part of the stat
sub-package of esutil  http://code.google.com/p/esutil/

Hope this helps,
Erin Scott Sheldon
Brookhaven National Laboratory

def wmom(arrin, weights_in, inputmean=None, calcerr=False, sdev=False):
    """
    NAME:
      wmom()
      
    PURPOSE:
      Calculate the weighted mean, error, and optionally standard deviation of
      an input array.  By default error is calculated assuming the weights are
      1/err^2, but if you send calcerr=True this assumption is dropped and the
      error is determined from the weighted scatter.

    CALLING SEQUENCE:
     wmean,werr = wmom(arr, weights, inputmean=None, calcerr=False, sdev=False)
    
    INPUTS:
      arr: A numpy array or a sequence that can be converted.
      weights: A set of weights for each elements in array.
    OPTIONAL INPUTS:
      inputmean: 
          An input mean value, around which them mean is calculated.
      calcerr=False: 
          Calculate the weighted error.  By default the error is calculated as
          1/sqrt( weights.sum() ).  If calcerr=True it is calculated as sqrt(
          (w**2 * (arr-mean)**2).sum() )/weights.sum()
      sdev=False: 
          If True, also return the weighted standard deviation as a third
          element in the tuple.

    OUTPUTS:
      wmean, werr: A tuple of the weighted mean and error. If sdev=True the
         tuple will also contain sdev: wmean,werr,wsdev

    REVISION HISTORY:
      Converted from IDL: 2006-10-23. Erin Sheldon, NYU

   """
    
    # no copy made if they are already arrays
    arr = numpy.array(arrin, ndmin=1, copy=False)
    
    # Weights is forced to be type double. All resulting calculations
    # will also be double
    weights = numpy.array(weights_in, ndmin=1, dtype='f8', copy=False)
  
    wtot = weights.sum()
        
    # user has input a mean value
    if inputmean is None:
        wmean = ( weights*arr ).sum()/wtot
    else:
        wmean=float(inputmean)

    # how should error be calculated?
    if calcerr:
        werr2 = ( weights**2 * (arr-wmean)**2 ).sum()
        werr = numpy.sqrt( werr2 )/wtot
    else:
        werr = 1.0/numpy.sqrt(wtot)

    # should output include the weighted standard deviation?
    if sdev:
        wvar = ( weights*(arr-wmean)**2 ).sum()/wtot
        wsdev = numpy.sqrt(wvar)
        return wmean,werr,wsdev
    else:
        return wmean,werr