Excerpts from cpblpublic's message of Thu Sep 09 22:22:05 -0400 2010:
I am looking for some reaally basic statistical tools. I have some sample data, some sample weights for those measurements, and I want to calculate a mean and a standard error of the mean.
Here are obvious places to look:
numpy scipy.stats statsmodels
It seems to me that numpy's "mean" and "average" functions have their names backwards. That is, often a mean is defined more generally than average, and includes the possibility of weighting, but in this case it is "average" that has a weights argument. Can these functions be merged/renamed/deprecated in the future? It's clear to me that "mean" should allow for weights.
None of these modules, above, offer standard error of the mean which incorporates weights. scipy.stats.sem() doesn't, and that's the closest thing. numpy's "var" doesn't allow weights. There aren't any weighted variances in the above modules.
Again, are there favoured codes for these functions? Can they be incorporated appropriately in the future?
Most immediately, I'd love to get code for weighted sem. I'll write it otherwise, but it might be crude and dumb... Thanks! Chris Barrington-Leigh UBC
This code below should do what you want. It is part of the stat sub-package of esutil http://code.google.com/p/esutil/ Hope this helps, Erin Scott Sheldon Brookhaven National Laboratory def wmom(arrin, weights_in, inputmean=None, calcerr=False, sdev=False): """ NAME: wmom() PURPOSE: Calculate the weighted mean, error, and optionally standard deviation of an input array. By default error is calculated assuming the weights are 1/err^2, but if you send calcerr=True this assumption is dropped and the error is determined from the weighted scatter. CALLING SEQUENCE: wmean,werr = wmom(arr, weights, inputmean=None, calcerr=False, sdev=False) INPUTS: arr: A numpy array or a sequence that can be converted. weights: A set of weights for each elements in array. OPTIONAL INPUTS: inputmean: An input mean value, around which them mean is calculated. calcerr=False: Calculate the weighted error. By default the error is calculated as 1/sqrt( weights.sum() ). If calcerr=True it is calculated as sqrt( (w**2 * (arr-mean)**2).sum() )/weights.sum() sdev=False: If True, also return the weighted standard deviation as a third element in the tuple. OUTPUTS: wmean, werr: A tuple of the weighted mean and error. If sdev=True the tuple will also contain sdev: wmean,werr,wsdev REVISION HISTORY: Converted from IDL: 2006-10-23. Erin Sheldon, NYU """ # no copy made if they are already arrays arr = numpy.array(arrin, ndmin=1, copy=False) # Weights is forced to be type double. All resulting calculations # will also be double weights = numpy.array(weights_in, ndmin=1, dtype='f8', copy=False) wtot = weights.sum() # user has input a mean value if inputmean is None: wmean = ( weights*arr ).sum()/wtot else: wmean=float(inputmean) # how should error be calculated? if calcerr: werr2 = ( weights**2 * (arr-wmean)**2 ).sum() werr = numpy.sqrt( werr2 )/wtot else: werr = 1.0/numpy.sqrt(wtot) # should output include the weighted standard deviation? if sdev: wvar = ( weights*(arr-wmean)**2 ).sum()/wtot wsdev = numpy.sqrt(wvar) return wmean,werr,wsdev else: return wmean,werr