On Thu, Sep 9, 2010 at 11:32 PM, Keith Goodman
On Thu, Sep 9, 2010 at 8:07 PM, Keith Goodman
wrote: On Thu, Sep 9, 2010 at 7:22 PM, cpblpublic
wrote: I am looking for some reaally basic statistical tools. I have some sample data, some sample weights for those measurements, and I want to calculate a mean and a standard error of the mean.
How about using a bootstrap?
Array and weights:
a = np.arange(100) w = np.random.rand(100) w = w / w.sum()
Initialize:
n = 1000 ma = np.zeros(n)
Save mean of each bootstrap sample:
for i in range(n): ....: idx = np.random.randint(0, 100, 100) ....: ma[i] = np.dot(a[idx], w[idx]) ....: ....:
Error in mean:
ma.std() 3.854023384833674
Sanity check:
np.dot(w, a) 49.231127299096954 ma.mean() 49.111478821225127
Hmm...should w[idx] be renormalized to sum to one in each bootstrap sample?
Or perhaps there is no uncertainty about the weights, in which case:
for i in range(n): ....: idx = np.random.randint(0, 100, 100) ....: ma[i] = np.dot(a[idx], w) ....: ....: ma.std() 3.2548815339711115
or maybe `w` reflects an underlying sampling scheme and you should sample in the bootstrap according to w ? if weighted average is a sum of linear functions of (normal) distributed random variables, it still depends on whether the individual observations have the same or different variances, e.g. http://en.wikipedia.org/wiki/Weighted_mean#Statistical_properties What I can't figure out is whether if you assume simga_i = sigma for all observation i, do we use the weighted or the unweighted variance to get an estimate of sigma. And I'm not able to replicate with simple calculations what statsmodels.WLS gives me. ??? Josef
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion