After some of the recent list mails advising checking results against R, I checked this code against the R t.test() calls. I hadn't bothered before since I had checked the t statistic and df values, and they were right, and as I was just calling distribution.t.sf() like every other t-test function assumed I should be OK. However, it turns out that the statistics book I used as a source had the df function wrong, believe it or not. Hence I closed the old PR and opened a new one with a fix and more tests here: https://github.com/scipy/scipy/pull/235 Apparently the thing to do is to beg for your PRs to go into the upcoming version. I'm certainly willing to do so, but I was wondering if the scipy devs respond better to other inveiglements - say, threats of world destruction, under the table 'gifts', favors of a lascivious nature, or cake. All proposals considered. Cheers, Gavin On 5/23/2012 12:38 AM, Junkshops wrote:
Hi all,
I've issued a pull request (http://github.com/scipy/scipy/pull/227) for a version of scipy/stats/stats.py with the following changes:
1) Adds a method for running a t-test with unequal or unknown population variances. ttest_ind assumes that population variances are equal. 2) Refactored common code in the 4 t-test methods into shared methods. 3) This section of code, which has variations in multiple methods, looks buggy to me:
d = np.mean(a,axis) - np.mean(b,axis) svar = ((n1-1)*v1+(n2-1)*v2) / float(df)
t = d/np.sqrt(svar*(1.0/n1 + 1.0/n2)) t = np.where((d==0)*(svar==0), 1.0, t) #define t=0/0 = 0, identical means
Surely if d=0, regardless of svar, t should be set to 0, not 1. Similarly, if svar = 0 then both variances are zero (assuming that each data set has at least 2 points - perhaps there should be a check for this?). In that case, if d==0 t should be zero. Otherwise, t should be +/-inf. Hence, (svar==0) is redundant.
Accordingly, I've changed the lines in all functions to be the equivalent of
t = np.where((d==0), 0.0, t)
This handles the case where both d and svar are 0. The respective tests have also been changed.
If I'm missing something here, please let me know.
Thanks, Gavin