Adding t-test with unequal variances to stats.py

Hi all, I've issued a pull request (http://github.com/scipy/scipy/pull/227) for a version of scipy/stats/stats.py with the following changes: 1) Adds a method for running a t-test with unequal or unknown population variances. ttest_ind assumes that population variances are equal. 2) Refactored common code in the 4 t-test methods into shared methods. 3) This section of code, which has variations in multiple methods, looks buggy to me: d = np.mean(a,axis) - np.mean(b,axis) svar = ((n1-1)*v1+(n2-1)*v2) / float(df) t = d/np.sqrt(svar*(1.0/n1 + 1.0/n2)) t = np.where((d==0)*(svar==0), 1.0, t) #define t=0/0 = 0, identical means Surely if d=0, regardless of svar, t should be set to 0, not 1. Similarly, if svar = 0 then both variances are zero (assuming that each data set has at least 2 points - perhaps there should be a check for this?). In that case, if d==0 t should be zero. Otherwise, t should be +/-inf. Hence, (svar==0) is redundant. Accordingly, I've changed the lines in all functions to be the equivalent of t = np.where((d==0), 0.0, t) This handles the case where both d and svar are 0. The respective tests have also been changed. If I'm missing something here, please let me know. Thanks, Gavin

After some of the recent list mails advising checking results against R, I checked this code against the R t.test() calls. I hadn't bothered before since I had checked the t statistic and df values, and they were right, and as I was just calling distribution.t.sf() like every other t-test function assumed I should be OK. However, it turns out that the statistics book I used as a source had the df function wrong, believe it or not. Hence I closed the old PR and opened a new one with a fix and more tests here: https://github.com/scipy/scipy/pull/235 Apparently the thing to do is to beg for your PRs to go into the upcoming version. I'm certainly willing to do so, but I was wondering if the scipy devs respond better to other inveiglements - say, threats of world destruction, under the table 'gifts', favors of a lascivious nature, or cake. All proposals considered. Cheers, Gavin On 5/23/2012 12:38 AM, Junkshops wrote:
Hi all,
I've issued a pull request (http://github.com/scipy/scipy/pull/227) for a version of scipy/stats/stats.py with the following changes:
1) Adds a method for running a t-test with unequal or unknown population variances. ttest_ind assumes that population variances are equal. 2) Refactored common code in the 4 t-test methods into shared methods. 3) This section of code, which has variations in multiple methods, looks buggy to me:
d = np.mean(a,axis) - np.mean(b,axis) svar = ((n1-1)*v1+(n2-1)*v2) / float(df)
t = d/np.sqrt(svar*(1.0/n1 + 1.0/n2)) t = np.where((d==0)*(svar==0), 1.0, t) #define t=0/0 = 0, identical means
Surely if d=0, regardless of svar, t should be set to 0, not 1. Similarly, if svar = 0 then both variances are zero (assuming that each data set has at least 2 points - perhaps there should be a check for this?). In that case, if d==0 t should be zero. Otherwise, t should be +/-inf. Hence, (svar==0) is redundant.
Accordingly, I've changed the lines in all functions to be the equivalent of
t = np.where((d==0), 0.0, t)
This handles the case where both d and svar are 0. The respective tests have also been changed.
If I'm missing something here, please let me know.
Thanks, Gavin

03.06.2012 08:13, Junkshops kirjoitti: [clip]
Apparently the thing to do is to beg for your PRs to go into the upcoming version. I'm certainly willing to do so, but I was wondering if the scipy devs respond better to other inveiglements - say, threats of world destruction, under the table 'gifts', favors of a lascivious nature, or cake. All proposals considered.
The only thing needed is someone (= *anyone*, this doesn't have to be a "scipy developer") with some familiarity in stats to look at it, and say: "I know this stuff, and looks correct to me." So far, it seems nobody has found the interest to chip in. However, having the comparison to R in this case may be enough by itself to make it clear even to non-experts that things work now as intended. We nevertheless do have a "blame list" of people who have promised to be available for scipy.stats on some time scale: https://github.com/scipy/scipy/blob/master/doc/MAINTAINERS.rst.txt -- Pauli Virtanen
participants (2)
-
Junkshops
-
Pauli Virtanen