[SciPy-User] Bug t-test for identical means with no variance?

Skipper Seabold jsseabold at gmail.com
Fri Jul 8 19:06:42 EDT 2011


On Fri, Jul 8, 2011 at 6:51 PM,  <josef.pktd at gmail.com> wrote:
> On Fri, Jul 8, 2011 at 6:41 PM, Skipper Seabold <jsseabold at gmail.com> wrote:
>> A ticket was filed [1] for ttest_ind (same issue with ttest_rel and
>> ttest_1samp) in the case of identical means and no variance.
>>
>> Same means, no variance
>>
>> d1 = np.ones(10)
>> d2 = np.array([1,1.])
>> stats.ttest_ind(d1,d2)
>> (1.0, 0.34089313230206009)
>>
>> Different means, no variance
>>
>> d1 = np.array([ 5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.])
>> d2 = np.array([ 2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.])
>> stats.ttest_ind(d1,d2)
>> (inf, 0.0)
>>
>> The first result doesn't make sense. In the code there are conflicting
>> notes (with each other and what the code does) for catching this
>>
>> https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L2873
>> https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L2963
>> https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L3044
>>
>> I think defining t = 0/0 to be 0 is the least wrong thing to do, but
>> certainly not t = 0/0 as 1, which gives an arbitrary p-value depending
>> on sample sizes. Is there an accepted definition for this case? Does
>> returning (nan, 1.0) make more sense?
>>
>> Skipper
>>
>> [1] http://projects.scipy.org/scipy/ticket/1475
>
> scipy dev mailing list "changes to stats t-tests" Dec 20, 2008 for the
> original change.
>
> If anyone finds a justification for the 0/0 case, ....
>

I have the same intuition as your initial thought. Setting it to 1
*seems* aribitrary. I'd have to think more now than I have time for
any justification though.

Apologies for not searching and making noise instead,

Skipper



More information about the SciPy-User mailing list