[SciPy-Dev] chi-square test for a contingency (R x C) table

Bruce Southey bsouthey at gmail.com
Wed Jun 2 14:10:12 EDT 2010


On 06/02/2010 11:26 AM, Neil Martinsen-Burrell wrote:
> On 2010-06-02 11:02 , Bruce Southey wrote:
>> On 06/02/2010 09:37 AM, josef.pktd at gmail.com wrote:
>>> On Wed, Jun 2, 2010 at 8:24 AM, Neil 
>>> Martinsen-Burrell<nmb at wartburg.edu>  wrote:
>>>
>>>> On 2010-06-01 23:28 , Warren Weckesser wrote:
>>>>
>>>>> I've been digging into some basic statistics recently, and 
>>>>> developed the
>>>>> following function for applying the chi-square test to a contingency
>>>>> table.  Does something like this already exist in scipy.stats? If 
>>>>> not,
>>>>> any objects to adding it?  (Tests are already written :)
>>>>>
>>>> Something like this would be great in scipy.stats since I end up doing
>>>> the exact same thing by hand whenever I grade introductory statistics
>>>> exams.  Thanks for writing this!
>>>>
>> You might find SAS helpful:
>> http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#/documentation/cdl/en/procstat/63104/HTML/default/freq_toc.htm 
>>
>
> I'm not sure what you mean by this.  I have no problem performing the 
> test, it's just inconvenient that it isn't already a part of scipy.stats
Because this is the main SAS procedure that does contingency tables and 
tests. There is useful information as well.
>
>> However, this code is the chi-squared test part as SAS will compute the
>> actual cell numbers. Also an extension to scipy.stats.chisquare() so we
>> can not have both functions.
>
> Again, I don't understand what you mean that we can't have both 
> functions?  I believe (from a statistics teacher's point of view) that 
> the Chi-Squared goodness of fit test (which is stats.chisquare) is a 
> different beast from the Chi-Square test for independence (which is 
> stats.chisquare_contingency).  The fact that the distribution of the 
> test statistic is the same should not tempt us to put them into the 
> same function.
Please read scipy.stats.chisquare() because scipy.stats.chisquare() is 
the 1-d case of yours.
Quote from the docstring:
    " The chi square test tests the null hypothesis that the categorical 
data
     has the given frequencies."
Also go the web site provided in the docstring.

By default you get the expected frequencies but you can also put in your 
own using the f_exp variable. You could do the same in your code.
>
>> Really this should be combined with fisher.py in ticket 956:
>> http://projects.scipy.org/scipy/ticket/956
>
> Wow, apparently I have lots of disagreements today, but I don't think 
> that this should be combined with Fisher's Exact test.  (I would like 
> to see that ticket mature to the point where it can be added to 
> scipy.stats.)  I like the functions in scipy.stats to correspond in a 
> one-to-one manner with the statistical tests.  I think that the docs 
> should "See Also" the appropriate exact (and non-parametric) tests, 
> but I think that one function/one test is a good rule.  This is 
> particularly true for people (like me) who would like to someday be 
> able to use scipy.stats in a pedagogical context.
>
> -Neil
I don't see any 'disagreements' rather just different ways to do things 
and identifying areas that need to be addressed for more general use.

I accept your opinion as here only because these functions only accept 
the digested (ie summarized) data.

Bruce



More information about the SciPy-Dev mailing list