[SciPy-User] scipy.stats one-sided two-sided less, greater, signed ?

Ralf Gommers ralf.gommers at googlemail.com
Mon Jun 13 12:36:31 EDT 2011


On Mon, Jun 13, 2011 at 6:18 PM, Bruce Southey <bsouthey at gmail.com> wrote:

>  On 06/13/2011 02:46 AM, Ralf Gommers wrote:
>
> On Mon, Jun 13, 2011 at 3:50 AM, Bruce Southey <bsouthey at gmail.com> wrote:
>
>>  On Sun, Jun 12, 2011 at 7:52 PM,  <josef.pktd at gmail.com> wrote:
>> >
>> > All the p-values agree for the alternatives two-sided, less, and
>> > greater, the odds ratio is defined differently as explained pretty
>> > well in the docstring.
>> >
>> > Josef
>>  Yes, but you said to follow BOTH R and SAS - that means providing all
>> three:
>>
>> The FREQ Procedure
>>
>> Table of Exposure by Response
>>
>> Exposure     Response
>>
>> Frequency|       0|       1|  Total
>> ---------+--------+--------+
>>       0 |    190 |    800 |    990
>> ---------+--------+--------+
>>       1 |    200 |    900 |   1100
>> ---------+--------+--------+
>> Total         390     1700     2090
>>
>>
>> Statistics for Table of Exposure by Response
>>
>> Statistic                     DF       Value      Prob
>> ------------------------------------------------------
>> Chi-Square                     1      0.3503    0.5540
>> Likelihood Ratio Chi-Square    1      0.3500    0.5541
>> Continuity Adj. Chi-Square     1      0.2869    0.5922
>> Mantel-Haenszel Chi-Square     1      0.3501    0.5541
>> Phi Coefficient                       0.0129
>> Contingency Coefficient               0.0129
>> Cramer's V                            0.0129
>>
>>
>>     Pearson Chi-Square Test
>> ----------------------------------
>> Chi-Square                  0.3503
>> DF                               1
>> Asymptotic Pr >  ChiSq      0.5540
>> Exact      Pr >= ChiSq      0.5741
>>
>>
>>       Fisher's Exact Test
>> ----------------------------------
>> Cell (1,1) Frequency (F)       190
>> Left-sided Pr <= F          0.7416
>> Right-sided Pr >= F         0.2960
>>
>> Table Probability (P)       0.0376
>> Two-sided Pr <= P           0.5741
>>
>> Sample Size = 2090
>>
>> Thus providing all three is the correct answer.
>>
>>   Eh, we do. The interface is the same as that of R, and all three of
> {two-sided, less, greater} are extensively checked against R. It looks like
> you are reacting to only one statement Josef made to explain his
> interpretation of less/greater. Please check the actual commit and then
> comment if you see anything wrong.
>
> Ralf
>
>
> _______________________________________________
> SciPy-User mailing listSciPy-User at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user
>
>  I have looked at it (again) and the comments still stand:
> A user should not have to read a statistical book and then the code to
> figure out what was actually implemented here.  So I do strongly object to
> Josef's statements as you just can not interpret Fisher's test in that way.
> Just look at how SAS presents the results as should give a huge clue that
> the two-sided tests is different than the other one-sided tests.
>

Okay, I am pasting the entire docstring below. You seem to know a lot about
this, so can you please suggest wording for things to be added/changed?

I have compared with the R doc (
http://rss.acs.unt.edu/Rdoc/library/stats/html/fisher.test.html), and that's
not much different as far as I can tell.

Thanks a lot,
Ralf


    Performs a Fisher exact test on a 2x2 contingency table.

    Parameters
    ----------
    table : array_like of ints
        A 2x2 contingency table.  Elements should be non-negative integers.
    alternative : {'two-sided', 'less', 'greater'}, optional
        Which alternative hypothesis to the null hypothesis the test uses.
        Default is 'two-sided'.

    Returns
    -------
    oddsratio : float
        This is prior odds ratio and not a posterior estimate.
    p_value : float
        P-value, the probability of obtaining a distribution at least as
        extreme as the one that was actually observed, assuming that the
        null hypothesis is true.

    See Also
    --------
    chisquare : inexact alternative that can be used when sample sizes are
                large enough.

    Notes
    -----
    The calculated odds ratio is different from the one R uses. In R
language,
    this implementation returns the (more common) "unconditional Maximum
    Likelihood Estimate", while R uses the "conditional Maximum Likelihood
    Estimate".

    For tables with large numbers the (inexact) `chisquare` test can also be
    used.

    Examples
    --------
    Say we spend a few days counting whales and sharks in the Atlantic and
    Indian oceans. In the Atlantic ocean we find 6 whales and 1 shark, in
the
    Indian ocean 2 whales and 5 sharks. Then our contingency table is::

                Atlantic  Indian
        whales     8        2
        sharks     1        5

    We use this table to find the p-value:

    >>> oddsratio, pvalue = stats.fisher_exact([[8, 2], [1, 5]])
    >>> pvalue
    0.0349...

    The probability that we would observe this or an even more imbalanced
ratio
    by chance is about 3.5%.  A commonly used significance level is 5%, if
we
    adopt that we can therefore conclude that our observed imbalance is
    statistically significant; whales prefer the Atlantic while sharks
prefer
    the Indian ocean.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20110613/5f91e387/attachment.html>


More information about the SciPy-User mailing list