[SciPy-User] scipy.stats one-sided two-sided less, greater, signed ?
Ralf Gommers
ralf.gommers at googlemail.com
Mon Jun 13 12:36:31 EDT 2011
On Mon, Jun 13, 2011 at 6:18 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> On 06/13/2011 02:46 AM, Ralf Gommers wrote:
>
> On Mon, Jun 13, 2011 at 3:50 AM, Bruce Southey <bsouthey at gmail.com> wrote:
>
>> On Sun, Jun 12, 2011 at 7:52 PM, <josef.pktd at gmail.com> wrote:
>> >
>> > All the p-values agree for the alternatives two-sided, less, and
>> > greater, the odds ratio is defined differently as explained pretty
>> > well in the docstring.
>> >
>> > Josef
>> Yes, but you said to follow BOTH R and SAS - that means providing all
>> three:
>>
>> The FREQ Procedure
>>
>> Table of Exposure by Response
>>
>> Exposure Response
>>
>> Frequency| 0| 1| Total
>> ---------+--------+--------+
>> 0 | 190 | 800 | 990
>> ---------+--------+--------+
>> 1 | 200 | 900 | 1100
>> ---------+--------+--------+
>> Total 390 1700 2090
>>
>>
>> Statistics for Table of Exposure by Response
>>
>> Statistic DF Value Prob
>> ------------------------------------------------------
>> Chi-Square 1 0.3503 0.5540
>> Likelihood Ratio Chi-Square 1 0.3500 0.5541
>> Continuity Adj. Chi-Square 1 0.2869 0.5922
>> Mantel-Haenszel Chi-Square 1 0.3501 0.5541
>> Phi Coefficient 0.0129
>> Contingency Coefficient 0.0129
>> Cramer's V 0.0129
>>
>>
>> Pearson Chi-Square Test
>> ----------------------------------
>> Chi-Square 0.3503
>> DF 1
>> Asymptotic Pr > ChiSq 0.5540
>> Exact Pr >= ChiSq 0.5741
>>
>>
>> Fisher's Exact Test
>> ----------------------------------
>> Cell (1,1) Frequency (F) 190
>> Left-sided Pr <= F 0.7416
>> Right-sided Pr >= F 0.2960
>>
>> Table Probability (P) 0.0376
>> Two-sided Pr <= P 0.5741
>>
>> Sample Size = 2090
>>
>> Thus providing all three is the correct answer.
>>
>> Eh, we do. The interface is the same as that of R, and all three of
> {two-sided, less, greater} are extensively checked against R. It looks like
> you are reacting to only one statement Josef made to explain his
> interpretation of less/greater. Please check the actual commit and then
> comment if you see anything wrong.
>
> Ralf
>
>
> _______________________________________________
> SciPy-User mailing listSciPy-User at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user
>
> I have looked at it (again) and the comments still stand:
> A user should not have to read a statistical book and then the code to
> figure out what was actually implemented here. So I do strongly object to
> Josef's statements as you just can not interpret Fisher's test in that way.
> Just look at how SAS presents the results as should give a huge clue that
> the two-sided tests is different than the other one-sided tests.
>
Okay, I am pasting the entire docstring below. You seem to know a lot about
this, so can you please suggest wording for things to be added/changed?
I have compared with the R doc (
http://rss.acs.unt.edu/Rdoc/library/stats/html/fisher.test.html), and that's
not much different as far as I can tell.
Thanks a lot,
Ralf
Performs a Fisher exact test on a 2x2 contingency table.
Parameters
----------
table : array_like of ints
A 2x2 contingency table. Elements should be non-negative integers.
alternative : {'two-sided', 'less', 'greater'}, optional
Which alternative hypothesis to the null hypothesis the test uses.
Default is 'two-sided'.
Returns
-------
oddsratio : float
This is prior odds ratio and not a posterior estimate.
p_value : float
P-value, the probability of obtaining a distribution at least as
extreme as the one that was actually observed, assuming that the
null hypothesis is true.
See Also
--------
chisquare : inexact alternative that can be used when sample sizes are
large enough.
Notes
-----
The calculated odds ratio is different from the one R uses. In R
language,
this implementation returns the (more common) "unconditional Maximum
Likelihood Estimate", while R uses the "conditional Maximum Likelihood
Estimate".
For tables with large numbers the (inexact) `chisquare` test can also be
used.
Examples
--------
Say we spend a few days counting whales and sharks in the Atlantic and
Indian oceans. In the Atlantic ocean we find 6 whales and 1 shark, in
the
Indian ocean 2 whales and 5 sharks. Then our contingency table is::
Atlantic Indian
whales 8 2
sharks 1 5
We use this table to find the p-value:
>>> oddsratio, pvalue = stats.fisher_exact([[8, 2], [1, 5]])
>>> pvalue
0.0349...
The probability that we would observe this or an even more imbalanced
ratio
by chance is about 3.5%. A commonly used significance level is 5%, if
we
adopt that we can therefore conclude that our observed imbalance is
statistically significant; whales prefer the Atlantic while sharks
prefer
the Indian ocean.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20110613/5f91e387/attachment.html>
More information about the SciPy-User
mailing list