[SciPy-User] scipy.stats one-sided two-sided less, greater, signed ?
Bruce Southey
bsouthey at gmail.com
Mon Jun 13 16:38:12 EDT 2011
On Mon, Jun 13, 2011 at 2:19 PM, Ralf Gommers
<ralf.gommers at googlemail.com> wrote:
>
>
> On Mon, Jun 13, 2011 at 8:56 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>>
>> On Mon, Jun 13, 2011 at 11:36 AM, Ralf Gommers
>> <ralf.gommers at googlemail.com> wrote:
>> >
>> >
>> > On Mon, Jun 13, 2011 at 6:18 PM, Bruce Southey <bsouthey at gmail.com>
>> > wrote:
>> >>
>> >> On 06/13/2011 02:46 AM, Ralf Gommers wrote:
>> >>
>> >> On Mon, Jun 13, 2011 at 3:50 AM, Bruce Southey <bsouthey at gmail.com>
>> >> wrote:
>> >>>
>> >>> On Sun, Jun 12, 2011 at 7:52 PM, <josef.pktd at gmail.com> wrote:
>> >>> >
>> >>> > All the p-values agree for the alternatives two-sided, less, and
>> >>> > greater, the odds ratio is defined differently as explained pretty
>> >>> > well in the docstring.
>> >>> >
>> >>> > Josef
>> >>> Yes, but you said to follow BOTH R and SAS - that means providing all
>> >>> three:
>> >>>
>> >>> The FREQ Procedure
>> >>>
>> >>> Table of Exposure by Response
>> >>>
>> >>> Exposure Response
>> >>>
>> >>> Frequency| 0| 1| Total
>> >>> ---------+--------+--------+
>> >>> 0 | 190 | 800 | 990
>> >>> ---------+--------+--------+
>> >>> 1 | 200 | 900 | 1100
>> >>> ---------+--------+--------+
>> >>> Total 390 1700 2090
>> >>>
>> >>>
>> >>> Statistics for Table of Exposure by Response
>> >>>
>> >>> Statistic DF Value Prob
>> >>> ------------------------------------------------------
>> >>> Chi-Square 1 0.3503 0.5540
>> >>> Likelihood Ratio Chi-Square 1 0.3500 0.5541
>> >>> Continuity Adj. Chi-Square 1 0.2869 0.5922
>> >>> Mantel-Haenszel Chi-Square 1 0.3501 0.5541
>> >>> Phi Coefficient 0.0129
>> >>> Contingency Coefficient 0.0129
>> >>> Cramer's V 0.0129
>> >>>
>> >>>
>> >>> Pearson Chi-Square Test
>> >>> ----------------------------------
>> >>> Chi-Square 0.3503
>> >>> DF 1
>> >>> Asymptotic Pr > ChiSq 0.5540
>> >>> Exact Pr >= ChiSq 0.5741
>> >>>
>> >>>
>> >>> Fisher's Exact Test
>> >>> ----------------------------------
>> >>> Cell (1,1) Frequency (F) 190
>> >>> Left-sided Pr <= F 0.7416
>> >>> Right-sided Pr >= F 0.2960
>> >>>
>> >>> Table Probability (P) 0.0376
>> >>> Two-sided Pr <= P 0.5741
>> >>>
>> >>> Sample Size = 2090
>> >>>
>> >>> Thus providing all three is the correct answer.
>> >>>
>> >> Eh, we do. The interface is the same as that of R, and all three of
>> >> {two-sided, less, greater} are extensively checked against R. It looks
>> >> like
>> >> you are reacting to only one statement Josef made to explain his
>> >> interpretation of less/greater. Please check the actual commit and then
>> >> comment if you see anything wrong.
>> >>
>> >> Ralf
>> >>
>> >>
>> >> _______________________________________________
>> >> SciPy-User mailing list
>> >> SciPy-User at scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>> >>
>> >> I have looked at it (again) and the comments still stand:
>> >> A user should not have to read a statistical book and then the code to
>> >> figure out what was actually implemented here. So I do strongly object
>> >> to
>> >> Josef's statements as you just can not interpret Fisher's test in that
>> >> way.
>> >> Just look at how SAS presents the results as should give a huge clue
>> >> that
>> >> the two-sided tests is different than the other one-sided tests.
>> >
>> > Okay, I am pasting the entire docstring below. You seem to know a lot
>> > about
>> > this, so can you please suggest wording for things to be added/changed?
>> >
>> > I have compared with the R doc
>> > (http://rss.acs.unt.edu/Rdoc/library/stats/html/fisher.test.html), and
>> > that's not much different as far as I can tell.
>> >
>> > Thanks a lot,
>> > Ralf
>>
>> You are assuming a lot by saying that I even agree with R documentation
>> :-)
>
> Didn't assume that.
>
>>
>> If you noticed, I never referred to it because it is not correct
>> compared SAS and other sources given.
>>
>>
>> >
>> >
>> > Performs a Fisher exact test on a 2x2 contingency table.
>> >
>> > Parameters
>> > ----------
>> > table : array_like of ints
>> > A 2x2 contingency table. Elements should be non-negative
>> > integers.
>> > alternative : {'two-sided', 'less', 'greater'}, optional
>> > Which alternative hypothesis to the null hypothesis the test
>> > uses.
>> > Default is 'two-sided'.
>> >
>> > Returns
>> > -------
>> > oddsratio : float
>> > This is prior odds ratio and not a posterior estimate.
>> > p_value : float
>> > P-value, the probability of obtaining a distribution at least as
>> > extreme as the one that was actually observed, assuming that the
>> > null hypothesis is true.
>> >
>> > See Also
>> > --------
>> > chisquare : inexact alternative that can be used when sample sizes
>> > are
>> > large enough.
>> >
>> > Notes
>> > -----
>> > The calculated odds ratio is different from the one R uses. In R
>> > language,
>> > this implementation returns the (more common) "unconditional Maximum
>> > Likelihood Estimate", while R uses the "conditional Maximum
>> > Likelihood
>> > Estimate".
>> >
>> > For tables with large numbers the (inexact) `chisquare` test can
>> > also be
>> > used.
>> >
>> > Examples
>> > --------
>> > Say we spend a few days counting whales and sharks in the Atlantic
>> > and
>> > Indian oceans. In the Atlantic ocean we find 6 whales and 1 shark,
>> > in
>> > the
>> > Indian ocean 2 whales and 5 sharks. Then our contingency table is::
>> >
>> > Atlantic Indian
>> > whales 8 2
>> > sharks 1 5
>> >
>> > We use this table to find the p-value:
>> >
>> > >>> oddsratio, pvalue = stats.fisher_exact([[8, 2], [1, 5]])
>> > >>> pvalue
>> > 0.0349...
>> >
>> > The probability that we would observe this or an even more
>> > imbalanced
>> > ratio
>> > by chance is about 3.5%. A commonly used significance level is 5%,
>> > if
>> > we
>> > adopt that we can therefore conclude that our observed imbalance is
>> > statistically significant; whales prefer the Atlantic while sharks
>> > prefer
>> > the Indian ocean.
>> >
>> >
>> >
>> > _______________________________________________
>> > SciPy-User mailing list
>> > SciPy-User at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >
>> >
>>
>> So did two of the six whales give birth?
>>
>> That docstring is incomplete and probably does not meet the Scipy
>> documentation guidelines because not everything is explained.
>
> Yes, which ones do? It's a lot better than it was, and more complete than
> your average scipy docstring. Same for the tests. So I'm just going to be
> satisfied with the bug fix and added functionality.
>
>> It is not a small amount of effort to clean this up to be technically
>> correct - 0.0349 is not 'about 3.5%'.
>
> Note the ellipsis? It's also not exactly 0.0349. So I fail to see the
> problem. There are bigger fish to fry.
>
> Ralf
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
Correct as this is not the place to teach statistics especially p-values.
Bruce
More information about the SciPy-User
mailing list