[SciPy-User] scipy.stats one-sided two-sided less, greater, signed ?

Mon Jun 13 16:43:16 EDT 2011

On Mon, Jun 13, 2011 at 4:38 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> On Mon, Jun 13, 2011 at 2:19 PM, Ralf Gommers
> <ralf.gommers at googlemail.com> wrote:
>>
>>
>> On Mon, Jun 13, 2011 at 8:56 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>>>
>>> On Mon, Jun 13, 2011 at 11:36 AM, Ralf Gommers
>>> <ralf.gommers at googlemail.com> wrote:
>>> >
>>> >
>>> > On Mon, Jun 13, 2011 at 6:18 PM, Bruce Southey <bsouthey at gmail.com>
>>> > wrote:
>>> >>
>>> >> On 06/13/2011 02:46 AM, Ralf Gommers wrote:
>>> >>
>>> >> On Mon, Jun 13, 2011 at 3:50 AM, Bruce Southey <bsouthey at gmail.com>
>>> >> wrote:
>>> >>>
>>> >>> On Sun, Jun 12, 2011 at 7:52 PM,  <josef.pktd at gmail.com> wrote:
>>> >>> >
>>> >>> > All the p-values agree for the alternatives two-sided, less, and
>>> >>> > greater, the odds ratio is defined differently as explained pretty
>>> >>> > well in the docstring.
>>> >>> >
>>> >>> > Josef
>>> >>> Yes, but you said to follow BOTH R and SAS - that means providing all
>>> >>> three:
>>> >>>
>>> >>> The FREQ Procedure
>>> >>>
>>> >>> Table of Exposure by Response
>>> >>>
>>> >>> Exposure     Response
>>> >>>
>>> >>> Frequency|       0|       1|  Total
>>> >>> ---------+--------+--------+
>>> >>>       0 |    190 |    800 |    990
>>> >>> ---------+--------+--------+
>>> >>>       1 |    200 |    900 |   1100
>>> >>> ---------+--------+--------+
>>> >>> Total         390     1700     2090
>>> >>>
>>> >>>
>>> >>> Statistics for Table of Exposure by Response
>>> >>>
>>> >>> Statistic                     DF       Value      Prob
>>> >>> ------------------------------------------------------
>>> >>> Chi-Square                     1      0.3503    0.5540
>>> >>> Likelihood Ratio Chi-Square    1      0.3500    0.5541
>>> >>> Continuity Adj. Chi-Square     1      0.2869    0.5922
>>> >>> Mantel-Haenszel Chi-Square     1      0.3501    0.5541
>>> >>> Phi Coefficient                       0.0129
>>> >>> Contingency Coefficient               0.0129
>>> >>> Cramer's V                            0.0129
>>> >>>
>>> >>>
>>> >>>     Pearson Chi-Square Test
>>> >>> ----------------------------------
>>> >>> Chi-Square                  0.3503
>>> >>> DF                               1
>>> >>> Asymptotic Pr >  ChiSq      0.5540
>>> >>> Exact      Pr >= ChiSq      0.5741
>>> >>>
>>> >>>
>>> >>>       Fisher's Exact Test
>>> >>> ----------------------------------
>>> >>> Cell (1,1) Frequency (F)       190
>>> >>> Left-sided Pr <= F          0.7416
>>> >>> Right-sided Pr >= F         0.2960
>>> >>>
>>> >>> Table Probability (P)       0.0376
>>> >>> Two-sided Pr <= P           0.5741
>>> >>>
>>> >>> Sample Size = 2090
>>> >>>
>>> >>> Thus providing all three is the correct answer.
>>> >>>
>>> >> Eh, we do. The interface is the same as that of R, and all three of
>>> >> {two-sided, less, greater} are extensively checked against R. It looks
>>> >> like
>>> >> you are reacting to only one statement Josef made to explain his
>>> >> interpretation of less/greater. Please check the actual commit and then
>>> >> comment if you see anything wrong.
>>> >>
>>> >> Ralf
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> SciPy-User mailing list
>>> >> SciPy-User at scipy.org
>>> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >>
>>> >> I have looked at it (again) and the comments still stand:
>>> >> A user should not have to read a statistical book and then the code to
>>> >> figure out what was actually implemented here.  So I do strongly object
>>> >> to
>>> >> Josef's statements as you just can not interpret Fisher's test in that
>>> >> way.
>>> >> Just look at how SAS presents the results as should give a huge clue
>>> >> that
>>> >> the two-sided tests is different than the other one-sided tests.
>>> >
>>> > Okay, I am pasting the entire docstring below. You seem to know a lot
>>> > about
>>> > this, so can you please suggest wording for things to be added/changed?
>>> >
>>> > I have compared with the R doc
>>> > (http://rss.acs.unt.edu/Rdoc/library/stats/html/fisher.test.html), and
>>> > that's not much different as far as I can tell.
>>> >
>>> > Thanks a lot,
>>> > Ralf
>>>
>>> You are assuming a lot by saying that I even agree with  R documentation
>>> :-)
>>
>> Didn't assume that.
>>
>>>
>>> If you noticed, I never referred to it because it is not correct
>>> compared SAS and other sources given.
>>>
>>>
>>> >
>>> >
>>> >     Performs a Fisher exact test on a 2x2 contingency table.
>>> >
>>> >     Parameters
>>> >     ----------
>>> >     table : array_like of ints
>>> >         A 2x2 contingency table.  Elements should be non-negative
>>> > integers.
>>> >     alternative : {'two-sided', 'less', 'greater'}, optional
>>> >         Which alternative hypothesis to the null hypothesis the test
>>> > uses.
>>> >         Default is 'two-sided'.
>>> >
>>> >     Returns
>>> >     -------
>>> >     oddsratio : float
>>> >         This is prior odds ratio and not a posterior estimate.
>>> >     p_value : float
>>> >         P-value, the probability of obtaining a distribution at least as
>>> >         extreme as the one that was actually observed, assuming that the
>>> >         null hypothesis is true.
>>> >
>>> >     See Also
>>> >     --------
>>> >     chisquare : inexact alternative that can be used when sample sizes
>>> > are
>>> >                 large enough.
>>> >
>>> >     Notes
>>> >     -----
>>> >     The calculated odds ratio is different from the one R uses. In R
>>> > language,
>>> >     this implementation returns the (more common) "unconditional Maximum
>>> >     Likelihood Estimate", while R uses the "conditional Maximum
>>> > Likelihood
>>> >     Estimate".
>>> >
>>> >     For tables with large numbers the (inexact) `chisquare` test can
>>> > also be
>>> >     used.
>>> >
>>> >     Examples
>>> >     --------
>>> >     Say we spend a few days counting whales and sharks in the Atlantic
>>> > and
>>> >     Indian oceans. In the Atlantic ocean we find 6 whales and 1 shark,
>>> > in
>>> > the
>>> >     Indian ocean 2 whales and 5 sharks. Then our contingency table is::
>>> >
>>> >                 Atlantic  Indian
>>> >         whales     8        2
>>> >         sharks     1        5
>>> >
>>> >     We use this table to find the p-value:
>>> >
>>> >     >>> oddsratio, pvalue = stats.fisher_exact([[8, 2], [1, 5]])
>>> >     >>> pvalue
>>> >     0.0349...
>>> >
>>> >     The probability that we would observe this or an even more
>>> > imbalanced
>>> > ratio
>>> >     by chance is about 3.5%.  A commonly used significance level is 5%,
>>> > if
>>> > we
>>> >     adopt that we can therefore conclude that our observed imbalance is
>>> >     statistically significant; whales prefer the Atlantic while sharks
>>> > prefer
>>> >     the Indian ocean.
>>> >
>>> >
>>> >
>>> > _______________________________________________
>>> > SciPy-User mailing list
>>> > SciPy-User at scipy.org
>>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >
>>> >
>>>
>>> So did two of the six whales give birth?
>>>
>>> That docstring is incomplete and probably does not meet the Scipy
>>> documentation guidelines because not everything is explained.
>>
>> Yes, which ones do? It's a lot better than it was, and more complete than
>> your average scipy docstring. Same for the tests. So I'm just going to be
>> satisfied with the bug fix and added functionality.
>>
>>> It is not a small amount of effort to clean this up to be technically
>>> correct -  0.0349 is not 'about 3.5%'.
>>
>> Note the ellipsis? It's also not exactly 0.0349. So I fail to see the
>> problem. There are bigger fish to fry.
>>
>> Ralf
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>
> Correct as this is not the place to teach statistics especially p-values.

:)
Josef
(I learned a lot on the mailing lists)

> Bruce
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>