[SciPy-User] scipy.stats one-sided two-sided less, greater, signed ?

Mon Jun 13 16:38:12 EDT 2011

On Mon, Jun 13, 2011 at 2:19 PM, Ralf Gommers
<ralf.gommers at googlemail.com> wrote:
>
>
> On Mon, Jun 13, 2011 at 8:56 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>>
>> On Mon, Jun 13, 2011 at 11:36 AM, Ralf Gommers
>> <ralf.gommers at googlemail.com> wrote:
>> >
>> >
>> > On Mon, Jun 13, 2011 at 6:18 PM, Bruce Southey <bsouthey at gmail.com>
>> > wrote:
>> >>
>> >> On 06/13/2011 02:46 AM, Ralf Gommers wrote:
>> >>
>> >> On Mon, Jun 13, 2011 at 3:50 AM, Bruce Southey <bsouthey at gmail.com>
>> >> wrote:
>> >>>
>> >>> On Sun, Jun 12, 2011 at 7:52 PM,  <josef.pktd at gmail.com> wrote:
>> >>> >
>> >>> > All the p-values agree for the alternatives two-sided, less, and
>> >>> > greater, the odds ratio is defined differently as explained pretty
>> >>> > well in the docstring.
>> >>> >
>> >>> > Josef
>> >>> Yes, but you said to follow BOTH R and SAS - that means providing all
>> >>> three:
>> >>>
>> >>> The FREQ Procedure
>> >>>
>> >>> Table of Exposure by Response
>> >>>
>> >>> Exposure     Response
>> >>>
>> >>> Frequency|       0|       1|  Total
>> >>> ---------+--------+--------+
>> >>>       0 |    190 |    800 |    990
>> >>> ---------+--------+--------+
>> >>>       1 |    200 |    900 |   1100
>> >>> ---------+--------+--------+
>> >>> Total         390     1700     2090
>> >>>
>> >>>
>> >>> Statistics for Table of Exposure by Response
>> >>>
>> >>> Statistic                     DF       Value      Prob
>> >>> ------------------------------------------------------
>> >>> Chi-Square                     1      0.3503    0.5540
>> >>> Likelihood Ratio Chi-Square    1      0.3500    0.5541
>> >>> Continuity Adj. Chi-Square     1      0.2869    0.5922
>> >>> Mantel-Haenszel Chi-Square     1      0.3501    0.5541
>> >>> Phi Coefficient                       0.0129
>> >>> Contingency Coefficient               0.0129
>> >>> Cramer's V                            0.0129
>> >>>
>> >>>
>> >>>     Pearson Chi-Square Test
>> >>> ----------------------------------
>> >>> Chi-Square                  0.3503
>> >>> DF                               1
>> >>> Asymptotic Pr >  ChiSq      0.5540
>> >>> Exact      Pr >= ChiSq      0.5741
>> >>>
>> >>>
>> >>>       Fisher's Exact Test
>> >>> ----------------------------------
>> >>> Cell (1,1) Frequency (F)       190
>> >>> Left-sided Pr <= F          0.7416
>> >>> Right-sided Pr >= F         0.2960
>> >>>
>> >>> Table Probability (P)       0.0376
>> >>> Two-sided Pr <= P           0.5741
>> >>>
>> >>> Sample Size = 2090
>> >>>
>> >>> Thus providing all three is the correct answer.
>> >>>
>> >> Eh, we do. The interface is the same as that of R, and all three of
>> >> {two-sided, less, greater} are extensively checked against R. It looks
>> >> like
>> >> you are reacting to only one statement Josef made to explain his
>> >> interpretation of less/greater. Please check the actual commit and then
>> >> comment if you see anything wrong.
>> >>
>> >> Ralf
>> >>
>> >>
>> >> _______________________________________________
>> >> SciPy-User mailing list
>> >> SciPy-User at scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>> >>
>> >> I have looked at it (again) and the comments still stand:
>> >> A user should not have to read a statistical book and then the code to
>> >> figure out what was actually implemented here.  So I do strongly object
>> >> to
>> >> Josef's statements as you just can not interpret Fisher's test in that
>> >> way.
>> >> Just look at how SAS presents the results as should give a huge clue
>> >> that
>> >> the two-sided tests is different than the other one-sided tests.
>> >
>> > Okay, I am pasting the entire docstring below. You seem to know a lot
>> > about
>> > this, so can you please suggest wording for things to be added/changed?
>> >
>> > I have compared with the R doc
>> > (http://rss.acs.unt.edu/Rdoc/library/stats/html/fisher.test.html), and
>> > that's not much different as far as I can tell.
>> >
>> > Thanks a lot,
>> > Ralf
>>
>> You are assuming a lot by saying that I even agree with  R documentation
>> :-)
>
> Didn't assume that.
>
>>
>> If you noticed, I never referred to it because it is not correct
>> compared SAS and other sources given.
>>
>>
>> >
>> >
>> >     Performs a Fisher exact test on a 2x2 contingency table.
>> >
>> >     Parameters
>> >     ----------
>> >     table : array_like of ints
>> >         A 2x2 contingency table.  Elements should be non-negative
>> > integers.
>> >     alternative : {'two-sided', 'less', 'greater'}, optional
>> >         Which alternative hypothesis to the null hypothesis the test
>> > uses.
>> >         Default is 'two-sided'.
>> >
>> >     Returns
>> >     -------
>> >     oddsratio : float
>> >         This is prior odds ratio and not a posterior estimate.
>> >     p_value : float
>> >         P-value, the probability of obtaining a distribution at least as
>> >         extreme as the one that was actually observed, assuming that the
>> >         null hypothesis is true.
>> >
>> >     See Also
>> >     --------
>> >     chisquare : inexact alternative that can be used when sample sizes
>> > are
>> >                 large enough.
>> >
>> >     Notes
>> >     -----
>> >     The calculated odds ratio is different from the one R uses. In R
>> > language,
>> >     this implementation returns the (more common) "unconditional Maximum
>> >     Likelihood Estimate", while R uses the "conditional Maximum
>> > Likelihood
>> >     Estimate".
>> >
>> >     For tables with large numbers the (inexact) `chisquare` test can
>> > also be
>> >     used.
>> >
>> >     Examples
>> >     --------
>> >     Say we spend a few days counting whales and sharks in the Atlantic
>> > and
>> >     Indian oceans. In the Atlantic ocean we find 6 whales and 1 shark,
>> > in
>> > the
>> >     Indian ocean 2 whales and 5 sharks. Then our contingency table is::
>> >
>> >                 Atlantic  Indian
>> >         whales     8        2
>> >         sharks     1        5
>> >
>> >     We use this table to find the p-value:
>> >
>> >     >>> oddsratio, pvalue = stats.fisher_exact([[8, 2], [1, 5]])
>> >     >>> pvalue
>> >     0.0349...
>> >
>> >     The probability that we would observe this or an even more
>> > imbalanced
>> > ratio
>> >     by chance is about 3.5%.  A commonly used significance level is 5%,
>> > if
>> > we
>> >     adopt that we can therefore conclude that our observed imbalance is
>> >     statistically significant; whales prefer the Atlantic while sharks
>> > prefer
>> >     the Indian ocean.
>> >
>> >
>> >
>> > _______________________________________________
>> > SciPy-User mailing list
>> > SciPy-User at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >
>> >
>>
>> So did two of the six whales give birth?
>>
>> That docstring is incomplete and probably does not meet the Scipy
>> documentation guidelines because not everything is explained.
>
> Yes, which ones do? It's a lot better than it was, and more complete than
> your average scipy docstring. Same for the tests. So I'm just going to be
> satisfied with the bug fix and added functionality.
>
>> It is not a small amount of effort to clean this up to be technically
>> correct -  0.0349 is not 'about 3.5%'.
>
> Note the ellipsis? It's also not exactly 0.0349. So I fail to see the
> problem. There are bigger fish to fry.
>
> Ralf
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>

Correct as this is not the place to teach statistics especially p-values.
Bruce