[SciPy-User] scipy.stats one-sided two-sided less, greater, signed ?
josef.pktd at gmail.com
josef.pktd at gmail.com
Mon Jun 13 16:43:16 EDT 2011
On Mon, Jun 13, 2011 at 4:38 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> On Mon, Jun 13, 2011 at 2:19 PM, Ralf Gommers
> <ralf.gommers at googlemail.com> wrote:
>>
>>
>> On Mon, Jun 13, 2011 at 8:56 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>>>
>>> On Mon, Jun 13, 2011 at 11:36 AM, Ralf Gommers
>>> <ralf.gommers at googlemail.com> wrote:
>>> >
>>> >
>>> > On Mon, Jun 13, 2011 at 6:18 PM, Bruce Southey <bsouthey at gmail.com>
>>> > wrote:
>>> >>
>>> >> On 06/13/2011 02:46 AM, Ralf Gommers wrote:
>>> >>
>>> >> On Mon, Jun 13, 2011 at 3:50 AM, Bruce Southey <bsouthey at gmail.com>
>>> >> wrote:
>>> >>>
>>> >>> On Sun, Jun 12, 2011 at 7:52 PM, <josef.pktd at gmail.com> wrote:
>>> >>> >
>>> >>> > All the p-values agree for the alternatives two-sided, less, and
>>> >>> > greater, the odds ratio is defined differently as explained pretty
>>> >>> > well in the docstring.
>>> >>> >
>>> >>> > Josef
>>> >>> Yes, but you said to follow BOTH R and SAS - that means providing all
>>> >>> three:
>>> >>>
>>> >>> The FREQ Procedure
>>> >>>
>>> >>> Table of Exposure by Response
>>> >>>
>>> >>> Exposure Response
>>> >>>
>>> >>> Frequency| 0| 1| Total
>>> >>> ---------+--------+--------+
>>> >>> 0 | 190 | 800 | 990
>>> >>> ---------+--------+--------+
>>> >>> 1 | 200 | 900 | 1100
>>> >>> ---------+--------+--------+
>>> >>> Total 390 1700 2090
>>> >>>
>>> >>>
>>> >>> Statistics for Table of Exposure by Response
>>> >>>
>>> >>> Statistic DF Value Prob
>>> >>> ------------------------------------------------------
>>> >>> Chi-Square 1 0.3503 0.5540
>>> >>> Likelihood Ratio Chi-Square 1 0.3500 0.5541
>>> >>> Continuity Adj. Chi-Square 1 0.2869 0.5922
>>> >>> Mantel-Haenszel Chi-Square 1 0.3501 0.5541
>>> >>> Phi Coefficient 0.0129
>>> >>> Contingency Coefficient 0.0129
>>> >>> Cramer's V 0.0129
>>> >>>
>>> >>>
>>> >>> Pearson Chi-Square Test
>>> >>> ----------------------------------
>>> >>> Chi-Square 0.3503
>>> >>> DF 1
>>> >>> Asymptotic Pr > ChiSq 0.5540
>>> >>> Exact Pr >= ChiSq 0.5741
>>> >>>
>>> >>>
>>> >>> Fisher's Exact Test
>>> >>> ----------------------------------
>>> >>> Cell (1,1) Frequency (F) 190
>>> >>> Left-sided Pr <= F 0.7416
>>> >>> Right-sided Pr >= F 0.2960
>>> >>>
>>> >>> Table Probability (P) 0.0376
>>> >>> Two-sided Pr <= P 0.5741
>>> >>>
>>> >>> Sample Size = 2090
>>> >>>
>>> >>> Thus providing all three is the correct answer.
>>> >>>
>>> >> Eh, we do. The interface is the same as that of R, and all three of
>>> >> {two-sided, less, greater} are extensively checked against R. It looks
>>> >> like
>>> >> you are reacting to only one statement Josef made to explain his
>>> >> interpretation of less/greater. Please check the actual commit and then
>>> >> comment if you see anything wrong.
>>> >>
>>> >> Ralf
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> SciPy-User mailing list
>>> >> SciPy-User at scipy.org
>>> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >>
>>> >> I have looked at it (again) and the comments still stand:
>>> >> A user should not have to read a statistical book and then the code to
>>> >> figure out what was actually implemented here. So I do strongly object
>>> >> to
>>> >> Josef's statements as you just can not interpret Fisher's test in that
>>> >> way.
>>> >> Just look at how SAS presents the results as should give a huge clue
>>> >> that
>>> >> the two-sided tests is different than the other one-sided tests.
>>> >
>>> > Okay, I am pasting the entire docstring below. You seem to know a lot
>>> > about
>>> > this, so can you please suggest wording for things to be added/changed?
>>> >
>>> > I have compared with the R doc
>>> > (http://rss.acs.unt.edu/Rdoc/library/stats/html/fisher.test.html), and
>>> > that's not much different as far as I can tell.
>>> >
>>> > Thanks a lot,
>>> > Ralf
>>>
>>> You are assuming a lot by saying that I even agree with R documentation
>>> :-)
>>
>> Didn't assume that.
>>
>>>
>>> If you noticed, I never referred to it because it is not correct
>>> compared SAS and other sources given.
>>>
>>>
>>> >
>>> >
>>> > Performs a Fisher exact test on a 2x2 contingency table.
>>> >
>>> > Parameters
>>> > ----------
>>> > table : array_like of ints
>>> > A 2x2 contingency table. Elements should be non-negative
>>> > integers.
>>> > alternative : {'two-sided', 'less', 'greater'}, optional
>>> > Which alternative hypothesis to the null hypothesis the test
>>> > uses.
>>> > Default is 'two-sided'.
>>> >
>>> > Returns
>>> > -------
>>> > oddsratio : float
>>> > This is prior odds ratio and not a posterior estimate.
>>> > p_value : float
>>> > P-value, the probability of obtaining a distribution at least as
>>> > extreme as the one that was actually observed, assuming that the
>>> > null hypothesis is true.
>>> >
>>> > See Also
>>> > --------
>>> > chisquare : inexact alternative that can be used when sample sizes
>>> > are
>>> > large enough.
>>> >
>>> > Notes
>>> > -----
>>> > The calculated odds ratio is different from the one R uses. In R
>>> > language,
>>> > this implementation returns the (more common) "unconditional Maximum
>>> > Likelihood Estimate", while R uses the "conditional Maximum
>>> > Likelihood
>>> > Estimate".
>>> >
>>> > For tables with large numbers the (inexact) `chisquare` test can
>>> > also be
>>> > used.
>>> >
>>> > Examples
>>> > --------
>>> > Say we spend a few days counting whales and sharks in the Atlantic
>>> > and
>>> > Indian oceans. In the Atlantic ocean we find 6 whales and 1 shark,
>>> > in
>>> > the
>>> > Indian ocean 2 whales and 5 sharks. Then our contingency table is::
>>> >
>>> > Atlantic Indian
>>> > whales 8 2
>>> > sharks 1 5
>>> >
>>> > We use this table to find the p-value:
>>> >
>>> > >>> oddsratio, pvalue = stats.fisher_exact([[8, 2], [1, 5]])
>>> > >>> pvalue
>>> > 0.0349...
>>> >
>>> > The probability that we would observe this or an even more
>>> > imbalanced
>>> > ratio
>>> > by chance is about 3.5%. A commonly used significance level is 5%,
>>> > if
>>> > we
>>> > adopt that we can therefore conclude that our observed imbalance is
>>> > statistically significant; whales prefer the Atlantic while sharks
>>> > prefer
>>> > the Indian ocean.
>>> >
>>> >
>>> >
>>> > _______________________________________________
>>> > SciPy-User mailing list
>>> > SciPy-User at scipy.org
>>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >
>>> >
>>>
>>> So did two of the six whales give birth?
>>>
>>> That docstring is incomplete and probably does not meet the Scipy
>>> documentation guidelines because not everything is explained.
>>
>> Yes, which ones do? It's a lot better than it was, and more complete than
>> your average scipy docstring. Same for the tests. So I'm just going to be
>> satisfied with the bug fix and added functionality.
>>
>>> It is not a small amount of effort to clean this up to be technically
>>> correct - 0.0349 is not 'about 3.5%'.
>>
>> Note the ellipsis? It's also not exactly 0.0349. So I fail to see the
>> problem. There are bigger fish to fry.
>>
>> Ralf
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>
> Correct as this is not the place to teach statistics especially p-values.
:)
Josef
(I learned a lot on the mailing lists)
> Bruce
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
More information about the SciPy-User
mailing list