[SciPy-User] scipy.stats one-sided two-sided less, greater, signed ?

Bruce Southey bsouthey at gmail.com
Mon Jun 13 12:18:17 EDT 2011


On 06/13/2011 02:46 AM, Ralf Gommers wrote:
>
>
> On Mon, Jun 13, 2011 at 3:50 AM, Bruce Southey <bsouthey at gmail.com 
> <mailto:bsouthey at gmail.com>> wrote:
>
>     On Sun, Jun 12, 2011 at 7:52 PM, <josef.pktd at gmail.com
>     <mailto:josef.pktd at gmail.com>> wrote:
>     > On Sun, Jun 12, 2011 at 8:30 PM, Bruce Southey
>     <bsouthey at gmail.com <mailto:bsouthey at gmail.com>> wrote:
>     >> On Sun, Jun 12, 2011 at 8:56 AM, <josef.pktd at gmail.com
>     <mailto:josef.pktd at gmail.com>> wrote:
>     >>> On Sun, Jun 12, 2011 at 9:36 AM, Bruce Southey
>     <bsouthey at gmail.com <mailto:bsouthey at gmail.com>> wrote:
>     >>>> On Sun, Jun 12, 2011 at 5:20 AM, Ralf Gommers
>     >>>> <ralf.gommers at googlemail.com
>     <mailto:ralf.gommers at googlemail.com>> wrote:
>     >>>>>
>     >>>>>
>     >>>>> On Wed, Jun 8, 2011 at 12:56 PM, <josef.pktd at gmail.com
>     <mailto:josef.pktd at gmail.com>> wrote:
>     >>>>>>
>     >>>>>> On Tue, Jun 7, 2011 at 10:37 PM, Bruce Southey
>     <bsouthey at gmail.com <mailto:bsouthey at gmail.com>> wrote:
>     >>>>>> > On Tue, Jun 7, 2011 at 4:40 PM, Ralf Gommers
>     >>>>>> > <ralf.gommers at googlemail.com
>     <mailto:ralf.gommers at googlemail.com>> wrote:
>     >>>>>> >>
>     >>>>>> >>
>     >>>>>> >> On Mon, Jun 6, 2011 at 9:34 PM, <josef.pktd at gmail.com
>     <mailto:josef.pktd at gmail.com>> wrote:
>     >>>>>> >>>
>     >>>>>> >>> On Mon, Jun 6, 2011 at 2:34 PM, Bruce Southey
>     <bsouthey at gmail.com <mailto:bsouthey at gmail.com>>
>     >>>>>> >>> wrote:
>     >>>>>> >>> > On 06/05/2011 02:43 PM, josef.pktd at gmail.com
>     <mailto:josef.pktd at gmail.com> wrote:
>     >>>>>> >>> >> What should be the policy on one-sided versus two-sided?
>     >>>>>> >>> > Yes :-)
>     >>>>>> >>> >
>     >>>>>> >>> >> The main reason right now for looking at this is
>     >>>>>> >>> >> http://projects.scipy.org/scipy/ticket/1394 which
>     specifies a
>     >>>>>> >>> >> "one-sided" alternative and provides both lower and
>     upper tail.
>     >>>>>> >>> > That refers to the Fisher's test rather than the more
>     'traditional'
>     >>>>>> >>> > one-sided tests. Each value of the Fisher's test has
>     special
>     >>>>>> >>> > meanings
>     >>>>>> >>> > about the value or probability of the 'first cell'
>     under the null
>     >>>>>> >>> > hypothesis.  So it is necessary to provide those
>     three values.
>     >>>>>> >>> >
>     >>>>>> >>> >> I would prefer that we follow the alternative
>     patterns similar to R
>     >>>>>> >>> >>
>     >>>>>> >>> >> currently only kstest has    alternative :
>     'two_sided' (default),
>     >>>>>> >>> >> 'less' or 'greater'
>     >>>>>> >>> >> but this should be added to other tests where it
>     makes sense
>     >>>>>> >>> > I think that these Kolmogorov-Smirnov  tests are not
>     the traditional
>     >>>>>> >>> > meaning either. It is a little mind-boggling to try
>     to think about
>     >>>>>> >>> > cdfs!
>     >>>>>> >>> >
>     >>>>>> >>> >> R fisher.exact
>     >>>>>> >>> >> """alternative        indicates the alternative
>     hypothesis and must
>     >>>>>> >>> >> be
>     >>>>>> >>> >> one
>     >>>>>> >>> >> of "two.sided", "greater" or "less". You can specify
>     just the
>     >>>>>> >>> >> initial
>     >>>>>> >>> >> letter. Only used in the 2 by 2 case."""
>     >>>>>> >>> >>
>     >>>>>> >>> >> mannwhitneyu reports a one-sided test without
>     actually specifying
>     >>>>>> >>> >> which alternative is used  (I thought I remembered
>     other cases like
>     >>>>>> >>> >> this but don't find any right now)
>     >>>>>> >>> >>
>     >>>>>> >>> >> related:
>     >>>>>> >>> >> in many cases in the two-sided tests the test
>     statistic has a sign
>     >>>>>> >>> >> that indicates in which tail the test-statistic falls.
>     >>>>>> >>> >> This is useful in ttests for example, because the
>     one-sided tests
>     >>>>>> >>> >> can
>     >>>>>> >>> >> be backed out from the two-sided tests. (With symmetric
>     >>>>>> >>> >> distributions
>     >>>>>> >>> >> one-sided p-value is just half of the two-sided pvalue)
>     >>>>>> >>> >>
>     >>>>>> >>> >> In the discussion of
>     https://github.com/scipy/scipy/pull/8  I
>     >>>>>> >>> >> argued
>     >>>>>> >>> >> that this might mislead users to interpret a
>     two-sided result as a
>     >>>>>> >>> >> one-sided result. However, I doubt now that this is
>     a strong
>     >>>>>> >>> >> argument
>     >>>>>> >>> >> against not reporting the signed test statistic.
>     >>>>>> >>> > (I do not follow pull requests so is there a relevant
>     ticket?)
>     >>>>>> >>> >
>     >>>>>> >>> >> After going through scipy.stats.stats, it looks like
>     we always
>     >>>>>> >>> >> report
>     >>>>>> >>> >> the signed test statistic.
>     >>>>>> >>> >>
>     >>>>>> >>> >> The test statistic in ks_2samp is in all cases
>     defined as a max
>     >>>>>> >>> >> value
>     >>>>>> >>> >> and doesn't have a sign in R either, so adding a
>     sign there would
>     >>>>>> >>> >> break with the standard definition.
>     >>>>>> >>> >> one-sided option for ks_2samp would just require to
>     find the
>     >>>>>> >>> >> distribution of the test statistics D+, D-
>     >>>>>> >>> >>
>     >>>>>> >>> >> ---
>     >>>>>> >>> >>
>     >>>>>> >>> >> So my proposal for the general pattern (with
>     exceptions for special
>     >>>>>> >>> >> reasons) would be
>     >>>>>> >>> >>
>     >>>>>> >>> >> * add/offer alternative : 'two_sided' (default),
>     'less' or
>     >>>>>> >>> >> 'greater'
>     >>>>>> >>> >> http://projects.scipy.org/scipy/ticket/1394  for now,
>     >>>>>> >>> >> and adjustments of existing tests in the future
>     (adding the option
>     >>>>>> >>> >> can
>     >>>>>> >>> >> be mostly done in a backwards compatible way and for
>     symmetric
>     >>>>>> >>> >> distributions like ttest it's just a convenience)
>     >>>>>> >>> >> mannwhitneyu seems to be the only "weird" one
>     >>>>>> >>
>     >>>>>> >> This would actually make the fisher_exact implementation
>     more
>     >>>>>> >> consistent,
>     >>>>>> >> since only one p-value is returned in all cases. I just
>     don't like the
>     >>>>>> >> R
>     >>>>>> >> naming much; alternative="greater" does not convey to me
>     that this is a
>     >>>>>> >> one-sided test using the upper tail. How about:
>     >>>>>> >>     test : {"two-tailed", "lower-tail", "upper-tail"}
>     >>>>>> >> with two-tailed the default?
>     >>>>>>
>     >>>>>> I think matlab uses (in general) larger and smaller, the
>     advantage of
>     >>>>>> less/smaller and greater/larger is that it directly refers
>     to the
>     >>>>>> alternative hypothesis, while the meaning in terms of tails
>     is not
>     >>>>>> always clear (in kstest and I guess some others the test
>     statistics is
>     >>>>>> just reversed and uses the same tail in both cases)
>     >>>>>>
>     >>>>>> so greater smaller is mostly "future proof" across tests, while
>     >>>>>> reference to the tail can only be used where this is an
>     unambiguous
>     >>>>>> statement. but see below
>     >>>>>>
>     >>>>> I think I understand your terminology a bit better now, and
>     consistency
>     >>>>> across all tests is important. So I've updated the Fisher's
>     exact patch to
>     >>>>> use alternative={'two-sided', 'less', greater'} and sent a
>     pull request:
>     >>>>> https://github.com/scipy/scipy/pull/32
>     >>>>>
>     >>>>> Cheers,
>     >>>>> Ralf
>     >>>>>
>     >>>>>>
>     >>>>>>
>     >>>>>> >>
>     >>>>>> >> Ralf
>     >>>>>> >>
>     >>>>>> >>
>     >>>>>> >>>
>     >>>>>> >>> >>
>     >>>>>> >>> >> * report signed test statistic for two-sided
>     alternative (when a
>     >>>>>> >>> >> signed test statistic exists):  which is the status
>     quo in
>     >>>>>> >>> >> stats.stats, but I didn't know that this is actually
>     pretty
>     >>>>>> >>> >> consistent
>     >>>>>> >>> >> across tests.
>     >>>>>> >>> >>
>     >>>>>> >>> >> Opinions ?
>     >>>>>> >>> >>
>     >>>>>> >>> >> Josef
>     >>>>>> >>> >> _______________________________________________
>     >>>>>> >>> >> SciPy-User mailing list
>     >>>>>> >>> >> SciPy-User at scipy.org <mailto:SciPy-User at scipy.org>
>     >>>>>> >>> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>     >>>>>> >>> > I think that there is some valid misunderstanding
>     here (as I was in
>     >>>>>> >>> > the
>     >>>>>> >>> > same situation) regarding what is meant here. My
>     understanding is
>     >>>>>> >>> > that
>     >>>>>> >>> > under a one-sided hypothesis, all the values of the
>     null hypothesis
>     >>>>>> >>> > only
>     >>>>>> >>> > exist in one tail of the test distribution. In
>     contrast the values
>     >>>>>> >>> > of
>     >>>>>> >>> > null distribution exist in both tails with a
>     two-sided hypothesis.
>     >>>>>> >>> > Yet
>     >>>>>> >>> > that interpretation does not have the same meaning as
>     the tails in
>     >>>>>> >>> > the
>     >>>>>> >>> > Fisher or Kolmogorov-Smirnov tests.
>     >>>>>> >>>
>     >>>>>> >>> The tests have a clear Null Hypothesis (equality) and
>     Alternative
>     >>>>>> >>> Hypothesis (not equal or directional, less or greater).
>     >>>>>> >>> So the "alternative" should be clearly specified in the
>     function
>     >>>>>> >>> argument, as in R.
>     >>>>>> >>>
>     >>>>>> >>> Whether this corresponds to left and right tails of the
>     distribution
>     >>>>>> >>> is an "implementation detail" which holds for ttests
>     but not for
>     >>>>>> >>> kstest/ks_2samp.
>     >>>>>> >>>
>     >>>>>> >>> kstest/ks2sample   H0: cdf1 == cdf2  and H1:  cdf1 !=
>     cdf2 or H1:
>     >>>>>> >>> cdf1 < cdf2 or H1:  cdf1 > cdf2
>     >>>>>> >>> (looks similar to comparing two survival curves in
>     Kaplan-Meier ?)
>     >>>>>> >>>
>     >>>>>> >>> fisher_exact (2 by 2)  H0: odds-ratio == 1 and H1:
>     odds-ratio != 1 or
>     >>>>>> >>> H1: odds-ratio < 1 or H1: odds-ratio > 1
>     >>>>>> >>>
>     >>>>>> >>> I know the kolmogorov-smirnov tests, but for fisher
>     exact and
>     >>>>>> >>> contingency tables I rely on R
>     >>>>>> >>>
>     >>>>>> >>> from R-help:
>     >>>>>> >>> For 2 by 2 tables, the null of conditional independence
>     is equivalent
>     >>>>>> >>> to the hypothesis that the odds ratio equals one. <...> The
>     >>>>>> >>> alternative for a one-sided test is based on the odds
>     ratio, so
>     >>>>>> >>> alternative = "greater" is a test of the odds ratio
>     being bigger than
>     >>>>>> >>> or.
>     >>>>>> >>> Two-sided tests are based on the probabilities of the
>     tables, and take
>     >>>>>> >>> as ‘more extreme’ all tables with probabilities less
>     than or equal to
>     >>>>>> >>> that of the observed table, the p-value being the sum
>     of such
>     >>>>>> >>> probabilities.
>     >>>>>> >>>
>     >>>>>> >>> Josef
>     >>>>>> >>>
>     >>>>>> >>>
>     >>>>>> >>> >
>     >>>>>> >>> > I never paid much attention to the frequency based
>     tests but it does
>     >>>>>> >>> > not
>     >>>>>> >>> > surprise if there are no one-sided tests. Most are
>     rank-based so it
>     >>>>>> >>> > is
>     >>>>>> >>> > rather hard to do in a simply manner - actually I am
>     not even sure
>     >>>>>> >>> > how
>     >>>>>> >>> > to use a permutation test.
>     >>>>>> >>> >
>     >>>>>> >>> > Bruce
>     >>>>>> >>> >
>     >>>>>> >>> >
>     >>>>>> >>> >
>     >>>>>> >>> > _______________________________________________
>     >>>>>> >>> > SciPy-User mailing list
>     >>>>>> >>> > SciPy-User at scipy.org <mailto:SciPy-User at scipy.org>
>     >>>>>> >>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>     >>>>>> >>> >
>     >>>>>> >>> _______________________________________________
>     >>>>>> >>> SciPy-User mailing list
>     >>>>>> >>> SciPy-User at scipy.org <mailto:SciPy-User at scipy.org>
>     >>>>>> >>> http://mail.scipy.org/mailman/listinfo/scipy-user
>     >>>>>> >>
>     >>>>>> >>
>     >>>>>> >> _______________________________________________
>     >>>>>> >> SciPy-User mailing list
>     >>>>>> >> SciPy-User at scipy.org <mailto:SciPy-User at scipy.org>
>     >>>>>> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>     >>>>>> >>
>     >>>>>> >>
>     >>>>>> >
>     >>>>>> > But that is NOT the correct interpretation  here!
>     >>>>>> > I tried to explain to you that this is the not the usual idea
>     >>>>>> > one-sided vs two-sided tests.
>     >>>>>> > For example:
>     >>>>>> >
>     http://www.msu.edu/~fuw/teaching/Fu_ch10_2_categorical.ppt
>     <http://www.msu.edu/%7Efuw/teaching/Fu_ch10_2_categorical.ppt>
>     >>>>>> > "The test holds the marginal totals fixed and computes the
>     >>>>>> > hypergeometric probability that n11 is at least as large
>     as the
>     >>>>>> > observed value"
>     >>>>>>
>     >>>>>> this still sounds like a less/greater test to me
>     >>>>>>
>     >>>>>>
>     >>>>>> > "The output consists of three p-values:
>     >>>>>> > Left: Use this when the alternative to independence is
>     that there is
>     >>>>>> > negative association between the variables.  That is, the
>     observations
>     >>>>>> > tend to lie in lower left and upper right.
>     >>>>>> > Right: Use this when the alternative to independence is
>     that there is
>     >>>>>> > positive association between the variables. That is, the
>     observations
>     >>>>>> > tend to lie in upper left and lower right.
>     >>>>>> > 2-Tail: Use this when there is no prior alternative.
>     >>>>>> > "
>     >>>>>> > There is also the book "Categorical data analysis: using
>     the SAS
>     >>>>>> > system  By Maura E. Stokes, Charles S. Davis, Gary G.
>     Koch" that came
>     >>>>>> > up via Google that also refers to the n11 cell.
>     >>>>>> >
>     >>>>>> > http://www.langsrud.com/fisher.htm
>     >>>>>>
>     >>>>>> I was trying to read the Agresti paper referenced there but
>     it has too
>     >>>>>> much detail to get through in 15 minutes :)
>     >>>>>>
>     >>>>>> > "The output consists of three p-values:
>     >>>>>> >
>     >>>>>> >    Left: Use this when the alternative to independence is
>     that there
>     >>>>>> > is negative association between the variables.
>     >>>>>> >    That is, the observations tend to lie in lower left
>     and upper right.
>     >>>>>> >    Right: Use this when the alternative to independence
>     is that there
>     >>>>>> > is positive association between the variables.
>     >>>>>> >    That is, the observations tend to lie in upper left
>     and lower right.
>     >>>>>> >    2-Tail: Use this when there is no prior alternative.
>     >>>>>> >
>     >>>>>> > NOTE: Decide to use Left, Right or 2-Tail before
>     collecting (or
>     >>>>>> > looking at) the data."
>     >>>>>> >
>     >>>>>> > But you will get a different p-value if you switch rows
>     and columns
>     >>>>>> > because of the dependence on the n11 cell. If you do that
>     then the
>     >>>>>> > p-values switch between left and right sides as these now
>     refer to
>     >>>>>> > different hypotheses regarding that first cell.
>     >>>>>>
>     >>>>>> switching row and columns doesn't change the p-value in R
>     >>>>>> reversing columns changes the definition of less and
>     greater, reverses
>     >>>>>> them
>     >>>>>>
>     >>>>>> The problem with 2 by 2 contingency tables with given
>     marginals, i.e.
>     >>>>>> row and column totals, is that we only have one free entry.
>     Any test
>     >>>>>> on one entry, e.g. element 0,0, pins down all the other
>     ones and
>     >>>>>> (many) tests then become equivalent.
>     >>>>>>
>     >>>>>>
>     >>>>>>
>     http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_freq_a0000000658.htm
>     >>>>>> some math got lost
>     >>>>>> """
>     >>>>>> For <2 by 2> tables, one-sided -values for Fisher’s exact
>     test are
>     >>>>>> defined in terms of the frequency of the cell in the first
>     row and
>     >>>>>> first column of the table, the (1,1) cell. Denoting the
>     observed (1,1)
>     >>>>>> cell frequency by , the left-sided -value for Fisher’s
>     exact test is
>     >>>>>> the probability that the (1,1) cell frequency is less than
>     or equal to
>     >>>>>> . For the left-sided -value, the set includes those tables
>     with a
>     >>>>>> (1,1) cell frequency less than or equal to . A small
>     left-sided -value
>     >>>>>> supports the alternative hypothesis that the probability of an
>     >>>>>> observation being in the first cell is actually less than
>     expected
>     >>>>>> under the null hypothesis of independent row and column
>     variables.
>     >>>>>>
>     >>>>>> Similarly, for a right-sided alternative hypothesis, is the
>     set of
>     >>>>>> tables where the frequency of the (1,1) cell is greater
>     than or equal
>     >>>>>> to that in the observed table. A small right-sided -value
>     supports the
>     >>>>>> alternative that the probability of the first cell is
>     actually greater
>     >>>>>> than that expected under the null hypothesis.
>     >>>>>>
>     >>>>>> Because the (1,1) cell frequency completely determines the
>     table when
>     >>>>>> the marginal row and column sums are fixed, these one-sided
>     >>>>>> alternatives can be stated equivalently in terms of other cell
>     >>>>>> probabilities or ratios of cell probabilities. The left-sided
>     >>>>>> alternative is equivalent to an odds ratio less than 1,
>     where the odds
>     >>>>>> ratio equals (). Additionally, the left-sided alternative is
>     >>>>>> equivalent to the column 1 risk for row 1 being less than
>     the column 1
>     >>>>>> risk for row 2, . Similarly, the right-sided alternative is
>     equivalent
>     >>>>>> to the column 1 risk for row 1 being greater than the
>     column 1 risk
>     >>>>>> for row 2, . See Agresti (2007) for details.
>     >>>>>> R C Tables
>     >>>>>> """
>     >>>>>>
>     >>>>>> I'm not a user of Fisher's exact test (and I have a hard
>     time keeping
>     >>>>>> the different statements straight), so if left/right or
>     lower/upper
>     >>>>>> makes more sense to users, then I don't complain.
>     >>>>>>
>     >>>>>> To me they are all just independence tests with possible
>     one-sided
>     >>>>>> alternatives that one distribution dominates the other.
>     (with the same
>     >>>>>> pattern as ks_2samp or ttest_2samp)
>     >>>>>>
>     >>>>>> Josef
>     >>>>>>
>     >>>>>> >
>     >>>>>> >
>     >>>>>> > Bruce
>     >>>>>> > _______________________________________________
>     >>>>>> > SciPy-User mailing list
>     >>>>>> > SciPy-User at scipy.org <mailto:SciPy-User at scipy.org>
>     >>>>>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>     >>>>>> >
>     >>>>>> _______________________________________________
>     >>>>>> SciPy-User mailing list
>     >>>>>> SciPy-User at scipy.org <mailto:SciPy-User at scipy.org>
>     >>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>     >>>>>
>     >>>>>
>     >>>>> _______________________________________________
>     >>>>> SciPy-User mailing list
>     >>>>> SciPy-User at scipy.org <mailto:SciPy-User at scipy.org>
>     >>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>     >>>>>
>     >>>>>
>     >>>> This is just wrong and plain ignorant! Please read the
>     references and
>     >>>> stats books about what the tails actually mean!
>     >>>>
>     >>>> You really need all three tests because these have different
>     meanings
>     >>>> that you do not know in advance which you need.
>     >>>
>     >>> Sorry, but I'm perfectly happy to follow R and SAS in this.
>     >>>
>     >>> Josef
>     >>>
>     >>>>
>     >>>> Bruce
>     >>>> _______________________________________________
>     >>>> SciPy-User mailing list
>     >>>> SciPy-User at scipy.org <mailto:SciPy-User at scipy.org>
>     >>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>     >>>>
>     >>> _______________________________________________
>     >>> SciPy-User mailing list
>     >>> SciPy-User at scipy.org <mailto:SciPy-User at scipy.org>
>     >>> http://mail.scipy.org/mailman/listinfo/scipy-user
>     >>>
>     >> So am I which is NOT what is happening here!
>     >
>     > Why do you think that?
>     Because all the stuff given above including SAS which YOU provided
>     includes all three tests.
>
>     > I quoted all the relevant descriptions from the R and SAS help,
>     and I
>     > checked the following and similar for the cases that are in the
>     > changeset for the tests:
>     >
>     >> fisher.test(t(matrix(c(190,800,200,900),nrow=2)),alternative='g')
>     >
>     >        Fisher's Exact Test for Count Data
>     >
>     > data:  t(matrix(c(190, 800, 200, 900), nrow = 2))
>     > p-value = 0.296
>     > alternative hypothesis: true odds ratio is greater than 1
>     > 95 percent confidence interval:
>     >  0.8828407       Inf
>     > sample estimates:
>     > odds ratio
>     >  1.068698
>     >
>     >> fisher.test(t(matrix(c(190,800,200,900),nrow=2)),alternative='l')
>     >
>     >        Fisher's Exact Test for Count Data
>     >
>     > data:  t(matrix(c(190, 800, 200, 900), nrow = 2))
>     > p-value = 0.7416
>     > alternative hypothesis: true odds ratio is less than 1
>     > 95 percent confidence interval:
>     >  0.000000 1.293552
>     > sample estimates:
>     > odds ratio
>     >  1.068698
>     >
>     >> fisher.test(t(matrix(c(190,800,200,900),nrow=2)),alternative='t')
>     >
>     >        Fisher's Exact Test for Count Data
>     >
>     > data:  t(matrix(c(190, 800, 200, 900), nrow = 2))
>     > p-value = 0.5741
>     > alternative hypothesis: true odds ratio is not equal to 1
>     > 95 percent confidence interval:
>     >  0.8520463 1.3401490
>     > sample estimates:
>     > odds ratio
>     >  1.068698
>     >
>     > All the p-values agree for the alternatives two-sided, less, and
>     > greater, the odds ratio is defined differently as explained pretty
>     > well in the docstring.
>     >
>     > Josef
>     >
>     >
>     >>
>     >> Bruce
>     >> _______________________________________________
>     >> SciPy-User mailing list
>     >> SciPy-User at scipy.org <mailto:SciPy-User at scipy.org>
>     >> http://mail.scipy.org/mailman/listinfo/scipy-user
>     >>
>     > _______________________________________________
>     > SciPy-User mailing list
>     > SciPy-User at scipy.org <mailto:SciPy-User at scipy.org>
>     > http://mail.scipy.org/mailman/listinfo/scipy-user
>     >
>
>     Yes, but you said to follow BOTH R and SAS - that means providing
>     all three:
>
>     The FREQ Procedure
>
>     Table of Exposure by Response
>
>     Exposure     Response
>
>     Frequency|       0|       1|  Total
>     ---------+--------+--------+
>           0 |    190 |    800 |    990
>     ---------+--------+--------+
>           1 |    200 |    900 |   1100
>     ---------+--------+--------+
>     Total         390     1700     2090
>
>
>     Statistics for Table of Exposure by Response
>
>     Statistic                     DF       Value      Prob
>     ------------------------------------------------------
>     Chi-Square                     1      0.3503    0.5540
>     Likelihood Ratio Chi-Square    1      0.3500    0.5541
>     Continuity Adj. Chi-Square     1      0.2869    0.5922
>     Mantel-Haenszel Chi-Square     1      0.3501    0.5541
>     Phi Coefficient                       0.0129
>     Contingency Coefficient               0.0129
>     Cramer's V                            0.0129
>
>
>         Pearson Chi-Square Test
>     ----------------------------------
>     Chi-Square                  0.3503
>     DF                               1
>     Asymptotic Pr >  ChiSq      0.5540
>     Exact      Pr >= ChiSq      0.5741
>
>
>           Fisher's Exact Test
>     ----------------------------------
>     Cell (1,1) Frequency (F)       190
>     Left-sided Pr <= F          0.7416
>     Right-sided Pr >= F         0.2960
>
>     Table Probability (P)       0.0376
>     Two-sided Pr <= P           0.5741
>
>     Sample Size = 2090
>
>     Thus providing all three is the correct answer.
>
> Eh, we do. The interface is the same as that of R, and all three of 
> {two-sided, less, greater} are extensively checked against R. It looks 
> like you are reacting to only one statement Josef made to explain his 
> interpretation of less/greater. Please check the actual commit and 
> then comment if you see anything wrong.
>
> Ralf
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
I have looked at it (again) and the comments still stand:
A user should not have to read a statistical book and then the code to 
figure out what was actually implemented here.  So I do strongly object 
to Josef's statements as you just can not interpret Fisher's test in 
that way. Just look at how SAS presents the results as should give a 
huge clue that the two-sided tests is different than the other one-sided 
tests.


Bruce


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20110613/365d0881/attachment.html>


More information about the SciPy-User mailing list