[SciPy-User] scipy.stats one-sided two-sided less, greater, signed ?
josef.pktd at gmail.com
josef.pktd at gmail.com
Sun Jun 12 20:52:32 EDT 2011
On Sun, Jun 12, 2011 at 8:30 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> On Sun, Jun 12, 2011 at 8:56 AM, <josef.pktd at gmail.com> wrote:
>> On Sun, Jun 12, 2011 at 9:36 AM, Bruce Southey <bsouthey at gmail.com> wrote:
>>> On Sun, Jun 12, 2011 at 5:20 AM, Ralf Gommers
>>> <ralf.gommers at googlemail.com> wrote:
>>>>
>>>>
>>>> On Wed, Jun 8, 2011 at 12:56 PM, <josef.pktd at gmail.com> wrote:
>>>>>
>>>>> On Tue, Jun 7, 2011 at 10:37 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>>>>> > On Tue, Jun 7, 2011 at 4:40 PM, Ralf Gommers
>>>>> > <ralf.gommers at googlemail.com> wrote:
>>>>> >>
>>>>> >>
>>>>> >> On Mon, Jun 6, 2011 at 9:34 PM, <josef.pktd at gmail.com> wrote:
>>>>> >>>
>>>>> >>> On Mon, Jun 6, 2011 at 2:34 PM, Bruce Southey <bsouthey at gmail.com>
>>>>> >>> wrote:
>>>>> >>> > On 06/05/2011 02:43 PM, josef.pktd at gmail.com wrote:
>>>>> >>> >> What should be the policy on one-sided versus two-sided?
>>>>> >>> > Yes :-)
>>>>> >>> >
>>>>> >>> >> The main reason right now for looking at this is
>>>>> >>> >> http://projects.scipy.org/scipy/ticket/1394 which specifies a
>>>>> >>> >> "one-sided" alternative and provides both lower and upper tail.
>>>>> >>> > That refers to the Fisher's test rather than the more 'traditional'
>>>>> >>> > one-sided tests. Each value of the Fisher's test has special
>>>>> >>> > meanings
>>>>> >>> > about the value or probability of the 'first cell' under the null
>>>>> >>> > hypothesis. So it is necessary to provide those three values.
>>>>> >>> >
>>>>> >>> >> I would prefer that we follow the alternative patterns similar to R
>>>>> >>> >>
>>>>> >>> >> currently only kstest has alternative : 'two_sided' (default),
>>>>> >>> >> 'less' or 'greater'
>>>>> >>> >> but this should be added to other tests where it makes sense
>>>>> >>> > I think that these Kolmogorov-Smirnov tests are not the traditional
>>>>> >>> > meaning either. It is a little mind-boggling to try to think about
>>>>> >>> > cdfs!
>>>>> >>> >
>>>>> >>> >> R fisher.exact
>>>>> >>> >> """alternative indicates the alternative hypothesis and must
>>>>> >>> >> be
>>>>> >>> >> one
>>>>> >>> >> of "two.sided", "greater" or "less". You can specify just the
>>>>> >>> >> initial
>>>>> >>> >> letter. Only used in the 2 by 2 case."""
>>>>> >>> >>
>>>>> >>> >> mannwhitneyu reports a one-sided test without actually specifying
>>>>> >>> >> which alternative is used (I thought I remembered other cases like
>>>>> >>> >> this but don't find any right now)
>>>>> >>> >>
>>>>> >>> >> related:
>>>>> >>> >> in many cases in the two-sided tests the test statistic has a sign
>>>>> >>> >> that indicates in which tail the test-statistic falls.
>>>>> >>> >> This is useful in ttests for example, because the one-sided tests
>>>>> >>> >> can
>>>>> >>> >> be backed out from the two-sided tests. (With symmetric
>>>>> >>> >> distributions
>>>>> >>> >> one-sided p-value is just half of the two-sided pvalue)
>>>>> >>> >>
>>>>> >>> >> In the discussion of https://github.com/scipy/scipy/pull/8 I
>>>>> >>> >> argued
>>>>> >>> >> that this might mislead users to interpret a two-sided result as a
>>>>> >>> >> one-sided result. However, I doubt now that this is a strong
>>>>> >>> >> argument
>>>>> >>> >> against not reporting the signed test statistic.
>>>>> >>> > (I do not follow pull requests so is there a relevant ticket?)
>>>>> >>> >
>>>>> >>> >> After going through scipy.stats.stats, it looks like we always
>>>>> >>> >> report
>>>>> >>> >> the signed test statistic.
>>>>> >>> >>
>>>>> >>> >> The test statistic in ks_2samp is in all cases defined as a max
>>>>> >>> >> value
>>>>> >>> >> and doesn't have a sign in R either, so adding a sign there would
>>>>> >>> >> break with the standard definition.
>>>>> >>> >> one-sided option for ks_2samp would just require to find the
>>>>> >>> >> distribution of the test statistics D+, D-
>>>>> >>> >>
>>>>> >>> >> ---
>>>>> >>> >>
>>>>> >>> >> So my proposal for the general pattern (with exceptions for special
>>>>> >>> >> reasons) would be
>>>>> >>> >>
>>>>> >>> >> * add/offer alternative : 'two_sided' (default), 'less' or
>>>>> >>> >> 'greater'
>>>>> >>> >> http://projects.scipy.org/scipy/ticket/1394 for now,
>>>>> >>> >> and adjustments of existing tests in the future (adding the option
>>>>> >>> >> can
>>>>> >>> >> be mostly done in a backwards compatible way and for symmetric
>>>>> >>> >> distributions like ttest it's just a convenience)
>>>>> >>> >> mannwhitneyu seems to be the only "weird" one
>>>>> >>
>>>>> >> This would actually make the fisher_exact implementation more
>>>>> >> consistent,
>>>>> >> since only one p-value is returned in all cases. I just don't like the
>>>>> >> R
>>>>> >> naming much; alternative="greater" does not convey to me that this is a
>>>>> >> one-sided test using the upper tail. How about:
>>>>> >> test : {"two-tailed", "lower-tail", "upper-tail"}
>>>>> >> with two-tailed the default?
>>>>>
>>>>> I think matlab uses (in general) larger and smaller, the advantage of
>>>>> less/smaller and greater/larger is that it directly refers to the
>>>>> alternative hypothesis, while the meaning in terms of tails is not
>>>>> always clear (in kstest and I guess some others the test statistics is
>>>>> just reversed and uses the same tail in both cases)
>>>>>
>>>>> so greater smaller is mostly "future proof" across tests, while
>>>>> reference to the tail can only be used where this is an unambiguous
>>>>> statement. but see below
>>>>>
>>>> I think I understand your terminology a bit better now, and consistency
>>>> across all tests is important. So I've updated the Fisher's exact patch to
>>>> use alternative={'two-sided', 'less', greater'} and sent a pull request:
>>>> https://github.com/scipy/scipy/pull/32
>>>>
>>>> Cheers,
>>>> Ralf
>>>>
>>>>>
>>>>>
>>>>> >>
>>>>> >> Ralf
>>>>> >>
>>>>> >>
>>>>> >>>
>>>>> >>> >>
>>>>> >>> >> * report signed test statistic for two-sided alternative (when a
>>>>> >>> >> signed test statistic exists): which is the status quo in
>>>>> >>> >> stats.stats, but I didn't know that this is actually pretty
>>>>> >>> >> consistent
>>>>> >>> >> across tests.
>>>>> >>> >>
>>>>> >>> >> Opinions ?
>>>>> >>> >>
>>>>> >>> >> Josef
>>>>> >>> >> _______________________________________________
>>>>> >>> >> SciPy-User mailing list
>>>>> >>> >> SciPy-User at scipy.org
>>>>> >>> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>> >>> > I think that there is some valid misunderstanding here (as I was in
>>>>> >>> > the
>>>>> >>> > same situation) regarding what is meant here. My understanding is
>>>>> >>> > that
>>>>> >>> > under a one-sided hypothesis, all the values of the null hypothesis
>>>>> >>> > only
>>>>> >>> > exist in one tail of the test distribution. In contrast the values
>>>>> >>> > of
>>>>> >>> > null distribution exist in both tails with a two-sided hypothesis.
>>>>> >>> > Yet
>>>>> >>> > that interpretation does not have the same meaning as the tails in
>>>>> >>> > the
>>>>> >>> > Fisher or Kolmogorov-Smirnov tests.
>>>>> >>>
>>>>> >>> The tests have a clear Null Hypothesis (equality) and Alternative
>>>>> >>> Hypothesis (not equal or directional, less or greater).
>>>>> >>> So the "alternative" should be clearly specified in the function
>>>>> >>> argument, as in R.
>>>>> >>>
>>>>> >>> Whether this corresponds to left and right tails of the distribution
>>>>> >>> is an "implementation detail" which holds for ttests but not for
>>>>> >>> kstest/ks_2samp.
>>>>> >>>
>>>>> >>> kstest/ks2sample H0: cdf1 == cdf2 and H1: cdf1 != cdf2 or H1:
>>>>> >>> cdf1 < cdf2 or H1: cdf1 > cdf2
>>>>> >>> (looks similar to comparing two survival curves in Kaplan-Meier ?)
>>>>> >>>
>>>>> >>> fisher_exact (2 by 2) H0: odds-ratio == 1 and H1: odds-ratio != 1 or
>>>>> >>> H1: odds-ratio < 1 or H1: odds-ratio > 1
>>>>> >>>
>>>>> >>> I know the kolmogorov-smirnov tests, but for fisher exact and
>>>>> >>> contingency tables I rely on R
>>>>> >>>
>>>>> >>> from R-help:
>>>>> >>> For 2 by 2 tables, the null of conditional independence is equivalent
>>>>> >>> to the hypothesis that the odds ratio equals one. <...> The
>>>>> >>> alternative for a one-sided test is based on the odds ratio, so
>>>>> >>> alternative = "greater" is a test of the odds ratio being bigger than
>>>>> >>> or.
>>>>> >>> Two-sided tests are based on the probabilities of the tables, and take
>>>>> >>> as ‘more extreme’ all tables with probabilities less than or equal to
>>>>> >>> that of the observed table, the p-value being the sum of such
>>>>> >>> probabilities.
>>>>> >>>
>>>>> >>> Josef
>>>>> >>>
>>>>> >>>
>>>>> >>> >
>>>>> >>> > I never paid much attention to the frequency based tests but it does
>>>>> >>> > not
>>>>> >>> > surprise if there are no one-sided tests. Most are rank-based so it
>>>>> >>> > is
>>>>> >>> > rather hard to do in a simply manner - actually I am not even sure
>>>>> >>> > how
>>>>> >>> > to use a permutation test.
>>>>> >>> >
>>>>> >>> > Bruce
>>>>> >>> >
>>>>> >>> >
>>>>> >>> >
>>>>> >>> > _______________________________________________
>>>>> >>> > SciPy-User mailing list
>>>>> >>> > SciPy-User at scipy.org
>>>>> >>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>> >>> >
>>>>> >>> _______________________________________________
>>>>> >>> SciPy-User mailing list
>>>>> >>> SciPy-User at scipy.org
>>>>> >>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>> >>
>>>>> >>
>>>>> >> _______________________________________________
>>>>> >> SciPy-User mailing list
>>>>> >> SciPy-User at scipy.org
>>>>> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>> >>
>>>>> >>
>>>>> >
>>>>> > But that is NOT the correct interpretation here!
>>>>> > I tried to explain to you that this is the not the usual idea
>>>>> > one-sided vs two-sided tests.
>>>>> > For example:
>>>>> > http://www.msu.edu/~fuw/teaching/Fu_ch10_2_categorical.ppt
>>>>> > "The test holds the marginal totals fixed and computes the
>>>>> > hypergeometric probability that n11 is at least as large as the
>>>>> > observed value"
>>>>>
>>>>> this still sounds like a less/greater test to me
>>>>>
>>>>>
>>>>> > "The output consists of three p-values:
>>>>> > Left: Use this when the alternative to independence is that there is
>>>>> > negative association between the variables. That is, the observations
>>>>> > tend to lie in lower left and upper right.
>>>>> > Right: Use this when the alternative to independence is that there is
>>>>> > positive association between the variables. That is, the observations
>>>>> > tend to lie in upper left and lower right.
>>>>> > 2-Tail: Use this when there is no prior alternative.
>>>>> > "
>>>>> > There is also the book "Categorical data analysis: using the SAS
>>>>> > system By Maura E. Stokes, Charles S. Davis, Gary G. Koch" that came
>>>>> > up via Google that also refers to the n11 cell.
>>>>> >
>>>>> > http://www.langsrud.com/fisher.htm
>>>>>
>>>>> I was trying to read the Agresti paper referenced there but it has too
>>>>> much detail to get through in 15 minutes :)
>>>>>
>>>>> > "The output consists of three p-values:
>>>>> >
>>>>> > Left: Use this when the alternative to independence is that there
>>>>> > is negative association between the variables.
>>>>> > That is, the observations tend to lie in lower left and upper right.
>>>>> > Right: Use this when the alternative to independence is that there
>>>>> > is positive association between the variables.
>>>>> > That is, the observations tend to lie in upper left and lower right.
>>>>> > 2-Tail: Use this when there is no prior alternative.
>>>>> >
>>>>> > NOTE: Decide to use Left, Right or 2-Tail before collecting (or
>>>>> > looking at) the data."
>>>>> >
>>>>> > But you will get a different p-value if you switch rows and columns
>>>>> > because of the dependence on the n11 cell. If you do that then the
>>>>> > p-values switch between left and right sides as these now refer to
>>>>> > different hypotheses regarding that first cell.
>>>>>
>>>>> switching row and columns doesn't change the p-value in R
>>>>> reversing columns changes the definition of less and greater, reverses
>>>>> them
>>>>>
>>>>> The problem with 2 by 2 contingency tables with given marginals, i.e.
>>>>> row and column totals, is that we only have one free entry. Any test
>>>>> on one entry, e.g. element 0,0, pins down all the other ones and
>>>>> (many) tests then become equivalent.
>>>>>
>>>>>
>>>>> http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_freq_a0000000658.htm
>>>>> some math got lost
>>>>> """
>>>>> For <2 by 2> tables, one-sided -values for Fisher’s exact test are
>>>>> defined in terms of the frequency of the cell in the first row and
>>>>> first column of the table, the (1,1) cell. Denoting the observed (1,1)
>>>>> cell frequency by , the left-sided -value for Fisher’s exact test is
>>>>> the probability that the (1,1) cell frequency is less than or equal to
>>>>> . For the left-sided -value, the set includes those tables with a
>>>>> (1,1) cell frequency less than or equal to . A small left-sided -value
>>>>> supports the alternative hypothesis that the probability of an
>>>>> observation being in the first cell is actually less than expected
>>>>> under the null hypothesis of independent row and column variables.
>>>>>
>>>>> Similarly, for a right-sided alternative hypothesis, is the set of
>>>>> tables where the frequency of the (1,1) cell is greater than or equal
>>>>> to that in the observed table. A small right-sided -value supports the
>>>>> alternative that the probability of the first cell is actually greater
>>>>> than that expected under the null hypothesis.
>>>>>
>>>>> Because the (1,1) cell frequency completely determines the table when
>>>>> the marginal row and column sums are fixed, these one-sided
>>>>> alternatives can be stated equivalently in terms of other cell
>>>>> probabilities or ratios of cell probabilities. The left-sided
>>>>> alternative is equivalent to an odds ratio less than 1, where the odds
>>>>> ratio equals (). Additionally, the left-sided alternative is
>>>>> equivalent to the column 1 risk for row 1 being less than the column 1
>>>>> risk for row 2, . Similarly, the right-sided alternative is equivalent
>>>>> to the column 1 risk for row 1 being greater than the column 1 risk
>>>>> for row 2, . See Agresti (2007) for details.
>>>>> R C Tables
>>>>> """
>>>>>
>>>>> I'm not a user of Fisher's exact test (and I have a hard time keeping
>>>>> the different statements straight), so if left/right or lower/upper
>>>>> makes more sense to users, then I don't complain.
>>>>>
>>>>> To me they are all just independence tests with possible one-sided
>>>>> alternatives that one distribution dominates the other. (with the same
>>>>> pattern as ks_2samp or ttest_2samp)
>>>>>
>>>>> Josef
>>>>>
>>>>> >
>>>>> >
>>>>> > Bruce
>>>>> > _______________________________________________
>>>>> > SciPy-User mailing list
>>>>> > SciPy-User at scipy.org
>>>>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>> >
>>>>> _______________________________________________
>>>>> SciPy-User mailing list
>>>>> SciPy-User at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>
>>>>
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> SciPy-User at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>
>>>>
>>> This is just wrong and plain ignorant! Please read the references and
>>> stats books about what the tails actually mean!
>>>
>>> You really need all three tests because these have different meanings
>>> that you do not know in advance which you need.
>>
>> Sorry, but I'm perfectly happy to follow R and SAS in this.
>>
>> Josef
>>
>>>
>>> Bruce
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> So am I which is NOT what is happening here!
Why do you think that?
I quoted all the relevant descriptions from the R and SAS help, and I
checked the following and similar for the cases that are in the
changeset for the tests:
> fisher.test(t(matrix(c(190,800,200,900),nrow=2)),alternative='g')
Fisher's Exact Test for Count Data
data: t(matrix(c(190, 800, 200, 900), nrow = 2))
p-value = 0.296
alternative hypothesis: true odds ratio is greater than 1
95 percent confidence interval:
0.8828407 Inf
sample estimates:
odds ratio
1.068698
> fisher.test(t(matrix(c(190,800,200,900),nrow=2)),alternative='l')
Fisher's Exact Test for Count Data
data: t(matrix(c(190, 800, 200, 900), nrow = 2))
p-value = 0.7416
alternative hypothesis: true odds ratio is less than 1
95 percent confidence interval:
0.000000 1.293552
sample estimates:
odds ratio
1.068698
> fisher.test(t(matrix(c(190,800,200,900),nrow=2)),alternative='t')
Fisher's Exact Test for Count Data
data: t(matrix(c(190, 800, 200, 900), nrow = 2))
p-value = 0.5741
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.8520463 1.3401490
sample estimates:
odds ratio
1.068698
All the p-values agree for the alternatives two-sided, less, and
greater, the odds ratio is defined differently as explained pretty
well in the docstring.
Josef
>
> Bruce
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
More information about the SciPy-User
mailing list