[SciPy-User] scipy.stats one-sided two-sided less, greater, signed ?

Mon Jun 6 15:34:12 EDT 2011

On Mon, Jun 6, 2011 at 2:34 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> On 06/05/2011 02:43 PM, josef.pktd at gmail.com wrote:
>> What should be the policy on one-sided versus two-sided?
> Yes :-)
>
>> The main reason right now for looking at this is
>> http://projects.scipy.org/scipy/ticket/1394 which specifies a
>> "one-sided" alternative and provides both lower and upper tail.
> That refers to the Fisher's test rather than the more 'traditional'
> one-sided tests. Each value of the Fisher's test has special meanings
> about the value or probability of the 'first cell' under the null
> hypothesis.  So it is necessary to provide those three values.
>
>> I would prefer that we follow the alternative patterns similar to R
>>
>> currently only kstest has    alternative : 'two_sided' (default),
>> 'less' or 'greater'
>> but this should be added to other tests where it makes sense
> I think that these Kolmogorov-Smirnov  tests are not the traditional
> meaning either. It is a little mind-boggling to try to think about cdfs!
>
>> R fisher.exact
>> """alternative        indicates the alternative hypothesis and must be one
>> of "two.sided", "greater" or "less". You can specify just the initial
>> letter. Only used in the 2 by 2 case."""
>>
>> mannwhitneyu reports a one-sided test without actually specifying
>> which alternative is used  (I thought I remembered other cases like
>> this but don't find any right now)
>>
>> related:
>> in many cases in the two-sided tests the test statistic has a sign
>> that indicates in which tail the test-statistic falls.
>> This is useful in ttests for example, because the one-sided tests can
>> be backed out from the two-sided tests. (With symmetric distributions
>> one-sided p-value is just half of the two-sided pvalue)
>>
>> In the discussion of https://github.com/scipy/scipy/pull/8  I argued
>> that this might mislead users to interpret a two-sided result as a
>> one-sided result. However, I doubt now that this is a strong argument
>> against not reporting the signed test statistic.
> (I do not follow pull requests so is there a relevant ticket?)
>
>> After going through scipy.stats.stats, it looks like we always report
>> the signed test statistic.
>>
>> The test statistic in ks_2samp is in all cases defined as a max value
>> and doesn't have a sign in R either, so adding a sign there would
>> break with the standard definition.
>> one-sided option for ks_2samp would just require to find the
>> distribution of the test statistics D+, D-
>>
>> ---
>>
>> So my proposal for the general pattern (with exceptions for special
>> reasons) would be
>>
>> * add/offer alternative : 'two_sided' (default), 'less' or 'greater'
>> http://projects.scipy.org/scipy/ticket/1394  for now,
>> and adjustments of existing tests in the future (adding the option can
>> be mostly done in a backwards compatible way and for symmetric
>> distributions like ttest it's just a convenience)
>> mannwhitneyu seems to be the only "weird" one
>>
>> * report signed test statistic for two-sided alternative (when a
>> signed test statistic exists):  which is the status quo in
>> stats.stats, but I didn't know that this is actually pretty consistent
>> across tests.
>>
>> Opinions ?
>>
>> Josef
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
> I think that there is some valid misunderstanding here (as I was in the
> same situation) regarding what is meant here. My understanding is that
> under a one-sided hypothesis, all the values of the null hypothesis only
> exist in one tail of the test distribution. In contrast the values of
> null distribution exist in both tails with a two-sided hypothesis. Yet
> that interpretation does not have the same meaning as the tails in the
> Fisher or Kolmogorov-Smirnov tests.

The tests have a clear Null Hypothesis (equality) and Alternative
Hypothesis (not equal or directional, less or greater).
So the "alternative" should be clearly specified in the function
argument, as in R.

Whether this corresponds to left and right tails of the distribution
is an "implementation detail" which holds for ttests but not for
kstest/ks_2samp.

kstest/ks2sample   H0: cdf1 == cdf2  and H1:  cdf1 != cdf2 or H1:
cdf1 < cdf2 or H1:  cdf1 > cdf2
(looks similar to comparing two survival curves in Kaplan-Meier ?)

fisher_exact (2 by 2)  H0: odds-ratio == 1 and H1: odds-ratio != 1 or
H1: odds-ratio < 1 or H1: odds-ratio > 1

I know the kolmogorov-smirnov tests, but for fisher exact and
contingency tables I rely on R

from R-help:
For 2 by 2 tables, the null of conditional independence is equivalent
to the hypothesis that the odds ratio equals one. <...> The
alternative for a one-sided test is based on the odds ratio, so
alternative = "greater" is a test of the odds ratio being bigger than
or.
Two-sided tests are based on the probabilities of the tables, and take
as ‘more extreme’ all tables with probabilities less than or equal to
that of the observed table, the p-value being the sum of such
probabilities.

Josef

>
> I never paid much attention to the frequency based tests but it does not
> surprise if there are no one-sided tests. Most are rank-based so it is
> rather hard to do in a simply manner - actually I am not even sure how
> to use a permutation test.
>
> Bruce
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>