[SciPy-User] scipy.stats one-sided two-sided less, greater, signed ?
Bruce Southey
bsouthey at gmail.com
Mon Jun 6 14:34:45 EDT 2011
On 06/05/2011 02:43 PM, josef.pktd at gmail.com wrote:
> What should be the policy on one-sided versus two-sided?
Yes :-)
> The main reason right now for looking at this is
> http://projects.scipy.org/scipy/ticket/1394 which specifies a
> "one-sided" alternative and provides both lower and upper tail.
That refers to the Fisher's test rather than the more 'traditional'
one-sided tests. Each value of the Fisher's test has special meanings
about the value or probability of the 'first cell' under the null
hypothesis. So it is necessary to provide those three values.
> I would prefer that we follow the alternative patterns similar to R
>
> currently only kstest has alternative : 'two_sided' (default),
> 'less' or 'greater'
> but this should be added to other tests where it makes sense
I think that these Kolmogorov-Smirnov tests are not the traditional
meaning either. It is a little mind-boggling to try to think about cdfs!
> R fisher.exact
> """alternative indicates the alternative hypothesis and must be one
> of "two.sided", "greater" or "less". You can specify just the initial
> letter. Only used in the 2 by 2 case."""
>
> mannwhitneyu reports a one-sided test without actually specifying
> which alternative is used (I thought I remembered other cases like
> this but don't find any right now)
>
> related:
> in many cases in the two-sided tests the test statistic has a sign
> that indicates in which tail the test-statistic falls.
> This is useful in ttests for example, because the one-sided tests can
> be backed out from the two-sided tests. (With symmetric distributions
> one-sided p-value is just half of the two-sided pvalue)
>
> In the discussion of https://github.com/scipy/scipy/pull/8 I argued
> that this might mislead users to interpret a two-sided result as a
> one-sided result. However, I doubt now that this is a strong argument
> against not reporting the signed test statistic.
(I do not follow pull requests so is there a relevant ticket?)
> After going through scipy.stats.stats, it looks like we always report
> the signed test statistic.
>
> The test statistic in ks_2samp is in all cases defined as a max value
> and doesn't have a sign in R either, so adding a sign there would
> break with the standard definition.
> one-sided option for ks_2samp would just require to find the
> distribution of the test statistics D+, D-
>
> ---
>
> So my proposal for the general pattern (with exceptions for special
> reasons) would be
>
> * add/offer alternative : 'two_sided' (default), 'less' or 'greater'
> http://projects.scipy.org/scipy/ticket/1394 for now,
> and adjustments of existing tests in the future (adding the option can
> be mostly done in a backwards compatible way and for symmetric
> distributions like ttest it's just a convenience)
> mannwhitneyu seems to be the only "weird" one
>
> * report signed test statistic for two-sided alternative (when a
> signed test statistic exists): which is the status quo in
> stats.stats, but I didn't know that this is actually pretty consistent
> across tests.
>
> Opinions ?
>
> Josef
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
I think that there is some valid misunderstanding here (as I was in the
same situation) regarding what is meant here. My understanding is that
under a one-sided hypothesis, all the values of the null hypothesis only
exist in one tail of the test distribution. In contrast the values of
null distribution exist in both tails with a two-sided hypothesis. Yet
that interpretation does not have the same meaning as the tails in the
Fisher or Kolmogorov-Smirnov tests.
I never paid much attention to the frequency based tests but it does not
surprise if there are no one-sided tests. Most are rank-based so it is
rather hard to do in a simply manner - actually I am not even sure how
to use a permutation test.
Bruce
More information about the SciPy-User
mailing list