[SciPy-User] scipy.stats one-sided two-sided less, greater, signed ?
josef.pktd at gmail.com
josef.pktd at gmail.com
Sun Jun 5 15:43:18 EDT 2011
What should be the policy on one-sided versus two-sided?
The main reason right now for looking at this is
http://projects.scipy.org/scipy/ticket/1394 which specifies a
"one-sided" alternative and provides both lower and upper tail.
I would prefer that we follow the alternative patterns similar to R
currently only kstest has alternative : 'two_sided' (default),
'less' or 'greater'
but this should be added to other tests where it makes sense
R fisher.exact
"""alternative indicates the alternative hypothesis and must be one
of "two.sided", "greater" or "less". You can specify just the initial
letter. Only used in the 2 by 2 case."""
mannwhitneyu reports a one-sided test without actually specifying
which alternative is used (I thought I remembered other cases like
this but don't find any right now)
related:
in many cases in the two-sided tests the test statistic has a sign
that indicates in which tail the test-statistic falls.
This is useful in ttests for example, because the one-sided tests can
be backed out from the two-sided tests. (With symmetric distributions
one-sided p-value is just half of the two-sided pvalue)
In the discussion of https://github.com/scipy/scipy/pull/8 I argued
that this might mislead users to interpret a two-sided result as a
one-sided result. However, I doubt now that this is a strong argument
against not reporting the signed test statistic.
After going through scipy.stats.stats, it looks like we always report
the signed test statistic.
The test statistic in ks_2samp is in all cases defined as a max value
and doesn't have a sign in R either, so adding a sign there would
break with the standard definition.
one-sided option for ks_2samp would just require to find the
distribution of the test statistics D+, D-
---
So my proposal for the general pattern (with exceptions for special
reasons) would be
* add/offer alternative : 'two_sided' (default), 'less' or 'greater'
http://projects.scipy.org/scipy/ticket/1394 for now,
and adjustments of existing tests in the future (adding the option can
be mostly done in a backwards compatible way and for symmetric
distributions like ttest it's just a convenience)
mannwhitneyu seems to be the only "weird" one
* report signed test statistic for two-sided alternative (when a
signed test statistic exists): which is the status quo in
stats.stats, but I didn't know that this is actually pretty consistent
across tests.
Opinions ?
Josef
More information about the SciPy-User
mailing list