[SciPy-User] stats.ranksums vs. stats.mannwhitneyu
josef.pktd at gmail.com
josef.pktd at gmail.com
Tue Oct 9 13:05:38 EDT 2012
On Tue, Oct 9, 2012 at 12:13 PM, Nils Kölling <nkoelling at gmail.com> wrote:
> I am trying to perform a Mann-Whitney U (AKA rank sum) test using
> Scipy. My data consists of around 30 samples in total with ties, so I
> get anything between 1:29 .. 15:15 .. 29:1 samples per group.
>
> As far as I can see there are two options:
>
> scipy.stats.ranksums: Does not handle ties, equivalent to R's
> wilcox.test with exact=False and correct=False
> scipy.stats.mannwhitneyu: Handles ties, equivalent to R's wilcox.test
> with exact=False and correct=use_continuity
>
> So at first glance the MWU function would seem to be the better
> choice, except the docs explicitly state that it should not be used
> with less than 20 samples per group.
>
> So what is the best function to use in this case? What kind of biases
> will I get when I use the mannwhitneyu function with less than 20
> samples? And what sort of problems do ties cause with ranksums?
If you have samples with 1:29 one observation in one sample and 29
observation in the other sample or similar, I would definitely go for
permutation tests. For the very asymmetric sample sizes you could even
do exact instead of random permutations. (I don't remember how to
calculate how many cases we have.)
Then your p-values will be more accurate, but the power of the test
will be (very) low.
--------
I wrote initially a general answer when I misread that you have 30
observations per sample:
mannwhitneyu is the best scipy has. None of the tests similar to
mannwhitneyu has a small sample distribution, IIRC.
Some discussion and comparison with other packages is in
http://projects.scipy.org/scipy/ticket/901
I don't have much experience with how good or bad the normal
approximation is for mannwhitneyu. My guess would be that if you don't
have a large number of ties, then it should be ok.
As alternative, and to see whether it makes a difference in your case,
you could also use p-values based on permutation tests along the lines
of https://gist.github.com/1270325
(my "view": If the pvalue with mannwhitneyu is not close to your
acceptance level 0.05 or similar, then I wouldn't bother. If the
p-value is close, then I would feel safer with a permutation test.)
-------------
Josef
>
> Cheers
>
> Nils
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
More information about the SciPy-User
mailing list