<div dir="ltr">Hi<b> </b><span style="font-size:12.8000001907349px;white-space:nowrap">Szymon Łęski,</span><div><span style="font-size:12.8000001907349px;white-space:nowrap"><br></span></div><div><span style="font-size:12.8000001907349px;white-space:nowrap">I was planning on making a MC permutation test for the Mann-Whitney U test in the future. </span></div><div><span style="font-size:12.8000001907349px;white-space:nowrap">I'm in the process of getting a <a href="https://github.com/scipy/scipy/pull/4440">permutation t-test</a> and a <a href="https://github.com/scipy/scipy/pull/4519">permutation anova</a> reviewed.</span></div><div><span style="font-size:12.8000001907349px;white-space:nowrap"><br></span></div><div><span style="font-size:12.8000001907349px;white-space:nowrap">But perhaps having an exact p-value calculation for smaller sample sizes would be preferable. </span></div><div><span style="font-size:12.8000001907349px;white-space:nowrap">If you submit a pull request, I'd be willing to take a look at it.</span></div><div><span style="font-size:12.8000001907349px;white-space:nowrap"><br></span></div><div><span style="font-size:12.8000001907349px;white-space:nowrap">Jamie</span></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Mar 5, 2015 at 7:45 AM, Szymon Łęski <span dir="ltr"><<a href="mailto:s.leski@nencki.gov.pl" target="_blank">s.leski@nencki.gov.pl</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello,<br>
<br>
I wrote a Python implementation of exact p-values in Mann-Whitney U test. The current test (scipy.stats.mannwhitneyu) uses normal approximation, and is valid only for sample size > 20 (as stated in notes). The exact version is correct also for small samples.<br>
<br>
I believe this would be a useful thing to include in scipy.stats. However, the current version is still better for very large samples, so I think both versions should be kept. I wanted to ask for opinion on what would be the best way to include the new version.<br>
Separate function? Optional argument controlling which method is used? Heuristics based on sample sizes?<br>
<br>
I have put my script, and the paper I based the implementation on, in this Dropbox folder:<br>
<a href="https://www.dropbox.com/sh/0zxp9u8sliwijl5/AAARecyrwQ2z-8xU-LbKOpWna?dl=0" target="_blank">https://www.dropbox.com/sh/0zxp9u8sliwijl5/AAARecyrwQ2z-8xU-LbKOpWna?dl=0</a><br>
<br>
Feedback appreciated!<br>
<br>
Best regards,<br>
Szymon Leski<br>
_______________________________________________<br>
SciPy-Dev mailing list<br>
<a href="mailto:SciPy-Dev@scipy.org">SciPy-Dev@scipy.org</a><br>
<a href="http://mail.scipy.org/mailman/listinfo/scipy-dev" target="_blank">http://mail.scipy.org/mailman/listinfo/scipy-dev</a><br>
</blockquote></div><br></div>