![](https://secure.gravatar.com/avatar/ad13088a623822caf74e635a68a55eae.jpg?s=120&d=mm&r=g)
On Wed, Dec 3, 2008 at 2:49 PM, Jarrod Millman <millman@berkeley.edu> wrote:
On Wed, Dec 3, 2008 at 11:43 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
def ks_2samp(data1, data2): """ Computes the Kolmogorov-Smirnof statistic on 2 samples. Modified from Numerical Recipies in C, page 493. Returns KS D-value, prob. Not ufunc- like.
Wait - really? We can't use Numerical Recipes code, it has strict and incompatible licensing... If it's in there it really has to come out as fast as possible.
http://www.nr.com/licenses/redistribute.html
-- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/ _______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
The algorithm is essentially one loop to calculate the distance measure, I would assume that this simple algorithm cannot be copyright protected, but for efficiency, it might be better anyway to come up with a vectorized version similar to kstest. about correctness: ============= A quick Monte Carlo shows that the test is pretty accurate under the null even for small sample sizes, power to reject, if the alternative is true is only reasonably high in larger samples Null correct ================================================== Monte Carlo for K-S 2sample test (ks_2samp): sample size = 100, 1000 replications sample 1: normal distribution (loc=1.000000,scale=2.000000) sample 2: normal distribution (loc=1.000000,scale=2.000000) ks_2samp: proportion of rejection at 1% significance: 0.003 ks_2samp: proportion of rejection at 5% significance: 0.049 ks_2samp: proportion of rejection at 10% significance: 0.101 ========= Null not true: ================================================== Monte Carlo for K-S 2sample test (ks_2samp): sample size = 500, 1000 replications sample 1: normal distribution (loc=0.000000,scale=1.000000) sample 2: t distribution (dof=10, loc=0.000000,scale=1.000000) ks_2samp: proportion of rejection at 1% significance: 0.253 ks_2samp: proportion of rejection at 5% significance: 0.71 ks_2samp: proportion of rejection at 10% significance: 0.88 Josef