kstest is reporting wrong p-value ??
Looking again at ticket 395 about the Kolmogorov-Smirnov test, I'm quite sure the kstest is wrong. The current implementation uses absolute value of the deviation, therefore it is a two sided test. A one-sided test takes either max or min of the deviations (not of absolute deviations). However, the test distribution that is used to calculate the p-value is ksone, the distribution for the one-sided Kolmogorov-Smirnov test. So, the reported p-value should be off by approximately one half, or maybe double (?). There was a discussion in http://projects.scipy.org/pipermail/scipy-dev/2004-July/002181.html about this, but I'm not sure that conclusion is correct Can a statistics knowledgeable person check this, or someone with access to a good book? If I am correct, then I can fix the test next week. Josef
On Thu, Nov 27, 2008 at 00:02, <josef.pktd@gmail.com> wrote:
Looking again at ticket 395 about the Kolmogorov-Smirnov test, I'm quite sure the kstest is wrong.
The current implementation uses absolute value of the deviation, therefore it is a two sided test. A one-sided test takes either max or min of the deviations (not of absolute deviations). However, the test distribution that is used to calculate the p-value is ksone, the distribution for the one-sided Kolmogorov-Smirnov test. So, the reported p-value should be off by approximately one half, or maybe double (?).
No, it's only slightly off (but you are correct that it is off). The names "one-sided" and "two-sided" don't really correspond with the usual meaning for generic hypothesis tests. Rather, they describe the different statistics and their distributions. There are two different kinds of "one-sided" K-S statistics, one that uses the greatest signed difference between the ECDF and the CDF, and one that uses the greatest signed difference between the CDF and the ECDF. Note the orders. Both statistics are positive values, and both follow the same "one-sided K-S distribution". The "two-sided K-S statistic" is the maximum of both variants of the one-sided statistic. Its distribution is close to the one-sided distribution, but is difficult to compute. The K-S hypothesis test can be conducted with any of these, and can be either one-sided (e.g. "is the fit poor?") or two-sided (e.g. "is the fit either too poor or too good to be true?") in the conventional sense hypothesis testing sense. kstest() implements a one-sided test using the "one-sided K-S distribution" but incorrectly uses the "two-sided K-S statistic". Is that a clear explanation?
There was a discussion in http://projects.scipy.org/pipermail/scipy-dev/2004-July/002181.html about this, but I'm not sure that conclusion is correct
You are correct. The terminology was tripping me up at the time, too. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
I compared with R in more detail: conclusion for small samples: * stats.kstest() for less than 10 observation is pretty wrong * calculation of D differs quite a bit from R and matlab (those 2 give the same numbers) * exact method in R uses the same distribution as stats.ksone.sf(D,n)*2 up to 4 decimals ! Note: times 2 * asymptotic distribution in R (not using exact) is exactly the same as kstwobign.(D*sqrt(n)) up to more than 7 decimals For larger samples, I tried 100 normal distributed random variables stats.kstest() still gives the wrong D and pval, but the difference is not as large as in small samples. With a sample of 1000 normal rvs, the D of stats.kstest() and of R are essentially identical, but the pvalue reported by stats.kstest() is half of the one in R
xxrl = stats.norm.rvs(size=1000) resultrl=ksfn(xxrl,'pnorm', exact = True) #this is R's kstest through rpy resultrl['p.value'] 0.2419499342788699 resultrl['statistic']['D'] 0.032317405617139472 stats.kstest(xxrl,'norm') (0.032317405617139472, 0.12118954799968018)
So, stats.kstest() definitely needs to be fixed. Josef
participants (2)
-
josef.pktd@gmail.com -
Robert Kern