[SciPy-dev] Problem with F distribution, or with me?
josef.pktd at gmail.com
josef.pktd at gmail.com
Wed Aug 13 01:19:34 EDT 2008
I wanted to compare the distributions in numpy.random with
scipy.stats.distribution.
When I found the kolmogorov_test in test_distributions.py, I was wondering
why this
test did not find the bug in the numpy random number generator.
It seems that this test is much too weak, sample size = 30 and parameters
between 1 and 2.
After I made the test stricter, increased the power, I get the
rejection/test failure for the F-distribution,
but additionally I get 2 to 4 additional failures, in fatiguelife, loggamma
in all runs and in genhalflogistic, and genextreme
only sometimes. Test result of an example run are below. I did not see any
obvious problem with my change in the test, the parameters that are used in
the tests are not ruled out from what I have seen in the doc strings or a
quick google search,
and I don't know these distributions at all or not well enough, to tell
whether there is anything wrong with these distributions
or with the tests.
Josef
I'm using
>>> numpy.version.version
'1.1.0'
>>> scipy.version.version
'0.6.0'
Failures with changed test_distributions.py
===============================
>>>
execfile(r'C:\Programs\Python24\Lib\site-packages\scipy\stats\tests\test_distributions.py')
Found 73/73 tests for stats.tests.test_distributions
Found 10/10 tests for stats.tests.test_morestats
Found 107/107 tests for stats.tests.test_stats
...................FF......F.F...............F.............................Ties
preclude use of exact statistic.
..Ties preclude use of exact statistic.
.................................................................................................................
======================================================================
FAIL: check_cdf (stats.tests.test_distributions.test_f)
----------------------------------------------------------------------
Traceback (most recent call last):
File "<string>", line 9, in check_cdf
AssertionError: D = 0.493585929987; pval = 0.0; alpha = 0.01
args = (9.8771486774554127, 1.2819774801876884)
======================================================================
FAIL: check_cdf (stats.tests.test_distributions.test_fatiguelife)
----------------------------------------------------------------------
Traceback (most recent call last):
File "<string>", line 9, in check_cdf
AssertionError: D = 0.101323526498; pval = 0.0; alpha = 0.01
args = (3.3139748541207283,)
======================================================================
FAIL: check_cdf (stats.tests.test_distributions.test_genextreme)
----------------------------------------------------------------------
Traceback (most recent call last):
File "<string>", line 9, in check_cdf
AssertionError: D = 0.02902; pval = 0.0; alpha = 0.01
args = (10.616290590132825,)
======================================================================
FAIL: check_cdf (stats.tests.test_distributions.test_genhalflogistic)
----------------------------------------------------------------------
Traceback (most recent call last):
File "<string>", line 9, in check_cdf
AssertionError: D = 0.02343; pval = 0.0; alpha = 0.01
args = (8.4724627096253382,)
======================================================================
FAIL: check_cdf (stats.tests.test_distributions.test_loggamma)
----------------------------------------------------------------------
Traceback (most recent call last):
File "<string>", line 9, in check_cdf
AssertionError: D = 1.0; pval = 0.0; alpha = 0.01
args = (4.4259066194420793,)
----------------------------------------------------------------------
Ran 190 tests in 5.250s
FAILED (failures=5)
>>>
3 Changes I made to scipy\stats\tests\test_distributions.py
===========================================
* increase spread for random parameters *10
* increase sample size N
Note: this is from scipy 0.60, but the same parameters are used in the
current trunk
{{{
for dist in dists:
distfunc = eval('stats.'+dist)
nargs = distfunc.numargs
alpha = 0.01
if dist == 'fatiguelife':
alpha = 0.001
if dist == 'erlang':
args = str((4,)+tuple(rand(2)))
elif dist == 'frechet':
args = str(tuple(2*rand(1))+(0,)+tuple(2*rand(2)))
elif dist == 'triang':
args = str(tuple(rand(nargs)))
elif dist == 'reciprocal':
vals = rand(nargs)
vals[1] = vals[0] + 1.0
args = str(tuple(vals))
else:
args = str(tuple(1.0+rand(nargs)*10)) # old was without *10
exstr = r"""
class test_%s(NumpyTestCase):
def check_cdf(self):
D,pval = stats.kstest('%s','',args=%s,N=10000) # old was N=30
if (pval < %f):
D,pval = stats.kstest('%s','',args=%s,N=100000) # old was
N=30
#if (pval < %f):
# D,pval = stats.kstest('%s','',args=%s,N=30)
assert (pval > %f), "D = " + str(D) + "; pval = " + str(pval) + ";
alpha = " + str(alpha) + "\nargs = " + str(%s)
""" % (dist,dist,args,alpha,dist,args,alpha,dist,args,alpha,args)
exec exstr
}}}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20080813/7e33dc77/attachment.html>
More information about the SciPy-Dev
mailing list