[SciPy-dev] status of stats.distributions

josef.pktd at gmail.com josef.pktd at gmail.com
Mon Nov 17 00:24:36 EST 2008


I finished with the basic cleanup of scipy.stats.distribution.

All generic methods work now, all distributions (except logser.rvs) pass the
basic tests for the given parameter values. Test coverage according to
figleaf is about 91%.

There are some remaining problems:

Entropy and fit test for the continuous rv are not included
in the test suite. The entropy integration fails for 6 (out of more than 80)
continuous distributions and returns nans, I haven't looked at this in detail.
Also the entropy test only checks for nan, I didn't find a quick, general test
for the numerical correctness of the entropy calculation.
The parameter estimation with fit also does not converge very well for
 for some distribution with sample size up to 10000, and it takes pretty
long to run.

Some methods defined in the specific distributions don't work
correctly, but I did not find any mistakes or I could not find enough
information of the statistical properties of these distributions with
googling or the bugs are outside of scipy.stats.
I replaced these methods by their generic counterparts which
work correctly although maybe slower. The skipped methods were
renamed by appending "_skip " to the method name. If someone finds
the correction, then any help is appreciated.

All my tests are currently for chosen parameter values, but
I know of a few cases that are broken for some parameter
values that are in the valid (but maybe uncommon) range. I did
quite a bit of fuzz testing earlier on, but don't have the time now
to go over the remaining cases.

Tickets 697, 758, 766 and my ticket 745 can be closed now.
ticket 620, I would close as don't fix, but I'm not sure how
important users would think this is.
I also just fixed 769, which looks correct to me.

Enhancement tickets 767, 768 are about including limiting cases
in distributions. I don't have a strong opinion about these. Is the
speed penalty important in this case or not? Are boundary cases
important in applications?

There are also possibly some details that I missed, e.g. I didn't
check return types, but I am basically finished and waiting to see
what a more wide spread testing will bring.

Josef



More information about the SciPy-Dev mailing list