[SciPy-dev] Problem with a jump in scipy.stats.betaprime.cdf

Mon Aug 25 00:39:40 EDT 2008

Hi,
I was running the kstests for those distributions that are not covered
by the tests in scipy.stats. I was testing with random arguments
between 0.1 and 5.

the kstest for betaprime failed for the following parameters:

>>> stats.kstest('betaprime','',[4.65283053384, 0.394739656544],N=1000)
(0.176, array(0.0))

Looking a bit more at what is going on I found a jump for this case in
the theoretical cdf in scipy.stats.betaprime.cdf at x=500

>>> stats.betaprime.cdf(490,4.65283053384, 0.394739656544)
array(0.82583087380901754)
>>> stats.betaprime.cdf(499,4.65283053384, 0.394739656544)
array(0.82706865985439781)
>>> stats.betaprime.cdf(499.99,4.65283053384, 0.394739656544)
array(0.82720292867835143)
>>> stats.betaprime.cdf(500.0,4.65283053384, 0.394739656544)
array(1.0)
>>> stats.betaprime.cdf(500.99,4.65283053384, 0.394739656544)
array(1.0)

The cumulative frequency count of a random sample follows quite
closely the theoretical cdf up to x<500 and does not have a mass point
at x=500.

>>> bprv=stats.betaprime.rvs(4.65283053384, 0.394739656544,size=10000)
>>> sum(bprv<499)/10000.0
0.8259
>>> sum(bprv<501)/10000.0
0.8265
>>>

So the error must be in the cdf calculation, but staring at the
function I didn't see any obvious numerical problems.

Note: from what I infer from the description and the calculations, the
mean, expected value is infinite if, as in this case, the second shape
parameter is smaller than 1. However, this case is not ruled out in
the description. Also the description for the vgam package for R has
the same formulas and description.

I did not see any problems if the second shape parameter is larger
than 1, even if the first shape parameter is smaller 1
>>> stats.kstest('betaprime','',[4.65283053384, 3.94739656544],N=1000)
(0.0295964079942, array(0.17006401162643214))
>>> stats.kstest('betaprime','',[0.394739656544,4.65283053384],N=1000)
(0.0345741586367, array(0.08946533058440953))

Also, inverting the random sample and interchanging the parameter
yields a passing kstest:

bprv = stats.betaprime.rvs(4.65283053384, 0.394739656544, size=10000)

>>> stats.kstest(1/bprv, 'betaprime', [0.394739656544,4.65283053384], N=1000)
(0.00847538770233, array(0.23638693612403028))

I haven't verified the pdf, but it looks easy enough, but I think the
cdf has some problems for large x and shape2 parameter < 1.

Josef