[SciPy-dev] fix for ncx2 bug?

josef.pktd at gmail.com josef.pktd at gmail.com
Wed Jun 17 14:22:54 EDT 2009


On Wed, Jun 17, 2009 at 1:53 PM, Neal Becker<ndbecker2 at gmail.com> wrote:
> josef.pktd at gmail.com wrote:
>
>> On Wed, Jun 17, 2009 at 1:08 PM, Neal Becker<ndbecker2 at gmail.com> wrote:
>>> josef.pktd at gmail.com wrote:
>>>
>>>> On Wed, Jun 17, 2009 at 10:45 AM, Neal Becker<ndbecker2 at gmail.com>
>>>> wrote:
>>>>> As reported earlier, there is a bug in ncx2 that causes a strange
>>>>> discontinuity.
>>>>>
>>>>> Is there a patch available?
>>>>>
>>>>
>>>> It's a ticket, but it could take some time until someone finds all the
>>>> imprecision for "outlier" cases in scipy.special.
>>>>
>>>> I'm not sure these functions are even designed to have a high
>>>> precision, because for distributions such as chisquare or ncx2, that
>>>> are primarily used for test statistics, it make only sense to report a
>>>> few digits.
>>>>
>>>> I would have recommended ncx2.veccdf for your case, but a quick check
>>>> showed that over a large range ncx2.veccdf and ncx2.cdf only agree up
>>>> to 1e-8. For ncx2.cdf close to one, there might also be a precision
>>>> loss.
>>>>
>>>> What's your use case that you need ncx2 at high precision?
>>>>
>>>> Josef
>>>
>>> I'm using this to compute what's often called "Receiver Operating
>>> Characteristic", which is a signal detection problem.  You may see over
>>> some range that probability of miss or false detection is quite low.
>>> Probably you don't really care about accuracy here, but it's making the
>>> plots look bad.
>>
>> If speed doesn't matter too much, you can still use veccdf, I don't
>> think it's more accurate but it is smooth, no discontinuities and it
>> also goes to 1 in the upper tail.
>>
>> veccdf is a private method, that avoids the argument check, but as
>> long as you call it with valid arguments you get good solutions, and
>> it is vectorized (based on np.vectorize). The generic methods are
>> pretty heavily tested, so besides slower speed from the numerical
>> integration, I wouldn't expect any problems.
>>
>> In contrast to central chisquare, I didn't see any formulas that would
>> allow a direct calculation of the ncx2.cdf as function of some other
>> special functions.
>>
>> Josef
>
> I'm no expert on this.  I do see 2 refs:
>
> Continuous Univariate Distributions, Vol 2, 2nd edition, Johnson, Kotz,
> Balakrishnan has an entire chapter on the subject.
>
> http://www.boost.org/doc/libs/1_39_0/libs/math/doc/sf_and_dist/html/math_toolkit/dist/dist_ref/dists/nc_chi_squared_dist.html
>
> The code says:
>  //
>            // Computes the complement of the Non-Central Chi-Square
>            // Distribution CDF by summing a weighted sum of complements
>            // of the central-distributions.  The weighting factor is
>            // a Poisson Distribution.
>            //
>            // This is an application of the technique described in:
>            //
>            // Computing discrete mixtures of continuous
>            // distributions: noncentral chisquare, noncentral t
>            // and the distribution of the square of the sample
>            // multiple correlation coeficient.
>            // D. Benton, K. Krishnamoorthy.
>            // Computational Statistics & Data Analysis 43 (2003) 249 - 267
>
>
> I see that the 1st ref above also talks about the fact that cdf on ncx2 can
> be obtained as weighted sum of central distributions, but it also goes on to
> give other methods of calculation.
>

The problem is that the sum has an infinite number of terms. I looked
at JKB vol.2 but they only discuss the central chisquare distribution
extensively, but I didn't see much for non-central chi-square.

Wikipidia as the infinite series expression for the ncx2.cdf
 http://en.wikipedia.org/wiki/Noncentral_chi-square_distribution


But I'm not a numerical precision person, and I usually don't worry
about numerical, floating point precision of convergence of a series
with an infinite number of terms. My preferred bugfix in this would be
special.hyp2f1.  But, I don't have the background and I prefer to
chase the "big bugs", 1e10 and 1e1 instead of 1e-8 and 1e-15.

The infinite mixture might be relatively easy to calculate for
standard cases, but it might be difficult to maintain a high precision
when users start to plug in "funny" parameters and/or expect it to
work for every tail case.

Josef



More information about the SciPy-Dev mailing list