[SciPy-Dev] Proposed improvement: recursive integrator for arbitrary dimensionality using QUADPACK

josef.pktd at gmail.com josef.pktd at gmail.com
Sat Dec 17 11:55:54 EST 2011


On Sat, Dec 17, 2011 at 10:55 AM, Ralf Gommers
<ralf.gommers at googlemail.com> wrote:
>
>
> On Mon, Dec 12, 2011 at 9:17 PM, Thomas Robinson
> <trobinson at systemsbiology.org> wrote:
>>
>> That's quite a mouthful. In simpler terms: I've been playing with
>> scipy/integrate/quadpack.py lately, specifically using dblquad and
>> tplquad to meet my needs for continuous mutual information
>> (https://en.wikipedia.org/wiki/Mutual_information).
>>
>> After implementing this, I refactored my algorithm to perform a
>> deterministic walk of multivariate mutual information
>> (https://en.wikipedia.org/wiki/Interaction_information) for cases where
>> the shared information increased. And so entered my dilemma: what I
>> needed was a general-purpose, canned integrator that could handle an
>> arbitrary number of terms during multiple integration using
>> fundamentally the same algorithm (to minimize work).
>>
>>
>> Here's what I came up with:
>>
>> def _infuncn(x,func,a,b,count,epsabs,epsrel,more_args):
>>     myargs = (x,) + more_args
>>     if count == 1:
>>         return quad(func,a,b,args=myargs,epsabs=epsabs,epsrel=epsrel)[0]
>>     else:
>>         return
>>
>> quad(_infuncn,a,b,(func,a,b,count-1,epsabs,epsrel,myargs),epsabs=epsabs,epsrel=epsrel)
>>
>> def nquad(func, a, b, args=(), epsabs=1.49e-8, epsrel=1.49e-8, count=2):
>>     ''' Recursive, N-depth integrator with number of variables given '''
>>
>>     # Preconditions
>>     if not isinstance( count, int ):
>>         raise TypeError('Variable "count" must be an integer value')
>>     if count < 2:
>>         raise ValueError('Variable "count" must be at least 2 to perform \
>>             meaningful multivariate integration. Consider using
>> quad(...) instead.')
>>
>>     return
>>
>> quad(_infuncn,a,b,(func,a,b,count-1,epsabs,epsrel,args),epsabs=epsabs,epsrel=epsrel)
>>
> An n-D integration routine could be a useful addition. An obvious problem I
> see with your code is that the integration limits are the same for all
> dimensions. This makes it unsuitable for solving the majority of problems I
> can think of that would need n-D integration. It would be straigtforward to
> at least allow different limits per integration step, and worth thinking
> about if it's useful to allow limits such as in dblquad/tplquad.
>
>>
>> Basically, what you get is this construct that recursively calls into
>> QUADPACK with a series of simplifying assumptions (ie, fixed domain,
>> equivalent epsilon values propagated down to all invocations of quad)
>> and allowable calls from an arbitrarily-shaped function pointer.
>> Invoking this function for any number of terms is as easy as abusing the
>> lambda function and argument pointers in Python, like so:
>>
>> nquad(lambda *args: my_nspace_function(args), min_bound, max_bound,
>> count=number_of_args)
>>
>>
>> For example:
>>
>> # Define INF as float('inf')
>>
>> points = [[1,2,3],[1,1,2],[2,2,6]]
>> gkde_n = gaussian_kde(points)
>>
>> # ... compute Shannon entropy, store as variable "entropy".
>>
>> mutual_info = lambda *args: gkde_n(args) * \
>>                    math.log((gkde_n(args) / (entropy_mult + \
>> MIN_DOUBLE)) + MIN_DOUBLE)
>>
>> # Compute MI(X1..Xn)
>> (minfo_n, err_n) = nquad(mutual_info, -INF, INF, count=scale)
>>
>>
>> I bring this up here for two reasons. The first is I'd find this to be
>> greatly beneficial to the current Python interface into QUADPACK,
>> insofar as it solves the general case instead of forcing programmers to
>> rely on (for example) invocations directly into quad, dblquad, or
>> tplquad. The second is my surprise that a function to compute the
>> discrete and continuous Shannon-Weaver mutual information does not
>> appear to be extant in SciPy or NumPy,
>
>
> I had to look up what this is exactly, so I don't find it very surprising.
> It sounds a bit too specialized for scipy, scikit-learn is the first place
> I'd go look for it.

mutual information and other information criteria are getting pretty
popular, for example
http://stackoverflow.com/questions/8363085/continuous-mutual-information-in-python
,
scikit-learn uses it for cluster algorithms (AFAIK), statsmodels has
some in the sandbox.

It would be very useful to have good *tested* code somewhere available
for it. However, my impression is that there are many application
specific versions, and nobody seems to be much interested in
maintaining it in scipy,
for example, scipy.maxentropy didn't find a new maintainer, and nobody
seems to complain that entropy for the stats.distributions have "quite
a few" bugs.

My impression is also that many developers (including myself) have fun
writing code, but often don't go through the extra effort to make sure
the code is actually *correct* and tested. Until the code is tested it
might remain a very useful cookbook recipe.

"Maintainers needed"

Josef
after spending a day looking for reference results to verify my code,
with a sandbox that is growing faster than I'm able to move code out of it.

>
> Ralf
>
>
>>
>> which I found greatly distressing
>> given their usefulness in discovering a select set of exceptionally
>> useful properties about paired data sets that are not well expressed in
>> monotonically-dependent tests for independence like Pearson's or
>> Spearman's correlations.
>>
>> Hence my purpose bringing these issues up for general review. I am
>> absolutely convinced that my prototypical code above may be greatly
>> improved, and I would wholeheartedly support it insofar as I've been
>> spinning my wheels over the problem of a robust, fast implementation.
>> What I have is an implementation that reliably offers a good
>> approximation, complains bitterly when integration fails to rapidly
>> converge, and which makes relatively naive assumptions that could be
>> greatly improved upon by developers more actively involved in the code
>> than I am.
>>
>>
>> Thanks for reading. I do hope someone out there finds this post valuable.
>>
>> - Tom Robinson
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>



More information about the SciPy-Dev mailing list