[SciPy-user] computing Bayesian credible intervals

Tue May 6 03:18:41 EDT 2008

2008/5/4 Johann Cohen-Tanugi <cohen at slac.stanford.edu>:

>  integral_error = quad(density, a , b) - q
>
>  into
>
>  integral_error = quad(density, a , b)[0] - q
>
>  as quad returns a tuple with the result and the estimated error

This bites me every time I use quad. Still, it's valuable to check the
reported error, though it's not always clear what to do with the
results.

>  More interesting, as a test  I used :
>  q=0.96
>
>  def density(s):
>     if s<0.5:
>         return 4*s
>     else :
>         return 4-4*s
>
>  For which the solution is a=0.1, and b=0.9
>  If I keep
>  results=optimize.fsolve(solve_me, [0, 1])
>  fsolve fails because it wanders in directions a<0 and b>1.
>  With
>  results=optimize.fsolve(solve_me, [0.5, 0.5])
>  fsolve converges without trouble.... Is there a simple way to get it to
>  work with constraints like a>0 and b<1? I am afraid that the answer is
>  no, if I understood well the context of another recent thread entitled
>  "constrained optimization".

Multidimensional solving, as you are discovering, is a pain. If at all
possible it's better to turn it into one-dimensional optimization. It
can also be worth disposing of constraints - at least "open"
constraints - using funny reparameterizations (log, for example, is
great for ensuring that positive things stay positive)

Do you really need p(a)=p(b)? I mean, is this the right supplementary
condition to construct a credible interval? Would it be acceptable to
choose instead p(x<a)=p(x>b)? This will probably be easier to work
with, at least if you can get good numerical behaviour out of your
PDF.

Have you looked into writing down an analytical CDF? It looks
moderately ghastly - all those s**n exp(k*s) - but possibly something
that could be done with a bit of sweat. This would improve the
performance of whatever solution method you use - the roundoff errors
from quad() follow some irregular, discontinuous pattern. If you must
use numerical integration, look into using a non-adaptive Gaussian
quadrature, which won't have that particular problem. (There's one in
scipy for more or less this reason.) Analytical mean, mode, or median
could probably be made use of too. Or a different expansion for small
s. A session with SAGE/MAPLE/Mathematica might yield something.

If you do need p(a)=p(b), perhaps the way to turn the problem
one-dimensional and reduce your headaches is to use a as the sole
parameter and write b as a function of a using the fact that your
distribution is single-peaked (so that p(x)-p(a)==0 has just two
solutions, one of which you know about already; if you feel like being
devious you could do root-finding on (p(x)-p(a))/(x-a) ).

Good luck,
Anne