[Numpy-discussion] Log Arrays

David Cournapeau cournape at gmail.com
Thu May 8 13:10:34 EDT 2008


On Fri, May 9, 2008 at 1:54 AM, Charles R Harris
<charlesr.harris at gmail.com> wrote:

> Yes, and Gaussians are a delusion beyond a few sigma. One of my pet peeves.
> If you have more than 8 standard deviations, then something is fundamentally
> wrong in the concept and formulation.

If you have a mixture of Gaussian, and the components are not all
mostly overlapping, you will get those ranges, and nothing is wrong in
the formulation. I mean, it is not like EM algorithms are untested
things and totally new. It is used in many different fields, and all
its successful implementations use the logsumexp trick.

Look at here for the formula involved:

http://en.wikipedia.org/wiki/Expectation-maximization_algorithm

If you need to compute log (exp (-1000) + exp(-1001)), how would you
do ? If you do it the naive way, you have -inf, and it propagates
across all your computation quickly. -inf instead of -1000 seems like
a precision win to me. of course you are trading precision for range,
but when you are out of range for your number representation, the
tradeoff is not a loss anymore. It is really like denormal: they are
less precise than normal format *for the usual range*, but in the
range where denormal are used, they are much more precise; they are
actually infinitely more precise, since the normal representation
would be 0 :)

David



More information about the NumPy-Discussion mailing list