[Numpy-discussion] calculating the mean and variance of a large float vector

Keith Goodman kwgoodman at gmail.com
Thu Jun 5 22:16:30 EDT 2008


On Thu, Jun 5, 2008 at 6:55 PM, Alan McIntyre <alan.mcintyre at gmail.com> wrote:
> On Thu, Jun 5, 2008 at 9:06 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
>> On Thu, Jun 5, 2008 at 4:54 PM, Christopher Marshall
>> Are you worried that the mean might overflow on the intermediate sum?
>
> I suspect (but please correct me if I'm wrong, Christopher) he's
> asking whether there's cases where small variations in the contents of
> the vector can produce relatively large changes in the value given as
> the mean or variance.  This is a wild guess, but if the intermediate
> sums are large enough, you could have a situation where (for example)
> the last half-million values aren't counted in the intermediate sum
> because they're too small relative to the intermediate sum.  (I hope
> my numerics prof from last year doesn't read this list...I should
> really have no trouble figuring out the condition number for mean/var
> :).

How can that lead to instability? If the last half-million values are
small then they won't have a big impact on the mean even if they are
ignored. The variance is a mean too (of the squares), so it should be
stable too. Or am I, once again, missing the point?



More information about the NumPy-Discussion mailing list