please change mean to use dtype=float

Hello all, I just had someone from my lab coming to my desk saying: "My god - SciPy is really stupid .... An array with only positive numbers claims to have a negative mean !! "?
I was asking about this before ... the reason was of course that her array was of dtype int32 and had many large values to cause an overflow (wrap around) .
Now that the default for axis is None (for all functions having an axis argument), can we please change dtype to default to float64 !?
It is really a very confusing and shocking result to get a negative mean on all positive values.
It has been stated here before that numpy should target the "scientist" rather than the programmers ... I would argue that mean() almost always requires the precision of "double" (float64) to produce usable results.
Please consider this change before the 1.0 release ... Thanks, Sebastian Haase

Sebastian Haase wrote:
Hello all, I just had someone from my lab coming to my desk saying: "My god - SciPy is really stupid .... An array with only positive numbers claims to have a negative mean !! "?
I was asking about this before ... the reason was of course that her array was of dtype int32 and had many large values to cause an overflow (wrap around) .
Now that the default for axis is None (for all functions having an axis argument), can we please change dtype to default to float64 !?
The default is float64 now (as long as you are not using numpy.oldnumeric).
I suppose more appropriately, we could reduce over float for integer data-types when calculating the mean as well (since a floating point is returned anyway).
-Travis

On Tuesday 19 September 2006 15:48, Travis Oliphant wrote:
Sebastian Haase wrote:
<snip>
can we please change dtype to default to float64 !?
The default is float64 now (as long as you are not using numpy.oldnumeric).
I suppose more appropriately, we could reduce over float for integer data-types when calculating the mean as well (since a floating point is returned anyway).
Is now mean() always "reducing over" float64 ? The svn note """Log: Fix mean, std, and var methods so that they reduce over double data-type with integer inputs. """ makes it sound that a float32 input is stays float32 ?
For mean calculation this might introduce large errors - I usually would require double-precision for *any* input type ...
(don't know how to say this for complex types !? Are here real and imag treated separately / independently ?)
Thanks, Sebastian Haase

Sebastian Haase wrote:
On Tuesday 19 September 2006 15:48, Travis Oliphant wrote:
Sebastian Haase wrote:
<snip>
can we please change dtype to default to float64 !?
The default is float64 now (as long as you are not using numpy.oldnumeric).
I suppose more appropriately, we could reduce over float for integer data-types when calculating the mean as well (since a floating point is returned anyway).
Is now mean() always "reducing over" float64 ? The svn note """Log: Fix mean, std, and var methods so that they reduce over double data-type with integer inputs. """ makes it sound that a float32 input is stays float32 ?
Yes, that is true. Only integer inputs are changed because you are going to get a floating point output anyway.
For mean calculation this might introduce large errors - I usually would require double-precision for *any* input type ...
Of course. The system is not fool-proof. I hesitate to arbitrarily change this. The advantage of using single-precision calculation is that it is faster. We do rely on the user who expressly requests these things to be aware of the difficulties.
(don't know how to say this for complex types !? Are here real and imag treated separately / independently ?)
There is a complex add performed at a low-level as two separate adds. The addition is performed in the precision requested.
-Travis

On 9/19/06, Travis Oliphant oliphant@ee.byu.edu wrote:
Sebastian Haase wrote:
On Tuesday 19 September 2006 15:48, Travis Oliphant wrote:
Sebastian Haase wrote:
<snip>
can we please change dtype to default to float64 !?
The default is float64 now (as long as you are not using numpy.oldnumeric).
I suppose more appropriately, we could reduce over float for integer data-types when calculating the mean as well (since a floating point is returned anyway).
Is now mean() always "reducing over" float64 ? The svn note """Log: Fix mean, std, and var methods so that they reduce over double data-type
with
integer inputs. """ makes it sound that a float32 input is stays float32 ?
Yes, that is true. Only integer inputs are changed because you are going to get a floating point output anyway.
For mean calculation this might introduce large errors - I usually would require double-precision for *any* input type ...
Of course. The system is not fool-proof. I hesitate to arbitrarily change this. The advantage of using single-precision calculation is that it is faster. We do rely on the user who expressly requests these things to be aware of the difficulties.
Speed depends on the achitecture. Float is a trifle slower than double on my Athlon64, but faster on PPC750. I don't know about other machines. I think there is a good argument for accumlating in double and converting to float for output if required.
Chuck

Charles R Harris wrote:
Speed depends on the achitecture. Float is a trifle slower than double on my Athlon64, but faster on PPC750. I don't know about other machines. I think there is a good argument for accumlating in double and converting to float for output if required.
Yes there is. It's just not what NumPy ever does so it would be an exception in this case and would need a more convincing argument in my opinion. You can always specify the accumulation type yourself with the dtype argument. We are only talking about what the default should be.
-Travis

On Tuesday 19 September 2006 17:17, Travis Oliphant wrote:
Sebastian Haase wrote:
On Tuesday 19 September 2006 15:48, Travis Oliphant wrote:
Sebastian Haase wrote:
<snip>
can we please change dtype to default to float64 !?
The default is float64 now (as long as you are not using numpy.oldnumeric).
I suppose more appropriately, we could reduce over float for integer data-types when calculating the mean as well (since a floating point is returned anyway).
Is now mean() always "reducing over" float64 ? The svn note """Log: Fix mean, std, and var methods so that they reduce over double data-type with integer inputs. """ makes it sound that a float32 input is stays float32 ?
Yes, that is true. Only integer inputs are changed because you are going to get a floating point output anyway.
For mean calculation this might introduce large errors - I usually would require double-precision for *any* input type ...
Of course. The system is not fool-proof. I hesitate to arbitrarily change this. The advantage of using single-precision calculation is that it is faster. We do rely on the user who expressly requests these things to be aware of the difficulties.
I still would argue that getting a "good" (smaller rounding errors) answer should be the default -- if speed is wanted, then *that* could be still specified by explicitly using dtype=float32 (which would also be a possible choice for int32 input) . In image processing we always want means to be calculated in float64 even though input data is always float32 (if not uint16).
Also it is simpler to say "float64 is the default" (full stop.) - instead "float64 is the default unless you have float32"
Thanks, Sebastian Haase

Sebastian Haase wrote:
I still would argue that getting a "good" (smaller rounding errors) answer should be the default -- if speed is wanted, then *that* could be still specified by explicitly using dtype=float32 (which would also be a possible choice for int32 input) .
So you are arguing for using long double then.... ;-)
In image processing we always want means to be calculated in float64 even though input data is always float32 (if not uint16).
Also it is simpler to say "float64 is the default" (full stop.) - instead
"float64 is the default unless you have float32"
"the type you have is the default except for integers". Do you really want float64 to be the default for float96?
Unless we are going to use long double as the default, then I'm not convinced that we should special-case the "double" type.
-Travis

Travis Oliphant wrote:
Sebastian Haase wrote:
I still would argue that getting a "good" (smaller rounding errors) answer should be the default -- if speed is wanted, then *that* could be still specified by explicitly using dtype=float32 (which would also be a possible choice for int32 input) .
So you are arguing for using long double then.... ;-)
In image processing we always want means to be calculated in float64 even though input data is always float32 (if not uint16).
Also it is simpler to say "float64 is the default" (full stop.) - instead
"float64 is the default unless you have float32"
"the type you have is the default except for integers". Do you really want float64 to be the default for float96?
Unless we are going to use long double as the default, then I'm not convinced that we should special-case the "double" type.
I guess I'm not really aware of the float96 type ... Is that a "machine type" on any system ? I always thought that -- e.g. coming from C -- double is "as good as it gets"... Who uses float96 ? I heard somewhere that (some) CPUs use 80bits internally when calculating 64bit double-precision...
Is this not going into some academic argument !? For all I know, calculating mean()s (and sum()s, ...) is always done in double precision -- never in single precision, even when the data is in float32.
Having float32 be the default for float32 data is just requiring more typing, and more explaining ... it would compromise numpy usability as a day-to-day replacement for other systems.
Sorry, if I'm being ignorant ...
- Sebastian

On 9/19/06, Sebastian Haase haase@msg.ucsf.edu wrote:
Travis Oliphant wrote:
Sebastian Haase wrote:
I still would argue that getting a "good" (smaller rounding errors)
answer
should be the default -- if speed is wanted, then *that* could be still specified by explicitly using dtype=float32 (which would also be a
possible
choice for int32 input) .
So you are arguing for using long double then.... ;-)
In image processing we always want means to be calculated in float64
even
though input data is always float32 (if not uint16).
Also it is simpler to say "float64 is the default" (full stop.) -
instead
"float64 is the default unless you have float32"
"the type you have is the default except for integers". Do you really want float64 to be the default for float96?
Unless we are going to use long double as the default, then I'm not convinced that we should special-case the "double" type.
I guess I'm not really aware of the float96 type ... Is that a "machine type" on any system ? I always thought that -- e.g. coming from C -- double is "as good as it gets"... Who uses float96 ? I heard somewhere that (some) CPUs use 80bits internally when calculating 64bit double-precision...
Is this not going into some academic argument !? For all I know, calculating mean()s (and sum()s, ...) is always done in double precision -- never in single precision, even when the data is in float32.
Having float32 be the default for float32 data is just requiring more typing, and more explaining ... it would compromise numpy usability as a day-to-day replacement for other systems.
Sorry, if I'm being ignorant ...
I'm going to side with Travis here. It is only a default and easily overridden. And yes, there are precisions greater than double. I was using quad precision back in the eighties on a VAX for some inherently ill conditioned problems. And on my machine long double is 12 bytes.
Chuck

On 9/20/06, Charles R Harris charlesr.harris@gmail.com wrote:
I guess I'm not really aware of the float96 type ... Is that a "machine type" on any system ? I always thought that -- e.g . coming from C -- double is "as good as it gets"... Who uses float96 ? I heard somewhere that (some) CPUs use 80bits internally when calculating 64bit double-precision...
I'm going to side with Travis here. It is only a default and easily overridden. And yes, there are precisions greater than double. I was using quad precision back in the eighties on a VAX for some inherently ill conditioned problems. And on my machine long double is 12 bytes.
And on Intel chips the internal fp precision is 80bits. The D programming language even exposes this 80-bit floating point type to the user. http://www.digitalmars.com/d/type.html http://www.digitalmars.com/d/faq.html#real
--bb

Charles R Harris wrote:
On 9/19/06, *Sebastian Haase* <haase@msg.ucsf.edu mailto:haase@msg.ucsf.edu> wrote:
Travis Oliphant wrote: > Sebastian Haase wrote: >> I still would argue that getting a "good" (smaller rounding errors) answer >> should be the default -- if speed is wanted, then *that* could be still >> specified by explicitly using dtype=float32 (which would also be a possible >> choice for int32 input) . >> > So you are arguing for using long double then.... ;-) > >> In image processing we always want means to be calculated in float64 even >> though input data is always float32 (if not uint16). >> >> Also it is simpler to say "float64 is the default" (full stop.) - instead >> >> "float64 is the default unless you have float32" >> > "the type you have is the default except for integers". Do you really > want float64 to be the default for float96? > > Unless we are going to use long double as the default, then I'm not > convinced that we should special-case the "double" type. > I guess I'm not really aware of the float96 type ... Is that a "machine type" on any system ? I always thought that -- e.g . coming from C -- double is "as good as it gets"... Who uses float96 ? I heard somewhere that (some) CPUs use 80bits internally when calculating 64bit double-precision... Is this not going into some academic argument !? For all I know, calculating mean()s (and sum()s, ...) is always done in double precision -- never in single precision, even when the data is in float32. Having float32 be the default for float32 data is just requiring more typing, and more explaining ... it would compromise numpy usability as a day-to-day replacement for other systems. Sorry, if I'm being ignorant ...
I'm going to side with Travis here. It is only a default and easily overridden. And yes, there are precisions greater than double. I was using quad precision back in the eighties on a VAX for some inherently ill conditioned problems. And on my machine long double is 12 bytes.
Chuck
I just did a web search for "long double" http://www.google.com/search?q=%22long+double%22
and it does not look like there is much agreement on what that is - see also http://en.wikipedia.org/wiki/Long_double
I really think that float96 *is* a special case - but computing mean()s and var()s in float32 would be "bad science". I hope I'm not alone in seeing numpy a great "interactive platform" to do evaluate data... I know that having too much knowledge of the details often makes one forget what the "newcomers" will do and expect. We are only talking about people that will a) work with single-precision data (e.g. large scale-image analysis) and who b) will tend to "just use the default" (dtype) --- How else can I say this: these people will just assume that arr.mean() *is* the mean of arr.
-Sebastian

Sebastian Haase wrote:
I know that having too much knowledge of the details often makes one forget what the "newcomers" will do and expect.
Please be more careful with such accusations. Repeated frequently, they can become quite insulting.
We are only talking about people that will a) work with single-precision data (e.g. large scale-image analysis) and who b) will tend to "just use the default" (dtype) --- How else can I say this: these people will just assume that arr.mean() *is* the mean of arr.
I don't understand what you mean, here. arr.mean() is almost never *the* mean of arr. Double precision can't fix that.

Robert Kern wrote:
Sebastian Haase wrote:
I know that having too much knowledge of the details often makes one forget what the "newcomers" will do and expect.
Please be more careful with such accusations. Repeated frequently, they can become quite insulting.
I did not mean to insult anyone - what I meant was, that I'm for numpy becoming an easy platform to use. I have spend and enjoyed part the last four years developing and evangelizing Python as an alternative to Matlab and C/Fortran based image analysis environment. I often find myself arguing for good support of the single precision data format. So I find it actually somewhat ironic to see myself arguing now for wanting float64 over float32 ;-)
We are only talking about people that will a) work with single-precision data (e.g. large scale-image analysis) and who b) will tend to "just use the default" (dtype) --- How else can I say this: these people will just assume that arr.mean() *is* the mean of arr.
I don't understand what you mean, here. arr.mean() is almost never *the* mean of arr. Double precision can't fix that.
This was not supposed to be a scientific statement -- I'm (again) thinking of our students that not always appreciate the full complexity of computational numerics and data types and such. The best I can hope for is a "sound" default for most (practical) cases... I still think that 80bit vs. 128bit vs 96bit is rather academic for most people ... most people seem to only use float64 and then there are some that use float32 (like us) ...
Cheers, Sebastian

Sebastian Haase wrote:
Robert Kern wrote:
Sebastian Haase wrote:
I know that having too much knowledge of the details often makes one forget what the "newcomers" will do and expect.
Please be more careful with such accusations. Repeated frequently, they can become quite insulting.
I did not mean to insult anyone - what I meant was, that I'm for numpy becoming an easy platform to use. I have spend and enjoyed part the last four years developing and evangelizing Python as an alternative to Matlab and C/Fortran based image analysis environment. I often find myself arguing for good support of the single precision data format. So I find it actually somewhat ironic to see myself arguing now for wanting float64 over float32 ;-)
No one is doubting that you want numpy to be easy to use. Please don't doubt that the rest of us want otherwise. However, the fact that you *want* numpy to be easy to use does not mean that your suggestions *will* make numpy easy to use.
We haven't forgotten what newcomers will do; to the contrary, we are quite aware that new users need consistent behavior in order to learn how to use a system. Adding another special case in how dtypes implicitly convert to one another will impede new users being able to understand the whole system. See A. M. Archibald's question in the thread "ufunc.reduce and conversion" for an example. In our judgement this is a worse outcome than notational convenience for float32 users, who already need to be aware of the effects of their precision choice. Each of us can come to different conclusions in good faith without one of us forgetting the new user experience.
Let me offer a third path: the algorithms used for .mean() and .var() are substandard. There are much better incremental algorithms that entirely avoid the need to accumulate such large (and therefore precision-losing) intermediate values. The algorithms look like the following for 1D arrays in Python:
def mean(a): m = a[0] for i in range(1, len(a)): m += (a[i] - m) / (i + 1) return m
def var(a): m = a[0] t = a.dtype.type(0) for i in range(1, len(a)): q = a[i] - m r = q / (i+1) m += r t += i * q * r t /= len(a) return t
Alternatively, from Knuth:
def var_knuth(a): m = a.dtype.type(0) variance = a.dtype.type(0) for i in range(len(a)): delta = a[i] - m m += delta / (i+1) variance += delta * (a[i] - m) variance /= len(a) return variance
If you will code up implementations of these for ndarray.mean() and ndarray.var(), I will check them in and then float32 arrays will avoid most of the catastrophes that the current implementations run into.
We are only talking about people that will a) work with single-precision data (e.g. large scale-image analysis) and who b) will tend to "just use the default" (dtype) --- How else can I say this: these people will just assume that arr.mean() *is* the mean of arr.
I don't understand what you mean, here. arr.mean() is almost never *the* mean of arr. Double precision can't fix that.
This was not supposed to be a scientific statement -- I'm (again) thinking of our students that not always appreciate the full complexity of computational numerics and data types and such.
They need to appreciate the complexity of computational numerics if they are going to do numerical computation. Double precision does not make it any simpler.

Robert Kern wrote:
This was not supposed to be a scientific statement -- I'm (again) thinking of our students that not always appreciate the full complexity of computational numerics and data types and such.
They need to appreciate the complexity of computational numerics if they are going to do numerical computation. Double precision does not make it any simpler.
This is were we differ.
We haven't forgotten what newcomers will do; to the contrary, we are quite aware that new users need consistent behavior in order to learn how to use a system. Adding another special case in how dtypes implicitly convert to one another will impede new users being able to understand the whole system.
All I'm proposing could be summarized in: mean(), sum(), var() ... produce output of dtype float64 (except for input float96 which produces float96)
A comment on this is also that for these operations the input type/precision is almost not related to the resulting output precision -- the int case makes that already clear. (This is different for e.g. min() or max() )
The proposed alternative implementations seem to have one or more multiplication (or division) for each value -- this might be noticeably slower ...
Regards, Sebastian

Robert Kern wrote:
Sebastian Haase wrote:
Robert Kern wrote:
Sebastian Haase wrote:
I know that having too much knowledge of the details often makes one forget what the "newcomers" will do and expect.
Please be more careful with such accusations. Repeated frequently, they can become quite insulting.
I did not mean to insult anyone - what I meant was, that I'm for numpy becoming an easy platform to use. I have spend and enjoyed part the last four years developing and evangelizing Python as an alternative to Matlab and C/Fortran based image analysis environment. I often find myself arguing for good support of the single precision data format. So I find it actually somewhat ironic to see myself arguing now for wanting float64 over float32 ;-)
No one is doubting that you want numpy to be easy to use. Please don't doubt that the rest of us want otherwise. However, the fact that you *want* numpy to be easy to use does not mean that your suggestions *will* make numpy easy to use.
We haven't forgotten what newcomers will do; to the contrary, we are quite aware that new users need consistent behavior in order to learn how to use a system. Adding another special case in how dtypes implicitly convert to one another will impede new users being able to understand the whole system. See A. M. Archibald's question in the thread "ufunc.reduce and conversion" for an example. In our judgement this is a worse outcome than notational convenience for float32 users, who already need to be aware of the effects of their precision choice. Each of us can come to different conclusions in good faith without one of us forgetting the new user experience.
Let me offer a third path: the algorithms used for .mean() and .var() are substandard. There are much better incremental algorithms that entirely avoid the need to accumulate such large (and therefore precision-losing) intermediate values. The algorithms look like the following for 1D arrays in Python:
def mean(a): m = a[0] for i in range(1, len(a)): m += (a[i] - m) / (i + 1) return m
def var(a): m = a[0] t = a.dtype.type(0) for i in range(1, len(a)): q = a[i] - m r = q / (i+1) m += r t += i * q * r t /= len(a) return t
Alternatively, from Knuth:
def var_knuth(a): m = a.dtype.type(0) variance = a.dtype.type(0) for i in range(len(a)): delta = a[i] - m m += delta / (i+1) variance += delta * (a[i] - m) variance /= len(a) return variance
If you will code up implementations of these for ndarray.mean() and ndarray.var(), I will check them in and then float32 arrays will avoid most of the catastrophes that the current implementations run into.
+1
We are only talking about people that will a) work with single-precision data (e.g. large scale-image analysis) and who b) will tend to "just use the default" (dtype) --- How else can I say this: these people will just assume that arr.mean() *is* the mean of arr.
I don't understand what you mean, here. arr.mean() is almost never *the* mean of arr. Double precision can't fix that.
This was not supposed to be a scientific statement -- I'm (again) thinking of our students that not always appreciate the full complexity of computational numerics and data types and such.
They need to appreciate the complexity of computational numerics if they are going to do numerical computation. Double precision does not make it any simpler.

On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote:
Let me offer a third path: the algorithms used for .mean() and .var() are substandard. There are much better incremental algorithms that entirely avoid the need to accumulate such large (and therefore precision-losing) intermediate values. The algorithms look like the following for 1D arrays in Python:
def mean(a): m = a[0] for i in range(1, len(a)): m += (a[i] - m) / (i + 1) return m
This isn't really going to be any better than using a simple sum. It'll also be slower (a division per iteration). You do avoid accumulating large sums, but then doing the division a[i]/len(a) and adding that will do the same.
Now, if you want to avoid losing precision, you want to use a better summation technique, like compensated (or Kahan) summation:
def mean(a): s = e = a.dtype.type(0) for i in range(0, len(a)): temp = s y = a[i] + e s = temp + y e = (temp - s) + y return s / len(a)
Some numerical experiments in Maple using 5-digit precision show that your mean is maybe a bit better in some cases, but can also be much worse, than sum(a)/len(a), but both are quite poor in comparision to the Kahan summation.
(We could probably use a fast implementation of Kahan summation in addition to a.sum())
def var(a): m = a[0] t = a.dtype.type(0) for i in range(1, len(a)): q = a[i] - m r = q / (i+1) m += r t += i * q * r t /= len(a) return t
Alternatively, from Knuth:
def var_knuth(a): m = a.dtype.type(0) variance = a.dtype.type(0) for i in range(len(a)): delta = a[i] - m m += delta / (i+1) variance += delta * (a[i] - m) variance /= len(a) return variance
These formulas are good when you can only do one pass over the data (like in a calculator where you don't store all the data points), but are slightly worse than doing two passes. Kahan summation would probably also be good here too.

On 9/20/06, David M. Cooke cookedm@physics.mcmaster.ca wrote:
On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote:
Let me offer a third path: the algorithms used for .mean() and .var()
are
substandard. There are much better incremental algorithms that entirely
avoid
the need to accumulate such large (and therefore precision-losing)
intermediate
values. The algorithms look like the following for 1D arrays in Python:
def mean(a): m = a[0] for i in range(1, len(a)): m += (a[i] - m) / (i + 1) return m
This isn't really going to be any better than using a simple sum. It'll also be slower (a division per iteration). You do avoid accumulating large sums, but then doing the division a[i]/len(a) and adding that will do the same.
Now, if you want to avoid losing precision, you want to use a better summation technique, like compensated (or Kahan) summation:
def mean(a): s = e = a.dtype.type(0) for i in range(0, len(a)): temp = s y = a[i] + e s = temp + y e = (temp - s) + y return s / len(a)
A variant of this can also be used to generate the sin/cos for fourier transforms using repeated complex multiplication. It works amazingly well. But I suspect accumulating in higher precision would be just as good and faster if the hardware supports it.
Divide and conquer, like an fft where only the constant coefficient is computed, is also a good bet but requires lg(n) passes over the data and extra storage.
Some numerical experiments in Maple using 5-digit precision show that
your mean is maybe a bit better in some cases, but can also be much worse, than sum(a)/len(a), but both are quite poor in comparision to the Kahan summation.
(We could probably use a fast implementation of Kahan summation in addition to a.sum())
def var(a): m = a[0] t = a.dtype.type(0) for i in range(1, len(a)): q = a[i] - m r = q / (i+1) m += r t += i * q * r t /= len(a) return t
Alternatively, from Knuth:
def var_knuth(a): m = a.dtype.type(0) variance = a.dtype.type(0) for i in range(len(a)): delta = a[i] - m m += delta / (i+1) variance += delta * (a[i] - m) variance /= len(a) return variance
These formulas are good when you can only do one pass over the data (like in a calculator where you don't store all the data points), but are slightly worse than doing two passes. Kahan summation would probably also be good here too.
I think we should leave things as they are. Choosing the right precision for accumulating sums is absolutely fundamental, I always think about it when programming such loops because it is always a potential gotcha. Making the defaults higher precision just kicks the can down the road where the unwary are still likely to trip over it at some point.
Perhaps these special tricks for special cases could be included in scipy somewhere.
Chuck

David M. Cooke wrote:
On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote:
Let me offer a third path: the algorithms used for .mean() and .var() are substandard. There are much better incremental algorithms that entirely avoid the need to accumulate such large (and therefore precision-losing) intermediate values. The algorithms look like the following for 1D arrays in Python:
def mean(a): m = a[0] for i in range(1, len(a)): m += (a[i] - m) / (i + 1) return m
This isn't really going to be any better than using a simple sum. It'll also be slower (a division per iteration).
With one exception, every test that I've thrown at it shows that it's better for float32. That exception is uniformly spaced arrays, like linspace().
You do avoid accumulating large sums, but then doing the division a[i]/len(a) and adding that will do the same.
Okay, this is true.
Now, if you want to avoid losing precision, you want to use a better summation technique, like compensated (or Kahan) summation:
def mean(a): s = e = a.dtype.type(0) for i in range(0, len(a)): temp = s y = a[i] + e s = temp + y e = (temp - s) + y return s / len(a)
Some numerical experiments in Maple using 5-digit precision show that your mean is maybe a bit better in some cases, but can also be much worse, than sum(a)/len(a), but both are quite poor in comparision to the Kahan summation.
(We could probably use a fast implementation of Kahan summation in addition to a.sum())
+1
def var(a): m = a[0] t = a.dtype.type(0) for i in range(1, len(a)): q = a[i] - m r = q / (i+1) m += r t += i * q * r t /= len(a) return t
Alternatively, from Knuth:
def var_knuth(a): m = a.dtype.type(0) variance = a.dtype.type(0) for i in range(len(a)): delta = a[i] - m m += delta / (i+1) variance += delta * (a[i] - m) variance /= len(a) return variance
These formulas are good when you can only do one pass over the data (like in a calculator where you don't store all the data points), but are slightly worse than doing two passes. Kahan summation would probably also be good here too.
Again, my tests show otherwise for float32. I'll condense my ipython log into a module for everyone's perusal. It's possible that the Kahan summation of the squared residuals will work better than the current two-pass algorithm and the implementations I give above.

Robert Kern wrote:
David M. Cooke wrote:
On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote:
Let me offer a third path: the algorithms used for .mean() and .var() are substandard. There are much better incremental algorithms that entirely avoid the need to accumulate such large (and therefore precision-losing) intermediate values. The algorithms look like the following for 1D arrays in Python:
def mean(a): m = a[0] for i in range(1, len(a)): m += (a[i] - m) / (i + 1) return m
This isn't really going to be any better than using a simple sum. It'll also be slower (a division per iteration).
With one exception, every test that I've thrown at it shows that it's better for float32. That exception is uniformly spaced arrays, like linspace().
Here's version of mean based on the Kahan sum, including the compensation term that seems to be consistently much better than builtin mean It seems to be typically 4 orders or magnitude closer to the "correct" answer than the builtin mean and 2 orders of magnitude better than just naively using the Kahan sum. I'm using the sum performed with dtype=int64 as the "correct" value.
def rawKahanSum(values): # Stolen from mathworld if not input: return 0.0 total = values[0] c = type(total)(0.0) # A running compensation for lost low-order bits. for x in values[1:]: y = x - c # So far, so good: c is zero. t = total + y # Alas, total is big, y small, so low-order digits of y are lost. c = (t - total) - y # (t - total) recovers the high-order part of y; subtracting y recovers -(low part of y) total = t # Algebriacally, c should always be zero. Beware eagerly optimising compilers! # Next time around, the lost low part will be added to y in a fresh attempt. return total, c
def kahanSum(values): total, c = rawKahanSum(values) return total
def mean(values): total, c = rawKahanSum(values) n = float(len(values)) return total / n - c / n
for i in range(100): data = random.random([10000]).astype(float32) baseline = data.mean(dtype=float64) delta_builtin_mean = baseline - data.mean() delta_compensated_mean = baseline - mean(data) delta_kahan_mean = baseline - (kahanSum(data) / len(data)) if not abs(delta_builtin_mean) >= abs(delta_kahan_mean) >= abs(delta_compensated_mean): print ">>>", print delta_builtin_mean, delta_kahan_mean, delta_compensated_mean
-tim
You do avoid accumulating large sums, but then doing the division a[i]/len(a) and adding that will do the same.
Okay, this is true.
Now, if you want to avoid losing precision, you want to use a better summation technique, like compensated (or Kahan) summation:
def mean(a): s = e = a.dtype.type(0) for i in range(0, len(a)): temp = s y = a[i] + e s = temp + y e = (temp - s) + y return s / len(a)
Some numerical experiments in Maple using 5-digit precision show that your mean is maybe a bit better in some cases, but can also be much worse, than sum(a)/len(a), but both are quite poor in comparision to the Kahan summation.
(We could probably use a fast implementation of Kahan summation in addition to a.sum())
+1
def var(a): m = a[0] t = a.dtype.type(0) for i in range(1, len(a)): q = a[i] - m r = q / (i+1) m += r t += i * q * r t /= len(a) return t
Alternatively, from Knuth:
def var_knuth(a): m = a.dtype.type(0) variance = a.dtype.type(0) for i in range(len(a)): delta = a[i] - m m += delta / (i+1) variance += delta * (a[i] - m) variance /= len(a) return variance
These formulas are good when you can only do one pass over the data (like in a calculator where you don't store all the data points), but are slightly worse than doing two passes. Kahan summation would probably also be good here too.
Again, my tests show otherwise for float32. I'll condense my ipython log into a module for everyone's perusal. It's possible that the Kahan summation of the squared residuals will work better than the current two-pass algorithm and the implementations I give above.

[Sorry, this version should have less munged formatting since I clipped the comments. Oh, and the Kahan sum algorithm was grabbed from wikipedia, not mathworld]
Tim Hochberg wrote:
Robert Kern wrote:
David M. Cooke wrote:
On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote:
Let me offer a third path: the algorithms used for .mean() and .var() are substandard. There are much better incremental algorithms that entirely avoid the need to accumulate such large (and therefore precision-losing) intermediate values. The algorithms look like the following for 1D arrays in Python:
def mean(a): m = a[0] for i in range(1, len(a)): m += (a[i] - m) / (i + 1) return m
This isn't really going to be any better than using a simple sum. It'll also be slower (a division per iteration).
With one exception, every test that I've thrown at it shows that it's better for float32. That exception is uniformly spaced arrays, like linspace().
Here's version of mean based on the Kahan sum, including the compensation term that seems to be consistently much better than builtin mean It seems to be typically 4 orders or magnitude closer to the "correct" answer than the builtin mean and 2 orders of magnitude better than just naively using the Kahan sum. I'm using the sum performed with dtype=int64 as the "correct" value.
def rawKahanSum(values): if not input: return 0.0 total = values[0] c = type(total)(0.0) for x in values[1:]: y = x - c t = total + y c = (t - total) - y total = t
return total, c
def kahanSum(values): total, c = rawKahanSum(values) return total
def mean(values): total, c = rawKahanSum(values) n = float(len(values)) return total / n - c / n
for i in range(100): data = random.random([10000]).astype(float32) baseline = data.mean(dtype=float64) delta_builtin_mean = baseline - data.mean() delta_compensated_mean = baseline - mean(data) delta_kahan_mean = baseline - (kahanSum(data) / len(data)) if not abs(delta_builtin_mean) >= abs(delta_kahan_mean) >= abs(delta_compensated_mean): print ">>>", print delta_builtin_mean, delta_kahan_mean, delta_compensated_mean
-tim
You do avoid accumulating large sums, but then doing the division a[i]/len(a) and adding that will do the same.
Okay, this is true.
Now, if you want to avoid losing precision, you want to use a better summation technique, like compensated (or Kahan) summation:
def mean(a): s = e = a.dtype.type(0) for i in range(0, len(a)): temp = s y = a[i] + e s = temp + y e = (temp - s) + y return s / len(a)
Some numerical experiments in Maple using 5-digit precision show that your mean is maybe a bit better in some cases, but can also be much worse, than sum(a)/len(a), but both are quite poor in comparision to the Kahan summation.
(We could probably use a fast implementation of Kahan summation in addition to a.sum())
+1
def var(a): m = a[0] t = a.dtype.type(0) for i in range(1, len(a)): q = a[i] - m r = q / (i+1) m += r t += i * q * r t /= len(a) return t
Alternatively, from Knuth:
def var_knuth(a): m = a.dtype.type(0) variance = a.dtype.type(0) for i in range(len(a)): delta = a[i] - m m += delta / (i+1) variance += delta * (a[i] - m) variance /= len(a) return variance
These formulas are good when you can only do one pass over the data (like in a calculator where you don't store all the data points), but are slightly worse than doing two passes. Kahan summation would probably also be good here too.
Again, my tests show otherwise for float32. I'll condense my ipython log into a module for everyone's perusal. It's possible that the Kahan summation of the squared residuals will work better than the current two-pass algorithm and the implementations I give above.
Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=D... _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion

Robert Kern wrote:
David M. Cooke wrote:
On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote:
Let me offer a third path: the algorithms used for .mean() and .var() are substandard. There are much better incremental algorithms that entirely avoid the need to accumulate such large (and therefore precision-losing) intermediate values. The algorithms look like the following for 1D arrays in Python:
def mean(a): m = a[0] for i in range(1, len(a)): m += (a[i] - m) / (i + 1) return m
This isn't really going to be any better than using a simple sum. It'll also be slower (a division per iteration).
With one exception, every test that I've thrown at it shows that it's better for float32. That exception is uniformly spaced arrays, like linspace().
You do avoid accumulating large sums, but then doing the division a[i]/len(a) and adding that will do the same.
Okay, this is true.
Now, if you want to avoid losing precision, you want to use a better summation technique, like compensated (or Kahan) summation:
def mean(a): s = e = a.dtype.type(0) for i in range(0, len(a)): temp = s y = a[i] + e s = temp + y e = (temp - s) + y return s / len(a)
Some numerical experiments in Maple using 5-digit precision show that your mean is maybe a bit better in some cases, but can also be much worse, than sum(a)/len(a), but both are quite poor in comparision to the Kahan summation.
(We could probably use a fast implementation of Kahan summation in addition to a.sum())
+1
def var(a): m = a[0] t = a.dtype.type(0) for i in range(1, len(a)): q = a[i] - m r = q / (i+1) m += r t += i * q * r t /= len(a) return t
Alternatively, from Knuth:
def var_knuth(a): m = a.dtype.type(0) variance = a.dtype.type(0) for i in range(len(a)): delta = a[i] - m m += delta / (i+1) variance += delta * (a[i] - m) variance /= len(a) return variance
These formulas are good when you can only do one pass over the data (like in a calculator where you don't store all the data points), but are slightly worse than doing two passes. Kahan summation would probably also be good here too.
On the flip side, it doesn't take a very large array (somewhere in the vicinity of 10,000 values in my experience) before memory bandwidth becomes a limiting factor. In that region a two pass algorithm could well be twice as slow as a single pass algorithm even if the former is somewhat simpler in terms of numeric operations.
-tim
Again, my tests show otherwise for float32. I'll condense my ipython log into a module for everyone's perusal. It's possible that the Kahan summation of the squared residuals will work better than the current two-pass algorithm and the implementations I give above.

Robert Kern wrote:
David M. Cooke wrote:
On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote:
Let me offer a third path: the algorithms used for .mean() and .var() are substandard. There are much better incremental algorithms that entirely avoid the need to accumulate such large (and therefore precision-losing) intermediate values. The algorithms look like the following for 1D arrays in Python:
def mean(a): m = a[0] for i in range(1, len(a)): m += (a[i] - m) / (i + 1) return m
This isn't really going to be any better than using a simple sum. It'll also be slower (a division per iteration).
With one exception, every test that I've thrown at it shows that it's better for float32. That exception is uniformly spaced arrays, like linspace().
You do avoid accumulating large sums, but then doing the division a[i]/len(a) and adding that will do the same.
Okay, this is true.
Now, if you want to avoid losing precision, you want to use a better summation technique, like compensated (or Kahan) summation:
def mean(a): s = e = a.dtype.type(0) for i in range(0, len(a)): temp = s y = a[i] + e s = temp + y e = (temp - s) + y return s / len(a)
Some numerical experiments in Maple using 5-digit precision show that your mean is maybe a bit better in some cases, but can also be much worse, than sum(a)/len(a), but both are quite poor in comparision to the Kahan summation.
(We could probably use a fast implementation of Kahan summation in addition to a.sum())
+1
def var(a): m = a[0] t = a.dtype.type(0) for i in range(1, len(a)): q = a[i] - m r = q / (i+1) m += r t += i * q * r t /= len(a) return t
Alternatively, from Knuth:
def var_knuth(a): m = a.dtype.type(0) variance = a.dtype.type(0) for i in range(len(a)): delta = a[i] - m m += delta / (i+1) variance += delta * (a[i] - m) variance /= len(a) return variance
These formulas are good when you can only do one pass over the data (like in a calculator where you don't store all the data points), but are slightly worse than doing two passes. Kahan summation would probably also be good here too.
Again, my tests show otherwise for float32. I'll condense my ipython log into a module for everyone's perusal. It's possible that the Kahan summation of the squared residuals will work better than the current two-pass algorithm and the implementations I give above.
This is what my tests show as well var_knuth outperformed any simple two pass algorithm I could come up with, even ones using Kahan sums. Interestingly, for 1D arrays the built in float32 variance performs better than it should. After a bit of twiddling around I discovered that it actually does most of it's calculations in float64. It uses a two pass calculation, the result of mean is a scalar, and in the process of converting that back to an array we end up with float64 values. Or something like that; I was mostly reverse engineering the sequence of events from the results.
-tim

Tim Hochberg wrote:
Robert Kern wrote:
David M. Cooke wrote:
On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote:
Let me offer a third path: the algorithms used for .mean() and .var() are substandard. There are much better incremental algorithms that entirely avoid the need to accumulate such large (and therefore precision-losing) intermediate values. The algorithms look like the following for 1D arrays in Python:
def mean(a): m = a[0] for i in range(1, len(a)): m += (a[i] - m) / (i + 1) return m
This isn't really going to be any better than using a simple sum. It'll also be slower (a division per iteration).
With one exception, every test that I've thrown at it shows that it's better for float32. That exception is uniformly spaced arrays, like linspace().
You do avoid accumulating large sums, but then doing the division a[i]/len(a) and adding that will do the same.
Okay, this is true.
Now, if you want to avoid losing precision, you want to use a better summation technique, like compensated (or Kahan) summation:
def mean(a): s = e = a.dtype.type(0) for i in range(0, len(a)): temp = s y = a[i] + e s = temp + y e = (temp - s) + y return s / len(a)
Some numerical experiments in Maple using 5-digit precision show that your mean is maybe a bit better in some cases, but can also be much worse, than sum(a)/len(a), but both are quite poor in comparision to the Kahan summation.
(We could probably use a fast implementation of Kahan summation in addition to a.sum())
+1
def var(a): m = a[0] t = a.dtype.type(0) for i in range(1, len(a)): q = a[i] - m r = q / (i+1) m += r t += i * q * r t /= len(a) return t
Alternatively, from Knuth:
def var_knuth(a): m = a.dtype.type(0) variance = a.dtype.type(0) for i in range(len(a)): delta = a[i] - m m += delta / (i+1) variance += delta * (a[i] - m) variance /= len(a) return variance
These formulas are good when you can only do one pass over the data (like in a calculator where you don't store all the data points), but are slightly worse than doing two passes. Kahan summation would probably also be good here too.
Again, my tests show otherwise for float32. I'll condense my ipython log into a module for everyone's perusal. It's possible that the Kahan summation of the squared residuals will work better than the current two-pass algorithm and the implementations I give above.
This is what my tests show as well var_knuth outperformed any simple two pass algorithm I could come up with, even ones using Kahan sums. Interestingly, for 1D arrays the built in float32 variance performs better than it should. After a bit of twiddling around I discovered that it actually does most of it's calculations in float64. It uses a two pass calculation, the result of mean is a scalar, and in the process of converting that back to an array we end up with float64 values. Or something like that; I was mostly reverse engineering the sequence of events from the results.
Here's a simple of example of how var is a little wacky. A shape-[N] array will give you a different result than a shape-[1,N] array. The reason is clear -- in the second case the mean is not a scalar so there isn't the inadvertent promotion to float64, but it's still odd.
data = (1000*(random.random([10000]) - 0.1)).astype(float32) print data.var() - data.reshape([1, -1]).var(-1)
[ 0.1171875]
I'm going to go ahead and attach a module containing the versions of mean, var, etc that I've been playing with in case someone wants to mess with them. Some were stolen from traffic on this list, for others I grabbed the algorithms from wikipedia or equivalent.
-tim
-tim
Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=D... _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
def raw_kahan_sum(values): """raw_kahan_sum(values) -> sum(values), residual
where sum(values) is computed using Kahan's summation algorithm and the residual is value of the lower order bits when finished.
""" total = c = values.dtype.type(0) for x in values: y = x + c t = total + y c = y - (t - total) total = t return total, c
def sum(values): """sum(values) -> sum of values performed using kahan summation""" total, c = raw_kahan_sum(values) return total
def mean(values): """mean(values) -> mean of values performed using kahan summation
The residual of the Kahan sum is used to increase the accuracy.
""" total, residual = raw_kahan_sum(values) n = float(len(values)) return total / n + residual / n
def var(values): """var(values) -> variance of values computed using Knuth's one pass algorithm""" m = variance = values.dtype.type(0) for i, x in enumerate(values): delta = values[i] - m m += delta / (i+1) variance += delta * (x - m) variance /= len(values) return variance
def var_kahan(a): m = mean(a) n = len(a) total, c = raw_kahan_sum((a-m)**2) return total / n + c / n
# This produces the same results as a.var() for 1D float32 arrays. def var_builtin_eqv(a): x = a.mean() m = empty_like(a).astype(float) m.fill(x) x = m x = a - x x = x * x total = add.reduce(x) return total * (1.0 / len(a))
if __name__ == "__main__": from numpy import random, float32, float64 builtin_better = 0 kahan_better = 0 for i in range(10): data = (1000*(random.random([10000]) - 0.1)).astype(float32) baseline = data.mean(dtype=float64) delta_builtin_mean = baseline - data.mean() delta_compensated_mean = baseline - mean(data) delta_kahan_mean = baseline - (sum(data) / len(data)) builtin_better += (abs(delta_builtin_mean) < abs(delta_compensated_mean)) kahan_better += (abs(delta_kahan_mean) < abs(delta_compensated_mean)) #~ print delta_builtin_mean, delta_kahan_mean, delta_compensated_mean print "Builtin was better", builtin_better, "times" print "Kahan was better", kahan_better, "times"
builtin_better = 0 kahan_better = 0 for i in range(10): data = (1000*(random.random([10000]) - 0.1)).astype(float32) baseline = data.var(dtype=float64) delta_builtin = baseline - data.var() delta_knuth = baseline - var(data) delta_kahan = baseline - var_kahan(data) builtin_better += (abs(delta_builtin) < abs(delta_knuth)) kahan_better += (abs(delta_kahan) < abs(delta_knuth)) print delta_builtin, delta_kahan, delta_knuth print "Builtin was better", builtin_better, "times" print "Kahan was better", kahan_better, "times"
data = (1000*(random.random([10000]) - 0.1)).astype(float32) print data.var() - data.reshape([1, -1]).var(-1)

On Thu, 21 Sep 2006 11:34:42 -0700 Tim Hochberg tim.hochberg@ieee.org wrote:
Tim Hochberg wrote:
Robert Kern wrote:
David M. Cooke wrote:
On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote:
Let me offer a third path: the algorithms used for .mean() and .var() are substandard. There are much better incremental algorithms that entirely avoid the need to accumulate such large (and therefore precision-losing) intermediate values. The algorithms look like the following for 1D arrays in Python:
def mean(a): m = a[0] for i in range(1, len(a)): m += (a[i] - m) / (i + 1) return m
This isn't really going to be any better than using a simple sum. It'll also be slower (a division per iteration).
With one exception, every test that I've thrown at it shows that it's better for float32. That exception is uniformly spaced arrays, like linspace().
You do avoid accumulating large sums, but then doing the division a[i]/len(a) and adding that will do the same.
Okay, this is true.
Now, if you want to avoid losing precision, you want to use a better summation technique, like compensated (or Kahan) summation:
def mean(a): s = e = a.dtype.type(0) for i in range(0, len(a)): temp = s y = a[i] + e s = temp + y e = (temp - s) + y return s / len(a)
def var(a): m = a[0] t = a.dtype.type(0) for i in range(1, len(a)): q = a[i] - m r = q / (i+1) m += r t += i * q * r t /= len(a) return t
Alternatively, from Knuth:
def var_knuth(a): m = a.dtype.type(0) variance = a.dtype.type(0) for i in range(len(a)): delta = a[i] - m m += delta / (i+1) variance += delta * (a[i] - m) variance /= len(a) return variance
I'm going to go ahead and attach a module containing the versions of mean, var, etc that I've been playing with in case someone wants to mess with them. Some were stolen from traffic on this list, for others I grabbed the algorithms from wikipedia or equivalent.
I looked into this a bit more. I checked float32 (single precision) and float64 (double precision), using long doubles (float96) for the "exact" results. This is based on your code. Results are compared using abs(exact_stat - computed_stat) / max(abs(values)), with 10000 values in the range of [-100, 900]
First, the mean. In float32, the Kahan summation in single precision is better by about 2 orders of magnitude than simple summation. However, accumulating the sum in double precision is better by about 9 orders of magnitude than simple summation (7 orders more than Kahan).
In float64, Kahan summation is the way to go, by 2 orders of magnitude.
For the variance, in float32, Knuth's method is *no better* than the two-pass method. Tim's code does an implicit conversion of intermediate results to float64, which is why he saw a much better result. The two-pass method using Kahan summation (again, in single precision), is better by about 2 orders of magnitude. There is practically no difference when using a double-precision accumulator amongst the techniques: they're all about 9 orders of magnitude better than single-precision two-pass.
In float64, Kahan summation is again better than the rest, by about 2 orders of magnitude.
I've put my adaptation of Tim's code, and box-and-whisker plots of the results, at http://arbutus.mcmaster.ca/dmc/numpy/variance/
Conclusions:
- If you're going to calculate everything in single precision, use Kahan summation. Using it in double-precision also helps. - If you can use a double-precision accumulator, it's much better than any of the techniques in single-precision only.
- for speed+precision in the variance, either use Kahan summation in single precision with the two-pass method, or use double precision with simple summation with the two-pass method. Knuth buys you nothing, except slower code :-)
After 1.0 is out, we should look at doing one of the above.

David M. Cooke wrote:
Conclusions:
- If you're going to calculate everything in single precision, use Kahan
summation. Using it in double-precision also helps.
- If you can use a double-precision accumulator, it's much better than any of
the techniques in single-precision only.
- for speed+precision in the variance, either use Kahan summation in single
precision with the two-pass method, or use double precision with simple summation with the two-pass method. Knuth buys you nothing, except slower code :-)
After 1.0 is out, we should look at doing one of the above.
+1

David M. Cooke wrote:
On Thu, 21 Sep 2006 11:34:42 -0700 Tim Hochberg tim.hochberg@ieee.org wrote:
Tim Hochberg wrote:
Robert Kern wrote:
David M. Cooke wrote:
On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote:
Let me offer a third path: the algorithms used for .mean() and .var() are substandard. There are much better incremental algorithms that entirely avoid the need to accumulate such large (and therefore precision-losing) intermediate values. The algorithms look like the following for 1D arrays in Python:
def mean(a): m = a[0] for i in range(1, len(a)): m += (a[i] - m) / (i + 1) return m
This isn't really going to be any better than using a simple sum. It'll also be slower (a division per iteration).
With one exception, every test that I've thrown at it shows that it's better for float32. That exception is uniformly spaced arrays, like linspace().
You do avoid accumulating large sums, but then doing the division a[i]/len(a) and adding that will do the same.
Okay, this is true.
Now, if you want to avoid losing precision, you want to use a better summation technique, like compensated (or Kahan) summation:
def mean(a): s = e = a.dtype.type(0) for i in range(0, len(a)): temp = s y = a[i] + e s = temp + y e = (temp - s) + y return s / len(a)
def var(a): m = a[0] t = a.dtype.type(0) for i in range(1, len(a)): q = a[i] - m r = q / (i+1) m += r t += i * q * r t /= len(a) return t
Alternatively, from Knuth:
def var_knuth(a): m = a.dtype.type(0) variance = a.dtype.type(0) for i in range(len(a)): delta = a[i] - m m += delta / (i+1) variance += delta * (a[i] - m) variance /= len(a) return variance
I'm going to go ahead and attach a module containing the versions of mean, var, etc that I've been playing with in case someone wants to mess with them. Some were stolen from traffic on this list, for others I grabbed the algorithms from wikipedia or equivalent.
I looked into this a bit more. I checked float32 (single precision) and float64 (double precision), using long doubles (float96) for the "exact" results. This is based on your code. Results are compared using abs(exact_stat - computed_stat) / max(abs(values)), with 10000 values in the range of [-100, 900]
First, the mean. In float32, the Kahan summation in single precision is better by about 2 orders of magnitude than simple summation. However, accumulating the sum in double precision is better by about 9 orders of magnitude than simple summation (7 orders more than Kahan).
In float64, Kahan summation is the way to go, by 2 orders of magnitude.
For the variance, in float32, Knuth's method is *no better* than the two-pass method. Tim's code does an implicit conversion of intermediate results to float64, which is why he saw a much better result.
Doh! And I fixed that same problem in the mean implementation earlier too. I was astounded by how good knuth was doing, but not astounded enough apparently.
Does it seem weird to anyone else that in: numpy_scalar <op> python_scalar the precision ends up being controlled by the python scalar? I would expect the numpy_scalar to control the resulting precision just like numpy arrays do in similar circumstances. Perhaps the egg on my face is just clouding my vision though.
The two-pass method using Kahan summation (again, in single precision), is better by about 2 orders of magnitude. There is practically no difference when using a double-precision accumulator amongst the techniques: they're all about 9 orders of magnitude better than single-precision two-pass.
In float64, Kahan summation is again better than the rest, by about 2 orders of magnitude.
I've put my adaptation of Tim's code, and box-and-whisker plots of the results, at http://arbutus.mcmaster.ca/dmc/numpy/variance/
Conclusions:
- If you're going to calculate everything in single precision, use Kahan
summation. Using it in double-precision also helps.
- If you can use a double-precision accumulator, it's much better than any of
the techniques in single-precision only.
- for speed+precision in the variance, either use Kahan summation in single
precision with the two-pass method, or use double precision with simple summation with the two-pass method. Knuth buys you nothing, except slower code :-)
The two pass methods are definitely more accurate. I won't be convinced on the speed front till I see comparable C implementations slug it out. That may well mean never in practice. However, I expect that somewhere around 10,000 items, the cache will overflow and memory bandwidth will become the bottleneck. At that point the extra operations of Knuth won't matter as much as making two passes through the array and Knuth will win on speed. Of course the accuracy is pretty bad at single precision, so the possible, theoretical speed advantage at large sizes probably doesn't matter.
-tim
After 1.0 is out, we should look at doing one of the above.
+1

On Thursday 21 September 2006 15:28, Tim Hochberg wrote:
David M. Cooke wrote:
On Thu, 21 Sep 2006 11:34:42 -0700
Tim Hochberg tim.hochberg@ieee.org wrote:
Tim Hochberg wrote:
Robert Kern wrote:
David M. Cooke wrote:
On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote: > Let me offer a third path: the algorithms used for .mean() and > .var() are substandard. There are much better incremental algorithms > that entirely avoid the need to accumulate such large (and therefore > precision-losing) intermediate values. The algorithms look like the > following for 1D arrays in Python: > > def mean(a): > m = a[0] > for i in range(1, len(a)): > m += (a[i] - m) / (i + 1) > return m
This isn't really going to be any better than using a simple sum. It'll also be slower (a division per iteration).
With one exception, every test that I've thrown at it shows that it's better for float32. That exception is uniformly spaced arrays, like linspace().
You do avoid accumulating large sums, but then doing the division a[i]/len(a) and adding that will do the same.
Okay, this is true.
Now, if you want to avoid losing precision, you want to use a better summation technique, like compensated (or Kahan) summation:
def mean(a): s = e = a.dtype.type(0) for i in range(0, len(a)): temp = s y = a[i] + e s = temp + y e = (temp - s) + y return s / len(a)
> def var(a): > m = a[0] > t = a.dtype.type(0) > for i in range(1, len(a)): > q = a[i] - m > r = q / (i+1) > m += r > t += i * q * r > t /= len(a) > return t > > Alternatively, from Knuth: > > def var_knuth(a): > m = a.dtype.type(0) > variance = a.dtype.type(0) > for i in range(len(a)): > delta = a[i] - m > m += delta / (i+1) > variance += delta * (a[i] - m) > variance /= len(a) > return variance
I'm going to go ahead and attach a module containing the versions of mean, var, etc that I've been playing with in case someone wants to mess with them. Some were stolen from traffic on this list, for others I grabbed the algorithms from wikipedia or equivalent.
I looked into this a bit more. I checked float32 (single precision) and float64 (double precision), using long doubles (float96) for the "exact" results. This is based on your code. Results are compared using abs(exact_stat - computed_stat) / max(abs(values)), with 10000 values in the range of [-100, 900]
First, the mean. In float32, the Kahan summation in single precision is better by about 2 orders of magnitude than simple summation. However, accumulating the sum in double precision is better by about 9 orders of magnitude than simple summation (7 orders more than Kahan).
In float64, Kahan summation is the way to go, by 2 orders of magnitude.
For the variance, in float32, Knuth's method is *no better* than the two-pass method. Tim's code does an implicit conversion of intermediate results to float64, which is why he saw a much better result.
Doh! And I fixed that same problem in the mean implementation earlier too. I was astounded by how good knuth was doing, but not astounded enough apparently.
Does it seem weird to anyone else that in: numpy_scalar <op> python_scalar the precision ends up being controlled by the python scalar? I would expect the numpy_scalar to control the resulting precision just like numpy arrays do in similar circumstances. Perhaps the egg on my face is just clouding my vision though.
The two-pass method using Kahan summation (again, in single precision), is better by about 2 orders of magnitude. There is practically no difference when using a double-precision accumulator amongst the techniques: they're all about 9 orders of magnitude better than single-precision two-pass.
In float64, Kahan summation is again better than the rest, by about 2 orders of magnitude.
I've put my adaptation of Tim's code, and box-and-whisker plots of the results, at http://arbutus.mcmaster.ca/dmc/numpy/variance/
Conclusions:
- If you're going to calculate everything in single precision, use Kahan
summation. Using it in double-precision also helps.
- If you can use a double-precision accumulator, it's much better than
any of the techniques in single-precision only.
- for speed+precision in the variance, either use Kahan summation in
single precision with the two-pass method, or use double precision with simple summation with the two-pass method. Knuth buys you nothing, except slower code :-)
The two pass methods are definitely more accurate. I won't be convinced on the speed front till I see comparable C implementations slug it out. That may well mean never in practice. However, I expect that somewhere around 10,000 items, the cache will overflow and memory bandwidth will become the bottleneck. At that point the extra operations of Knuth won't matter as much as making two passes through the array and Knuth will win on speed. Of course the accuracy is pretty bad at single precision, so the possible, theoretical speed advantage at large sizes probably doesn't matter.
-tim
After 1.0 is out, we should look at doing one of the above.
+1
Hi, I don't understand much of these "fancy algorithms", but does the above mean that after 1.0 is released the float32 functions will catch up in terms of precision precision ("to some extend") with using dtype=float64 ?
- Sebastian Haase

Sebastian Haase wrote:
On Thursday 21 September 2006 15:28, Tim Hochberg wrote:
David M. Cooke wrote:
On Thu, 21 Sep 2006 11:34:42 -0700
Tim Hochberg tim.hochberg@ieee.org wrote:
Tim Hochberg wrote:
Robert Kern wrote:
David M. Cooke wrote:
> On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote: > >> Let me offer a third path: the algorithms used for .mean() and >> .var() are substandard. There are much better incremental algorithms >> that entirely avoid the need to accumulate such large (and therefore >> precision-losing) intermediate values. The algorithms look like the >> following for 1D arrays in Python: >> >> def mean(a): >> m = a[0] >> for i in range(1, len(a)): >> m += (a[i] - m) / (i + 1) >> return m >> > This isn't really going to be any better than using a simple sum. > It'll also be slower (a division per iteration). > With one exception, every test that I've thrown at it shows that it's better for float32. That exception is uniformly spaced arrays, like linspace().
> You do avoid > accumulating large sums, but then doing the division a[i]/len(a) > and adding that will do the same.
Okay, this is true.
> Now, if you want to avoid losing precision, you want to use a better > summation technique, like compensated (or Kahan) summation: > > def mean(a): > s = e = a.dtype.type(0) > for i in range(0, len(a)): > temp = s > y = a[i] + e > s = temp + y > e = (temp - s) + y > return s / len(a) > > >> def var(a): >> m = a[0] >> t = a.dtype.type(0) >> for i in range(1, len(a)): >> q = a[i] - m >> r = q / (i+1) >> m += r >> t += i * q * r >> t /= len(a) >> return t >> >> Alternatively, from Knuth: >> >> def var_knuth(a): >> m = a.dtype.type(0) >> variance = a.dtype.type(0) >> for i in range(len(a)): >> delta = a[i] - m >> m += delta / (i+1) >> variance += delta * (a[i] - m) >> variance /= len(a) >> return variance >>
I'm going to go ahead and attach a module containing the versions of mean, var, etc that I've been playing with in case someone wants to mess with them. Some were stolen from traffic on this list, for others I grabbed the algorithms from wikipedia or equivalent.
I looked into this a bit more. I checked float32 (single precision) and float64 (double precision), using long doubles (float96) for the "exact" results. This is based on your code. Results are compared using abs(exact_stat - computed_stat) / max(abs(values)), with 10000 values in the range of [-100, 900]
First, the mean. In float32, the Kahan summation in single precision is better by about 2 orders of magnitude than simple summation. However, accumulating the sum in double precision is better by about 9 orders of magnitude than simple summation (7 orders more than Kahan).
In float64, Kahan summation is the way to go, by 2 orders of magnitude.
For the variance, in float32, Knuth's method is *no better* than the two-pass method. Tim's code does an implicit conversion of intermediate results to float64, which is why he saw a much better result.
Doh! And I fixed that same problem in the mean implementation earlier too. I was astounded by how good knuth was doing, but not astounded enough apparently.
Does it seem weird to anyone else that in: numpy_scalar <op> python_scalar the precision ends up being controlled by the python scalar? I would expect the numpy_scalar to control the resulting precision just like numpy arrays do in similar circumstances. Perhaps the egg on my face is just clouding my vision though.
The two-pass method using Kahan summation (again, in single precision), is better by about 2 orders of magnitude. There is practically no difference when using a double-precision accumulator amongst the techniques: they're all about 9 orders of magnitude better than single-precision two-pass.
In float64, Kahan summation is again better than the rest, by about 2 orders of magnitude.
I've put my adaptation of Tim's code, and box-and-whisker plots of the results, at http://arbutus.mcmaster.ca/dmc/numpy/variance/
Conclusions:
- If you're going to calculate everything in single precision, use Kahan
summation. Using it in double-precision also helps.
- If you can use a double-precision accumulator, it's much better than
any of the techniques in single-precision only.
- for speed+precision in the variance, either use Kahan summation in
single precision with the two-pass method, or use double precision with simple summation with the two-pass method. Knuth buys you nothing, except slower code :-)
The two pass methods are definitely more accurate. I won't be convinced on the speed front till I see comparable C implementations slug it out. That may well mean never in practice. However, I expect that somewhere around 10,000 items, the cache will overflow and memory bandwidth will become the bottleneck. At that point the extra operations of Knuth won't matter as much as making two passes through the array and Knuth will win on speed. Of course the accuracy is pretty bad at single precision, so the possible, theoretical speed advantage at large sizes probably doesn't matter.
-tim
After 1.0 is out, we should look at doing one of the above.
+1
Hi, I don't understand much of these "fancy algorithms", but does the above mean that after 1.0 is released the float32 functions will catch up in terms of precision precision ("to some extend") with using dtype=float64 ?
It sounds like there is some consensus to do something to improve the precision after 1.0 comes out. I don't think the details are entirely nailed down, but it sounds like some combination of using Kahan summation and increasing the minimum precision of the accumulator is likely for sum, mean, var and std.
-tim

On 9/22/06, Tim Hochberg tim.hochberg@ieee.org wrote:
Sebastian Haase wrote:
On Thursday 21 September 2006 15:28, Tim Hochberg wrote:
David M. Cooke wrote:
On Thu, 21 Sep 2006 11:34:42 -0700
Tim Hochberg tim.hochberg@ieee.org wrote:
Tim Hochberg wrote:
Robert Kern wrote:
> David M. Cooke wrote: > >> On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote: >> >>> Let me offer a third path: the algorithms used for .mean() and >>> .var() are substandard. There are much better incremental
algorithms
>>> that entirely avoid the need to accumulate such large (and
therefore
>>> precision-losing) intermediate values. The algorithms look like
the
>>> following for 1D arrays in Python: >>> >>> def mean(a): >>> m = a[0] >>> for i in range(1, len(a)): >>> m += (a[i] - m) / (i + 1) >>> return m >>> >> This isn't really going to be any better than using a simple sum. >> It'll also be slower (a division per iteration). >> > With one exception, every test that I've thrown at it shows that
it's
> better for float32. That exception is uniformly spaced arrays, like > linspace(). > > > You do avoid > > accumulating large sums, but then doing the division a[i]/len(a) > > and adding that will do the same. > > Okay, this is true. > > >> Now, if you want to avoid losing precision, you want to use a
better
>> summation technique, like compensated (or Kahan) summation: >> >> def mean(a): >> s = e = a.dtype.type(0) >> for i in range(0, len(a)): >> temp = s >> y = a[i] + e >> s = temp + y >> e = (temp - s) + y >> return s / len(a) >> >> >>> def var(a): >>> m = a[0] >>> t = a.dtype.type(0) >>> for i in range(1, len(a)): >>> q = a[i] - m >>> r = q / (i+1) >>> m += r >>> t += i * q * r >>> t /= len(a) >>> return t >>> >>> Alternatively, from Knuth: >>> >>> def var_knuth(a): >>> m = a.dtype.type(0) >>> variance = a.dtype.type(0) >>> for i in range(len(a)): >>> delta = a[i] - m >>> m += delta / (i+1) >>> variance += delta * (a[i] - m) >>> variance /= len(a) >>> return variance >>>
I'm going to go ahead and attach a module containing the versions of mean, var, etc that I've been playing with in case someone wants to
mess
with them. Some were stolen from traffic on this list, for others I grabbed the algorithms from wikipedia or equivalent.
I looked into this a bit more. I checked float32 (single precision)
and
float64 (double precision), using long doubles (float96) for the
"exact"
results. This is based on your code. Results are compared using abs(exact_stat - computed_stat) / max(abs(values)), with 10000 values
in
the range of [-100, 900]
First, the mean. In float32, the Kahan summation in single precision
is
better by about 2 orders of magnitude than simple summation. However, accumulating the sum in double precision is better by about 9 orders
of
magnitude than simple summation (7 orders more than Kahan).
In float64, Kahan summation is the way to go, by 2 orders of
magnitude.
For the variance, in float32, Knuth's method is *no better* than the two-pass method. Tim's code does an implicit conversion of
intermediate
results to float64, which is why he saw a much better result.
Doh! And I fixed that same problem in the mean implementation earlier too. I was astounded by how good knuth was doing, but not astounded enough apparently.
Does it seem weird to anyone else that in: numpy_scalar <op> python_scalar the precision ends up being controlled by the python scalar? I would expect the numpy_scalar to control the resulting precision just like numpy arrays do in similar circumstances. Perhaps the egg on my face is just clouding my vision though.
The two-pass method using Kahan summation (again, in single precision), is better by about 2
orders
of magnitude. There is practically no difference when using a double-precision accumulator amongst the techniques: they're all about
9
orders of magnitude better than single-precision two-pass.
In float64, Kahan summation is again better than the rest, by about 2 orders of magnitude.
I've put my adaptation of Tim's code, and box-and-whisker plots of the results, at http://arbutus.mcmaster.ca/dmc/numpy/variance/
Conclusions:
- If you're going to calculate everything in single precision, use
Kahan
summation. Using it in double-precision also helps.
- If you can use a double-precision accumulator, it's much better than
any of the techniques in single-precision only.
- for speed+precision in the variance, either use Kahan summation in
single precision with the two-pass method, or use double precision
with
simple summation with the two-pass method. Knuth buys you nothing,
except
slower code :-)
The two pass methods are definitely more accurate. I won't be convinced on the speed front till I see comparable C implementations slug it out. That may well mean never in practice. However, I expect that somewhere around 10,000 items, the cache will overflow and memory bandwidth will become the bottleneck. At that point the extra operations of Knuth
won't
matter as much as making two passes through the array and Knuth will
win
on speed. Of course the accuracy is pretty bad at single precision, so the possible, theoretical speed advantage at large sizes probably doesn't matter.
-tim
After 1.0 is out, we should look at doing one of the above.
+1
Hi, I don't understand much of these "fancy algorithms", but does the above
mean
that after 1.0 is released the float32 functions will catch up in terms of precision precision ("to some extend") with using dtype=float64 ?
It sounds like there is some consensus to do something to improve the precision after 1.0 comes out. I don't think the details are entirely nailed down, but it sounds like some combination of using Kahan summation and increasing the minimum precision of the accumulator is likely for sum, mean, var and std.
I would go with double as the default except when the base precision is greater. Higher precisions should be available as a check, i.e., if the results look funny use Kahan or extended to see if roundoff is a problem. I think Kahan should be available because it adds precision to *any* base precision, even extended or quad, so there is always something to check against. But I don't think Kahan should be the default because of the speed hit.
Chuck

Charles R Harris wrote:
[CHOP]
On 9/22/06, *Tim Hochberg* <tim.hochberg@ieee.org mailto:tim.hochberg@ieee.org> wrote:
It sounds like there is some consensus to do something to improve the precision after 1.0 comes out. I don't think the details are entirely nailed down, but it sounds like some combination of using Kahan summation and increasing the minimum precision of the accumulator is likely for sum, mean, var and std.
I would go with double as the default except when the base precision is greater. Higher precisions should be available as a check, i.e., if the results look funny use Kahan or extended to see if roundoff is a problem. I think Kahan should be available because it adds precision to *any* base precision, even extended or quad, so there is always something to check against. But I don't think Kahan should be the default because of the speed hit.
Note that add.reduce will be available for doing simple-fast-reductions, so I consider some speed hit as acceptable. I'll be interested to see what the actual speed difference between Kahan and straight summation actually are. It may also be possible to unroll the core loop cleverly so that multiple ops can be scheduled on the FPU in parallel. It doesn't look like naive unrolling will work since each iteration depends on the values of the previous one. However, it probably doesn't matter much where the low order bits get added back in, so there's some flexibility in how the loop gets unrolled.
-tim
PS. Sorry about the untrimmed posts.

Tim Hochberg wrote:
David M. Cooke wrote:
On Thu, 21 Sep 2006 11:34:42 -0700 Tim Hochberg tim.hochberg@ieee.org wrote:
Tim Hochberg wrote:
Robert Kern wrote:
David M. Cooke wrote:
On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote:
> Let me offer a third path: the algorithms used for .mean() and .var() > are substandard. There are much better incremental algorithms that > entirely avoid the need to accumulate such large (and therefore > precision-losing) intermediate values. The algorithms look like the > following for 1D arrays in Python: > > def mean(a): > m = a[0] > for i in range(1, len(a)): > m += (a[i] - m) / (i + 1) > return m > > > > This isn't really going to be any better than using a simple sum. It'll also be slower (a division per iteration).
With one exception, every test that I've thrown at it shows that it's better for float32. That exception is uniformly spaced arrays, like linspace().
You do avoid accumulating large sums, but then doing the division a[i]/len(a) and adding that will do the same.
Okay, this is true.
Now, if you want to avoid losing precision, you want to use a better summation technique, like compensated (or Kahan) summation:
def mean(a): s = e = a.dtype.type(0) for i in range(0, len(a)): temp = s y = a[i] + e s = temp + y e = (temp - s) + y return s / len(a)
> def var(a): > m = a[0] > t = a.dtype.type(0) > for i in range(1, len(a)): > q = a[i] - m > r = q / (i+1) > m += r > t += i * q * r > t /= len(a) > return t > > Alternatively, from Knuth: > > def var_knuth(a): > m = a.dtype.type(0) > variance = a.dtype.type(0) > for i in range(len(a)): > delta = a[i] - m > m += delta / (i+1) > variance += delta * (a[i] - m) > variance /= len(a) > return variance > >
I'm going to go ahead and attach a module containing the versions of mean, var, etc that I've been playing with in case someone wants to mess with them. Some were stolen from traffic on this list, for others I grabbed the algorithms from wikipedia or equivalent.
I looked into this a bit more. I checked float32 (single precision) and float64 (double precision), using long doubles (float96) for the "exact" results. This is based on your code. Results are compared using abs(exact_stat - computed_stat) / max(abs(values)), with 10000 values in the range of [-100, 900]
First, the mean. In float32, the Kahan summation in single precision is better by about 2 orders of magnitude than simple summation. However, accumulating the sum in double precision is better by about 9 orders of magnitude than simple summation (7 orders more than Kahan).
In float64, Kahan summation is the way to go, by 2 orders of magnitude.
For the variance, in float32, Knuth's method is *no better* than the two-pass method. Tim's code does an implicit conversion of intermediate results to float64, which is why he saw a much better result.
Doh! And I fixed that same problem in the mean implementation earlier too. I was astounded by how good knuth was doing, but not astounded enough apparently.
Does it seem weird to anyone else that in: numpy_scalar <op> python_scalar the precision ends up being controlled by the python scalar? I would expect the numpy_scalar to control the resulting precision just like numpy arrays do in similar circumstances. Perhaps the egg on my face is just clouding my vision though.
The two-pass method using Kahan summation (again, in single precision), is better by about 2 orders of magnitude. There is practically no difference when using a double-precision accumulator amongst the techniques: they're all about 9 orders of magnitude better than single-precision two-pass.
In float64, Kahan summation is again better than the rest, by about 2 orders of magnitude.
I've put my adaptation of Tim's code, and box-and-whisker plots of the results, at http://arbutus.mcmaster.ca/dmc/numpy/variance/
Conclusions:
- If you're going to calculate everything in single precision, use Kahan
summation. Using it in double-precision also helps.
- If you can use a double-precision accumulator, it's much better than any of
the techniques in single-precision only.
- for speed+precision in the variance, either use Kahan summation in single
precision with the two-pass method, or use double precision with simple summation with the two-pass method. Knuth buys you nothing, except slower code :-)
The two pass methods are definitely more accurate. I won't be convinced on the speed front till I see comparable C implementations slug it out. That may well mean never in practice. However, I expect that somewhere around 10,000 items, the cache will overflow and memory bandwidth will become the bottleneck. At that point the extra operations of Knuth won't matter as much as making two passes through the array and Knuth will win on speed.
OK, let me take this back. I looked at this speed effect a little more and the effect is smaller than I remember. For example, "a+=a" slows down by about a factor of 2.5 on my box between 10,000 and 100,0000 elements. That's not insignificant, but assuming (naively) that this means that memory effects account to the equivalent of 1.5 add operations, this doesn't come close to closing to gap between Knuth and the two pass approach. The other thing is that while a+=a and similar operations show this behaviour, a.sum() and add.reduce(a) do not, hinting that perhaps it's only writing to memory that is a bottleneck. Or perhaps hinting that my mental model of what's going on here is badly flawed.
So, +1 for me on Kahan summation for computing means and two pass with Kahan summation for variances. It would probably be nice to expose the Kahan sum and maybe even the raw_kahan_sum somewhere. I can see use for the latter in summing a bunch of disparate matrices with high precision.
I'm on the fence on using the array dtype for the accumulator dtype versus always using at least double precision for the accumulator. The former is easier to explain and is probably faster, but the latter is a lot more accuracy for basically free. It depends on how the relative speeds shake out I suppose. If the speed of using float64 is comparable to that of using float32, we might as well. One thing I'm not on the fence about is the return type: it should always match the input type, or dtype if that is specified. Otherwise one gets mysterious, gratuitous upcasting.
On the subject of upcasting, I'm still skeptical of the behavior of mixed numpy-scalar and python-scalar ops. Since numpy-scalars are generally the results of indexing operations, not literals, I think that they should behave like arrays for purposes of determining the resulting precision, not like python-scalars. Otherwise situations arise where the rank of the input array determine the kind of output (float32 versus float64 for example) since if the array is 1D indexing into the array yields numpy-scalars which then get promoted when they interact with python-scalars.However if the array is 2D, indexing yields a vector, which is not promoted when interacting with python scalars and the precision remains fixed.
-tim
Of course the accuracy is pretty bad at single precision, so the possible, theoretical speed advantage at large sizes probably doesn't matter.
-tim
After 1.0 is out, we should look at doing one of the above.
+1
Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=D... _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
m bnb

Hi,
On 9/22/06, Tim Hochberg tim.hochberg@ieee.org wrote:
Tim Hochberg wrote:
David M. Cooke wrote:
On Thu, 21 Sep 2006 11:34:42 -0700 Tim Hochberg tim.hochberg@ieee.org wrote:
Tim Hochberg wrote:
Robert Kern wrote:
David M. Cooke wrote:
> On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote: > > > > >> Let me offer a third path: the algorithms used for .mean() and
.var()
>> are substandard. There are much better incremental algorithms that >> entirely avoid the need to accumulate such large (and therefore >> precision-losing) intermediate values. The algorithms look like
the
>> following for 1D arrays in Python: >> >> def mean(a): >> m = a[0] >> for i in range(1, len(a)): >> m += (a[i] - m) / (i + 1) >> return m >> >> >> >> > This isn't really going to be any better than using a simple sum. > It'll also be slower (a division per iteration). > > > > With one exception, every test that I've thrown at it shows that
it's
better for float32. That exception is uniformly spaced arrays, like linspace().
> You do avoid > accumulating large sums, but then doing the division a[i]/len(a)
and
> adding that will do the same.
Okay, this is true.
> Now, if you want to avoid losing precision, you want to use a
better
> summation technique, like compensated (or Kahan) summation: > > def mean(a): > s = e = a.dtype.type(0) > for i in range(0, len(a)): > temp = s > y = a[i] + e > s = temp + y > e = (temp - s) + y > return s / len(a) > >
>> def var(a): >> m = a[0] >> t = a.dtype.type(0) >> for i in range(1, len(a)): >> q = a[i] - m >> r = q / (i+1) >> m += r >> t += i * q * r >> t /= len(a) >> return t >> >> Alternatively, from Knuth: >> >> def var_knuth(a): >> m = a.dtype.type(0) >> variance = a.dtype.type(0) >> for i in range(len(a)): >> delta = a[i] - m >> m += delta / (i+1) >> variance += delta * (a[i] - m) >> variance /= len(a) >> return variance >> >>
I'm going to go ahead and attach a module containing the versions of mean, var, etc that I've been playing with in case someone wants to
mess
with them. Some were stolen from traffic on this list, for others I grabbed the algorithms from wikipedia or equivalent.
I looked into this a bit more. I checked float32 (single precision) and float64 (double precision), using long doubles (float96) for the
"exact"
results. This is based on your code. Results are compared using abs(exact_stat - computed_stat) / max(abs(values)), with 10000 values
in the
range of [-100, 900]
First, the mean. In float32, the Kahan summation in single precision is better by about 2 orders of magnitude than simple summation. However, accumulating the sum in double precision is better by about 9 orders of magnitude than simple summation (7 orders more than Kahan).
In float64, Kahan summation is the way to go, by 2 orders of magnitude.
For the variance, in float32, Knuth's method is *no better* than the
two-pass
method. Tim's code does an implicit conversion of intermediate results
to
float64, which is why he saw a much better result.
Doh! And I fixed that same problem in the mean implementation earlier too. I was astounded by how good knuth was doing, but not astounded enough apparently.
Does it seem weird to anyone else that in: numpy_scalar <op> python_scalar the precision ends up being controlled by the python scalar? I would expect the numpy_scalar to control the resulting precision just like numpy arrays do in similar circumstances. Perhaps the egg on my face is just clouding my vision though.
The two-pass method using Kahan summation (again, in single precision), is better by about 2
orders of
magnitude. There is practically no difference when using a
double-precision
accumulator amongst the techniques: they're all about 9 orders of
magnitude
better than single-precision two-pass.
In float64, Kahan summation is again better than the rest, by about 2
orders
of magnitude.
I've put my adaptation of Tim's code, and box-and-whisker plots of the results, at http://arbutus.mcmaster.ca/dmc/numpy/variance/
Conclusions:
- If you're going to calculate everything in single precision, use
Kahan
summation. Using it in double-precision also helps.
- If you can use a double-precision accumulator, it's much better than
any of
the techniques in single-precision only.
- for speed+precision in the variance, either use Kahan summation in
single
precision with the two-pass method, or use double precision with simple summation with the two-pass method. Knuth buys you nothing, except
slower
code :-)
The two pass methods are definitely more accurate. I won't be convinced on the speed front till I see comparable C implementations slug it out. That may well mean never in practice. However, I expect that somewhere around 10,000 items, the cache will overflow and memory bandwidth will become the bottleneck. At that point the extra operations of Knuth won't matter as much as making two passes through the array and Knuth will win on speed.
OK, let me take this back. I looked at this speed effect a little more and the effect is smaller than I remember. For example, "a+=a" slows down by about a factor of 2.5 on my box between 10,000 and 100,0000 elements. That's not insignificant, but assuming (naively) that this means that memory effects account to the equivalent of 1.5 add operations, this doesn't come close to closing to gap between Knuth and the two pass approach. The other thing is that while a+=a and similar operations show this behaviour, a.sum() and add.reduce(a) do not, hinting that perhaps it's only writing to memory that is a bottleneck. Or perhaps hinting that my mental model of what's going on here is badly flawed.
So, +1 for me on Kahan summation for computing means and two pass with Kahan summation for variances. It would probably be nice to expose the Kahan sum and maybe even the raw_kahan_sum somewhere. I can see use for the latter in summing a bunch of disparate matrices with high precision.
I'm on the fence on using the array dtype for the accumulator dtype versus always using at least double precision for the accumulator.
I keep going back and forth on this myself. So I ask myself what I do in practice. In practice, I accumulate in doubles, sometimes extended precision. I only think of accumulating in floats when the number of items is small and computation might be noticeably faster -- and these days it seldom is. The days of the Vax and a 2x speed difference are gone, now we generally have FPUs whose base type is double with float tacked on as an afterthought. The only lesson from those days that remains pertinent is: never, ever, do division.
So in practice, I would go with doubles. The only good reason to use floats for floats is a sort of pure consistency.
Chuck

Tim Hochberg wrote:
It would probably be nice to expose the Kahan sum and maybe even the raw_kahan_sum somewhere.
What about using it for .sum() by default? What is the speed hit anyway? In any case, having it available would be nice.
I'm on the fence on using the array dtype for the accumulator dtype versus always using at least double precision for the accumulator. The former is easier to explain and is probably faster, but the latter is a lot more accuracy for basically free.
I don't think the difficulty of explanation is a big issue at all -- I'm having a really hard time imagining someone getting confused and/or disappointed that their single precision calculation didn't overflow or was more accurate than expected. Anyone that did, would know enough to understand the explanation.
In general, users expect things to "just work". They only dig into the details when something goes wrong, like the mean of a bunch of positive integers coming out as negative (wasn't that the OP's example?). The fewer such instance we have, the fewer questions we have.
speeds shake out I suppose. If the speed of using float64 is comparable to that of using float32, we might as well.
Only testing will tell, but my experience is that with double precision FPUs, doubles are just as fast as floats unless you're dealing with enough memory to make a difference in caching. In this case, only the accumulator is double, so that wouldn't be an issue. I suppose the float to double conversion could take some time though...
One thing I'm not on the fence about is the return type: it should always match the input type, or dtype if that is specified.
+1
Since numpy-scalars are generally the results of indexing operations, not literals, I think that they should behave like arrays for purposes of determining the resulting precision, not like python-scalars.
+1
Of course the accuracy is pretty bad at single precision, so the possible, theoretical speed advantage at large sizes probably doesn't matter.
good point. the larger the array -- the more accuracy matters.
-Chris

David M. Cooke wrote:
On Thu, 21 Sep 2006 11:34:42 -0700 Tim Hochberg tim.hochberg@ieee.org wrote:
Tim Hochberg wrote:
Robert Kern wrote:
David M. Cooke wrote:
On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote:
Let me offer a third path: the algorithms used for .mean() and .var() are substandard. There are much better incremental algorithms that entirely avoid the need to accumulate such large (and therefore precision-losing) intermediate values. The algorithms look like the following for 1D arrays in Python:
def mean(a): m = a[0] for i in range(1, len(a)): m += (a[i] - m) / (i + 1) return m
This isn't really going to be any better than using a simple sum. It'll also be slower (a division per iteration).
With one exception, every test that I've thrown at it shows that it's better for float32. That exception is uniformly spaced arrays, like linspace().
You do avoid accumulating large sums, but then doing the division a[i]/len(a) and adding that will do the same.
Okay, this is true.
Now, if you want to avoid losing precision, you want to use a better summation technique, like compensated (or Kahan) summation:
def mean(a): s = e = a.dtype.type(0) for i in range(0, len(a)): temp = s y = a[i] + e s = temp + y e = (temp - s) + y return s / len(a)
def var(a): m = a[0] t = a.dtype.type(0) for i in range(1, len(a)): q = a[i] - m r = q / (i+1) m += r t += i * q * r t /= len(a) return t
Alternatively, from Knuth:
def var_knuth(a): m = a.dtype.type(0) variance = a.dtype.type(0) for i in range(len(a)): delta = a[i] - m m += delta / (i+1) variance += delta * (a[i] - m) variance /= len(a) return variance
I'm going to go ahead and attach a module containing the versions of mean, var, etc that I've been playing with in case someone wants to mess with them. Some were stolen from traffic on this list, for others I grabbed the algorithms from wikipedia or equivalent.
I looked into this a bit more. I checked float32 (single precision) and float64 (double precision), using long doubles (float96) for the "exact" results. This is based on your code. Results are compared using abs(exact_stat - computed_stat) / max(abs(values)), with 10000 values in the range of [-100, 900]
First, the mean. In float32, the Kahan summation in single precision is better by about 2 orders of magnitude than simple summation. However, accumulating the sum in double precision is better by about 9 orders of magnitude than simple summation (7 orders more than Kahan).
In float64, Kahan summation is the way to go, by 2 orders of magnitude.
For the variance, in float32, Knuth's method is *no better* than the two-pass method. Tim's code does an implicit conversion of intermediate results to float64, which is why he saw a much better result. The two-pass method using Kahan summation (again, in single precision), is better by about 2 orders of magnitude. There is practically no difference when using a double-precision accumulator amongst the techniques: they're all about 9 orders of magnitude better than single-precision two-pass.
In float64, Kahan summation is again better than the rest, by about 2 orders of magnitude.
I've put my adaptation of Tim's code, and box-and-whisker plots of the results, at http://arbutus.mcmaster.ca/dmc/numpy/variance/
Conclusions:
- If you're going to calculate everything in single precision, use Kahan
summation. Using it in double-precision also helps.
- If you can use a double-precision accumulator, it's much better than any of
the techniques in single-precision only.
- for speed+precision in the variance, either use Kahan summation in single
precision with the two-pass method, or use double precision with simple summation with the two-pass method. Knuth buys you nothing, except slower code :-)
After 1.0 is out, we should look at doing one of the above.
One more little tidbit; it appears possible to "fix up" Knuth's algorithm so that it's comparable in accuracy to the two pass Kahan version by doing Kahan summation while accumulating the variance. Testing on this was far from thorough, but in the tests I did it nearly always produced identical results to kahan. Of course this is even slower than the original Knuth version, but it's interesting anyway:
# This is probably messier than it need to be # but I'm out of time for today... def var_knuth2(values, dtype=default_prec): """var(values) -> variance of values computed using Knuth's one pass algorithm""" m = variance = mc = vc = dtype(0) for i, x in enumerate(values): delta = values[i] - m y = delta / dtype(i+1) + mc t = m + y mc = y - (t - m) m = t y = delta * (x - m) + vc t = variance + y vc = y - (t - variance) variance = t assert type(variance) == dtype variance /= dtype(len(values)) return variance

Sebastian Haase wrote:
The best I can hope for is a "sound" default for most (practical) cases... I still think that 80bit vs. 128bit vs 96bit is rather academic for most people ... most people seem to only use float64 and then there are some that use float32 (like us) ...
I fully agree with Sebastian here. As Travis pointed out, "all we are talking about is the default". The default should follow the principle of least surprise for the less-knowledgeable users. Personally, I try to always use doubles, unless I have a real reason not to. The recent change of default types for zeros et al. will help.
clearly, there is a problem to say the default accumulator for *all* types is double, as you wouldn't want to downcast float128s, even if they are obscure. However, is it that hard to say that the default accumulator will have *at least* the precision of double?
Robert Kern wrote:
Let me offer a third path: the algorithms used for .mean() and .var() are substandard. There are much better incremental algorithms that entirely avoid the need to accumulate such large (and therefore precision-losing) intermediate values.
This, of course, is an even better solution, unless there are substantial performance impacts.
-Chris

On 9/19/06, Charles R Harris charlesr.harris@gmail.com wrote:
On 9/19/06, Sebastian Haase haase@msg.ucsf.edu wrote:
Travis Oliphant wrote:
Sebastian Haase wrote:
I still would argue that getting a "good" (smaller rounding errors)
answer
should be the default -- if speed is wanted, then *that* could be
still
specified by explicitly using dtype=float32 (which would also be a
possible
choice for int32 input) .
So you are arguing for using long double then.... ;-)
In image processing we always want means to be calculated in float64
even
though input data is always float32 (if not uint16).
Also it is simpler to say "float64 is the default" (full stop.) -
instead
"float64 is the default unless you have float32"
"the type you have is the default except for integers". Do you really want float64 to be the default for float96?
Unless we are going to use long double as the default, then I'm not convinced that we should special-case the "double" type.
I guess I'm not really aware of the float96 type ... Is that a "machine type" on any system ? I always thought that -- e.g . coming from C -- double is "as good as it gets"... Who uses float96 ? I heard somewhere that (some) CPUs use 80bits internally when calculating 64bit double-precision...
Is this not going into some academic argument !? For all I know, calculating mean()s (and sum()s, ...) is always done in double precision -- never in single precision, even when the data is in float32.
Having float32 be the default for float32 data is just requiring more typing, and more explaining ... it would compromise numpy usability as a day-to-day replacement for other systems.
Sorry, if I'm being ignorant ...
I'm going to side with Travis here. It is only a default and easily overridden. And yes, there are precisions greater than double. I was using quad precision back in the eighties on a VAX for some inherently ill conditioned problems. And on my machine long double is 12 bytes.
Here is the 754r (revision) spec: http://en.wikipedia.org/wiki/IEEE_754r
It includes quads (128 bits) and half precision (16 bits) floats. I believe the latter are used for some imaging stuff, radar for instance, and are also available in some high end GPUs from Nvidia and other companies. The 80 bit numbers you refer to were defined as extended precision in the original 754 spec and were mostly intended for temporaries in internal FPU computations. They have various alignment requirements for efficient use, which is why they show up as 96 bits (4 byte alignment) and sometimes 128 bits (8 byte alignment). So actually, float128 would not always distinquish between extended precision and quad precision. I see more work for Travis in the future ;)
Chuck

On 9/19/06, Charles R Harris charlesr.harris@gmail.com wrote:
On 9/19/06, Charles R Harris charlesr.harris@gmail.com wrote:
On 9/19/06, Sebastian Haase < haase@msg.ucsf.edu> wrote:
Travis Oliphant wrote:
Sebastian Haase wrote:
I still would argue that getting a "good" (smaller rounding errors)
answer
should be the default -- if speed is wanted, then *that* could be
still
specified by explicitly using dtype=float32 (which would also be a
possible
choice for int32 input) .
So you are arguing for using long double then.... ;-)
In image processing we always want means to be calculated in
float64 even
though input data is always float32 (if not uint16).
Also it is simpler to say "float64 is the default" (full stop.) -
instead
"float64 is the default unless you have float32"
"the type you have is the default except for integers". Do you
really
want float64 to be the default for float96?
Unless we are going to use long double as the default, then I'm not convinced that we should special-case the "double" type.
I guess I'm not really aware of the float96 type ... Is that a "machine type" on any system ? I always thought that -- e.g. coming from C -- double is "as good as it gets"... Who uses float96 ? I heard somewhere that (some) CPUs use 80bits internally when calculating 64bit double-precision...
Is this not going into some academic argument !? For all I know, calculating mean()s (and sum()s, ...) is always done in double precision -- never in single precision, even when the data is in float32.
Having float32 be the default for float32 data is just requiring more typing, and more explaining ... it would compromise numpy usability as a day-to-day replacement for other systems.
Sorry, if I'm being ignorant ...
I'm going to side with Travis here. It is only a default and easily overridden. And yes, there are precisions greater than double. I was using quad precision back in the eighties on a VAX for some inherently ill conditioned problems. And on my machine long double is 12 bytes.
Here is the 754r (revision) spec: http://en.wikipedia.org/wiki/IEEE_754r
It includes quads (128 bits) and half precision (16 bits) floats. I believe the latter are used for some imaging stuff, radar for instance, and are also available in some high end GPUs from Nvidia and other companies. The 80 bit numbers you refer to were defined as extended precision in the original 754 spec and were mostly intended for temporaries in internal FPU computations. They have various alignment requirements for efficient use, which is why they show up as 96 bits (4 byte alignment) and sometimes 128 bits (8 byte alignment). So actually, float128 would not always distinquish between extended precision and quad precision. I see more work for Travis in the future ;)
I just checked this out. On amd64 32 bit linux gives 12 bytes for long double, 64 bit linux gives 16 bytes for long doubles, but they both have 64 bit mantissas, i.e., they are both 80 bit extended precision. Those sizes are the defaults and can be overridden by compiler flags. Anyway, we may need some way to tell the difference between float128 and quads since they will both have the same length on 64 bit architectures. But that is a problem for the future.
Chuck

Sebastian Haase wrote:
(don't know how to say this for complex types !? Are here real and imag treated separately / independently ?)
Yes. For mean(), there's really no alternative. Scalar variance is not a well-defined concept for complex numbers, but treating the real and imaginary parts separately is a sensible and (partially) informative thing to do. Simply applying the formula for estimating variance for real numbers to complex numbers (i.e. change "x" to "z") is a meaningless operation.
participants (9)
-
Bill Baxter
-
Charles R Harris
-
Christopher Barker
-
David M. Cooke
-
Robert Kern
-
Sebastian Haase
-
Tim Hochberg
-
Travis Oliphant
-
Travis Oliphant