moving average, moving standard deviation, etc...
Does anyone know of a slick way to calculate a simple moving average and/or moving standard deviation? And when I say slick, I mean loop-free way, because it would be fairly easy to code a looping way of doing it, it would just be really slow. To be more specific, when I say moving standard deviation, I mean for example... if A is a 1-dimensional array with 100 elements, and I'm using a window size of 10, then the result at index 10 would be A[0:10].std(), the result at index 11 would be A[1:11].std() , etc... And similarly for moving average, and so forth. Is their a general way to do these kind of things? Or would it have to be a special case for each type of calculation to avoid looping? Any help is greatly appreciated. Thanks in advance, - Matt
Hi, For all linear operations a convolution will do the trick. For non-linear operations, such as the standard deviation an ugly but efficient way of doing it would be: sqrt(A[0:-10]**2 + A[1;-9]**2 + ... ) Maybe there is a more beautiful way of writing this. Gaël On Sat, Dec 23, 2006 at 01:42:39PM -0500, Matt Knox wrote:
Does anyone know of a slick way to calculate a simple moving average and/or moving standard deviation? And when I say slick, I mean loop-free way, because it would be fairly easy to code a looping way of doing it, it would just be really slow.
To be more specific, when I say moving standard deviation, I mean for example...
if A is a 1-dimensional array with 100 elements, and I'm using a window size of 10, then the result at index 10 would be A[0:10].std(), the result at index 11 would be A[1:11].std() , etc...
And similarly for moving average, and so forth. Is their a general way to do these kind of things? Or would it have to be a special case for each type of calculation to avoid looping?
Any help is greatly appreciated. Thanks in advance,
- Matt
_______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
On 23/12/06, Matt Knox <mattknox_ca@hotmail.com> wrote:
And similarly for moving average, and so forth. Is their a general way to do these kind of things? Or would it have to be a special case for each type of calculation to avoid looping?
I don't imagine it would be particularly efficient, but you could use fancy indexing (or some weird code I posted a few months ago to avoid copying) to construct a new array such that slices along one axis are the (overlapping) chunks of the original array. Then any operation you want to perform on all the chunks can just be performed along this axis of the new array. A. M. Archibald
Hi, The technique I've seen involves keeping some intermediate variables, and subtracting off the oldest value while adding on the newest. At each timestep then, you just update the variables and recalculate the result. xsum += (x[i] - x[i-window_size]) x2sum += (x[i]*x[i] - x[i-window_size]*x[i-window_size]) average = xsum/window_size stdev_squared = (x2sum - xsum*xsum)/window_size stdev = sqrt(stdev_squared) If you want the "sample" stdev, you can compute it directly, or use: stdev_sample_squared = window_size*stdev_squared/(window_size-1) stdev_sample = sqrt(stdev_squared) Don't get me started on this though... IMHO it's usually a waste of time. -Frank Matt Knox wrote:
Does anyone know of a slick way to calculate a simple moving average and/or moving standard deviation? And when I say slick, I mean loop-free way, because it would be fairly easy to code a looping way of doing it, it would just be really slow.
To be more specific, when I say moving standard deviation, I mean for example...
if A is a 1-dimensional array with 100 elements, and I'm using a window size of 10, then the result at index 10 would be A[0:10].std(), the result at index 11 would be A[1:11].std() , etc...
And similarly for moving average, and so forth. Is their a general way to do these kind of things? Or would it have to be a special case for each type of calculation to avoid looping?
Any help is greatly appreciated. Thanks in advance,
- Matt
_______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
Hi,
The technique I've seen involves keeping some intermediate variables, and subtracting off the oldest value while adding on the newest. At each timestep then, you just update the variables and recalculate the result.
xsum += (x[i] - x[i-window_size]) x2sum += (x[i]*x[i] - x[i-window_size]*x[i-window_size]) average = xsum/window_size stdev_squared = (x2sum - xsum*xsum)/window_size stdev = sqrt(stdev_squared)
If you want the "sample" stdev, you can compute it directly, or use:
stdev_sample_squared = window_size*stdev_squared/(window_size-1) stdev_sample = sqrt(stdev_squared)
Don't get me started on this though... IMHO it's usually a waste of time.
-Frank
Ok, thanks. So it sounds like there is no easy way to do this generally without doing some looping in python (for the standard deviation anyway), or writing some C code. I'm more interested in the general strategy for doing these kinds of calculations than these specific examples themselves necessarily. Although they are useful for making pretty charts sometimes. - Matt
Hi Matt On Sun, Dec 24, 2006 at 04:23:08PM +0000, Matt Knox wrote:
Ok, thanks. So it sounds like there is no easy way to do this generally without doing some looping in python (for the standard deviation anyway), or writing some C code. I'm more interested in the general strategy for doing these kinds of calculations than these specific examples themselves necessarily. Although they are useful for making pretty charts sometimes.
Like someone mentioned earlier, you can do averaging using numpy.convolve. For the rest, C code may be the easiest option -- and with ctypes it's a breeze. Take a look at Albert's page, http://www.scipy.org/Cookbook/Ctypes2 and let me know if you need further examples. Cheers Stéfan
Oops - I just realized that you said "no looping", as in "any looping is done inside the API function." Sorry about that - I don't know of any way to do it that way. -Frank
On 24/12/06, Matt Knox <mattknox_ca@hotmail.com> wrote:
Ok, thanks. So it sounds like there is no easy way to do this generally without doing some looping in python (for the standard deviation anyway), or writing some C code. I'm more interested in the general strategy for doing these kinds of calculations than these specific examples themselves necessarily. Although they are useful for making pretty charts sometimes.
Fancy indexing will do the job, more or less. This is the general idea I was trying to get at: chunk = 10 data = arange(10000)/10000. indices = arange(len(data)-chunk+1)[:,newaxis]+arange(chunk)[newaxis,:] data_chunks = data[indices] average(data_chunks,axis=0) var(data_chunks,axis=0) and so on... The only problem is that it expands your data rather a lot. Personally I wouldn't worry about that until my application was written. I posted, some time ago, a function that would allow the easy construction of data_chunks as a view of the original array, that is, with no data copying. (Sorry, I don't remember how to find the thread.) A. M. Archibald
On Sun, Dec 24, 2006 at 10:53:04PM -0400, A. M. Archibald wrote:
On 24/12/06, Matt Knox <mattknox_ca@hotmail.com> wrote:
Ok, thanks. So it sounds like there is no easy way to do this generally without doing some looping in python (for the standard deviation anyway), or writing some C code. I'm more interested in the general strategy for doing these kinds of calculations than these specific examples themselves necessarily. Although they are useful for making pretty charts sometimes.
Fancy indexing will do the job, more or less.
This is the general idea I was trying to get at:
chunk = 10 data = arange(10000)/10000.
indices = arange(len(data)-chunk+1)[:,newaxis]+arange(chunk)[newaxis,:]
data_chunks = data[indices]
average(data_chunks,axis=0) var(data_chunks,axis=0) and so on...
The only problem is that it expands your data rather a lot. Personally I wouldn't worry about that until my application was written. I posted, some time ago, a function that would allow the easy construction of data_chunks as a view of the original array, that is, with no data copying. (Sorry, I don't remember how to find the thread.)
http://thread.gmane.org/gmane.comp.python.scientific.user/9570/focus=9579 Cheers Stéfan
On Wed, Dec 27, 2006 at 10:33:28AM +0200, Stefan van der Walt wrote:
On Sun, Dec 24, 2006 at 10:53:04PM -0400, A. M. Archibald wrote:
On 24/12/06, Matt Knox <mattknox_ca@hotmail.com> wrote:
Ok, thanks. So it sounds like there is no easy way to do this generally without doing some looping in python (for the standard deviation anyway), or writing some C code. I'm more interested in the general strategy for doing these kinds of calculations than these specific examples themselves necessarily. Although they are useful for making pretty charts sometimes.
Fancy indexing will do the job, more or less.
This is the general idea I was trying to get at:
chunk = 10 data = arange(10000)/10000.
indices = arange(len(data)-chunk+1)[:,newaxis]+arange(chunk)[newaxis,:]
data_chunks = data[indices]
average(data_chunks,axis=0) var(data_chunks,axis=0) and so on...
The only problem is that it expands your data rather a lot. Personally I wouldn't worry about that until my application was written. I posted, some time ago, a function that would allow the easy construction of data_chunks as a view of the original array, that is, with no data copying. (Sorry, I don't remember how to find the thread.)
http://thread.gmane.org/gmane.comp.python.scientific.user/9570/focus=9579
Or even better: http://thread.gmane.org/gmane.comp.python.numeric.general/12365/focus=12367 Cheers Stéfan
For operations which may be reduced to sums (this includes average and std), you can use cumulative sum:
A = arange(10.0) w = 2 # window length C = zeros_like(A)
B = A.cumsum() C[w:] = (B[w:] - B[:-w]) / w # running average C[:w] = B[:w] / arange(1,w+1) # just average for first elements
print vstack((A, C)) [[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. ] [ 0. 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5]]
Denis On 12/23/06, Matt Knox <mattknox_ca@hotmail.com> wrote:
Does anyone know of a slick way to calculate a simple moving average and/or moving standard deviation? And when I say slick, I mean loop-free way, because it would be fairly easy to code a looping way of doing it, it would just be really slow.
To be more specific, when I say moving standard deviation, I mean for example...
if A is a 1-dimensional array with 100 elements, and I'm using a window size of 10, then the result at index 10 would be A[0:10].std(), the result at index 11 would be A[1:11].std() , etc...
And similarly for moving average, and so forth. Is their a general way to do these kind of things? Or would it have to be a special case for each type of calculation to avoid looping?
Any help is greatly appreciated. Thanks in advance,
- Matt
participants (6)
-
A. M. Archibald
-
Denis Simakov
-
Frank Palazzolo
-
Gael Varoquaux
-
Matt Knox
-
Stefan van der Walt