Managing Rolling Data

I'm new to numpy and looking for advice on setting up and managing array data for my particular problem. I'm collecting observations of P properties for N objects over a rolling horizon of H sample times. I could conceptually store the data in three-dimensional array with shape (N,P,H) that would allow me to easily (and efficiently with strided slices) compute the statistics over both N and H that I am interested in. This is great, but the rub is that H, an interval of T, is a rolling horizon. T is to large to fit in memory, so I need to load up H, perform my calculations, pop the oldest N x P slice and push the newest N x P slice into the data cube. What's the best way to do this that will maintain fast computations along the one-dimensional slices over N and H? Is there a commonly accepted idiom? Fundamentally, I see two solutions. The first would be to essentially perform a memcpy to propagate the data. The second would be to manage the N x P slices as H discontiguous memory blocks and merely reorder the pointers with each new sample. Can I do either of these with numpy? Thanks, Alex

On 21/02/07, Alexander Michael <lxander.m@gmail.com> wrote:
I'm new to numpy and looking for advice on setting up and managing array data for my particular problem. I'm collecting observations of P properties for N objects over a rolling horizon of H sample times. I could conceptually store the data in three-dimensional array with shape (N,P,H) that would allow me to easily (and efficiently with strided slices) compute the statistics over both N and H that I am interested in. This is great, but the rub is that H, an interval of T, is a rolling horizon. T is to large to fit in memory, so I need to load up H, perform my calculations, pop the oldest N x P slice and push the newest N x P slice into the data cube. What's the best way to do this that will maintain fast computations along the one-dimensional slices over N and H? Is there a commonly accepted idiom?
Fundamentally, I see two solutions. The first would be to essentially perform a memcpy to propagate the data. The second would be to manage the N x P slices as H discontiguous memory blocks and merely reorder the pointers with each new sample. Can I do either of these with numpy?
Yes, and several other possibilities as well. To do a memcpy, all you need is buffer[...,:-1] = buffer[...,1:] buffer[...,-1] = new_data() somecomputation(buffer[13,5,:]) Discontiguous blocks are somewhat inconvenient; one of the key assumptions of numpy is that memory is stored in contiguous, homogeneous blocks. You can use python lists (which are lists of pointers), though: listofbuffers = listofbuffers[:-1]+(new_data(),) Extracting slices is now more awkward, either somecomputation([buffer[13,5] for buffer in listofbuffers]) or convert the list to an array (which involves copying all the elements): buffer = array(listofbuffers) somecomputation(buffer[13,5,:]) Alternatively, if most of your statistics don't care about the order of the data, you could maintain a rolling buffer: buffer[...,oldest] = new_data() oldest += 1 oldest %= H somecomputationwhereorderdoesntmatter(buffer[13,6,:]) somecomputation(concatenate((buffer[13,5,oldest:],buffer[13,5,:oldest]))) (this last copies the data for that one test.) Fancy indexing can also be used here to pull out the elements in the right order, and if your statistics can be easily rewritten to work on a wrapped array, this is probably the most efficient. Incidentally, if the array wants to be inhomogeneous along one dimension, you can use recarrays (apparently; I've never investigated them). Good luck, Anne

Anne Archibald wrote:
Discontiguous blocks are somewhat inconvenient; one of the key assumptions of numpy is that memory is stored in contiguous, homogeneous blocks.
Not to add anything really useful to this discussion, but I should correct this wording before it gives incorrect conceptions. Actually the key assumption NumPy makes is that memory is accessible through uniform "striding" and not that it is necessarily contiguous. You are correct, however that each element of the array must be of the same "data-type."
Incidentally, if the array wants to be inhomogeneous along one dimension, you can use recarrays (apparently; I've never investigated them).
Your arrays are still "homogeneous" when you use record arrays (i.e. each element of the array is still the same "data-type") It's just that the data-type can be complicated. You can have arrays whose elements consist of a 4-byte float and a 3-byte string for example. -Travis

On 2/21/07, Alexander Michael <lxander.m@gmail.com> wrote:
... T is to large to fit in memory, so I need to load up H, perform my calculations, pop the oldest N x P slice and push the newest N x P slice into the data cube. What's the best way to do this that will maintain fast computations along the one-dimensional slices over N and H? Is there a commonly accepted idiom?
Would loading your data via memmap, then slicing it, do your job (using numpy.memmap)? I work on 12 GB files with 4 GB of memory, but it is transparent to me since the OS takes care of moving data in and out of memory. May not be the fastest solution possible, but for me it is a case where dev time is more significant than run time. Mike -- mike.ressler@alum.mit.edu

On 2/21/07, Mike Ressler <mike.ressler@alum.mit.edu> wrote:
Would loading your data via memmap, then slicing it, do your job (using numpy.memmap)? ...
Interesting idea. I think Anne's suggestion that sliced assignment will reduce to an efficient memcpy fits my needs a bit better than memmap because I'll be pushing new N x P samples into the array that will be arriving while the monitor is running. Actually, I'm hoping sliced self-assignment is as efficient as memmove (i.e. without creating temporaries), since the dst and src are overlapping, but I haven't tried it yet to confirm if this is relatively efficient. Thank you both for your ideas, Alex

On 21/02/07, Alexander Michael <lxander.m@gmail.com> wrote:
On 2/21/07, Mike Ressler <mike.ressler@alum.mit.edu> wrote:
Would loading your data via memmap, then slicing it, do your job (using numpy.memmap)? ...
Interesting idea. I think Anne's suggestion that sliced assignment will reduce to an efficient memcpy fits my needs a bit better than memmap because I'll be pushing new N x P samples into the array that will be arriving while the monitor is running.
If you want a record of all of them on disk anyway, with careful management you can read from a file as it's being written, though you'll need some rather exotic numpy hackery to have an ever-growing array. It might actually be nice to have a little wrapper object for this sort of thing, as you can't be the only one who needs it.
Actually, I'm hoping sliced self-assignment is as efficient as memmove (i.e. without creating temporaries), since the dst and src are overlapping, but I haven't tried it yet to confirm if this is relatively efficient.
I think it is almost as efficient as memmove; in particular, it doesn't create any temporaries (be careful which way you do the assignment, in fact, or you'll have an array of all the same thing) but it does use a for loop in C (maybe several) instead of a highly-optimized C library bulk copy. (I'm not sure about this - it could be special-cased in the assignment code, but I don't think it is.) Anne

buffer = zeros([H+M,N,P], float) Then you'd look at the first H time slices. data = buffer[:H] Top pop one piece of data off the stack you'd simply shift data to look at a different place in the buffer. The first time, you'd have something like
If none of the suggested methods turn out to be efficient enough due to copying overhead, here's a way to reduce the copying overhead by trading memory (and a bit of complexity) for copying overhead. The general thrust is to allocate M extra slices of memory and then shift the data every M time slices instead of every time slice. First you would allocate a block of memory N*P*(H+M) in size. this:
data = buffer[1:1+H] Every M time steps you need to recopy the data. I expect that this should reduce your copying overhead a bunch since your not copying as frequently. It's pretty tunable too. You'd want to wrap some convenience functions around stuff to automate the copying and popping, but that should be easy enough.
I haven't tried this though, so caveat emptor. -tim

On 2/21/2007 11:03 PM, Anne Archibald wrote:
I think it is almost as efficient as memmove; in particular, it doesn't create any temporaries
A ring buffer is O(1) whereas a memmove is O(N). Unless the amount of data to be moved are very small, this makes the ringbuffer the more attractive solution. Slicing becomes a little bit more complicated with a ring, but not very much. The data are stored in two ordered and contiguous segments: after and before the stack pointer (in that order). Sturla Molden

On 2/22/07, Sturla Molden <sturla@molden.no> wrote:
A ring buffer is O(1) whereas a memmove is O(N). Unless the amount of data to be moved are very small, this makes the ringbuffer the more attractive solution.
Slicing becomes a little bit more complicated with a ring, but not very much. The data are stored in two ordered and contiguous segments: after and before the stack pointer (in that order).
I like the ring buffer idea as well. As order matters for some of my operations, I will need to put the two sides together somehow, but want to avoid blindly copying into a new array every time I slice on H. I would also like to avoid writing wrappers for every possible array operation. Does numpy provide a facility for creating a concatenated "view" that can be handled like a "normal" array and only gets copied into contiguous memory when push comes to shove? This discussion has been really helpful for me as a numpy neophyte, thanks everyone! Alex

Timothy's refinement of Anne's idea will work for me:
import timeit print '%.2fms/push' % (1000 * timeit.Timer( ... "a[...,:-1] = a[...,1:]", ... "from numpy import empty; a = empty((5000,20,1000))" ... ).timeit(number=10)/10)
537.86ms/push I still find the ring buffer solution appealing, but I did not see a way to stack two arrays together without creating copies. Am I missing a bit of numpy cleverness? Thank you everyone for your help, Alex

On 23/02/07, Alexander Michael <lxander.m@gmail.com> wrote:
I still find the ring buffer solution appealing, but I did not see a way to stack two arrays together without creating copies. Am I missing a bit of numpy cleverness?
The short answer is no; the stride in memory from one element to the next must always be the same in an array, and so it's impossible to join two arrays like that without copying. There is a ring-buffer-like idea that lets you do less copying; if you need slices of length N, you can choose M>=2N and maintain a ringbuffer of length M instead; when you push something into the M-N+jth position, you also push (i.e., copy) it into the jth position. Then you can always take a contiguous slice starting anywhere from 0 to M-N-1, and you need only copy a small fraction (N/M or so) of the values pushed on. You never move values in the ringbuffer, and there are no big spikes in CPU time. Anne

On 2/21/07, Mike Ressler <mike.ressler@alum.mit.edu> wrote:
Would loading your data via memmap, then slicing it, do your job (using numpy.memmap)? ...
Interesting idea. I think Anne's suggestion that sliced assignment will reduce to an efficient memcpy fits my needs a bit better than memmap because I'll be pushing new N x P samples into the array that will be arriving while the monitor is running. Actually, I'm hoping sliced self-assignment is as efficient as memmove (i.e. without creating temporaries), since the dst and src are overlapping, but I haven't tried it yet to confirm if this is relatively efficient. Thank you both for your ideas, Alex

Alexander Michael wrote:
I'm new to numpy and looking for advice on setting up and managing array data for my particular problem. I'm collecting observations of P properties for N objects over a rolling horizon of H sample times. I could conceptually store the data in three-dimensional array with shape (N,P,H) that would allow me to easily (and efficiently with strided slices) compute the statistics over both N and H that I am interested in. This is great, but the rub is that H, an interval of T, is a rolling horizon. T is to large to fit in memory, so I need to load up H, perform my calculations, pop the oldest N x P slice and push the newest N x P slice into the data cube. What's the best way to do this that will maintain fast computations along the one-dimensional slices over N and H? Is there a commonly accepted idiom?
Fundamentally, I see two solutions. The first would be to essentially perform a memcpy to propagate the data. The second would be to manage the N x P slices as H discontiguous memory blocks and merely reorder the pointers with each new sample. Can I do either of these with numpy?
The 'pointers reordering' can be very nicely done via deque from collections module - it behaves like a list, but has some additional methods like rotate(). I have used it successfully as a circular buffer for numpy arrays, but I can see that you need an efficient slicing over H, so a contiguous memory would be better for you IMHO. r.
participants (7)
-
Alexander Michael
-
Anne Archibald
-
Mike Ressler
-
Robert Cimrman
-
Sturla Molden
-
Timothy Hochberg
-
Travis Oliphant