Running mean: numpy or pyrex?
I need to run a series of convolution filters over swath bathymetry data. The data is spaced in track lines approximately 4048 samples across and several million lines long. Currently, I have the data in a python 2D list and then send several variations of a running mean down the track. I was fairly impressed with the filter performance using psyco until I met some truly huge files. Now I am wondering if I would get any speed benefit from using numpy (for the 2D array access) or would I be better off trying pyrex for the running mean loops? One of the issues I am facing is the possibility that very large files may need to be read from the disk as needed instead of holding the whole thing in ram. I also need to distrubute this code to my team, so minimizing the pain of installing numeric and other packages would be an issue. Should I expect significant speed benefits from a numy version of this type of code? Note that I have already made the simple optimization to calculate the full mean only the first iteration and just update the mean with the new data for the remaining iterations. Thanks for any advise you might have. -- David Finlayson
On 30/09/06, David Finlayson <david.p.finlayson@gmail.com> wrote:
I was fairly impressed with the filter performance using psyco until I met some truly huge files. Now I am wondering if I would get any speed benefit from using numpy (for the 2D array access) or would I be better off trying pyrex for the running mean loops? One of the issues I am facing is the possibility that very large files may need to be read from the disk as needed instead of holding the whole thing in ram. I also need to distrubute this code to my team, so minimizing the pain of installing numeric and other packages would be an issue. Should I expect significant speed benefits from a numy version of this type of code?
Numpy is designed for exactly this kind of computation. It will almost certainly be simpler to express your calculation in numpy, and it may be faster (as the internal loops are written in C). Numpy is also capable of operating on on-disk arrays by mmap()ing them (so that pages are loaded and unloaded as needed). Once in numpy, there are also tools such as weave which should allow you to accelerate calculations further (by running more code in C). On the other hand, depending what you're doing, psyco may accelerate your program, and it is certainly easy to add (two lines: import psyco; psyco.full() ) though it has never actually accelerated any computation that I tried it on. If you don't care about linear algebra speed, I think numpy is relatively easy to install. (For me it was just a question of clicking on a checkbox as iti is packaged for my distribution of Linux.) If you're doing major calculations, you may need to ensure that your favourite fast linear algebra package gets detected. If you do end up trying it, please report on your experiences. Performance is not always what we expect. (For example, I tried converting a number of examples from the Great Computer Language Shootout to use Numeric only to discover that they ran more slowly.) A. M. Archibald
participants (2)
-
A. M. Archibald
-
David Finlayson