I need to run a series of convolution filters over swath bathymetry data. The data is spaced in track lines approximately 4048 samples across and several million lines long. Currently, I have the data in a python 2D list and then send several variations of a running mean down the track.

I was fairly impressed with the filter performance using psyco until I met some truly huge files. Now I am wondering if I would get any speed benefit from using numpy (for the 2D array access) or would I be better off trying pyrex for the running mean loops? One of the issues I am facing is the possibility that very large files may need to be read from the disk as needed instead of holding the whole thing in ram. I also need to distrubute this code to my team, so minimizing the pain of installing numeric and other packages would be an issue. Should I expect significant speed benefits from a numy version of this type of code?

Note that I have already made the simple optimization to calculate the full mean only the first iteration and just update the mean with the new data for the remaining iterations.

Thanks for any advise you might have.

--
David Finlayson