On 26 Jul 2005, at 18:41, Sebastian Haase wrote:
Hi, This is not sopposed to be an evil question; instead I'm hoping for the answer: "No, generally we get >=95% the speed of a pure C/fortran implementation" ;-)
you won't, generally. Question is: since you are certainly not going to gain an order of a magnitude doing it in C, do you really care?
But as I am the strongest Python/numarray advocate in our group I get often the answer that Matlab is (of course) also very convenient but generally memory handling and overall execution performance is so bad that for final implementation one would generally have to reimplement in C.
Well its true that implementations in C will be faster. And memory handling in Numeric/numarray can be a pain since the tendency is to create and destroy a lot of arrays if you are not careful.
We are a bio-physics group at UCSF developping new algorithms for deconvolution (often in 3D). Our data sets are regularly bigger than several 100MB. When deciding for numarray I was assuming that the "Hubble Crowd" had a similar situation and all the operations are therefore very much optimized for this type of data.
Funny you mention that example. I did my PhD in exactly the same field (considering you are from Sedats lab I guess you are in exactly the same field as I was/am, i.e. fluorescence microscopy. What are you guys up to these days?) and I developed all my algorithms in C at the time. Now, about 7 years later, I returned to the field to re-implement and extend some of my old algorithms for use with microscopy data that can consist of multiple sets, each several 100MB at least. Now I use python with numarray, and I am actualy quite happy with. I am pushing it by using up to 2GB of memory, (per process, after splitting the problem up and distributing it on a cluster...), but it works. I am sure I could squeeze maybe a factor of two or three in terms of speed and memory usage by rewriting in C, but that is currently not worth my time. So I guess that counts as using numarray as a prototyping environment, but the result is also suitable for production use.
Is 95% a reasonable number to hope for ? I did wrap my own version of FFTW (with "plan-caching"), which should give 100% of the C-speed.
That should help a lot, as the standard FFTs that come with Numarray/Numeric suck big time. I do use them, but have to go through all kind of tricks to get some decent memory usage in 32bit floating point. The FFT array module is in fact very badly written for use with large multi-dimensional data sets.
But concerns arise from expression like "a=b+c*a" (think "convenience"!): If a,b,c are each 3D-datastacks creation of temporary data-arrays for 'c*a' AND then also for 'b+...' would have to be very costly. (I think this is at least happening for Numeric - I don't know about Matlab and numarray)
That is indeed a problem, although I think in your case you maybe limited by your FFTs anyway, at least in terms of speed. One thing you should consider is replacing expressions such as ' c= a + b' with add(a, b, c). If you do that cleverly you can avoid quite some memory allocations and you 'should' get closer to C. That does not solve everything though: 1) Complex expressions still need to be broken up in sequences of operations which is likely slower then iterating once over you array and do the expression at each point. 2) Unfortunately not all numarray functions support an output array (maybe only the ufuncs?). This can be a big problem, as then temporary arrays must be allocated. (It sure was a problem for me.) You can of course always re-implement the parts that are critical in C and wrap them (as you did with FFTW). In fact, I think numarray now provides a relatively easy way to write ufuncs which would allow you to write a single python function in C for complex expressions.
Hoping for comments,
Hope this gives some insights. I guess I have had similar experiences, there are definitely some limits to the use of numarray/Numeric that could be relieved, for instance by having a consistent implementation of output arrays. That would allow writing algorithms where you could very strictly control the allocation and de-allocation of arrays, which would be a big help for working with large arrays. Cheers, Peter PS. I would not mind to hear a bit about your experiences doing the big deconvolutions in numarray/Numeric, but that may not be a good topic for this list.
Thanks Sebastian Haase UCSF, Sedat Lab
-- Dr Peter J Verveer European Molecular Biology Laboratory Meyerhofstrasse 1 D-69117 Heidelberg Germany Tel. +49 6221 387 8245 Fax. +49 6221 397 8306