
If I read your C++ right (and I may not have, I'm a C++ novice), you allocated the memory for all three arrays, and then performed your loop. In the Python version, the result array is allocated when the multiplication is perfomed, so you are allocating and freeing the result array each tim ein the loop. That may slow things down a little. In a real application, you are less likely to be re-doing the same computation over and over again, so the allocation would happen only once. You might try something like this, and see if it is any faster (it is more memory efficient) Note also that there is some overhead in function calls in Python, so you may get some speed up if you inline the call to mult_test. You can decide for yourself if this would still be a fair comparison. You might try something like this, and see if it is any faster (it is more memory efficient) (unfortunately, MA doesn't seem to support the thiord argument to multiply) My version (I don't have TimerUtility, so I used time.clock instead) got these times: Your code: completed 1000 in 99.050000 seconds 3.74e+06 checked multiplies/second My code: alternative completed 1000 in 80.070000 seconds 4.62e+06 checked multiplies/second It did buy you something: here is the code: #!/usr/bin/env python2.1 import sys # test harness for Masked array performonce #from MA import * from Numeric import * from time import clock def mult_test(a1, a2): res = a1 * a2 if __name__ == '__main__': repeat = 100 gates = 1000 beams = 370 if len(sys.argv) > 1: repeat = int(sys.argv[1]) t1 = ones((beams, gates), Float) a1 = t1 a2 = t1 # a1 = masked_values(t1, -327.68) # a2 = masked_values(t1, -327.68) i = 0 start = clock() while (i < repeat): i = i+1 res = mult_test(a1, a2) elapsed = clock() - start print 'completed %d in %f seconds' % (repeat , elapsed) cntMultiply = repeat*gates*beams print '%8.3g checked multiplies/second' % (cntMultiply/elapsed) print # alternative: res = zeros(a1.shape,Float) i = 0 start = clock() while (i < repeat): i = i+1 multiply(a1, a2, res) elapsed = clock() - start print 'alternative completed %d in %f seconds' % (repeat , elapsed) cntMultiply = repeat*gates*beams print '%8.3g checked multiplies/second' % (cntMultiply/elapsed) print Another note: calling ones with Float as your type gives you a Python float, which is a C double. Use 'f' or Float32 to get a C float. I've found on Intel hardware, doubles are just as fast (the FPU used doubles anyway), but they do use more memory, so this could make a difference. -Chris -- Christopher Barker, Ph.D. ChrisHBarker@home.net --- --- --- http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Oil Spill Modeling ------ @ ------ @ ------ @ Water Resources Engineering ------- --------- -------- Coastal and Fluvial Hydrodynamics -------------------------------------- ------------------------------------------------------------------------