Hello,
I have two 2D arrays, Q is 3byq and R is rby3. At each q, I need to sum q(r) over R, so I take a dot product RQ and then sum along one axis to get a 1byq result.
I'm doing this with dot products because it is much faster than the equivalent for or while loop. The intermediate rbyq array can get very large though (200MB in my case), so I was wondering if there is a better way to go about it?
If not, I can slice up R and deal with it one chunk at a time, then the intermediate arrays fit within the available system resources. Would somebody offer a suggestion of how to do this intelligently? Should the intermediate array be about the size of the processor cache, some fraction of the available memory, or is there something else I need to consider?
Thank you, Darren
Darren Dale wrote:
Hello,
I have two 2D arrays, Q is 3byq and R is rby3. At each q, I need to sum q(r) over R, so I take a dot product RQ and then sum along one axis to get a 1byq result.
I'm doing this with dot products because it is much faster than the equivalent for or while loop. The intermediate rbyq array can get very large though (200MB in my case), so I was wondering if there is a better way to go about it?
I think so. I believe you are doing something like this:
result_1 = na.sum(na.dot(R,Q), 0)
I'm fairly certain (but I urge you to double check), that this reduces to:
result_2 = na.dot(na.sum(R, 0), Q)
which will take up much less intermediate storage and be faster to boot. In more quasimathematical notations:
result_1 => sum_i sum_j R_ij Qjk = sum_j sum_i R_ij Q_jk = sum_j Q_jk sum_i R_ij => result_2
A quick test seems to confirm this:
import numarray as na from numarray import random_array
q = 10 r = 12
R = random_array.random((r,3)) Q = random_array.random((3,q))
x1 = na.sum(na.dot(R,Q), 0) x2 = na.dot(na.sum(R, 0), Q)
print na.allclose(x1, x2)
tim
If not, I can slice up R and deal with it one chunk at a time, then the intermediate arrays fit within the available system resources. Would somebody offer a suggestion of how to do this intelligently? Should the intermediate array be about the size of the processor cache, some fraction of the available memory, or is there something else I need to consider?
Thank you, Darren
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl _______________________________________________ Numpydiscussion mailing list Numpydiscussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpydiscussion
Thank you for your response, Tim,
On Friday 15 October 2004 06:07 pm, Tim Hochberg wrote:
Darren Dale wrote:
Hello,
I have two 2D arrays, Q is 3byq and R is rby3. At each q, I need to sum q(r) over R, so I take a dot product RQ and then sum along one axis to get a 1byq result.
I'm doing this with dot products because it is much faster than the equivalent for or while loop. The intermediate rbyq array can get very large though (200MB in my case), so I was wondering if there is a better way to go about it?
I'm fairly certain (but I urge you to double check), that this reduces to:
result_2 = na.dot(na.sum(R, 0), Q)
Yes. As usual, I left out a bit of information that turned out to be important. See below
A modified test:
from numarray import * from numarray import random_array
q = 10 r = 12
R = random_array.random((r,3)) Q = random_array.random((3,q))
x1 = sum( exp(1j*dot(R,Q)), 0) #note complex argument to exp() x2 = exp(1j*dot(sum(R, 0), Q))
print allclose(x1, x2)
The complex arithmetic changes things. I am still learning how to keep my code efficient. The following code is actually almost as fast as using the large dot product, apparently I had some other sinks in my original tests:
phase = zeros(len(Q[0]),'d') for i in range(len(Q[0])): phase[i] = phase[i] + sum(exp(1j*dot(R,Q[:,i])), 0)
If q=1000 and r=2500, the for loop takes about 13% longer than the dot product method. Incredibly, if q=10,000 and r=2500, the for loop is 17% faster. So I am going to use it instead. Apparently I had some other time sink in my original test.
from numarray import * from numarray import random_array from time import clock
q = 10000 r = 2500
R = random_array.random((r,3)) Q = random_array.random((3,q))
t0 = clock() x1 = sum(exp(1j*dot(R,Q)), 0) #note complex argument to exp() t1 = clock() dt1 = t1t0
phase = zeros(len(Q[0]),'d') for i in range(len(Q[0])): phase[i] = phase[i] + sum(exp(1j*dot(R,Q[:,i])), 0)
t2 = clock() dt2 = t2t1
print (dt2dt1)/dt1
participants (2)

Darren Dale

Tim Hochberg