
A Friday 19 March 2010 18:13:33 Anne Archibald escrigué: [clip]
What I didn't go into in detail in the article was that there's a trade-off of processing versus memory access available: we could reduce the memory load by a factor of eight by doing interpolation on the fly instead of all at once in a giant FFT. But that would cost cache space and flops, and we're not memory-dominated.
One thing I didn't try, and should: running four of these jobs at once on a four-core machine. If I correctly understand the architecture, that won't affect the cache issues, but it will effectively quadruple the memory bandwidth needed, without increasing the memory bandwidth available. (Which, honestly, makes me wonder what the point is of building multicore machines.)
Maybe I should look into that interpolation stuff.
Please do. Although you may be increasing the data rate by 4x, your program is already very efficient in how it handles data, so chances are that you still get a good speed-up. I'd glad to hear you back on your experience. -- Francesc Alted