
On Tue, Mar 07, 2006 at 11:33:45AM -0700, Tim Hochberg wrote:
Tim Hochberg wrote:
3. Reduction. I figure this could be done at the end of the program in each loop: sum or multiply the output register. Downcasting the output could be done here too.
I'm still not excited about summing over the whole output buffer though. That ends up allocating and scanning through a whole extra buffer which may result in a signifigant speed and memory hit for large arrays. Since if we're only doing this on the way out, there should be no problem just allocating a single double (or complex) to do the sum in. On the way in, this could be set to zero or one based on what the last opcode is (sum or product). Then the SUM opcode could simply do something like:
No, no, we'd just sum over the 128 element output vector (mem[0]), and add the result to cumulative sum. That vector should already be in cache, as the last op would put it there.
BTW, the cleanup of the interpreter looks pretty slick.
Not finished yet :-) Look for a checkin today (if I have time). -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm@physics.mcmaster.ca