[Numpy-discussion] numexpr thoughts

Tue Mar 7 10:35:02 EST 2006

Tim Hochberg wrote:

>> 3. Reduction. I figure this could be done at the end of the program in
>>   each loop: sum or multiply the output register. Downcasting the
>>   output could be done here too.
>>  
>>
> Do you mean that sum() and friends could only occur as the outermost 
> function. That is:
>    "sum(a+3*b)"
> would work, but:
>   "where(a, sum(a+3*b), c)"
> would not? Or am I misunderstanding you here? I don't think I like 
> that limitation if that's the case. I don' think it should be 
> necessary either.

OK, I thought about this some more and I think I was mostly wrong. I 
suspect that reduction does need to happen as the last step. Still it 
would be nice for "where(a, sum(a+3*b), c)" to work. This could be done 
by doing the following transformation:
    a = evaluate("where(a, sum(a+3*b), c)") =>  
temp=evaluate("sum(a+3*b)"); a = evaluate("where(a, temp, c)")
I suspect that this this would be fairly easy to do automagically as 
long as it was at the Python level. That is, numexpr would return a 
python object that would call the lower level interpreter.numexpr 
appropriately. This would have some other advantages as well -- it would 
make it easy to deal with keyword arguments for one. It would also make 
it easy to do the bytecode rewriting if we decide to go that route. It 
does add more call overhead, but if that turns out to be we can move 
stuff into C later.

I'm still not excited about summing over the whole output buffer though. 
That ends up allocating and scanning through a whole extra buffer which 
may result in a signifigant speed and memory hit for large arrays. Since 
if we're only doing this on the way out, there should be no problem just 
allocating a single double (or complex) to do the sum in.  On the way 
in, this could be set to zero or one based on what the last opcode is 
(sum or product). Then the SUM opcode could simply do something like:

BTW, the cleanup of the interpreter looks pretty slick. I'm going to 
look at timings for using COPY_C versus using add directly and see about 
reducing the number of opcodes. If this works out OK, the number 
comparison opcodes could be reduced a lot. Sorry about munging the line 
endings.

-tim