[Numpy-discussion] Numpy and PEP 343

Tue Feb 28 11:41:15 EST 2006

<pie-in-the-sky>

An idea that has popped up from time to time is delaying evalution of a 
complicated expressions so that the result can be computed more 
efficiently. For instance, the matrix expression:

a = b*c + d*e

results in the creation of two, potentially large, temporary matrices 
and also does a couple of extra loops at the C level than the equivalent 
expression implemented in C would.

The general idea has been to construct some sort of psuedo-object, when 
the numerical operations are indicated, then do the actual numerical 
operations at some later time. This would be very problematic if 
implemented for all arrays since it would quickly become impossible to 
figure out what was going on, particularly with view semantics. However, 
it could result in large performance improvements without becoming 
incomprehensible if implemented in small enough chunks.

A "straightforward" approach would look something like:

    numpy.begin_defer()    # Now all numpy operations (in this thread)
    are deferred
    a = b*c + d*e # 'a' is a special object that holds pointers to
                          #  'b', 'c', 'd' and 'e' and knows what ops to
    perform.
    numpy.end_defer() # 'a' performs the operations and now looks like
    an array

Since 'a' knows the whole series of operations in advance it can perform 
them more efficiently than would be possible using the basic numpy 
machinery. Ideally, the space for 'a' could be allocated up front, and 
all of the operations could be done in a single loop. In practice the 
optimization might be somewhat less ambitious, depending on how much 
energy people put into this. However, this approach has some problems. 
One is the syntax, which clunky and a bit unsafe (a missing end_defer in 
a function could cause stuff to break very far away). The other is that 
I suspect that this sort of deferred evaluation makes multiple views of 
an array even more likely to bite the unwary.

The syntax issue can be cleanly addressed now that PEP 343 (the 'with' 
statement) is going into Python 2.5. Thus the above would look like:

with numpy.deferral():
    a = b*c + d*e

Just removing the extra allocation of temporary variables can result in 
30% speedup for this case[1], so the payoff would likely be large. On 
the down side, it could be quite a can of worms, and would likely 
require a lot of work to implement.

Food for thought anyway.

</pie-in-the-sky>

-tim

[1]

from timeit import Timer
print Timer('a = b*c + d*e', 'from numpy import 
arange;b=c=d=e=arange(100000.)').timeit(10000)
print Timer('a = b*c; multiply(d,e,temp); a+=temp',
            'from numpy import arange, zeros, multiply;'
            'b=c=d=e=arange(100000.);temp=zeros([100000], 
dtype=float)').timeit(10000)

=>

94.8665989672
62.6143562939