[Numpy-discussion] allocated memory cache for numpy

Mon Feb 17 15:55:53 EST 2014

On 02/17/2014 03:42 PM, Nathaniel Smith wrote:
> Another optimization we should consider that might help a lot in the
> same situations where this would help: for code called from the
> cpython eval loop, it's afaict possible to determine which inputs are
> temporaries by checking their refcnt. In the second call to __add__ in
> '(a + b) + c', the temporary will have refcnt 1, while the other
> arrays will all have refcnt >1. In such cases (subject to various
> sanity checks on shape, dtype, etc) we could elide temporaries by
> reusing the input array for the output. The risk is that there may be
> some code out there that calls these operations directly from C with
> non-temp arrays that nonetheless have refcnt 1, but we should at least
> investigate the feasibility. E.g. maybe we can do the optimization for
> tp_add but not PyArray_Add. 

For element-wise operations such as the above, wouldn't it be even
better to use loop fusion, by evaluating the entire compound expression
per element, instead of each individual operation ? That would require
methods such as __add__ to return an operation object, rather than the
result value. I believe a technique like that is used in the numexpr
package (https://github.com/pydata/numexpr), which I saw announced here
recently...

FWIW,
            Stefan

PS: Such a loop-fusion technique would also open the door to other
optimizations, such as vectorization (simd)...

-- 

      ...ich hab' noch einen Koffer in Berlin...