[Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes

Fri Jun 6 10:01:17 CEST 2014

On 06.06.2014 04:18, Sturla Molden wrote:
> On 05/06/14 22:51, Nathaniel Smith wrote:
> 
>> This gets evaluated as:
>>
>>     tmp1 = a + b
>>     tmp2 = tmp1 + c
>>     result = tmp2 / c
>>
>> All these temporaries are very expensive. Suppose that a, b, c are
>> arrays with N bytes each, and N is large. For simple arithmetic like
>> this, then costs are dominated by memory access. Allocating an N byte
>> array requires the kernel to clear the memory, which incurs N bytes of
>> memory traffic.
> 
> It seems to be the case that a large portion of the run-time in Python
> code using NumPy can be spent in the kernel zeroing pages (which the
> kernel does for security reasons).
> 
> I think this can also be seen as a 'malloc problem'. It comes about
> because each new NumPy array starts with a fresh buffer allocated by
> malloc. Perhaps buffers can be reused?
> 
> Sturla
> 
> 

Caching memory inside of numpy would indeed solve this issue too. There
has even been a paper written on this which contains some more serious
benchmarks than the laplace case which runs on very old hardware (and
the inplace and out of place cases are actually not the same, one
computes array/scalar the other array * (1 / scalar)):

hiperfit.dk/pdf/Doubling.pdf
"The result is an improvement of as much as 2.29 times speedup, on
average 1.32 times speedup across a benchmark suite of 15 applications"

The problem with this approach is that it is already difficult enough to
handle memory in numpy. Having a cache that potentially stores gigabytes
of memory out of the users sight will just make things worse.

This would not be needed if we can come up with a way on how python can
help out numpy in eliding the temporaries.