[Numpy-discussion] allocated memory cache for numpy

Mon Feb 17 19:45:05 EST 2014

On 02/17/2014 06:56 PM, Nathaniel Smith wrote:
> On Mon, Feb 17, 2014 at 3:55 PM, Stefan Seefeld <stefan at seefeld.name> wrote:
>> On 02/17/2014 03:42 PM, Nathaniel Smith wrote:
>>> Another optimization we should consider that might help a lot in the
>>> same situations where this would help: for code called from the
>>> cpython eval loop, it's afaict possible to determine which inputs are
>>> temporaries by checking their refcnt. In the second call to __add__ in
>>> '(a + b) + c', the temporary will have refcnt 1, while the other
>>> arrays will all have refcnt >1. In such cases (subject to various
>>> sanity checks on shape, dtype, etc) we could elide temporaries by
>>> reusing the input array for the output. The risk is that there may be
>>> some code out there that calls these operations directly from C with
>>> non-temp arrays that nonetheless have refcnt 1, but we should at least
>>> investigate the feasibility. E.g. maybe we can do the optimization for
>>> tp_add but not PyArray_Add.
>> For element-wise operations such as the above, wouldn't it be even
>> better to use loop fusion, by evaluating the entire compound expression
>> per element, instead of each individual operation ? That would require
>> methods such as __add__ to return an operation object, rather than the
>> result value. I believe a technique like that is used in the numexpr
>> package (https://github.com/pydata/numexpr), which I saw announced here
>> recently...
> Hi Stefan (long time no see!),

Indeed ! :-)

> Sure, that would be an excellent thing, but adding a delayed
> evaluation engine to numpy is a big piece of new code, and we'd want
> to make it something you opt-in to explicitly. (There are too many
> weird potential issues with e.g. errors showing up at some far away
> place from the actual broken code, due to evaluation being delayed to
> there.) By contrast, the optimization suggested here is a tiny change
> we could do now, and would still be useful even in the hypothetical
> future where we do have lazy evaluation, for anyone who doesn't use
> it.

Sure, I fully agree. I didn't mean to suggest this as an alternative to
a focused memory management optimization.
Still, it seems this would be a nice project (perhaps even under the
GSoC umbrella). It could be controlled by a metaclass (substituting
appropriate ndarray methods), and thus could be enabled separately and
explicitly.

Anyhow, just an idea for someone else to pick up. :-)

    Stefan

-- 

      ...ich hab' noch einen Koffer in Berlin...