[Numpy-discussion] automatically avoiding temporary arrays

Mon Oct 3 14:43:16 EDT 2016

On 03.10.2016 20:23, Chris Barker wrote:
> 
> 
> On Mon, Oct 3, 2016 at 3:16 AM, Julian Taylor
> <jtaylor.debian at googlemail.com <mailto:jtaylor.debian at googlemail.com>>
> wrote:
> 
>     the problem with this approach is that we don't really want numpy
>     hogging on to hundreds of megabytes of memory by default so it would
>     need to be a user option.
> 
> 
> indeed -- but one could set an LRU cache to be very small (few items,
> not small memory), and then it get used within expressions, but not hold
> on to much outside of expressions.

numpy doesn't see the whole expression so we can't really do much.
(technically we could in 3.5 by using pep 523, but that would be a
larger undertaking)

> 
> However, is the allocation the only (Or even biggest) source of the
> performance hit?
>  

on large arrays the allocation is insignificant. What does cost some
time is faulting the memory into the process which implies writing zeros
into the pages (a page at a time as it is being used).
By storing memory blocks in numpy we would save this portion. This is
really the job of the libc, but these are usually tuned for general
purpose workloads and thus tend to give back memory back to the system
much earlier than numerical workloads would like.

Note that numpy already has a small memory block cache but its only used
for very small arrays where the allocation cost itself is significant,
it is limited to a couple megabytes at most.