[Numpy-discussion] allocated memory cache for numpy
njs at pobox.com
Mon Feb 17 15:42:13 EST 2014
On 17 Feb 2014 15:17, "Sturla Molden" <sturla.molden at gmail.com> wrote:
> Julian Taylor <jtaylor.debian at googlemail.com> wrote:
> > When an array is created it tries to get its memory from the cache and
> > when its deallocated it returns it to the cache.
> Good idea, however there is already a C function that does this. It uses a
> heap to keep the cached memory blocks sorted according to size. You know
> as malloc — and is why we call this allocation from the heap. Which by the
> way is what NumPy already does. ;-)
Common malloc implementations are not well optimized for programs that have
frequent, short-lived, large-sized allocations. Usually they optimize for
small short-lived allocations of of small sizes. It's totally plausible
that we could do a better job in the common case of array operations like
'a + b + c + d' that allocate and free a bunch of same-sized temporary
arrays as they go. (Right now, if those arrays are large, that expression
will always generate multiple mmap/munmap calls.) The question is to what
extent numpy programs are bottlenecked by such allocations.
Also, I'd be pretty wary of caching large chunks of unused memory. People
already have a lot of trouble understanding their program's memory usage,
and getting rid of 'greedy free' will make this even worse.
Another optimization we should consider that might help a lot in the same
situations where this would help: for code called from the cpython eval
loop, it's afaict possible to determine which inputs are temporaries by
checking their refcnt. In the second call to __add__ in '(a + b) + c', the
temporary will have refcnt 1, while the other arrays will all have refcnt
>1. In such cases (subject to various sanity checks on shape, dtype, etc)
we could elide temporaries by reusing the input array for the output. The
risk is that there may be some code out there that calls these operations
directly from C with non-temp arrays that nonetheless have refcnt 1, but we
should at least investigate the feasibility. E.g. maybe we can do the
optimization for tp_add but not PyArray_Add.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion