[Numpy-discussion] caching large allocations on gnu/linux

Tue Mar 14 11:21:33 EDT 2017

On 13.03.2017 19:54, Francesc Alted wrote:
> 2017-03-13 18:11 GMT+01:00 Julian Taylor <jtaylor.debian at googlemail.com
> <mailto:jtaylor.debian at googlemail.com>>:
> 
>     On 13.03.2017 16:21, Anne Archibald wrote:
>     >
>     >
>     > On Mon, Mar 13, 2017 at 12:21 PM Julian Taylor
>     > <jtaylor.debian at googlemail.com
>     <mailto:jtaylor.debian at googlemail.com>
>     <mailto:jtaylor.debian at googlemail.com
>     <mailto:jtaylor.debian at googlemail.com>>>
>     > wrote:
>     >
>     >     Should it be agreed that caching is worthwhile I would propose a very
>     >     simple implementation. We only really need to cache a small handful of
>     >     array data pointers for the fast allocate deallocate cycle that appear
>     >     in common numpy usage.
>     >     For example a small list of maybe 4 pointers storing the 4 largest
>     >     recent deallocations. New allocations just pick the first memory block
>     >     of sufficient size.
>     >     The cache would only be active on systems that support MADV_FREE (which
>     >     is linux 4.5 and probably BSD too).
>     >
>     >     So what do you think of this idea?
>     >
>     >
>     > This is an interesting thought, and potentially a nontrivial speedup
>     > with zero user effort. But coming up with an appropriate caching policy
>     > is going to be tricky. The thing is, for each array, numpy grabs a block
>     > "the right size", and that size can easily vary by orders of magnitude,
>     > even within the temporaries of a single expression as a result of
>     > broadcasting. So simply giving each new array the smallest cached block
>     > that will fit could easily result in small arrays in giant allocated
>     > blocks, wasting non-reclaimable memory.  So really you want to recycle
>     > blocks of the same size, or nearly, which argues for a fairly large
>     > cache, with smart indexing of some kind.
>     >
> 
>     The nice thing about MADV_FREE is that we don't need any clever cache.
>     The same process that marked the pages free can reclaim them in another
>     allocation, at least that is what my testing indicates it allows.
>     So a small allocation getting a huge memory block does not waste memory
>     as the top unused part will get reclaimed when needed, either by numpy
>     itself doing another allocation or a different program on the system.
> 
> 
> Well, what you say makes a lot of sense to me, so if you have tested
> that then I'd say that this is worth a PR and see how it works on
> different workloads.
>  
> 
> 
>     An issue that does arise though is that this memory is not available for
>     the page cache used for caching on disk data. A too large cache might
>     then be detrimental for IO heavy workloads that rely on the page cache.
> 
> 
> Yeah.  Also, memory mapped arrays use the page cache intensively, so we
> should test this use case and see how the caching affects memory map
> performance.
>  
> 
>     So we might want to cap it to some max size, provide an explicit on/off
>     switch and/or have numpy IO functions clear the cache.
> 
> 
> Definitely dynamically
>  allowing the disabling
> this feature would be desirable.  That would provide an easy path for
> testing how it affects performance.  Would that be feasible?
> 
> 

I have created a PR with such a simple cache implemented:
https://github.com/numpy/numpy/pull/8783

This sets the max amount of memory pointers to save and returns the old
value:
np.core.multiarray.set_memory_cache_size(4)
On system where it works it return a value greater 0 (4 currently).
The size of the cache in bytes is currently unbounded.
Setting the value to 0 clears and disables the cache.

You should probably not expect too large performance improvements. It
will only have an effect in applications that have measurable page
faulting overhead which only happens if you have lots of relatively
short operations that create copies of large arrays. So mostly
operations with temporaries and maybe some indexing operations.