[Numpy-discussion] allocated memory cache for numpy

Tue Feb 18 04:05:31 EST 2014

On Tue, Feb 18, 2014 at 1:47 AM, David Cournapeau <cournape at gmail.com> wrote:
>
> On Mon, Feb 17, 2014 at 7:31 PM, Julian Taylor
> <jtaylor.debian at googlemail.com> wrote:
>>
>> hi,
>> I noticed that during some simplistic benchmarks (e.g.
>> https://github.com/numpy/numpy/issues/4310) a lot of time is spent in
>> the kernel zeroing pages.
>> This is because under linux glibc will always allocate large memory
>> blocks with mmap. As these pages can come from other processes the
>> kernel must zero them for security reasons.
>
>
> Do you have numbers for 'a lot of time' ? Is the above script the exact one
> you used for benchmarking this issue ?

I saw it in many benchmarks I did over time for the numerous little
improvements I added.
But I'm aware these are overly simplistic, thats why I'm asking for
numbers from real applications.
The paper I found did a little more thorough benchmarks and seem to
indicate more applications profit from it.

>
>>
>> For memory within the numpy process this unnecessary and possibly a
>> large overhead for the many temporaries numpy creates.
>>
>> The behavior of glibc can be tuned to change the threshold at which it
>> starts using mmap but that would be a platform specific fix.
>>
>> I was thinking about adding a thread local cache of pointers to of
>> allocated memory.
>> When an array is created it tries to get its memory from the cache and
>> when its deallocated it returns it to the cache.
>> The threshold and cached memory block sizes could be adaptive depending
>> on the application workload.
>>
>> For simplistic temporary heavy benchmarks this eliminates the time spent
>> in the kernel (system with time).
>
>
> For this kind of setup, I would advise to look into perf on linux. It should
> be much more precise than time.
>
> If nobody beats me to it, I can try to look at this this WE,

I'm using perf for most of my benchmarks.
But in this case time is sufficient as the system time is all you need to know.
perf confirms this time is almost all spent zeroing pages.