[Numpy-discussion] allocated memory cache for numpy

David Cournapeau cournape at gmail.com
Mon Feb 17 19:47:30 EST 2014

On Mon, Feb 17, 2014 at 7:31 PM, Julian Taylor <
jtaylor.debian at googlemail.com> wrote:

> hi,
> I noticed that during some simplistic benchmarks (e.g.
> https://github.com/numpy/numpy/issues/4310) a lot of time is spent in
> the kernel zeroing pages.
> This is because under linux glibc will always allocate large memory
> blocks with mmap. As these pages can come from other processes the
> kernel must zero them for security reasons.

Do you have numbers for 'a lot of time' ? Is the above script the exact one
you used for benchmarking this issue ?

> For memory within the numpy process this unnecessary and possibly a
> large overhead for the many temporaries numpy creates.
> The behavior of glibc can be tuned to change the threshold at which it
> starts using mmap but that would be a platform specific fix.
> I was thinking about adding a thread local cache of pointers to of
> allocated memory.
> When an array is created it tries to get its memory from the cache and
> when its deallocated it returns it to the cache.
> The threshold and cached memory block sizes could be adaptive depending
> on the application workload.
> For simplistic temporary heavy benchmarks this eliminates the time spent
> in the kernel (system with time).

For this kind of setup, I would advise to look into perf on linux. It
should be much more precise than time.

If nobody beats me to it, I can try to look at this this WE,

> But I don't know how relevant this is for real world applications.
> Have you noticed large amounts of time spent in the kernel in your apps?

In my experience, more time is spent on figuring out how to spare memory
than speeding this kind of operations for 'real life applications' (TM).

What happens to your benchmark if you tune malloc to not use mmap at all ?


> I also found this paper which describes pretty much exactly what I'm
> proposing:
> pyhpc.org/workshop/papers/Doubling.pdf‎
> Someone know why their changes were never incorporated in numpy? I
> couldn't find a reference in the list archive.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140218/9b609420/attachment.html>

More information about the NumPy-Discussion mailing list