[Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed

Nathaniel Smith njs at pobox.com
Tue Apr 15 05:48:14 EDT 2014


Hey all,

The well known memory_profiler module [1] is super-useful, but has a
fundamental limitation, which is the only way it can track allocations
is by constantly polling the OS for the size of the total process
address space. This is a crude and unreliable way of making
measurements.

In Python 3.4, there's a new "allocation hooks" infrastructure, that
allows one to precisely track the lifetime and size of every
allocation [2]. So this is pretty awesome, and we can expect there
will be more tools growing up around this interface.

But unfortunately, this system is useless for numpy right now, because
numpy does not use the Python memory allocation interface; this means
that numpy data is "invisible" to tracemalloc and related tools. Why
doesn't numpy use the Python memory allocation interface? Because
numpy needs calloc(), but Python doesn't expose calloc(), only
malloc()/realloc()/free().

Good news, though! python-dev is in favor of adding calloc() to the
core allocation interfaces, which will let numpy join the party. See
python-dev thread:
  https://mail.python.org/pipermail/python-dev/2014-April/133985.html

It would be especially nice if we could get this into 3.5, since it
seems likely that lots of numpy users will be switching to 3.5 when it
comes out, and having a good memory tracing infrastructure there
waiting for them make it even more awesome.

Anyone interested in picking this up?
  http://bugs.python.org/issue21233

-n

[1] https://pypi.python.org/pypi/memory_profiler
[2] https://docs.python.org/3.4/library/tracemalloc.html
-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org



More information about the NumPy-Discussion mailing list