[Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed

Julian Taylor jtaylor.debian at googlemail.com
Tue Apr 15 14:27:52 EDT 2014

On 15.04.2014 18:39, Nathaniel Smith wrote:
> On Tue, Apr 15, 2014 at 4:08 PM, Julian Taylor
> <jtaylor.debian at googlemail.com <mailto:jtaylor.debian at googlemail.com>>
> wrote:
>> On Tue, Apr 15, 2014 at 3:07 PM, Nathaniel Smith <njs at pobox.com
> <mailto:njs at pobox.com>> wrote:
>>> On Tue, Apr 15, 2014 at 12:06 PM, Julian Taylor
>>> <jtaylor.debian at googlemail.com
> <mailto:jtaylor.debian at googlemail.com>> wrote:
>>>>> Good news, though! python-dev is in favor of adding calloc() to the
>>>>> core allocation interfaces, which will let numpy join the party. See
>>>>> python-dev thread:
>>>>> https://mail.python.org/pipermail/python-dev/2014-April/133985.html
>>>>> It would be especially nice if we could get this into 3.5, since it
>>>>> seems likely that lots of numpy users will be switching to 3.5 when it
>>>>> comes out, and having a good memory tracing infrastructure there
>>>>> waiting for them make it even more awesome.
>>>>> Anyone interested in picking this up?
>>>>> http://bugs.python.org/issue21233
>>>> Hi,
>>>> I think it would be a better idea to instead of API functions for one
>>>> different type of allocator we get access to use the python hooks
>>>> directly with whatever allocator we want to use.
>>> Unfortunately, that's not how the API works. The way that third-party
>>> tracers register a 'hook' is by providing a new implementation of
>>> malloc/free/etc. So there's no general way to say "please pretend to
>>> have done a malloc".
>>> I guess we could potentially request the addition of
>>> fake_malloc/fake_free functions.
>> Unfortunate, looking at the pep it seems either you have a custom
>> allocator or you have tracing but not both (unless you trace
>> yourself).
>> This seems like quite a limitation.
> I don't think this is right - notice the PyMem_GetAllocator function,
> which lets you grab the old allocator. This means you can write a
> tracing "allocator" which just does its tracing and then delegates to
> the old allocator. (And looking at _tracemalloc.c this does seem to be
> how it works.) This means that any full allocator replacement has to be
> enabled first before any tracing allocator is enabled, but that's okay,
> because a full allocator has to be inserted *very* early in any case
> (like, before any allocations have happened) and can never be removed,
> so this doesn't seem so bad.
> OTOH I don't think they've really thought about the case of stacking
> multiple tracing allocators. tracemalloc.stop() just unconditionally
> resets the allocator to whatever it was when tracemalloc.start() was
> called, and there's no guidelines on how to handle the lifetime of the
> ctx pointer. I'm not sure these issues cause any problems in practice
> though.
>> Maybe it would have been more flexible if instead python provided
>> three functions:
>> PyMem_RegisterAlloc(size);
>> PyMem_RegisterReAlloc(size);
>> PyMem_RegisterFree(size);
>> + possibly nogil variantes
>> These functions call into registered tracing functions (registered
>> e.g. by tracemalloc.start()) or do nothing.
>> Our allocator (and pythons) then just always calls these functions and
>> continues doing its stuff.
> You'd need to add some void* arguments as well -- tracemalloc actually
> tracks every allocation independently, so you can do things like ask
> "which line of code was responsible for allocating the largest portion
> of the memory that is still in use".
> And unfortunately once you add these arguments the resulting signatures
> don't quite match regular malloc/realloc/free (you have to pass a void*
> into malloc instead of receiving one), so we can't just define a
> PYMEM_NULL domain. (Or rather, we could, but then it would have to
> return an opaque void* used only for memory tracking, and we'd have to
> keep track of this alongside every allocation we did, and that would suck.)

the tracing could register the context at the same time it registers its
tracing functions and then retrieve it from python when they need it.
I assume you currently can only have one custom allocator per
interpreter instance so a global context for tracing should not be a big

More information about the NumPy-Discussion mailing list