[Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed

Tue Apr 15 14:27:52 EDT 2014

On 15.04.2014 18:39, Nathaniel Smith wrote:
> On Tue, Apr 15, 2014 at 4:08 PM, Julian Taylor
> <jtaylor.debian at googlemail.com <mailto:jtaylor.debian at googlemail.com>>
> wrote:
>> On Tue, Apr 15, 2014 at 3:07 PM, Nathaniel Smith <njs at pobox.com
> <mailto:njs at pobox.com>> wrote:
>>> On Tue, Apr 15, 2014 at 12:06 PM, Julian Taylor
>>> <jtaylor.debian at googlemail.com
> <mailto:jtaylor.debian at googlemail.com>> wrote:
>>>>> Good news, though! python-dev is in favor of adding calloc() to the
>>>>> core allocation interfaces, which will let numpy join the party. See
>>>>> python-dev thread:
>>>>> https://mail.python.org/pipermail/python-dev/2014-April/133985.html
>>>>>
>>>>> It would be especially nice if we could get this into 3.5, since it
>>>>> seems likely that lots of numpy users will be switching to 3.5 when it
>>>>> comes out, and having a good memory tracing infrastructure there
>>>>> waiting for them make it even more awesome.
>>>>>
>>>>> Anyone interested in picking this up?
>>>>> http://bugs.python.org/issue21233
>>>>
>>>> Hi,
>>>> I think it would be a better idea to instead of API functions for one
>>>> different type of allocator we get access to use the python hooks
>>>> directly with whatever allocator we want to use.
>>>
>>> Unfortunately, that's not how the API works. The way that third-party
>>> tracers register a 'hook' is by providing a new implementation of
>>> malloc/free/etc. So there's no general way to say "please pretend to
>>> have done a malloc".
>>>
>>> I guess we could potentially request the addition of
>>> fake_malloc/fake_free functions.
>>
>> Unfortunate, looking at the pep it seems either you have a custom
>> allocator or you have tracing but not both (unless you trace
>> yourself).
>> This seems like quite a limitation.
> 
> I don't think this is right - notice the PyMem_GetAllocator function,
> which lets you grab the old allocator. This means you can write a
> tracing "allocator" which just does its tracing and then delegates to
> the old allocator. (And looking at _tracemalloc.c this does seem to be
> how it works.) This means that any full allocator replacement has to be
> enabled first before any tracing allocator is enabled, but that's okay,
> because a full allocator has to be inserted *very* early in any case
> (like, before any allocations have happened) and can never be removed,
> so this doesn't seem so bad.
> 
> OTOH I don't think they've really thought about the case of stacking
> multiple tracing allocators. tracemalloc.stop() just unconditionally
> resets the allocator to whatever it was when tracemalloc.start() was
> called, and there's no guidelines on how to handle the lifetime of the
> ctx pointer. I'm not sure these issues cause any problems in practice
> though.
> 
>> Maybe it would have been more flexible if instead python provided
>> three functions:
>>
>> PyMem_RegisterAlloc(size);
>> PyMem_RegisterReAlloc(size);
>> PyMem_RegisterFree(size);
>> + possibly nogil variantes
>> These functions call into registered tracing functions (registered
>> e.g. by tracemalloc.start()) or do nothing.
>>
>> Our allocator (and pythons) then just always calls these functions and
>> continues doing its stuff.
> 
> You'd need to add some void* arguments as well -- tracemalloc actually
> tracks every allocation independently, so you can do things like ask
> "which line of code was responsible for allocating the largest portion
> of the memory that is still in use".
> 
> And unfortunately once you add these arguments the resulting signatures
> don't quite match regular malloc/realloc/free (you have to pass a void*
> into malloc instead of receiving one), so we can't just define a
> PYMEM_NULL domain. (Or rather, we could, but then it would have to
> return an opaque void* used only for memory tracking, and we'd have to
> keep track of this alongside every allocation we did, and that would suck.)
> 

the tracing could register the context at the same time it registers its
tracing functions and then retrieve it from python when they need it.
I assume you currently can only have one custom allocator per
interpreter instance so a global context for tracing should not be a big
issue.