[Numpy-discussion] tracing numpy data allocation with python callbacks

Nathaniel Smith njs at pobox.com
Thu May 17 15:52:32 EDT 2012


On Thu, May 17, 2012 at 7:50 PM, Stéfan van der Walt <stefan at sun.ac.za> wrote:
> On Wed, May 16, 2012 at 12:34 PM, Thouis Jones <thouis.jones at curie.fr> wrote:
>> I wondered, however, if there were a better way to accomplish the same
>> goal, preferably in pure python.
>
> Fabien recently posted this; not sure if it addresses your use case:
>
> http://fseoane.net/blog/2012/line-by-line-report-of-memory-usage/

I'd be wary of that blog's technique... getting accurate/meaningful
memory usage from the kernel is *very* complicated. It'll more-or-less
work in some cases, but definitely not all. You'll get spurious size
changes if you use mmap (or just load new shared libraries), the
portion of the heap that holds small objects will have a reported size
that only increases, never decreases (even if your actual usage
decreases), etc. I'm not saying it's not useful (and it could be
somewhat more accurate if it used /proc/self/smaps where available
instead of statm), but real heap tracing has a lot of advantages.

I can't see any way to trace numpy's allocation/deallocation out of
the box. If PyDataMem_* were real dynamically linked functions, you
could use various tricks to intercept calls to them, but in fact
they're just aliases for malloc/free/realloc, so there's no way to
separate out numpy's uses from other heap allocations without
recompiling.

If there were a compelling reason then I can't see why anyone would
object to adding a memory tracing API to numpy... it's not like we
call malloc in tight loops, and when disabled it'd just be a single
branch per malloc/free. Not sure how generally useful that would be,
though.

I'd be tempted to just see if I could get by with massif or another
"real" heap profiler -- unfortunately the ones I know are C oriented,
but might still be useful...

- N



More information about the NumPy-Discussion mailing list