[Python-Dev] Add a new tracemalloc module to trace memory allocations

Sat Aug 31 19:09:09 CEST 2013

First, I really like this.  +1

On Wed, Aug 28, 2013 at 6:07 PM, Victor Stinner <victor.stinner at gmail.com>wrote:

> 2013/8/29 Victor Stinner <victor.stinner at gmail.com>:
> > My proposed implementation for Python 3.4 is different:
> >
> > * no enable() / disable() function: tracemalloc can only be enabled
> > before startup by setting PYTHONTRACEMALLOC=1 environment variable
> >
> > * traces (size of the memory block, Python filename, Python line
> > number) are stored directly in the memory block, not in a separated
> > hash table
> >
> > I chose PYTHONTRACEMALLOC env var instead of enable()/disable()
> > functions to be able to really trace *all* memory allocated by Python,
> > especially memory allocated at startup, during Python initialization.
>
> I'm not sure that having to set an environment variable is the most
> convinient option, especially on Windows.
>
> Storing traces directly into memory blocks should use less memory, but
> it requires to start tracemalloc before the first memory allocation.
> It is possible to add again enable() and disable() methods to
> dynamically install/uninstall the hook on memory allocators. I solved
> this issue in the current implementation by using a second hash table
> (pointer => trace).
>
> We can keep the environment variable as PYTHONFAULTHANDLER which
> enables faulthandler at startup. faulthandler has also a command line
> option: -X faulthandler. We may add -X tracemalloc.
>

We should be consistent with faulthandler's options. Why do you not want to
support both the env var and enable()/disable() functions?

Users are likely to want snapshots captured by enable()/disable() around
particular pieces of code just as much as whole program information.

Think of the possibilities, you could even setup a test runner to
enable/disable before and after each test, test suite or test module to
gather narrow statistics as to what code actually _caused_ the allocations
rather than the ultimate individual file/line doing it.

Taking that further: file and line information is great, but what if you
extend the concept: could you allow for C API or even Python hooks to
gather additional information at the time of each allocation or free? for
example: Gathering the actual C and Python stack traces for correlation to
figure out what call patterns lead allocations is powerful.

(Yes, this gets messy fast as hooks should not trigger calls back into
themselves when they allocate or free, similar to the "fun" involved in
writing coverage tools)

let me know if you think i'm crazy. :)

-gps
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20130831/78e48ba7/attachment.html>