First, I really like this.  +1

On Wed, Aug 28, 2013 at 6:07 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
2013/8/29 Victor Stinner <victor.stinner@gmail.com>:
> My proposed implementation for Python 3.4 is different:
>
> * no enable() / disable() function: tracemalloc can only be enabled
> before startup by setting PYTHONTRACEMALLOC=1 environment variable
>
> * traces (size of the memory block, Python filename, Python line
> number) are stored directly in the memory block, not in a separated
> hash table
>
> I chose PYTHONTRACEMALLOC env var instead of enable()/disable()
> functions to be able to really trace *all* memory allocated by Python,
> especially memory allocated at startup, during Python initialization.

I'm not sure that having to set an environment variable is the most
convinient option, especially on Windows.

Storing traces directly into memory blocks should use less memory, but
it requires to start tracemalloc before the first memory allocation.
It is possible to add again enable() and disable() methods to
dynamically install/uninstall the hook on memory allocators. I solved
this issue in the current implementation by using a second hash table
(pointer => trace).

We can keep the environment variable as PYTHONFAULTHANDLER which
enables faulthandler at startup. faulthandler has also a command line
option: -X faulthandler. We may add -X tracemalloc.

We should be consistent with faulthandler's options. Why do you not want to support both the env var and enable()/disable() functions?

Users are likely to want snapshots captured by enable()/disable() around particular pieces of code just as much as whole program information.

Think of the possibilities, you could even setup a test runner to enable/disable before and after each test, test suite or test module to gather narrow statistics as to what code actually _caused_ the allocations rather than the ultimate individual file/line doing it.

Taking that further: file and line information is great, but what if you extend the concept: could you allow for C API or even Python hooks to gather additional information at the time of each allocation or free? for example: Gathering the actual C and Python stack traces for correlation to figure out what call patterns lead allocations is powerful.

(Yes, this gets messy fast as hooks should not trigger calls back into themselves when they allocate or free, similar to the "fun" involved in writing coverage tools)

let me know if you think i'm crazy. :)

-gps