[Python-Dev] PEP 454 (tracemalloc) disable ==> clear?

Wed Oct 30 20:58:20 CET 2013

On Wed, Oct 30, 2013 at 6:02 AM, Victor Stinner
<victor.stinner at gmail.com> wrote:
> 2013/10/30 Jim J. Jewett <jimjjewett at gmail.com>:
>> Well, unless I missed it... I don't see how to get anything beyond
>> the return value of get_traces, which is a (time-ordered?) list
>> of allocation size with then-current call stack.  It doesn't mention
>> any attribute for indicating that some entries are de-allocations,
>> let alone the actual address of each allocation.

> get_traces() does return the traces of the currently allocated memory
> blocks. It's not a log of alloc/dealloc calls. The list is not sorted.
> If you want a sorted list, use take_snapshot.statistics('lineno') for
> example.

Any list is sorted somehow; I had assumed that it was defaulting to
order-of-creation, though if you use a dict internally, that might not
be the case.  If you return it as a list instead of a dict, but that list is
NOT in time-order, that is worth documenting

Also, am I misreading the documentation of get_traces() function?

    Get traces of memory blocks allocated by Python.
    Return a list of (size: int, traceback: tuple) tuples.
    traceback is a tuple of (filename: str, lineno: int) tuples.

So it now sounds like you don't bother to emit de-allocation
events because you just remove the allocation from your
internal data structure.

In other words, you provide a snapshot, but not a history --
except that the snapshot isn't complete either, because it
only shows things that appeared after a certain event
(the most recent enablement).

I still don't see anything here(*) that requires even saving
the address, let alone preventing re-use.

(*) get_object_traceback(obj) might require a stored
     address for efficiency, but the base functionality of
    getting traces doesn't.

    I still wouldn't worry about address re-use though,
    because the address should not be re-used until
    the object has been deleted -- and is no longer
    available to be passed to get_object_traceback.
    So the worst that can happen is that an object which
    was not traced might return a bogus answer
    instead of failing.

>> In that case, I would expect disabling (and filtering) to stop
>> capturing new allocation events for me, but I would still expect
>> tracemalloc to do proper internal maintenance.

> tracemalloc has an important overhead in term of performances and
> memory. The purpose of disable() is to... disable the module, to
> remove completely the overhead.
> ...  Why would you like to keep traces and disable the module?

Because of that very overhead.  I think my use typical use case would
be similar to Kristján Valur's, but I'll try to spell it out in more
detail here.

(1)  Whoa -- memory hog!  How can I fix this?

(2)  I know -- track all allocations, with a traceback showing why they
were made.  (At a minimum, I would like to be able to subclass your
tool to do this -- preferably without also keeping the full history in
memory.)

(3)  Oh, maybe I should skip the ones that really are temporary and
get cleaned up.  (You make this easy by handling the de-allocs,
though I'm not sure those events get exposed to anyone working at
the python level, as opposed to modifying and re-compiling.)

(4)  hmm... still too big ... I should use filters.  (But will changing those
filters while tracing is enabled mess up your current implementation?)

(5)  Argh.  What I really want is to know what gets allocated at times
like XXX.
I can do that if times-like-XXX only ever occur once per process.  I *might* be
able to do it with filters.  But I would rather do it by saying "trace on" and
"trace off".   Maybe even with a context manager around the suspicious
places.

(6)  Then, at the end of the run, I would say "give me the info about how much
was allocated when tracing was on."  Some of that might be going away
again when tracing is off, but at least I know what is making the allocations
in the first place.  And I know that they're sticking around "long enough".

Under your current proposal, step (5) turns into

    set filters
    trace on
    ...
    get_traces
    serialize to some other storage
    trace off

 and step (6) turns into
    read in from that other storage I just made up on the fly, and do my own
    summarizing, because my format is almost by definition non-standard.

This complication isn't intolerable, but neither is it what I expect
from python.
And it certainly isn't what I expect from a binary toggle like enable/disable.
(So yes, changing the name to clear_traces would help, because I would
still be disappointed, but at least I wouldn't be surprised.)

Also, if you do stick with the current limitations, then why even have
get_traces,
as opposed to just take_snapshot?  Is there some difference between them,
except that a snapshot has some convenience methods and some simple
metadata?

Later, he wrote:
> I don't see why disable() would return data.

disable is indeed a bad name for something that returns data.

The only reason to return data from "disable" is that (currently)
you're throwing
the data away, so either you want the data now or you should have turned it
off earlier.

-jJ