Mailman 3 PEP 454 (tracemalloc) disable ==> clear? - Python-Dev

PEP 454 (tracemalloc) disable ==> clear?

older
Updated PEP 454 (tracemalloc): no...

Jim Jewett

29 Oct 2013 29 Oct '13

2 a.m.

reset() function: Clear traces of memory blocks allocated by Python. Does this do anything besides clear? If not, why not just re-use the 'clear' name from dicts? disable() function: Stop tracing Python memory allocations and clear traces of memory blocks allocated by Python. I would disable to stop tracing, but I would not expect it to clear out the traces it had already captured. If it has to do that, please put in some sample code showing how to save the current traces before disabling. -jJ

Show replies by date

Kristján Valur Jónsson

29 Oct 29 Oct

10 a.m.

...

disable() function:

Stop tracing Python memory allocations and clear traces of memory blocks allocated by Python.

I would disable to stop tracing, but I would not expect it to clear out the traces it had already captured. If it has to do that, please put in some sample code showing how to save the current traces before disabling.

I was thinking something similar. It would be useful to be able to "pause" and "resume" if one is doing any analysis work in the live environment. This would reduce the need to have "Filter" objects. K

Victor Stinner

11:37 a.m.

2013/10/29 Jim Jewett <jimjjewett@gmail.com>:

...

reset() function:

Clear traces of memory blocks allocated by Python.

Does this do anything besides clear? If not, why not just re-use the 'clear' name from dicts?

(I like the reset() name. Charles-François suggested this name inspired by OProfile API.)

...

disable() function:

Stop tracing Python memory allocations and clear traces of memory blocks allocated by Python.

I would disable to stop tracing, but I would not expect it to clear out the traces it had already captured. If it has to do that, please put in some sample code showing how to save the current traces before disabling.

For consistency, you cannot keep traces when tracing is disabled. The free() must be enabled to remove allocated memory blocks, or next malloc() may get the same address which would raise an assertion error (you cannot have two memory blocks at the same address). Just call get_traces() to get traces before clearing them. I can explain it in the doc. 2013/10/29 Kristján Valur Jónsson <kristjan@ccpgames.com>:

...

I was thinking something similar. It would be useful to be able to "pause" and "resume" if one is doing any analysis work in the live environment. This would reduce the need to have "Filter" objects.

For the reason explained above, it's not possible to disable the whole module temporarly. Internally, tracemalloc uses a thread-local variable (called the "reentrant" flag) to disable temporarly tracing allocations in the current thread. It only disables tracing new allocations, deallocations are still proceed. Victor

Jim J. Jewett

30 Oct 30 Oct

3:45 a.m.

(Tue Oct 29 12:37:52 CET 2013) Victor Stinner wrote:

...

For consistency, you cannot keep traces when tracing is disabled. The free() must be enabled to remove allocated memory blocks, or next malloc() may get the same address which would raise an assertion error (you cannot have two memory blocks at the same address).

That seems like an a quirk of the implementation, particularly since the actual address is not returned to the user. Nor do I see any way of knowing when that allocation is freed. Well, unless I missed it... I don't see how to get anything beyond the return value of get_traces, which is a (time-ordered?) list of allocation size with then-current call stack. It doesn't mention any attribute for indicating that some entries are de-allocations, let alone the actual address of each allocation.

...

For the reason explained above, it's not possible to disable the whole module temporarly.

...

Internally, tracemalloc uses a thread-local variable (called the "reentrant" flag) to disable temporarly tracing allocations in the current thread. It only disables tracing new allocations, deallocations are still proceed.

Even assuming the restriction is needed, this just seems to mean that disabling (or filtering) should not affect de-allocation events, for fear of corrupting tracemalloc's internal structures. In that case, I would expect disabling (and filtering) to stop capturing new allocation events for me, but I would still expect tracemalloc to do proper internal maintenance. It would at least explain why you need both disable *and* reset; reset would empty those internal structures, so that tracemalloc could shortcut that maintenance. I would NOT assume that I needed to call reset when changing the filters, nor would I assume that changing them threw out existing traces. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ

Victor Stinner

10:02 a.m.

Hi, 2013/10/30 Jim J. Jewett <jimjjewett@gmail.com>:

...

Well, unless I missed it... I don't see how to get anything beyond the return value of get_traces, which is a (time-ordered?) list of allocation size with then-current call stack. It doesn't mention any attribute for indicating that some entries are de-allocations, let alone the actual address of each allocation.

get_traces() does return the traces of the currently allocated memory blocks. It's not a log of alloc/dealloc calls. The list is not sorted. If you want a sorted list, use take_snapshot.statistics('lineno') for example.

...

In that case, I would expect disabling (and filtering) to stop capturing new allocation events for me, but I would still expect tracemalloc to do proper internal maintenance.

tracemalloc has an important overhead in term of performances and memory. The purpose of disable() is to... disable the module, to remove complelty the overhead. In practice, enable() installs on memory allocators, disable() uninstalls these hooks. I don't understand why you are so concerned by disable(). Why would you like to keep traces and disable the module? I never called disable() in my own tests, the module is automatically disabled at exit. Victor

Jim Jewett

7:58 p.m.

On Wed, Oct 30, 2013 at 6:02 AM, Victor Stinner <victor.stinner@gmail.com> wrote:

...

2013/10/30 Jim J. Jewett <jimjjewett@gmail.com>:

...
Well, unless I missed it... I don't see how to get anything beyond the return value of get_traces, which is a (time-ordered?) list of allocation size with then-current call stack. It doesn't mention any attribute for indicating that some entries are de-allocations, let alone the actual address of each allocation.

...

get_traces() does return the traces of the currently allocated memory blocks. It's not a log of alloc/dealloc calls. The list is not sorted. If you want a sorted list, use take_snapshot.statistics('lineno') for example.

Any list is sorted somehow; I had assumed that it was defaulting to order-of-creation, though if you use a dict internally, that might not be the case. If you return it as a list instead of a dict, but that list is NOT in time-order, that is worth documenting Also, am I misreading the documentation of get_traces() function? Get traces of memory blocks allocated by Python. Return a list of (size: int, traceback: tuple) tuples. traceback is a tuple of (filename: str, lineno: int) tuples. So it now sounds like you don't bother to emit de-allocation events because you just remove the allocation from your internal data structure. In other words, you provide a snapshot, but not a history -- except that the snapshot isn't complete either, because it only shows things that appeared after a certain event (the most recent enablement). I still don't see anything here(*) that requires even saving the address, let alone preventing re-use. (*) get_object_traceback(obj) might require a stored address for efficiency, but the base functionality of getting traces doesn't. I still wouldn't worry about address re-use though, because the address should not be re-used until the object has been deleted -- and is no longer available to be passed to get_object_traceback. So the worst that can happen is that an object which was not traced might return a bogus answer instead of failing.

...

...
In that case, I would expect disabling (and filtering) to stop capturing new allocation events for me, but I would still expect tracemalloc to do proper internal maintenance.

...

tracemalloc has an important overhead in term of performances and memory. The purpose of disable() is to... disable the module, to remove completely the overhead. ... Why would you like to keep traces and disable the module?

Because of that very overhead. I think my use typical use case would be similar to Kristján Valur's, but I'll try to spell it out in more detail here. (1) Whoa -- memory hog! How can I fix this? (2) I know -- track all allocations, with a traceback showing why they were made. (At a minimum, I would like to be able to subclass your tool to do this -- preferably without also keeping the full history in memory.) (3) Oh, maybe I should skip the ones that really are temporary and get cleaned up. (You make this easy by handling the de-allocs, though I'm not sure those events get exposed to anyone working at the python level, as opposed to modifying and re-compiling.) (4) hmm... still too big ... I should use filters. (But will changing those filters while tracing is enabled mess up your current implementation?) (5) Argh. What I really want is to know what gets allocated at times like XXX. I can do that if times-like-XXX only ever occur once per process. I *might* be able to do it with filters. But I would rather do it by saying "trace on" and "trace off". Maybe even with a context manager around the suspicious places. (6) Then, at the end of the run, I would say "give me the info about how much was allocated when tracing was on." Some of that might be going away again when tracing is off, but at least I know what is making the allocations in the first place. And I know that they're sticking around "long enough". Under your current proposal, step (5) turns into set filters trace on ... get_traces serialize to some other storage trace off and step (6) turns into read in from that other storage I just made up on the fly, and do my own summarizing, because my format is almost by definition non-standard. This complication isn't intolerable, but neither is it what I expect from python. And it certainly isn't what I expect from a binary toggle like enable/disable. (So yes, changing the name to clear_traces would help, because I would still be disappointed, but at least I wouldn't be surprised.) Also, if you do stick with the current limitations, then why even have get_traces, as opposed to just take_snapshot? Is there some difference between them, except that a snapshot has some convenience methods and some simple metadata? Later, he wrote:

...

I don't see why disable() would return data.

disable is indeed a bad name for something that returns data. The only reason to return data from "disable" is that (currently) you're throwing the data away, so either you want the data now or you should have turned it off earlier. -jJ

Victor Stinner

8:40 p.m.

Le 30 oct. 2013 20:58, "Jim Jewett" <jimjjewett@gmail.com> a écrit :

...

hough if you use a dict internally, that might not be the case.

Tracemalloc uses a {address: trace} duct internally.

...

If you return it as a list instead of a dict, but that list is NOT in time-order, that is worth documenting

Ok i will document it.

...

Also, am I misreading the documentation of get_traces() function?

Get traces of memory blocks allocated by Python. Return a list of (size: int, traceback: tuple) tuples. traceback is a tuple of (filename: str, lineno: int) tuples.

So it now sounds like you don't bother to emit de-allocation events because you just remove the allocation from your internal data structure.

I don't understand your question. Tracemalloc does not store events but traces. When a memory block is deallocated, it us removed from the internal dict (and so from get_traces() list).

...

I still don't see anything here(*) that requires even saving the address, let alone preventing re-use.

The address must be stored internally to maintain the internal dict. See the C code.

...

(1) Whoa -- memory hog! How can I fix this?

(2) I know -- track allocallocations, with a traceback showing why they were made. (At a minimum, I would like to be able to subclass your tool to do this -- preferably without also keeping the full history in memory.)

What do you mean by "full history" and "subclass your tool"?

...

(3) Oh, maybe I should skip the ones that really are temporary and get cleaned up. (You make this easy by handling the de-allocs, though I'm not sure those events get exposed to anyone working at the python level, as opposed to modifying and re-compiling.)

...

(4) hmm... still too big ... I should use filters. (But will changing

If your temporary objects are destroyed before you call get_traces(), you will not see them in get_traces(). I don't understand. those

...

filters while tracing is enabled mess up your current implementation?)

If you call add_filter(), new traces() will be filtered. Not the old ones, as explained in the doc. What do you mean by "mess up"?

...

(5) Argh. What I really want is to know what gets allocated at times like XXX. I can do that if times-like-XXX only ever occur once per process. I *might* be able to do it with filters. But I would rather do it by saying "trace on" and "trace off". Maybe even with a context manager around the suspicious places.

I don't understand "times like XXX", what is it? To see what happened between two lines of code, you can compare two snapshots. No need to disable tracing.

...

(6) Then, at the end of the run, I would say "give me the info about how much was allocated when tracing was on." Some of that might be going away again when tracing is off, but at least I know what is making the allocations in the first place. And I know that they're sticking around "long enough".

I think you musunderstood how tracemalloc works. You should compile it and play with it. In my opinion, you already have everything in tracemalloc for you scenario.

...

Under your current proposal, step (5) turns into

set filters trace on ... get_traces serialize to some other storage trace off

s1=take_snapshot() ... s2=take_snapshot() ... diff=s2.statistics("lines", compare_to=s1)

...

why even have get_traces, as opposed to just take_snapshot? Is there some difference between them, except that a snapshot has some convenience methods and some simple metadata?

See the doc: Snapshot.traces is the result of get_traces(). get_traces() is here is you want to write your own tool without Snapshot. Victor

Stephen J. Turnbull

31 Oct 31 Oct

5:08 a.m.

Jim Jewett writes:

...

Later, he wrote:

...
I don't see why disable() would return data.

disable is indeed a bad name for something that returns data.

Note that I never proposed that disable() *return* anything, only that it *get* the trace. It could store it in some specified object, or a file, rather than return it, for example. I deliberately left what it does with the retrieved data unspecified. The important thing to me is that it not be dropped on the floor by something named "disable".

Stephen J. Turnbull

30 Oct 30 Oct

4:09 a.m.

Victor Stinner writes:

...

2013/10/29 Jim Jewett <jimjjewett@gmail.com>:

...
reset() function:

Clear traces of memory blocks allocated by Python.

Does this do anything besides clear? If not, why not just re-use the 'clear' name from dicts?

(I like the reset() name. Charles-François suggested this name inspired by OProfile API.)

Just "reset" implies to me that you're ready to start over. Not just traced memory blocks but accumulated statistics and any configuration (such as Filters) would also be reset. Also tracing would be disabled until started explicitly. If you want it to apply just to the traces, reset_traces() would be more appropriate.

...

...
disable() function:

Stop tracing Python memory allocations and clear traces of memory blocks allocated by Python.

I would disable to stop tracing, but I would not expect it to clear out the traces it had already captured. If it has to do that, please put in some sample code showing how to save the current traces before disabling.

For consistency, you cannot keep traces when tracing is disabled. The free() must be enabled to remove allocated memory blocks, or next malloc() may get the same address which would raise an assertion error (you cannot have two memory blocks at the same address).

Then I would not call this "disable". disable() should not "destroy" data.

...

Just call get_traces() to get traces before clearing them. I can explain it in the doc.

Shouldn't disable() do this automatically, perhaps with an optional discard_traces flag (which would be False by default)? But I definitely agree with Jim: You *must* provide an example here showing how to save the traces (even though it's trivial to do so), because that will make clear that disable() is a destructive operation. (It is not destructive in any other debugging tool that I've used.) Even with documentation, be prepared for user complaints.

Victor Stinner

10:09 a.m.

2013/10/30 Stephen J. Turnbull <stephen@xemacs.org>:

...

Just "reset" implies to me that you're ready to start over. Not just traced memory blocks but accumulated statistics and any configuration (such as Filters) would also be reset. Also tracing would be disabled until started explicitly.

If the name is really the problem, I propose the restore the previous name: clear_traces(). It's symmetric with get_traces(), like add_filter()/get_filters()/clear_filters().

...

Shouldn't disable() do this automatically, perhaps with an optional discard_traces flag (which would be False by default)?

The pattern is something like that: enable() snapshot1 = take_snapshot() ... snapshot2 = take_snapshot() disable() I don't see why disable() would return data.

...

But I definitely agree with Jim: You *must* provide an example here showing how to save the traces (even though it's trivial to do so), because that will make clear that disable() is a destructive operation. (It is not destructive in any other debugging tool that I've used.) Even with documentation, be prepared for user complaints.

I added "Call get_traces() or take_snapshot() function to get traces before clearing them." to the doc: http://www.haypocalc.com/tmp/tracemalloc/library/tracemalloc.html#tracemallo... Victor

Victor Stinner

31 Oct 31 Oct

10:41 a.m.

2013/10/29 Victor Stinner <victor.stinner@gmail.com>:

...

2013/10/29 Kristján Valur Jónsson <kristjan@ccpgames.com>:

...
I was thinking something similar. It would be useful to be able to "pause" and "resume" if one is doing any analysis work in the live environment. This would reduce the need to have "Filter" objects.

Internally, tracemalloc uses a thread-local variable (called the "reentrant" flag) to disable temporarly tracing allocations in the current thread. It only disables tracing new allocations, deallocations are still proceed.

If I give access to this flag, it would be possible to disable temporarily tracing in the current thread, but tracing would still be enabled in other threads. Would it fit your requirement? Example: --------------- tracemalloc.enable() # start your application ... # spawn many threads ... # oh no, I don't want to trace this ugly function tracemalloc.disable_local() ugly_function() tracemalloc.enable_local() ... snapshot = take_snapshot() --------------- You can imagine a context manager based on these two functions: --------------- with disable_tracing_temporarily_in_current_thread(): ugly_function() --------------- I still don't understand why you would need to stop tracing temporarily. When I use tracemalloc, I never disable it. Victor

Victor Stinner

12:20 p.m.

2013/10/31 Victor Stinner <victor.stinner@gmail.com>:

...

If I give access to this flag, it would be possible to disable temporarily tracing in the current thread, but tracing would still be enabled in other threads. Would it fit your requirement?

It's probably not what you are looking for :-) As I wrote in the PEP, the API of tracemalloc was inspired by the faulthandler module. enable() / disable() makes sense in faulthandler because faulthandler is passive: it only do something on a trigger (synchonous signals like SIGFPE or SIGSEGV). I realized that tracemalloc is different: as written in the documentation, enable() *starts* tracing. After enable() has been called, tracemalloc becomes active. So tracemalloc should use names start() / stop() rather than enable() / disable(). I did another experiment. I replaced enable/disable/is_enabled with start/stop/is_tracing, and added enable/disable/is_enabled functions to disable temporarily tracing. API: - clear_traces(): clear traces - start(): start tracing (the old "enable") - stop(): stop tracing and clear traces (the old "disable") - disable(): disable temporarily tracing - enable(): reenable tracing - is_tracing(): True if tracemalloc is tracing, False otherwise (the old "is_enabled") - is_enabled(): True if tracemalloc is enabled, False otherwise All these functions are process-wide (affect all threads). tracemalloc is only tracing new allocations if is_tracing() and is_enabled() are True. If is_tracing() is True and is_enabled() is False, deallocations still remove traces (otherwise, the internal dictionary of traces would become inconsistent). Example: --------------- tracemalloc.start() # start your application ... useful = UsefulObject() huge = HugeObject() ... snapshot1 = take_snapshot() ... # oh no, I don't want to trace this ugly object, but please don't trash old traces tracemalloc.disable() ugly = ugly_object() ... # release memory of the huge object huge = None ... # restart tracing (ugly is still alive) tracemalloc.enable() ... snapshot2 = take_snapshot() tracemalloc.stop() --------------- snapshot1 contains traces of objects: - useful - huge snapshot2 contains traces of objects: - useful huge is missing from snapshot2 even if the module was disabled. ugly is missing from snapshot2 because tracing was disabled. Does it look better? I don't see the usecase of disable() / enable() yet, but it's cheap (it just add a flag). Victor

Ethan Furman

2:32 p.m.

On 10/31/2013 05:20 AM, Victor Stinner wrote:

...

I did another experiment. I replaced enable/disable/is_enabled with start/stop/is_tracing, and added enable/disable/is_enabled functions to disable temporarily tracing.

API:

- clear_traces(): clear traces - start(): start tracing (the old "enable") - stop(): stop tracing and clear traces (the old "disable") - disable(): disable temporarily tracing - enable(): reenable tracing - is_tracing(): True if tracemalloc is tracing, False otherwise (the old "is_enabled") - is_enabled(): True if tracemalloc is enabled, False otherwise

These names make more sense. However, `stop` is still misleading as it both stops and destroys data. An easy fix for that is for stop to save the data somewhere so get_traces (or whatever) can still retrieve it. If `stop` really must destroy the data, perhaps it should be called `close` instead; StringIO has a similar close method that when called destroys any stored data, and get_value must be called first if that data is wanted. -- ~Ethan~

3958

Age (days ago)

3960

Last active (days ago)

List overview

Download

12 comments

7 participants

participants (7)

Ethan Furman
Jim J. Jewett
Jim Jewett
Kristján Valur Jónsson
Stephen J. Turnbull
Stephen J. Turnbull
Victor Stinner

PEP 454 (tracemalloc) disable ==> clear?

Kristján Valur Jónsson

tags

participants (7)