[Python-Dev] Updated PEP 454 (tracemalloc): no more metrics!

Thu Oct 24 14:34:13 CEST 2013

> -----Original Message-----
> From: Victor Stinner [mailto:victor.stinner at gmail.com]
> Sent: 24. október 2013 01:03
> To: Kristján Valur Jónsson
> Cc: Python Dev
> Subject: Re: [Python-Dev] Updated PEP 454 (tracemalloc): no more metrics!
> 
> 
> The use case of get_traces() + get_object_trace() is to retrieve the traceback
> of all alive Python objects for tools like Melia, Pympler or Heapy. The only
> motivation is performance.
Well, for me, the use of get_traces() is to get the raw data so that I can perform
my own analysis on it.  With this data, I foresee people wanting to try to analyse
this data in novel ways, as I suggested to you privately.

> 
> I wrote a benchmark using 10^6 objects and... get_traces() x 1 +
> get_object_address() x N is 40% *slower* than calling
> get_object_traceback() x N. So get_object_traceback() is faster for this use
> case, especially if you don't want to the traceback of all objects, but only a
> few of them.

I understand your desire for things to be fast, but let me just re-iterate my view
that for this kind of jobs, performance is completely secondary.  Memory
debugging and analysis is an off-line, laboratory task.  In my opinion,
performance should not be driving the design of a module like this.  And in
particular, it should not be the only reason  to write code in C that could
just as well be written in .py.
This is a lorry.  A lorry is for moving refrigerators, on those rare occasions when
you need to have refrigerators moved.  It doesn't need go-faster-stripes.

Well, I think I've made my point on this amply clear now, in this email and the
previous, so I won't dwell on it further.

> 
> Charles-Francois already asked me to remove everything related to address,
> so let's remove two more functions:
Great.  

> 
> Test 1. With the Python test suite, 467,738 traces limited to 1 frame:
...
> I'm surprised: it's faster than the benchmark I ran some weeks ago.
> Maybe I optimized something? The most critical operation, taking a snapshot
> takes half a second, so it's enough efficient.

Well, to me anything that happens in under a second is fast :)

> 
> Let's remove even more code:
> 
> - remove get_stats()
> - remove Snapshot.stats
> 
Removal of code is always nice :)

> 
> > 3) set_traceback_limit().  Truncating tracebacks is bad.  Particularly if it is
> truncated at the top end of the callstack, because then information looses
> cohesion, namely, the common connection point, the root.  If traceback
> limits are required, I suggest being able to specifiy that we truncate the leaf-
> end of the tracebacks.
> 
> If the traceback is truncated and 90% of all memory is allocated at the same
> Python line: I prefer to have the get the most recent frame, than the n-th
> function from main() which may indirectly call 100 different more functions...
> In this case, how do you guess which function allocated the memory? You get
> the same issue than
> Melia/Pympler/Heapy: debug data doesn't help to identify the memory leak.

Debugging memory leaks is not the only use case for your module.  Analysing
memory usage in a non-leaking application is also very important.  In my work, I have
been asked to reduce the memory overhead of a python application once it has
started up.  To do this, you need a top-down view of the application.  You
need to break it down from the "main" call down towards the leaves. 
Now, I would personally not truncate the stack, because I can afford the memory, 
but even if I would, for example, to hide a bunch of detail, I would want to throw away
the _lower_ detals of the stack.  It is unimportant to me to know if memory was
allocated in 
...;itertools.py;logging.py;stringutil.py
but more important to know that it was allocated in
main.py;databaseengine.py;enginesettings.py;...

The "main" function here is the one that ties all the different allocations into one tree. 
If you take a tree, say a nice rowan, and truncate it by leaving only X nodes towards
the leaves, you end up with a big heap of small branches.
If on the other hand, you trim it so that you leave X nodes beginning at the root, you
still have something resembling a tree, albeit a much coarser one.

Anyway, this is not so important.  I would run this with full traceback myself and truncate
the tracebacks during the display stage anyway.

> 
> 
> > 4) add_filter().  This is unnecessary. Information can be filtered on the
> python side.  Defining Filter as a C type is not necessary.  Similarly, module
> level filter functions can be dropped.
> 
> Filters for capture are here for efficiency: attaching a trace to each memory
> block is expensive. I tried pybench: when using tracemalloc, Python is 2x
> slower. The memory usage is also doubled. Using filters, the overhead is
> lower. I don't have numbers for the CPU, but for the
> memory: ignored traces are not stored, so the memory usage is immediatly
> reduced. Without filters for capture, I'm not sure that it is even possible to
> use tracemalloc with 100 frames on a large application.
> 
> Anyway, you can remove all filters: in this case, the overhead of filters is
> zero.
> 
> 
> 
> > 6) Snapshot dump/load():  It is unusual to see load and save functions
> > taking filenames in a python module, and a module implementing its own
> > file IO.  I have suggested simply to add Pickle support.
> > Alternatively, support file-like objects or bytes (loads/dumps)
> 
> In the latest implementation, load/dump is trivial:
> 
>     def dump(self, filename):
>         with open(filename, "wb") as fp:
>             pickle.dump(self, fp, pickle.HIGHEST_PROTOCOL)
> 
>     @staticmethod
>     def load(filename, traces=True):
>         with open(filename, "rb") as fp:
>             return pickle.load(fp)
> 
What does the "traces" argument do in the load() function then?

Anyway, in this case, dump and load can be thought of as convenience functions.
That's perfectly fine from my viewpoint.

> 
> > I'd also like to point out (just to say "I told you so" :) ) that this module is
> precisely the reason I suggested we include "const char *file, int lineno" in
> the API for PEP 445, because that would allow us, in debug builds, to get one
> extra stack level, namely the position of the actual C allocation in the python
> source.
> 
> In my experience, C functions allocating memory are wrapped in Python
> objects, it's easy to guess the C function from the Python traceback.

Often, yes.  But there are big black boxes that remain.  The most numerous
of those are those big mysterious allocations that can happen as a
result of
"import mymodule"

But apart from that, a lot of code can have unforeseen side effects, like growing
some internal list, or other.  This sort of information helps with understanding that.

Not that we are likely to change PEP 445 at this stage, but this was the use
case for my suggestion.

Cheers,

Kristján