[Python-Dev] Updated PEP 454 (tracemalloc): no more metrics!

Thu Oct 24 03:03:21 CEST 2013

Hi,

2013/10/23 Kristján Valur Jónsson <kristjan at ccpgames.com>:
> This might be a good place to make some comments.
> I have discussed some of this in private with Victor, but wanted to make them here, for the record.

Yes, I prefer to discuss the PEP on python-dev. It's nice to get more
feedback, I expect to get a better API at the end!

Oh, you have a lot of remarks, I will try to reply to all of them.

> 1) really, all that is required in terms of data is the traceback.get_traces() function.  Further, it _need_ not return addresses since they are not required for analysis.  It is sufficient for it to return a list of (traceback, size, count) tuples.   I understand that the get_stats function is useful for quick information so it can be kept, although it provides no added information, only convenience
> 2) get_object_address() and get_trace(address) functions seem redundant.  All that is required is get_object_traceback(), I think.

The use case of get_traces() + get_object_trace() is to retrieve the
traceback of all alive Python objects for tools like Melia, Pympler or
Heapy. The only motivation is performance.

I wrote a benchmark using 10^6 objects and... get_traces() x 1 +
get_object_address() x N is 40% *slower* than calling
get_object_traceback() x N. So get_object_traceback() is faster for
this use case, especially if you don't want to the traceback of all
objects, but only a few of them.

Charles-Francois already asked me to remove everything related to
address, so let's remove two more functions:

- remove get_object_address()
- remove get_trace()
- get_traces() returns a list
- remove 'address' key type of Snapshot.group_by()
- 'traceback' key type of Snapshot.group_by() groups traces by
traceback, instead of (address, traceback) => it is closer to what you
suggested me privately (generate "top stats" but keep the whole
traceback)

> 1) really, all that is required in terms of data is the traceback.get_traces() function.  Further, it _need_ not return addresses since they are not required for analysis.  It is sufficient for it to return a list of (traceback, size, count) tuples.   I understand that the get_stats function is useful for quick information so it can be kept, although it provides no added information, only convenience

For the get_stats() question, the motivation is also performances.
Let's try a benchmark on my laptop.

Test 1. With the Python test suite, 467,738 traces limited to 1 frame:

* take a snapshot with traces (call get_traces()): 293 ms
* write the snapshot on disk: 167 ms
* load the snapshot from disk: 184 ms
* group by filename *using stats*: 24 ms (754 different filenames)
* group by line *using stats*: 28 ms (31827 different lines)
* group by traceback using traces: 333 ms (31827 different tracebacks,
the traceback is limited to 1 frame)

Test 2. With the Python test suite, 495,571 traces limited to 25 frame:

* take a snapshot without traces (call get_stats()): 35 ms
* take a snapshot with traces (call get_stats() and get_traces()): 532 ms
* write the snapshot on disk: 565 ms
* load the snapshot from disk: 739 ms
* group by filename *using stats*:  25 ms (906 different filenames)
* group by line *using stats*: 22 ms (36940 different lines)
* group by traceback using traces: 786 ms (66314 different tracebacks)

Test 3. tracemalloc modified to not use get_stats() anymore, only use
traces. With the Python test suite, 884719 traces limited to 1 frame:

* take a snapshot with traces (call get_traces()): 531 ms
* write the snapshot on disk: 278 ms
* load the snapshot from disk: 298 ms
* group by filename *using traces*: 706 ms (1329 different filenames)
* group by line *using traces*: 724 ms (55,349 different lines)
* group by traceback using traces: 731 ms (55,349 different
tracebacks, the traceback is limited to 1 frame)

I'm surprised: it's faster than the benchmark I ran some weeks ago.
Maybe I optimized something? The most critical operation, taking a
snapshot takes half a second, so it's enough efficient.

Let's remove even more code:

- remove get_stats()
- remove Snapshot.stats

Snapshot.group_by() can easily recompute statistics by filename and
line number from traces.

(To be honest, get_stats() and get_traces() used together have an
issue: they may be inconsistent if some objects are allocated between.
Snapshot.apply_filters() has to apply filters on traces and stats for
example. It's simpler to only manipulate traces.)

> 3) set_traceback_limit().  Truncating tracebacks is bad.  Particularly if it is truncated at the top end of the callstack, because then information looses cohesion, namely, the common connection point, the root.  If traceback limits are required, I suggest being able to specifiy that we truncate the leaf-end of the tracebacks.

If the traceback is truncated and 90% of all memory is allocated at
the same Python line: I prefer to have the get the most recent frame,
than the n-th function from main() which may indirectly call 100
different more functions... In this case, how do you guess which
function allocated the memory? You get the same issue than
Melia/Pympler/Heapy: debug data doesn't help to identify the memory
leak.

> 4) add_filter().  This is unnecessary. Information can be filtered on the python side.  Defining Filter as a C type is not necessary.  Similarly, module level filter functions can be dropped.

Filters for capture are here for efficiency: attaching a trace to each
memory block is expensive. I tried pybench: when using tracemalloc,
Python is 2x slower. The memory usage is also doubled. Using filters,
the overhead is lower. I don't have numbers for the CPU, but for the
memory: ignored traces are not stored, so the memory usage is
immediatly reduced. Without filters for capture, I'm not sure that it
is even possible to use tracemalloc with 100 frames on a large
application.

Anyway, you can remove all filters: in this case, the overhead of
filters is zero.

> 5) Filter, Snapshot, GroupedStats, Statistics:  These classes, if required, can be implemented in a .py module.

Snapshot, GroupedStats and Statistics are implemented in Python.

Filter is implemented in C because I want filters for the capture.

> 6) Snapshot dump/load():  It is unusual to see load and save functions taking filenames in a python module, and a module implementing its own file IO.  I have suggested simply to add Pickle support.  Alternatively, support file-like objects or bytes (loads/dumps)

In the latest implementation, load/dump is trivial:

    def dump(self, filename):
        with open(filename, "wb") as fp:
            pickle.dump(self, fp, pickle.HIGHEST_PROTOCOL)

    @staticmethod
    def load(filename, traces=True):
        with open(filename, "rb") as fp:
            return pickle.load(fp)

http://hg.python.org/features/tracemalloc/file/85c0cefb92cb/Lib/tracemalloc.py#l164

So you can easily reimplement your own serialization function (using
pickle) with your custom file-like object.

I already asked Charles-Francois if he prefers to accept a file-like
object as input (in filename, as open() does), but he doesn't feel the
need.

> I'd also like to point out (just to say "I told you so" :) ) that this module is precisely the reason I suggested we include "const char *file, int lineno" in the API for PEP 445, because that would allow us, in debug builds, to get one extra stack level, namely the position of the actual C allocation in the python source.

In my experience, C functions allocating memory are wrapped in Python
objects, it's easy to guess the C function from the Python traceback.

Victor