[Python-Dev] PEP 454 (tracemalloc): new minimalist version

Sat Oct 19 02:03:23 CEST 2013

2013/10/18 Charles-François Natali <cf.natali at gmail.com>:
> I'm happy to see this move forward!

Thanks for your reviews. I had some time in my train travel to improve
the implementation.

I removed the call to pthread_atfork(): tasks have been removed, it
now makes sense to keep tracemalloc enabled in the child process. Call
disable() explicitly in the child process to disable tracemalloc.

I modified the implementation according to your remarks, here is the
updated doc:
http://www.haypocalc.com/tmp/tracemalloc/library/tracemalloc.html

I will update the PEP if you like the new doc.

>> API
>> ===
>>
>> Main Functions
>> --------------
>>
>> ``clear_traces()`` function:
>>
>>     Clear traces and statistics on Python memory allocations, and reset
>>     the ``get_traced_memory()`` counter.
>
> That's nitpicking, but how about just ``reset()`` (I'm probably biased
> by oprofile's opcontrol --reset)?

=> done

Well, I already hesitated to rename it to reset() because
"clear_traces" sounds too specific. The function clears more than
traces.

>> ``get_stats()`` function:
>>
>>     Get statistics on traced Python memory blocks as a dictionary
>>     ``{filename (str): {line_number (int): stats}}`` where *stats* in a
>>     ``(size: int, count: int)`` tuple, *filename* and *line_number* can
>>     be ``None``.
>
> It's probably obvious, but you might want to say once what *size* and
> *count* represent (and the unit for *size*).

=> done

>> ``get_tracemalloc_memory()`` function:
>>
>>     Get the memory usage in bytes of the ``tracemalloc`` module as a
>>     tuple: ``(size: int, free: int)``.
>>
>>     * *size*: total size of bytes allocated by the module,
>>       including *free* bytes
>>     * *free*: number of free bytes available to store data
>
> What's *free* exactly? I assume it's linked to the internal storage
> area used by tracemalloc itself, but that's not clear at all.
>
> Also, is the tracemalloc overhead included in the above stats (I'm
> mainly thinking about get_stats() and get_traced_memory()?
> If yes, I find it somewhat confusing: for example, AFAICT, valgrind's
> memcheck doesn't report the memory overhead, although it can be quite
> large, simply because it's not interesting.

My goal is to able to explain how *every* byte is allocated in Python.
If you enable tracemalloc, your RSS memory will double, or something
like that. You can use get_tracemalloc_memory() to add metrics to a
snapshot. It helps to understand how the RSS memory evolves.

Basically, get_tracemalloc_size() is the memory used to store traces.
It's something internal to the C module (_tracemalloc). This memory is
not traced because it *is* the traces... and so is not counted in
get_traced_memory().

The issue is probably the name (or maybe also the doc): would you
prefer get_python_memory() / get_traces_memory() names, instead of
get_traced_memory() / get_tracemalloc_memory()?

FYI Objects allocated in tracemalloc.py (real objects, not traces) are
not counted in get_traced_memory() because of a filter set up by
default (it was not the case in previous versions of the PEP). You can
remove the filter using tracemalloc.clear_filters() to see this
memory. There are two exceptions: Python objects created for the
result of get_traces() and get_stats() are never traced for
efficiency. It *is* possible to trace these objects, but it's really
too slow. get_traces() and get_stats() may be called outside
tracemalloc.py, so another filter would be needed. Well, it's easier
to never trace these objects. Anyway, they are not interesting to
understand where your application leaks memory.

>> Trace Functions
>> ---------------
>>
>> ``get_traceback_limit()`` function:
>>
>>     Get the maximum number of frames stored in the traceback of a trace
>>     of a memory block.
>>
>>     Use the ``set_traceback_limit()`` function to change the limit.
>
> I didn't see anywhere the default value for this setting: it would be
> nice to write it somewhere,

=> done

> and also explain the rationale (memory/CPU
> overhead...).

I already explained this partially in set_traceback_limit() doc. I
added something to get_traceback_limit() to explain the default (1 is
enough to get statistics).

(For information, it's possible to set the limit to 0, but it's not
really useful, so it's not mentionned in the doc.)

>> ``get_object_address(obj)`` function:
>>
>>     Get the address of the main memory block of the specified Python object.
>>
>>     A Python object can be composed by multiple memory blocks, the
>>     function only returns the address of the main memory block.
>
> IOW, this should return the same as id() on CPython? If yes, it could
> be an interesting note.

=> done

I modified the doc to mention id(). id() is only the same if the
object is not tracked by the garbage collector. Otherwise, there is a
difference of sizeof(PyGC_Head) (12 bytes on x86).

>> ``get_object_trace(obj)`` function:
>>
>>     Get the trace of a Python object *obj* as a ``(size: int,
>>     traceback)`` tuple where *traceback* is a tuple of ``(filename: str,
>>     lineno: int)`` tuples, *filename* and *lineno* can be ``None``.
>
> I find the "trace" word confusing, so it might be interesting to add a
> note somewhere explaining what it is ("callstack leading to the object
> allocation", or whatever).

=> done

Ok, I added a note at the beginning of the section.

> [get_object_trace]
> Also, this function leaves me a mixed feeling: it's called
> get_object_trace(), but you also return the object size - well, a
> vague estimate thereof. I wonder if the size really belongs here,
> especially if the information returned isn't really accurate: it will
> be for an integer, but not for e.g. a list, right? How about just
> using sys.getsizeof(), which would give a more accurate result?

I already modified the doc to add examples: I used set and dict as
examples of types using at least 2 memory blocks.

get_object_trace(obj) is a shortcut for
get_trace(get_object_address(obj)). I agree that the wrong size
information can be surprising.

I can delete get_object_trace(), or rename the function to
get_object_traceback() and modify it to only return the traceback.

I prefer to keep the function (modified for get_object_traceback).
tracemalloc can be combined with other tools like Melia, Heapy or
objgraph to combine information. When you find an interesting object
with these tools, you may be interested to know where it was
allocated.

>> ``get_trace(address)`` function:
>>
>>     Get the trace of a memory block as a ``(size: int, traceback)``
>>     tuple where *traceback* is a tuple of ``(filename: str, lineno:
>>     int)`` tuples, *filename* and *lineno* can be ``None``.
>>
>>     Return ``None`` if the ``tracemalloc`` module did not trace the
>>     allocation of the memory block.
>>
>>     See also ``get_object_trace()``, ``get_stats()`` and
>>     ``get_traces()`` functions.
>
> Do you have example use cases where you want to work with a raw addresses?

An address is the unique key to identify a memory block. In Python,
you don't manipulate directly memory blocks, that's why you have a
get_object_address() function (link objects to traces).

I added get_trace() because get_traces() is very slow. It would be
stupid to call it if you only need one trace of a memory block.

I'm not sure that this function is really useful. I added it to
workaround the performance issue, and because I believe that someone
will need it later :-)

What do you suggest for this function?

>> Filter
>> ------
>>
>> ``Filter(include: bool, pattern: str, lineno: int=None, traceback:
>> bool=False)`` class:
>>
>>     Filter to select which memory allocations are traced. Filters can be
>>     used to reduce the memory usage of the ``tracemalloc`` module, which
>>     can be read using the ``get_tracemalloc_memory()`` function.
>>
>> ``match(filename: str, lineno: int)`` method:
>>
>>     Return ``True`` if the filter matchs the filename and line number,
>>     ``False`` otherwise.
>>
>> ``match_filename(filename: str)`` method:
>>
>>     Return ``True`` if the filter matchs the filename, ``False`` otherwise.
>>
>> ``match_lineno(lineno: int)`` method:
>>
>>     Return ``True`` if the filter matchs the line number, ``False``
>>     otherwise.
>>
>> ``match_traceback(traceback)`` method:
>>
>>     Return ``True`` if the filter matchs the *traceback*, ``False``
>>     otherwise.
>>
>>     *traceback* is a tuple of ``(filename: str, lineno: int)`` tuples.
>
> Are those ``match`` methods really necessary for the end user, i.e.
> are they worth being exposed as part of the public API?

(Oh, I just realized that match_lineno() and may lead to bugs, I removed it.)

Initially, I exposed the methods for unit tests. Later, I used them in
Snapshot.apply_filters() to factorize the code (before I add 2
implementations to match a filter, one in C, another in Python).

I see tracemalloc more as a library, I don't know yet how it will be
used by new tools based on it. Snapshot is more an helper (convinient
class) than a mandatory API to use tracemalloc. You might want to use
directly filters to analyze raw datas.

Users are supposed to use tracemalloc.add_filter() and
Snapshot.apply_filters(). You prefer to keep them private and not
document them? I don't have a strong opinion on this point.

>> StatsDiff
>> ---------
>>
>> ``StatsDiff(differences, old_stats, new_stats)`` class:
>>
>>     Differences between two ``GroupedStats`` instances.
>>
>>     The ``GroupedStats.compare_to()`` method creates a ``StatsDiff``
>>     instance.
>>
>> ``sort()`` method:
>>
>>     Sort the ``differences`` list from the biggest difference to the
>>     smallest difference. Sort by ``abs(size_diff)``, *size*,
>>     ``abs(count_diff)``, *count* and then by *key*.
>>
>> ``differences`` attribute:
>>
>>     Differences between ``old_stats`` and ``new_stats`` as a list of
>>     ``(size_diff, size, count_diff, count, key)`` tuples. *size_diff*,
>>     *size*, *count_diff* and *count* are ``int``. The key type depends
>>     on the ``GroupedStats.group_by`` attribute of ``new_stats``: see the
>>     ``Snapshot.top_by()`` method.
>>
>> ``old_stats`` attribute:
>>
>>     Old ``GroupedStats`` instance, can be ``None``.
>>
>> ``new_stats`` attribute:
>>
>>     New ``GroupedStats`` instance.
>
> Why keep references to ``old_stats`` and ``new_stats``?
>
> Also, if you sort the difference by default (which is a sensible
> choice), then the StatsDiff becomes pretty much useless, since you
> would just keep its ``differences`` attribute (sorted).

Well, StatsDiff is useless :-) I just removed it.

I modified GroupedStats.compare_to() to sort differences by default,
but I added a sort parameter to get the list unsorted. sort=False can
be used to sort differences differently (sorting the list twice would
be inefficient).

Another option would be add sort_key and sort_reverse parameters.

>> Snapshot
>> --------
>>
>> ``Snapshot(timestamp: datetime.datetime, traces: dict=None, stats:
>> dict=None)`` class:
>>
>>     Snapshot of traces and statistics on memory blocks allocated by Python.
>
>
> I'm confused.
> Why are get_trace(), get_object_trace(), get_stats() etc not methods
> of a Snapshot object?

get_stats() returns the current stats. If you call it twice, you get
different results. The principe of a snapshot is to be frozen: stats,
traces and metrics are read once when the snapshot was created.

To get stats of a snapshot, just read its stats attribute. To get a
trace, it's snapshot.traces[address].

> Is it because you don't store all the necessary information in a
> snapshot, or are they just some sort of shorthands, like:
> stats = get_stats()
> vs
> snapshot = Snapshot.create()
> stats = snapshot.stats

I already used other tools like Melia and Heapy, and it's convinient
to get access to raw data to compute manually my own view. I don't
want to force users to use the high-level API (Snapshot).

Is it a problem to have two API (low-level like get_stats() and
high-level like Snapshot) for similar use cases? What do you suggest?

>> ``write(filename)`` method:
>>
>>     Write the snapshot into a file.
>
> I assume it's in a serialized form, only readable by Snapshort.load() ?

Yes.

> BTW, it's a nitpick and debatable, but write()/read() or load()/dump()
> would be more consistent (see e.g. pickle's load/dump).

=> Done, I renamed Snapshot.write() to Snapshot.dump().

By the way, load() and dump() are limited to filenames (string).
Should they accept file-like object? isinstance(filename, str) may be
used to check if the parameter is a filename or a open file object.

>> Metric
>> ------
>>
>> ``Metric(name: str, value: int, format: str)`` class:
>>
>>     Value of a metric when a snapshot is created.
>
> Alright, what's a metric again ;-) ?
>
> I don't know if it's customary, but having short examples would IMO be nice.

=> done, I improved the doc

Victor