Snapshot formats in tracemalloc vs profiler
Hi everyone, Something occurred to me while trying to analyze code today: profiler and cProfiler emit their data in pstats format, which various tools and libraries consume. tracemalloc, on the other hand, uses a completely separate format which nonetheless contains similar data. In fact, in many non-Python applications I've worked in, heap and CPU profiles were always emitted in identical formats, which allowed things like visual representations of stack traces where memory is allocated, and these have proven quite useful in practice and allowed lots of sharing of tools across many applications. Is there a particular design reason why these formats are different in Python right now? Would it make sense to consider allowing them to match, e.g. having a tracemalloc.dump_pstats() method? Yonatan
Hi, I designed tracemalloc with Charles-François Natali in PEP 454. The API is a lightweight abstraction on top of the internal C structures used by the C _tracemalloc module which is designed to minimize the memory footprint. I'm not aware of the pstats format. Adding a new tracemalloc.dump_pstats() function looks like a good idea. Does pstats allow to attach arbitrary data to a traceback? The root structure of tracemalloc is basically the tuple (size: int, traceback) (trace_t structure in C). Victor Le jeu. 27 juin 2019 à 21:03, Yonatan Zunger <zunger@humu.com> a écrit :
Hi everyone,
Something occurred to me while trying to analyze code today: profiler and cProfiler emit their data in pstats format, which various tools and libraries consume. tracemalloc, on the other hand, uses a completely separate format which nonetheless contains similar data. In fact, in many non-Python applications I've worked in, heap and CPU profiles were always emitted in identical formats, which allowed things like visual representations of stack traces where memory is allocated, and these have proven quite useful in practice and allowed lots of sharing of tools across many applications.
Is there a particular design reason why these formats are different in Python right now? Would it make sense to consider allowing them to match, e.g. having a tracemalloc.dump_pstats() method?
Yonatan _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/3JFFWGJ5...
-- Night gathers, and now my watch begins. It shall not end until my death.
It's similar, but not quite the same -- I was just trying to see if I could build a neatly Pythonic library to do the conversion. The CPU profilers are basically building a dict from (filename, lineno, funcname) to a tuple (from a comment in profile.py): [0] = The number of times this function was called, not counting direct or indirect recursion, [1] = Number of times this function appears on the stack, minus one [2] = Total time spent internal to this function [3] = Cumulative time that this function was present on the stack. In non-recursive functions, this is the total execution time from start to finish of each invocation of a function, including time spent in all subfunctions. [4] = A dictionary indicating for each function name, the number of times it was called by us. pstats serializes this dict in a particular format which various other tools can read, like gprof2dot. The challenge in translating is that building (4), or handling recursion for (0) and (3), really requires instrumentation at the CPU trace points as well, which would probably be a good answer to my original question of why not. :) However, there are other profiling formats which are used outside the Python community, have good tooling support, and could be much easier to deal with; for example, there's the pprof format <https://github.com/google/pprof/tree/master/proto>, which is almost ludicrously versatile; it's meant for profiling both compiled and interpreted languages, so it's very flexible as to what constitutes a "line." So if I have the time, and knowing that there's no intrinsic thing to fear in all of this, I'll see if I can implement a pprof translator for tracemalloc snapshots. Although while I have you hear, I do have a further question about how tracemalloc works: If I'm reading the code correctly, traces get removed by tracemalloc when objects are free, which means that at equilibrium (e.g. at the end of a function) the trace would show just the data which leaked. That's very useful in most cases, but I'm trying to hunt down a situation where memory usage is transiently spiking -- which might be due to something being actively used, or to something building up and overwhelming the GC, or to evil elves in the CPU for all I can tell so far. Would it be completely insane for tracemalloc to have a mode where it either records frees separately (e.g. as a malloc of negative space, at the trace where the free is happening), or where it simply ignores frees altogether? On Thu, Jun 27, 2019 at 3:08 PM Victor Stinner <vstinner@redhat.com> wrote:
Hi,
I designed tracemalloc with Charles-François Natali in PEP 454. The API is a lightweight abstraction on top of the internal C structures used by the C _tracemalloc module which is designed to minimize the memory footprint.
I'm not aware of the pstats format. Adding a new tracemalloc.dump_pstats() function looks like a good idea. Does pstats allow to attach arbitrary data to a traceback? The root structure of tracemalloc is basically the tuple (size: int, traceback) (trace_t structure in C).
Victor
Le jeu. 27 juin 2019 à 21:03, Yonatan Zunger <zunger@humu.com> a écrit :
Hi everyone,
Something occurred to me while trying to analyze code today: profiler
and cProfiler emit their data in pstats format, which various tools and libraries consume. tracemalloc, on the other hand, uses a completely separate format which nonetheless contains similar data. In fact, in many non-Python applications I've worked in, heap and CPU profiles were always emitted in identical formats, which allowed things like visual representations of stack traces where memory is allocated, and these have proven quite useful in practice and allowed lots of sharing of tools across many applications.
Is there a particular design reason why these formats are different in
Python right now? Would it make sense to consider allowing them to match, e.g. having a tracemalloc.dump_pstats() method?
Yonatan _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/3JFFWGJ5...
-- Night gathers, and now my watch begins. It shall not end until my death.
Le ven. 28 juin 2019 à 01:03, Yonatan Zunger <zunger@humu.com> a écrit :
Although while I have you hear, I do have a further question about how tracemalloc works: If I'm reading the code correctly, traces get removed by tracemalloc when objects are free, which means that at equilibrium (e.g. at the end of a function) the trace would show just the data which leaked. That's very useful in most cases, but I'm trying to hunt down a situation where memory usage is transiently spiking -- which might be due to something being actively used, or to something building up and overwhelming the GC, or to evil elves in the CPU for all I can tell so far. Would it be completely insane for tracemalloc to have a mode where it either records frees separately (e.g. as a malloc of negative space, at the trace where the free is happening), or where it simply ignores frees altogether?
My very first implementation of tracemalloc produced a log of malloc and free calls. Problem: transferring the log from a slow set top box to a desktop computer was slow, and parsing the log was very slow. Parsing complexity is in O(n) where n is the number of malloc or free calls, knowning that Python calls malloc(), realloc() or free() 270,000 times per second in average: https://www.python.org/dev/peps/pep-0454/#log-calls-to-the-memory-allocator tracemalloc is built on top of PEP 445 -- Add new APIs to customize Python memory allocators: https://www.python.org/dev/peps/pep-0445/ Using these PEP 445 hooks, you should be able to do whatever you want on Python memory allocations and free :-) Example of toy project to inject memory allocation failures: https://github.com/vstinner/pyfailmalloc Victor
Well, then. I think I'm going to have some fun with this. :) Thank you! On Thu, Jun 27, 2019 at 4:17 PM Victor Stinner <vstinner@redhat.com> wrote:
Although while I have you hear, I do have a further question about how
Le ven. 28 juin 2019 à 01:03, Yonatan Zunger <zunger@humu.com> a écrit : tracemalloc works: If I'm reading the code correctly, traces get removed by tracemalloc when objects are free, which means that at equilibrium (e.g. at the end of a function) the trace would show just the data which leaked. That's very useful in most cases, but I'm trying to hunt down a situation where memory usage is transiently spiking -- which might be due to something being actively used, or to something building up and overwhelming the GC, or to evil elves in the CPU for all I can tell so far. Would it be completely insane for tracemalloc to have a mode where it either records frees separately (e.g. as a malloc of negative space, at the trace where the free is happening), or where it simply ignores frees altogether?
My very first implementation of tracemalloc produced a log of malloc and free calls. Problem: transferring the log from a slow set top box to a desktop computer was slow, and parsing the log was very slow. Parsing complexity is in O(n) where n is the number of malloc or free calls, knowning that Python calls malloc(), realloc() or free() 270,000 times per second in average: https://www.python.org/dev/peps/pep-0454/#log-calls-to-the-memory-allocator
tracemalloc is built on top of PEP 445 -- Add new APIs to customize Python memory allocators: https://www.python.org/dev/peps/pep-0445/
Using these PEP 445 hooks, you should be able to do whatever you want on Python memory allocations and free :-)
Example of toy project to inject memory allocation failures: https://github.com/vstinner/pyfailmalloc
Victor
Update: Thanks to Victor's advice and the PEP445 hooks, I put together a pretty comprehensive logging/sampling heap profiler for Python, and it works great. The package is now available via pip <https://pypi.org/project/heapprof/> for anyone who needs it! On Thu, Jun 27, 2019 at 4:21 PM Yonatan Zunger <zunger@humu.com> wrote:
Well, then. I think I'm going to have some fun with this. :)
Thank you!
On Thu, Jun 27, 2019 at 4:17 PM Victor Stinner <vstinner@redhat.com> wrote:
Although while I have you hear, I do have a further question about how
Le ven. 28 juin 2019 à 01:03, Yonatan Zunger <zunger@humu.com> a écrit : tracemalloc works: If I'm reading the code correctly, traces get removed by tracemalloc when objects are free, which means that at equilibrium (e.g. at the end of a function) the trace would show just the data which leaked. That's very useful in most cases, but I'm trying to hunt down a situation where memory usage is transiently spiking -- which might be due to something being actively used, or to something building up and overwhelming the GC, or to evil elves in the CPU for all I can tell so far. Would it be completely insane for tracemalloc to have a mode where it either records frees separately (e.g. as a malloc of negative space, at the trace where the free is happening), or where it simply ignores frees altogether?
My very first implementation of tracemalloc produced a log of malloc and free calls. Problem: transferring the log from a slow set top box to a desktop computer was slow, and parsing the log was very slow. Parsing complexity is in O(n) where n is the number of malloc or free calls, knowning that Python calls malloc(), realloc() or free() 270,000 times per second in average:
https://www.python.org/dev/peps/pep-0454/#log-calls-to-the-memory-allocator
tracemalloc is built on top of PEP 445 -- Add new APIs to customize Python memory allocators: https://www.python.org/dev/peps/pep-0445/
Using these PEP 445 hooks, you should be able to do whatever you want on Python memory allocations and free :-)
Example of toy project to inject memory allocation failures: https://github.com/vstinner/pyfailmalloc
Victor
That looks pretty cool! I'm really happy that PEP 445 hooks are reused for something different than tracemalloc ;-) Victor Le mer. 14 août 2019 à 20:12, Yonatan Zunger <zunger@humu.com> a écrit :
Update: Thanks to Victor's advice and the PEP445 hooks, I put together a pretty comprehensive logging/sampling heap profiler for Python, and it works great. The package is now available via pip for anyone who needs it!
On Thu, Jun 27, 2019 at 4:21 PM Yonatan Zunger <zunger@humu.com> wrote:
Well, then. I think I'm going to have some fun with this. :)
Thank you!
On Thu, Jun 27, 2019 at 4:17 PM Victor Stinner <vstinner@redhat.com> wrote:
Le ven. 28 juin 2019 à 01:03, Yonatan Zunger <zunger@humu.com> a écrit :
Although while I have you hear, I do have a further question about how tracemalloc works: If I'm reading the code correctly, traces get removed by tracemalloc when objects are free, which means that at equilibrium (e.g. at the end of a function) the trace would show just the data which leaked. That's very useful in most cases, but I'm trying to hunt down a situation where memory usage is transiently spiking -- which might be due to something being actively used, or to something building up and overwhelming the GC, or to evil elves in the CPU for all I can tell so far. Would it be completely insane for tracemalloc to have a mode where it either records frees separately (e.g. as a malloc of negative space, at the trace where the free is happening), or where it simply ignores frees altogether?
My very first implementation of tracemalloc produced a log of malloc and free calls. Problem: transferring the log from a slow set top box to a desktop computer was slow, and parsing the log was very slow. Parsing complexity is in O(n) where n is the number of malloc or free calls, knowning that Python calls malloc(), realloc() or free() 270,000 times per second in average: https://www.python.org/dev/peps/pep-0454/#log-calls-to-the-memory-allocator
tracemalloc is built on top of PEP 445 -- Add new APIs to customize Python memory allocators: https://www.python.org/dev/peps/pep-0445/
Using these PEP 445 hooks, you should be able to do whatever you want on Python memory allocations and free :-)
Example of toy project to inject memory allocation failures: https://github.com/vstinner/pyfailmalloc
Victor
-- Night gathers, and now my watch begins. It shall not end until my death.
participants (2)
-
Victor Stinner
-
Yonatan Zunger