[Python-ideas] Sampling Profiler for Python
t.landschoff at gmx.net
Wed Jan 4 23:32:04 CET 2012
since I moved from Java development to Python development, I really miss
the sampling profiler of jvisualvm (http://visualvm.java.net/). Perhaps
it is only me but especially for long running GUI applications, I would
really like to know why it sometimes slows down to a crawl.
cProfile is not really useful for this as it makes the software
unresponsive alone. Also if using multiple threads (like doing some I/O
in background, when using http://www.lag.net/paramiko/) it will give
I looked for an existing profiler and found statprof, mentioned in this
However, this still fails profiling multiple threads. I don't think one
can do any better without touching the interpreter core. So that's what
My approach implements the profiler from the following parts:
1. Sample Trigger: A timer that triggers taking samples of all
threads at fixed intervals.
2. Sampling function injection: Ensures that a sampling function is
called on each thread at the first opportunity after the sample
3. Sampling function: A function that assigns the current time slice
to the executing code on the thread it is called on.
Only the second part has to be implemented in the Python interpreter
core, the sample trigger and sampling functions can be provided by
external modules. My implementation of (2) can be found at
I also prepared a first implementation of (1) and (3) at
https://bitbucket.org/bluehorn/profiler_util. Using that library, I can
profile pybench with a 3 % slowdown (taking 100 profiling samples per
$ export PYTHON_PATH=$HOME/python-work/profiler_util
$ python3 -m sampling_profiler -- Tools/pybench/pybench.py -c
stock.pybench -f prof.pybench
Totals: 3119ms 3072ms +1.5% 3198ms
This also shows that the string benchmark takes the most time:
Thread MainThread (3842 samples, 38.434401 seconds)
cpu_time (cum) ticks (cum) samples (cum)
0.548 0.548 55 55 55 55
0.448 0.448 45 45 45 45
0.428 0.428 43 43 43 43
0.380 0.380 38 38 38 38
0.324 0.324 34 34 34 34
The output is explained as follows:
cpu_time: cpu time in seconds accounted to that line.
(cum): This is the cpu time including the cumulative time of the
functions called on that line. Gathering that data means that the
sampling function has to walk the stack for each sample, which will
cause a dramatic slowdown for deeply recursive calls. There should be a
switch to disable this.
ticks: Number of sampling profiler ticks accounted to that line. For my
example (where sample ticks are equidistant in wall time) this is a
measure of the time spent in that line, including any blocking calls
(cum): ... including called functions
samples: Number of times the sampling profiler function was called for
that line. This should equal ticks unless execution of that line took
longer than a profiler tick.
The remaining output should be obvious.
I have to admit that this is far from finished, but I hope to get some
feedback and perhaps even help by posting it here.
What do you think?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-ideas