[Python-ideas] Sampling Profiler for Python

Wed Jan 4 23:32:04 CET 2012

Hello world,

since I moved from Java development to Python development, I really miss 
the sampling profiler of jvisualvm (http://visualvm.java.net/). Perhaps 
it is only me but especially for long running GUI applications, I would 
really like to know why it sometimes slows down to a crawl.

cProfile is not really useful for this as it makes the software 
unresponsive alone. Also if using multiple threads (like doing some I/O 
in background, when using http://www.lag.net/paramiko/) it will give 
meaningless results.

I looked for an existing profiler and found statprof, mentioned in this 
blog: http://wingolog.org/archives/2005/10/28/profiling

However, this still fails profiling multiple threads. I don't think one 
can do any better without touching the interpreter core. So that's what 
I tried...

My approach implements the profiler from the following parts:

   1. Sample Trigger: A timer that triggers taking samples of all
      threads at fixed intervals.
   2. Sampling function injection: Ensures that a sampling function is
      called on each thread at the first opportunity after the sample
      trigger fired.
   3. Sampling function: A function that assigns the current time slice
      to the executing code on the thread it is called on.

Only the second part has to be implemented in the Python interpreter 
core, the sample trigger and sampling functions can be provided by 
external modules. My implementation of (2) can be found at 
https://bitbucket.org/bluehorn/sampling_prof

I also prepared a first implementation of (1) and (3) at 
https://bitbucket.org/bluehorn/profiler_util. Using that library, I can 
profile pybench with a 3 % slowdown (taking 100 profiling samples per 
second):

$ export PYTHON_PATH=$HOME/python-work/profiler_util
$ python3 -m sampling_profiler -- Tools/pybench/pybench.py -c 
stock.pybench -f prof.pybench
[...]
Totals:                          3119ms  3072ms   +1.5%  3198ms  
3108ms   +2.9%

This also shows that the string benchmark takes the most time:

Thread MainThread (3842 samples, 38.434401 seconds)
------------------------------------------------------------------------
cpu_time    (cum)  ticks  (cum) samples (cum) 
filename:lineno                          function
    0.548    0.548     55     55     55     55 
Tools/pybench/Strings.py:455             test
    0.448    0.448     45     45     45     45 
Tools/pybench/Strings.py:568             calibrate
    0.428    0.428     43     43     43     43 
Tools/pybench/Constructs.py:484          test
    0.380    0.380     38     38     38     38 
Tools/pybench/Calls.py:445               f
    0.324    0.324     34     34     34     34 
Tools/pybench/Calls.py:122               f

The output is explained as follows:

cpu_time: cpu time in seconds accounted to that line.
(cum): This is the cpu time including the cumulative time of the 
functions called on that line. Gathering that data means that the 
sampling function has to walk the stack for each sample, which will 
cause a dramatic slowdown for deeply recursive calls. There should be a 
switch to disable this.
ticks: Number of sampling profiler ticks accounted to that line. For my 
example (where sample ticks are equidistant in wall time) this is a 
measure of the time spent in that line, including any blocking calls 
(especially I/O).
(cum): ... including called functions
samples: Number of times the sampling profiler function was called for 
that line. This should equal ticks unless execution of that line took 
longer than a profiler tick.

The remaining output should be obvious.

I have to admit that this is far from finished, but I hope to get some 
feedback and perhaps even help by posting it here.

What do you think?

Greetings, Torsten

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20120104/664a8811/attachment.html>