[Python-ideas] Sampling Profiler for Python

Torsten Landschoff t.landschoff at gmx.net
Sun Jan 8 02:01:04 CET 2012


Hi Sümer,

On 01/05/2012 07:45 AM, Sümer Cip wrote:
> Have you looked at yappi? yappi.start() accepts a sampling parameter.
no, I haven't because I googled for sampling profiler or statistical 
profiler. I compared running pybench with yappi vs. my sampling profiler 
(with 100 samples/sec).

First, the run time increases in both cases vs. the original performance 
of python. The figures are different because I backported to Python2 
again for the comparison (is yappi available for Python3?)

yappi:
Totals:                          4692ms  3317ms  +41.5%  4831ms  3448ms  
+40.1%

sampling_profiler:
Totals:                          3491ms  3269ms   +6.8%  3638ms  
3435ms   +5.9%

These are the hot spots as reported by yappi:
name                                 #n       tsub       ttot       tavg
Tools/pybench/Calls.py.test:439      10       1.709853   1.709853   0.170985
Tools/pybench/Strings.py.test:467    10       1.383400   2.133651   0.213365
Tools/pybench/With.py.test:16        10       1.179130   1.614330   0.161433
Tools/pybench/Calls.py.test:176      10       1.088372   1.885379   0.188538
Tools/pybench/Unicode.py.test:388    10       1.062785   1.729250   0.172925
Tools/pybench/Calls.py.test:118      10       1.008680   1.284282   0.128428
Tools/pybench/With.py.test:64        10       0.999503   1.431072   0.143107
Tools/pybench/Lists.py.test:9        10       0.995866   1.394482   0.139448
Tools/pybench/Calls.py.test:9        10       0.979050   1.505686   0.150569
Tools/pybench/Dict.py.test:351       10       0.959278   1.290247   0.129025

Compared with the sampling_profiler:
cpu_time    (cum)  ticks  (cum) samples (cum) 
filename:lineno                          function
    0.552    4.224     55    419     55    419 
Tools/pybench/Calls.py:446               f
    0.512    0.512     52     52     52     52 
Tools/pybench/Unicode.py:452             calibrate
    0.436    0.436     44     44     44     44 
Tools/pybench/Unicode.py:542             calibrate
    0.436    0.436     44     44     44     44 
Tools/pybench/Strings.py:562             calibrate
    0.372    0.372     37     37     37     37 
Tools/pybench/Constructs.py:485          test
    0.348    0.348     36     36     36     36 
Tools/pybench/Calls.py:122               f
    0.272    0.272     27     27     27     27 
Tools/pybench/Lists.py:339               test
    0.248    0.248     25     25     25     25 
Tools/pybench/Calls.py:445               f
    0.228    0.228     22     22     22     22 
Tools/pybench/Strings.py:247             calibrate
    0.212    0.212     21     21     21     21 
Tools/pybench/With.py:14                 __exit__

Let's look at leading hot-spots:
1) yappi reports Calls.py line 439 as the hot spot. This refers to this 
code:

     def test(self):                 # <-- line 439
         global f
         def f(x):
             if x > 1:
                 return f(x-1)       # <-- line 446
             return 1
         for i in xrange(self.rounds):
             f(10)
             f(10)
             ...

Here yappi identified the outer function, the sampling profiler points 
directly to the line where most of the time is spent.

2) Most other hot spots reported by sampling_profiler lie in the 
calibrate functions. I wondered by that might be and noticed that 
pybench calls these 20 times per default. As the sampling_profiler 
reports the hot spots based on lines vs. the functions in yappi, this 
means we can not really compare these results.


So let's compare the behaviour of the different profilers on this code:

from time import time

def fib(n):
     if n > 1:
         return fib(n-1) + fib(n-2)
     else:
         return n

def fib2(n):
     stack = [n]
     while stack:
         n = stack.pop()
         if n > 1:
             stack.append(n-1)
             stack.append(n-2)

s = time()
fib(34)
print "fib: %s seconds" % (time() - s)
s = time()
fib2(34)
print "fib2: %s seconds" % (time() - s)

Here are two variants of the fibonnaci function. The first works using 
recursion and returns the correct result. The second is just for 
simulating the same behaviour without recursion by using a local stack. 
The result is not computed since it is not required here - I just want 
to compare how this behaves wrt. our profilers.

When I run this without any profiler, I get

fib: 2.87187790871 seconds
fib2: 5.26651716232 seconds

With yappi, I get this:
fib: 6.42756795883 seconds
fib2: 14.6631779671 seconds

name                                 #n       tsub       ttot       tavg
demo.py.<module>:1                   1        6.427689   21.090864  
21.090864
demo.py.fib2:9                       1        10.353305  14.663173  
14.663173
..atprof/Lib/posixpath.py.dirname:11 1        0.000004   0.000005   
0.000005
..prof/Lib/threading.py.setprofile:8 1        0.000001   0.000001   
0.000001
..n2.6/site-packages/yappi.py.stop:4 1        0.000000   0.000000   
0.000000
demo.py.fib:3                        184549.. 0.000000   0.000000   
0.000000

The slow down for fib is about what I expected, but I have no idea why 
fib2 is slowed down that much. AFAIK, the usual profiling hooks of 
Python are called on entry and exit of each function, not on each line?!

The results are incorrect for the fib function for reasons unknown to me 
- yappi thinks that no time is spent in that function, while the program 
output makes it obvious that more than 6 seconds are spent there.

I tried using the timing_sample parameter to yappi.start. When set to 2, 
all of tsub, ttot and tavg are 0 in the output for this example. I can 
speed up the run to 4.97 seconds for fib and 9.33 s for fib2 using a 
high timing_sample value of 200 or more. That is still quite some 
overhead for keeping it active at all times.

With the sampling profiler, the output is
fib: 3.10812282562 seconds
fib2: 5.91388201714 seconds
Thread MainThread (902 samples, 9.036564 seconds)
------------------------------------------------------------------------
cpu_time    (cum)  ticks  (cum) samples (cum) 
filename:lineno                          function
    0.000    9.005      0    902      0    902 
ython-work/statprof/Lib/runpy.py:122     _run_module_as_main
    0.000    9.005      0    902      0    902 
tatprof/Lib/sampling_profiler.py:203     main
    0.000    9.005      0    902      0    902 
tatprof/Lib/sampling_profiler.py:209 <module>
    0.000    9.005      0    902      0    902 
ython-work/statprof/Lib/runpy.py:34      _run_code
    0.000    5.912      0    592      0    592 demo.py:21 <module>
    2.068    3.092    211    310    211    310 
demo.py:5                                fib
    0.000    3.092      0    310      0    310 demo.py:18 <module>
    2.196    2.196    217    217    217    217 
demo.py:12                               fib2
    1.576    1.576    157    157    157    157 
demo.py:15                               fib2
    0.980    0.980    101    101    101    101 
demo.py:14                               fib2
    0.900    0.900     87     87     87     87 
demo.py:4                                fib
    0.644    0.644     67     67     67     67 
demo.py:11                               fib2
    0.516    0.516     50     50     50     50 
demo.py:13                               fib2
    0.124    0.124     12     12     12     12 
demo.py:7                                fib

Just a small slow down here. And as you can see, this even tells you on 
what lines the most cpu time is spent. While I look at this, I am 
missing per-function totals, but that should be easy to add.
As you can see, the profiler correctly detects that line 21 and line 18 
of the demo code take most of the time (which are the module-level calls 
to fib2 and fib).

For completeness, let's look what cProfile finds out:
torsten at pulsar:~/python-work/statprof$ ./python run_cprofile.py demo.py
fib: 5.83289813995 seconds
fib2: 11.2586638927 seconds
          55364794 function calls (36909866 primitive calls) in 17.092 
CPU seconds

    Ordered by: standard name

    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
         1    0.000    0.000   17.092   17.092 <string>:1(<module>)
         1    0.000    0.000   17.092   17.092 demo.py:1(<module>)
18454929/1    5.833    0.000    5.833    5.833 demo.py:3(fib)
         1    7.831    7.831   11.259   11.259 demo.py:9(fib2)
  18454928    1.421    0.000    1.421    0.000 {method 'append' of 
'list' objects}
         1    0.000    0.000    0.000    0.000 {method 'disable' of 
'_lsprof.Profiler' objects}
  18454929    2.007    0.000    2.007    0.000 {method 'pop' of 'list' 
objects}
         4    0.000    0.000    0.000    0.000 {time.time}

I am impressed how cProfile detects the recursive calls of the fib 
function (I guess that's the info in the ncalls column). Anyway, the 
slow down here is much more noticeable for the same information.


So, why would I prefer the sampling profiler?
a) Less slow down (could still use some optimization)
b) Points directly to the code lines where the hot spots are (based on 
equidistant time samples).
c) Does not distort run time by being active.

Greetings, Torsten




More information about the Python-ideas mailing list