[Python-ideas] Sampling Profiler for Python
Torsten Landschoff
t.landschoff at gmx.net
Sun Jan 8 02:01:04 CET 2012
Hi Sümer,
On 01/05/2012 07:45 AM, Sümer Cip wrote:
> Have you looked at yappi? yappi.start() accepts a sampling parameter.
no, I haven't because I googled for sampling profiler or statistical
profiler. I compared running pybench with yappi vs. my sampling profiler
(with 100 samples/sec).
First, the run time increases in both cases vs. the original performance
of python. The figures are different because I backported to Python2
again for the comparison (is yappi available for Python3?)
yappi:
Totals: 4692ms 3317ms +41.5% 4831ms 3448ms
+40.1%
sampling_profiler:
Totals: 3491ms 3269ms +6.8% 3638ms
3435ms +5.9%
These are the hot spots as reported by yappi:
name #n tsub ttot tavg
Tools/pybench/Calls.py.test:439 10 1.709853 1.709853 0.170985
Tools/pybench/Strings.py.test:467 10 1.383400 2.133651 0.213365
Tools/pybench/With.py.test:16 10 1.179130 1.614330 0.161433
Tools/pybench/Calls.py.test:176 10 1.088372 1.885379 0.188538
Tools/pybench/Unicode.py.test:388 10 1.062785 1.729250 0.172925
Tools/pybench/Calls.py.test:118 10 1.008680 1.284282 0.128428
Tools/pybench/With.py.test:64 10 0.999503 1.431072 0.143107
Tools/pybench/Lists.py.test:9 10 0.995866 1.394482 0.139448
Tools/pybench/Calls.py.test:9 10 0.979050 1.505686 0.150569
Tools/pybench/Dict.py.test:351 10 0.959278 1.290247 0.129025
Compared with the sampling_profiler:
cpu_time (cum) ticks (cum) samples (cum)
filename:lineno function
0.552 4.224 55 419 55 419
Tools/pybench/Calls.py:446 f
0.512 0.512 52 52 52 52
Tools/pybench/Unicode.py:452 calibrate
0.436 0.436 44 44 44 44
Tools/pybench/Unicode.py:542 calibrate
0.436 0.436 44 44 44 44
Tools/pybench/Strings.py:562 calibrate
0.372 0.372 37 37 37 37
Tools/pybench/Constructs.py:485 test
0.348 0.348 36 36 36 36
Tools/pybench/Calls.py:122 f
0.272 0.272 27 27 27 27
Tools/pybench/Lists.py:339 test
0.248 0.248 25 25 25 25
Tools/pybench/Calls.py:445 f
0.228 0.228 22 22 22 22
Tools/pybench/Strings.py:247 calibrate
0.212 0.212 21 21 21 21
Tools/pybench/With.py:14 __exit__
Let's look at leading hot-spots:
1) yappi reports Calls.py line 439 as the hot spot. This refers to this
code:
def test(self): # <-- line 439
global f
def f(x):
if x > 1:
return f(x-1) # <-- line 446
return 1
for i in xrange(self.rounds):
f(10)
f(10)
...
Here yappi identified the outer function, the sampling profiler points
directly to the line where most of the time is spent.
2) Most other hot spots reported by sampling_profiler lie in the
calibrate functions. I wondered by that might be and noticed that
pybench calls these 20 times per default. As the sampling_profiler
reports the hot spots based on lines vs. the functions in yappi, this
means we can not really compare these results.
So let's compare the behaviour of the different profilers on this code:
from time import time
def fib(n):
if n > 1:
return fib(n-1) + fib(n-2)
else:
return n
def fib2(n):
stack = [n]
while stack:
n = stack.pop()
if n > 1:
stack.append(n-1)
stack.append(n-2)
s = time()
fib(34)
print "fib: %s seconds" % (time() - s)
s = time()
fib2(34)
print "fib2: %s seconds" % (time() - s)
Here are two variants of the fibonnaci function. The first works using
recursion and returns the correct result. The second is just for
simulating the same behaviour without recursion by using a local stack.
The result is not computed since it is not required here - I just want
to compare how this behaves wrt. our profilers.
When I run this without any profiler, I get
fib: 2.87187790871 seconds
fib2: 5.26651716232 seconds
With yappi, I get this:
fib: 6.42756795883 seconds
fib2: 14.6631779671 seconds
name #n tsub ttot tavg
demo.py.<module>:1 1 6.427689 21.090864
21.090864
demo.py.fib2:9 1 10.353305 14.663173
14.663173
..atprof/Lib/posixpath.py.dirname:11 1 0.000004 0.000005
0.000005
..prof/Lib/threading.py.setprofile:8 1 0.000001 0.000001
0.000001
..n2.6/site-packages/yappi.py.stop:4 1 0.000000 0.000000
0.000000
demo.py.fib:3 184549.. 0.000000 0.000000
0.000000
The slow down for fib is about what I expected, but I have no idea why
fib2 is slowed down that much. AFAIK, the usual profiling hooks of
Python are called on entry and exit of each function, not on each line?!
The results are incorrect for the fib function for reasons unknown to me
- yappi thinks that no time is spent in that function, while the program
output makes it obvious that more than 6 seconds are spent there.
I tried using the timing_sample parameter to yappi.start. When set to 2,
all of tsub, ttot and tavg are 0 in the output for this example. I can
speed up the run to 4.97 seconds for fib and 9.33 s for fib2 using a
high timing_sample value of 200 or more. That is still quite some
overhead for keeping it active at all times.
With the sampling profiler, the output is
fib: 3.10812282562 seconds
fib2: 5.91388201714 seconds
Thread MainThread (902 samples, 9.036564 seconds)
------------------------------------------------------------------------
cpu_time (cum) ticks (cum) samples (cum)
filename:lineno function
0.000 9.005 0 902 0 902
ython-work/statprof/Lib/runpy.py:122 _run_module_as_main
0.000 9.005 0 902 0 902
tatprof/Lib/sampling_profiler.py:203 main
0.000 9.005 0 902 0 902
tatprof/Lib/sampling_profiler.py:209 <module>
0.000 9.005 0 902 0 902
ython-work/statprof/Lib/runpy.py:34 _run_code
0.000 5.912 0 592 0 592 demo.py:21 <module>
2.068 3.092 211 310 211 310
demo.py:5 fib
0.000 3.092 0 310 0 310 demo.py:18 <module>
2.196 2.196 217 217 217 217
demo.py:12 fib2
1.576 1.576 157 157 157 157
demo.py:15 fib2
0.980 0.980 101 101 101 101
demo.py:14 fib2
0.900 0.900 87 87 87 87
demo.py:4 fib
0.644 0.644 67 67 67 67
demo.py:11 fib2
0.516 0.516 50 50 50 50
demo.py:13 fib2
0.124 0.124 12 12 12 12
demo.py:7 fib
Just a small slow down here. And as you can see, this even tells you on
what lines the most cpu time is spent. While I look at this, I am
missing per-function totals, but that should be easy to add.
As you can see, the profiler correctly detects that line 21 and line 18
of the demo code take most of the time (which are the module-level calls
to fib2 and fib).
For completeness, let's look what cProfile finds out:
torsten at pulsar:~/python-work/statprof$ ./python run_cprofile.py demo.py
fib: 5.83289813995 seconds
fib2: 11.2586638927 seconds
55364794 function calls (36909866 primitive calls) in 17.092
CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 17.092 17.092 <string>:1(<module>)
1 0.000 0.000 17.092 17.092 demo.py:1(<module>)
18454929/1 5.833 0.000 5.833 5.833 demo.py:3(fib)
1 7.831 7.831 11.259 11.259 demo.py:9(fib2)
18454928 1.421 0.000 1.421 0.000 {method 'append' of
'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of
'_lsprof.Profiler' objects}
18454929 2.007 0.000 2.007 0.000 {method 'pop' of 'list'
objects}
4 0.000 0.000 0.000 0.000 {time.time}
I am impressed how cProfile detects the recursive calls of the fib
function (I guess that's the info in the ncalls column). Anyway, the
slow down here is much more noticeable for the same information.
So, why would I prefer the sampling profiler?
a) Less slow down (could still use some optimization)
b) Points directly to the code lines where the hot spots are (based on
equidistant time samples).
c) Does not distort run time by being active.
Greetings, Torsten
More information about the Python-ideas
mailing list