[Python-Dev] Benchmarking "fun" (was Re: Python 2.1 slower than 2.0)
Michael Hudson
mwh21@cam.ac.uk
30 Jan 2001 08:30:15 +0000
In the interest of generating some numbers (and filling up my hard
drive), last night I wrote a script to build lots & lots of versions
of python (many of which turned out to be redundant - eg. -O6 didn't
seem to do anything different to -O3 and pybench doesn't work with
1.5.2), and then run pybench with them. Summarised results below;
first a key:
src-n: this morning's CVS (with Jeremy's f_localsplus optimisation)
(only built this with -O3)
src: CVS from yesterday afternoon
src-obmalloc: CVS from yesterday afternoon with Vladimir's obmalloc
patch applied. More on this later...
Python-2.0: you can guess what this is.
All runs are compared against Python-2.0-O2:
Benchmark: src-n-O3 (rounds=10, warp=20)
Average round time: 49029.00 ms -0.86%
Benchmark: src (rounds=10, warp=20)
Average round time: 67141.00 ms +35.76%
Benchmark: src-O (rounds=10, warp=20)
Average round time: 50167.00 ms +1.44%
Benchmark: src-O2 (rounds=10, warp=20)
Average round time: 49641.00 ms +0.37%
Benchmark: src-O3 (rounds=10, warp=20)
Average round time: 49104.00 ms -0.71%
Benchmark: src-O6 (rounds=10, warp=20)
Average round time: 49131.00 ms -0.66%
Benchmark: src-obmalloc (rounds=10, warp=20)
Average round time: 63276.00 ms +27.94%
Benchmark: src-obmalloc-O (rounds=10, warp=20)
Average round time: 46927.00 ms -5.11%
Benchmark: src-obmalloc-O2 (rounds=10, warp=20)
Average round time: 46146.00 ms -6.69%
Benchmark: src-obmalloc-O3 (rounds=10, warp=20)
Average round time: 46456.00 ms -6.07%
Benchmark: src-obmalloc-O6 (rounds=10, warp=20)
Average round time: 46450.00 ms -6.08%
Benchmark: Python-2.0 (rounds=10, warp=20)
Average round time: 68933.00 ms +39.38%
Benchmark: Python-2.0-O (rounds=10, warp=20)
Average round time: 49542.00 ms +0.17%
Benchmark: Python-2.0-O3 (rounds=10, warp=20)
Average round time: 48262.00 ms -2.41%
Benchmark: Python-2.0-O6 (rounds=10, warp=20)
Average round time: 48273.00 ms -2.39%
My conclusion? Python 2.1 is slower than Python 2.0, but not by
enough to care about.
Interestingly, adding obmalloc speeds things up. Let's take a closer
look:
$ python pybench.py -c src-obmalloc-O3 -s src-O3
PYBENCH 0.7
Benchmark: src-O3 (rounds=10, warp=20)
Tests: per run per oper. diff *
------------------------------------------------------------------------
BuiltinFunctionCalls: 843.35 ms 6.61 us +2.93%
BuiltinMethodLookup: 878.70 ms 1.67 us +0.56%
ConcatStrings: 1068.80 ms 7.13 us -1.22%
ConcatUnicode: 1373.70 ms 9.16 us -1.24%
CreateInstances: 1433.55 ms 34.13 us +9.06%
CreateStringsWithConcat: 1031.75 ms 5.16 us +10.95%
CreateUnicodeWithConcat: 1277.85 ms 6.39 us +3.14%
DictCreation: 1275.80 ms 8.51 us +44.22%
ForLoops: 1415.90 ms 141.59 us -0.64%
IfThenElse: 1152.70 ms 1.71 us -0.15%
ListSlicing: 397.40 ms 113.54 us -0.53%
NestedForLoops: 789.75 ms 2.26 us -0.37%
NormalClassAttribute: 935.15 ms 1.56 us -0.41%
NormalInstanceAttribute: 961.15 ms 1.60 us -0.60%
PythonFunctionCalls: 1079.65 ms 6.54 us -1.00%
PythonMethodCalls: 908.05 ms 12.11 us -0.88%
Recursion: 838.50 ms 67.08 us -0.00%
SecondImport: 741.20 ms 29.65 us +25.57%
SecondPackageImport: 744.25 ms 29.77 us +18.66%
SecondSubmoduleImport: 947.05 ms 37.88 us +25.60%
SimpleComplexArithmetic: 1129.40 ms 5.13 us +114.92%
SimpleDictManipulation: 1048.55 ms 3.50 us -0.00%
SimpleFloatArithmetic: 746.05 ms 1.36 us -2.75%
SimpleIntFloatArithmetic: 823.35 ms 1.25 us -0.37%
SimpleIntegerArithmetic: 823.40 ms 1.25 us -0.37%
SimpleListManipulation: 1004.70 ms 3.72 us +0.01%
SimpleLongArithmetic: 865.30 ms 5.24 us +100.65%
SmallLists: 1657.65 ms 6.50 us +6.63%
SmallTuples: 1143.95 ms 4.77 us +2.90%
SpecialClassAttribute: 949.00 ms 1.58 us -0.22%
SpecialInstanceAttribute: 1353.05 ms 2.26 us -0.73%
StringMappings: 1161.00 ms 9.21 us +7.30%
StringPredicates: 1069.65 ms 3.82 us -5.30%
StringSlicing: 846.30 ms 4.84 us +8.61%
TryExcept: 1590.40 ms 1.06 us -0.49%
TryRaiseExcept: 1104.65 ms 73.64 us +24.46%
TupleSlicing: 681.10 ms 6.49 us -3.13%
UnicodeMappings: 1021.70 ms 56.76 us +0.79%
UnicodePredicates: 1308.45 ms 5.82 us -4.79%
UnicodeProperties: 1148.45 ms 5.74 us +13.67%
UnicodeSlicing: 984.15 ms 5.62 us -0.51%
------------------------------------------------------------------------
Average round time: 49104.00 ms +5.70%
*) measured against: src-obmalloc-O3 (rounds=10, warp=20)
Words fail me slightly, but maybe some tuning of the memory allocation
of longs & complex numbers would be in order?
Time for lectures - I don't think algebraic geometry is going to make
my head hurt as much as trying to explain benchmarks...
Cheers,
M.
--
ARTHUR: But which is probably incapable of drinking the coffee.
-- The Hitch-Hikers Guide to the Galaxy, Episode 6