Mailman 3 Re: [Speed] When CPython performance depends on dead code... - Speed

28 Apr 2016 · *seems*


      Hi,
2016-04-27 20:30 GMT+02:00 Brett Cannon <brett@python.org>:
...
My first intuition is some cache somewhere is unhappy w/ the varying sizes.
Have you tried any of this on another machine to see if the results are
consistent?
On my laptop, the performance when I add deadcode doesn't seem to
change much: the delta is smaller than 1%.
I found a fix for my deadcode issue! Use "make profile-opt" rather
than "make". Using PGO, GCC reorders hot functions to make them
closer. I also read that it records statistics on branches to emit
first the most frequent branch.
I also modified bm_call_simple.py to use multiple processes and to use
random hash seeds, rather than using a single process and disabling
hash randomization.
Comparison reference => fastcall (my whole fork, not just the tiny
patches adding deadcode) using make (gcc -O3):
Average: 1183.5 ms +/- 6.1 ms (min: 1173.3 ms, max: 1201.9 ms) -
15 processes x 5 loops
=> Average: 1121.2 ms +/- 7.4 ms (min: 1106.5 ms, max: 1142.0 ms) - 15
processes x 5 loops
Comparison reference => fastcall using make profile-opt (PGO):
Average: 962.7 ms +/- 17.8 ms (min: 952.6 ms, max: 998.6 ms) - 15
processes x 5 loops
=> Average: 961.1 ms +/- 18.6 ms (min: 949.0 ms, max: 1011.3 ms) - 15
processes x 5 loops
Using make, fastcall *seems* to be faster, but in fact it looks more
like random noise of deadcode. Using PGO, fastcall doesn't change
performance at all. I expected fastcall to be faster, but it's the
purpose of benchmarks: get real performance, not expectations :-)
Next step: modify most benchmarks of perf.py to run multiple processes
rather than a single process to test using multiple hash seeds.
Victor

Re: [Speed] When CPython performance depends on dead code...

Victor Stinner

tags

participants (1)