[Python-Dev] Python 3 optimizations, continued, continued again...

stefan brunthaler s.brunthaler at uci.edu
Sat Jan 28 02:28:28 CET 2012


Hi,

On Tue, Nov 8, 2011 at 10:36, Benjamin Peterson <benjamin at python.org> wrote:
> 2011/11/8 stefan brunthaler <s.brunthaler at uci.edu>:
>> How does that sound?
>
> I think I can hear real patches and benchmarks most clearly.
>
I spent the better part of my -20% time on implementing the work as
"suggested". Please find the benchmarks attached to this email, I just
did them on my system (i7-920, Linux 3.0.0-15, GCC 4.6.1). I branched
off the regular 3.3a0 default tip changeset 73977 shortly after your
email. I do not have an official patch yet, but am going to create one
if wanted. Changes to the existing interpreter are minimal, the
biggest chunk is a new interpreter dispatch loop.

Merging dispatch loops eliminates some of my optimizations, but my
inline caching technique enables inlining some functionality, which
results in visible speedups. The code is normalized to the
non-threaded-code version of the CPython interpreter (named
"vanilla"), so that I can reference it to my preceding results. I
anticipate *no* compatibility issues and the interpreter requires less
than 100 KiB of extra memory at run-time. Since my interpreter is
using 215 of a maximum of 255 instructions, there is room for adding
additional derivatives, e.g., for popular Python libraries, too.


Let me know what python-dev thinks of this and have a nice weekend,
--stefan

PS: AFAIR the version without partial stack frame caching also passes
all regression tests modulo the ones that test against specific
bytecodes.
-------------- next part --------------
currently processing:  bench/binarytrees.py3.py
phd-cpy-3a0-thr-cod-pytho      arg:     10 | time:   0.161876  | stdev:  0.007780 | var:  0.000061 | mem:   6633.60
phd-cpy-3a0-thr-cod-pytho      arg:     12 | time:   0.699243  | stdev:  0.019112 | var:  0.000365 | mem:   8142.67
phd-cpy-3a0-thr-cod-pytho      arg:     14 | time:   3.388344  | stdev:  0.048042 | var:  0.002308 | mem:  13586.93
phd-cpy-pio-sne-pre-pyt-no-psf arg:     10 | time:   0.153875  | stdev:  0.003828 | var:  0.000015 | mem:   6873.73
phd-cpy-pio-sne-pre-pyt-no-psf arg:     12 | time:   0.632572  | stdev:  0.019121 | var:  0.000366 | mem:   8246.27
phd-cpy-pio-sne-pre-pyt-no-psf arg:     14 | time:   3.020988  | stdev:  0.043483 | var:  0.001891 | mem:  13640.27
phd-cpy-pio-sne-pre-pytho      arg:     10 | time:   0.150942  | stdev:  0.005157 | var:  0.000027 | mem:   6901.87
phd-cpy-pio-sne-pre-pytho      arg:     12 | time:   0.660841  | stdev:  0.020538 | var:  0.000422 | mem:   8286.80
phd-cpy-pio-sne-pre-pytho      arg:     14 | time:   3.184198  | stdev:  0.051103 | var:  0.002612 | mem:  13680.40
phd-cpy-3a0-van-pytho          arg:     10 | time:   0.202812  | stdev:  0.005480 | var:  0.000030 | mem:   6633.33
phd-cpy-3a0-van-pytho          arg:     12 | time:   0.908456  | stdev:  0.015744 | var:  0.000248 | mem:   8153.07
phd-cpy-3a0-van-pytho          arg:     14 | time:   4.364805  | stdev:  0.037522 | var:  0.001408 | mem:  13593.60
### phd-cpy-3a0-thr-cod-pytho     :  1.2887 (avg-sum:   1.416488)
### phd-cpy-pio-sne-pre-pyt-no-psf:  1.4383 (avg-sum:   1.269145)
### phd-cpy-pio-sne-pre-pytho     :  1.3704 (avg-sum:   1.331994)
### phd-cpy-3a0-van-pytho         :  1.0000 (avg-sum:   1.825358)
currently processing:  bench/fannkuch.py3.py
phd-cpy-3a0-thr-cod-pytho      arg:      8 | time:   0.172677  | stdev:  0.006620 | var:  0.000044 | mem:   6424.13
phd-cpy-3a0-thr-cod-pytho      arg:      9 | time:   1.426755  | stdev:  0.035545 | var:  0.001263 | mem:   6425.20
phd-cpy-pio-sne-pre-pyt-no-psf arg:      8 | time:   0.168010  | stdev:  0.010277 | var:  0.000106 | mem:   6481.07
phd-cpy-pio-sne-pre-pyt-no-psf arg:      9 | time:   1.345817  | stdev:  0.033127 | var:  0.001097 | mem:   6479.60
phd-cpy-pio-sne-pre-pytho      arg:      8 | time:   0.165876  | stdev:  0.007136 | var:  0.000051 | mem:   6520.00
phd-cpy-pio-sne-pre-pytho      arg:      9 | time:   1.351150  | stdev:  0.028822 | var:  0.000831 | mem:   6519.73
phd-cpy-3a0-van-pytho          arg:      8 | time:   0.216146  | stdev:  0.012879 | var:  0.000166 | mem:   6419.07
phd-cpy-3a0-van-pytho          arg:      9 | time:   1.834247  | stdev:  0.028224 | var:  0.000797 | mem:   6418.67
### phd-cpy-3a0-thr-cod-pytho     :  1.2820 (avg-sum:   0.799716)
### phd-cpy-pio-sne-pre-pyt-no-psf:  1.3544 (avg-sum:   0.756913)
### phd-cpy-pio-sne-pre-pytho     :  1.3516 (avg-sum:   0.758513)
### phd-cpy-3a0-van-pytho         :  1.0000 (avg-sum:   1.025197)
currently processing:  bench/fasta.py3.py
phd-cpy-3a0-thr-cod-pytho      arg:  50000 | time:   0.374023  | stdev:  0.010870 | var:  0.000118 | mem:   6495.07
phd-cpy-3a0-thr-cod-pytho      arg: 100000 | time:   0.714577  | stdev:  0.024713 | var:  0.000611 | mem:   6495.47
phd-cpy-3a0-thr-cod-pytho      arg: 150000 | time:   1.062866  | stdev:  0.040138 | var:  0.001611 | mem:   6496.27
phd-cpy-pio-sne-pre-pyt-no-psf arg:  50000 | time:   0.345621  | stdev:  0.022549 | var:  0.000508 | mem:   6551.87
phd-cpy-pio-sne-pre-pyt-no-psf arg: 100000 | time:   0.656174  | stdev:  0.031608 | var:  0.000999 | mem:   6551.60
phd-cpy-pio-sne-pre-pyt-no-psf arg: 150000 | time:   0.964326  | stdev:  0.046202 | var:  0.002135 | mem:   6552.13
phd-cpy-pio-sne-pre-pytho      arg:  50000 | time:   0.381223  | stdev:  0.015771 | var:  0.000249 | mem:   6592.40
phd-cpy-pio-sne-pre-pytho      arg: 100000 | time:   0.739112  | stdev:  0.035685 | var:  0.001273 | mem:   6591.60
phd-cpy-pio-sne-pre-pytho      arg: 150000 | time:   1.080334  | stdev:  0.035524 | var:  0.001262 | mem:   6591.73
phd-cpy-3a0-van-pytho          arg:  50000 | time:   0.417759  | stdev:  0.016483 | var:  0.000272 | mem:   6490.27
phd-cpy-3a0-van-pytho          arg: 100000 | time:   0.788182  | stdev:  0.019665 | var:  0.000387 | mem:   6492.40
phd-cpy-3a0-van-pytho          arg: 150000 | time:   1.187140  | stdev:  0.035640 | var:  0.001270 | mem:   6491.73
### phd-cpy-3a0-thr-cod-pytho     :  1.1123 (avg-sum:   0.717155)
### phd-cpy-pio-sne-pre-pyt-no-psf:  1.2172 (avg-sum:   0.655374)
### phd-cpy-pio-sne-pre-pytho     :  1.0874 (avg-sum:   0.733556)
### phd-cpy-3a0-van-pytho         :  1.0000 (avg-sum:   0.797694)
currently processing:  mandelbrot.py
phd-cpy-3a0-thr-cod-pytho      arg:    200 | time:   0.244281  | stdev:  0.009795 | var:  0.000096 | mem:   6424.13
phd-cpy-3a0-thr-cod-pytho      arg:    400 | time:   0.861120  | stdev:  0.019812 | var:  0.000393 | mem:   6501.87
phd-cpy-3a0-thr-cod-pytho      arg:    500 | time:   1.338883  | stdev:  0.029741 | var:  0.000885 | mem:   6730.67
phd-cpy-pio-sne-pre-pyt-no-psf arg:    200 | time:   0.220013  | stdev:  0.013307 | var:  0.000177 | mem:   6476.00
phd-cpy-pio-sne-pre-pyt-no-psf arg:    400 | time:   0.789915  | stdev:  0.028319 | var:  0.000802 | mem:   6566.00
phd-cpy-pio-sne-pre-pyt-no-psf arg:    500 | time:   1.180740  | stdev:  0.042762 | var:  0.001829 | mem:   6794.00
phd-cpy-pio-sne-pre-pytho      arg:    200 | time:   0.218946  | stdev:  0.014494 | var:  0.000210 | mem:   6519.47
phd-cpy-pio-sne-pre-pytho      arg:    400 | time:   0.767381  | stdev:  0.042411 | var:  0.001799 | mem:   6614.67
phd-cpy-pio-sne-pre-pytho      arg:    500 | time:   1.162739  | stdev:  0.029852 | var:  0.000891 | mem:   6842.67
phd-cpy-3a0-van-pytho          arg:    200 | time:   0.328553  | stdev:  0.009619 | var:  0.000093 | mem:   6419.60
phd-cpy-3a0-van-pytho          arg:    400 | time:   1.202208  | stdev:  0.018670 | var:  0.000349 | mem:   6514.27
phd-cpy-3a0-van-pytho          arg:    500 | time:   1.860382  | stdev:  0.036647 | var:  0.001343 | mem:   6712.93
### phd-cpy-3a0-thr-cod-pytho     :  1.3874 (avg-sum:   0.814761)
### phd-cpy-pio-sne-pre-pyt-no-psf:  1.5480 (avg-sum:   0.730223)
### phd-cpy-pio-sne-pre-pytho     :  1.5780 (avg-sum:   0.716355)
### phd-cpy-3a0-van-pytho         :  1.0000 (avg-sum:   1.130381)
currently processing:  bench/nbody.py3.py
phd-cpy-3a0-thr-cod-pytho      arg:  50000 | time:   0.907789  | stdev:  0.021787 | var:  0.000475 | mem:   6668.13
phd-cpy-3a0-thr-cod-pytho      arg: 100000 | time:   1.788778  | stdev:  0.042285 | var:  0.001788 | mem:   6674.67
phd-cpy-3a0-thr-cod-pytho      arg: 150000 | time:   2.666433  | stdev:  0.062115 | var:  0.003858 | mem:   6663.20
phd-cpy-pio-sne-pre-pyt-no-psf arg:  50000 | time:   0.789515  | stdev:  0.022475 | var:  0.000505 | mem:   6720.00
phd-cpy-pio-sne-pre-pyt-no-psf arg: 100000 | time:   1.525695  | stdev:  0.039957 | var:  0.001597 | mem:   6735.87
phd-cpy-pio-sne-pre-pyt-no-psf arg: 150000 | time:   2.283342  | stdev:  0.071985 | var:  0.005182 | mem:   6730.93
phd-cpy-pio-sne-pre-pytho      arg:  50000 | time:   0.789915  | stdev:  0.012848 | var:  0.000165 | mem:   6771.47
phd-cpy-pio-sne-pre-pytho      arg: 100000 | time:   1.563297  | stdev:  0.033950 | var:  0.001153 | mem:   6770.00
phd-cpy-pio-sne-pre-pytho      arg: 150000 | time:   2.324945  | stdev:  0.050021 | var:  0.002502 | mem:   6768.93
phd-cpy-3a0-van-pytho          arg:  50000 | time:   1.167939  | stdev:  0.025035 | var:  0.000627 | mem:   6666.80
phd-cpy-3a0-van-pytho          arg: 100000 | time:   2.327478  | stdev:  0.047759 | var:  0.002281 | mem:   6666.93
phd-cpy-3a0-van-pytho          arg: 150000 | time:   3.434881  | stdev:  0.066780 | var:  0.004460 | mem:   6666.67
### phd-cpy-3a0-thr-cod-pytho     :  1.2922 (avg-sum:   1.787667)
### phd-cpy-pio-sne-pre-pyt-no-psf:  1.5071 (avg-sum:   1.532851)
### phd-cpy-pio-sne-pre-pytho     :  1.4814 (avg-sum:   1.559386)
### phd-cpy-3a0-van-pytho         :  1.0000 (avg-sum:   2.310099)
currently processing:  bench/spectralnorm.py3.py
phd-cpy-3a0-thr-cod-pytho      arg:    100 | time:   0.267083  | stdev:  0.010964 | var:  0.000120 | mem:   6548.80
phd-cpy-3a0-thr-cod-pytho      arg:    200 | time:   0.970060  | stdev:  0.023750 | var:  0.000564 | mem:   6539.20
phd-cpy-3a0-thr-cod-pytho      arg:    300 | time:   2.160668  | stdev:  0.044157 | var:  0.001950 | mem:   6528.93
phd-cpy-pio-sne-pre-pyt-no-psf arg:    100 | time:   0.233081  | stdev:  0.007929 | var:  0.000063 | mem:   6611.87
phd-cpy-pio-sne-pre-pyt-no-psf arg:    200 | time:   0.837918  | stdev:  0.019807 | var:  0.000392 | mem:   6596.80
phd-cpy-pio-sne-pre-pyt-no-psf arg:    300 | time:   1.865183  | stdev:  0.028789 | var:  0.000829 | mem:   6616.40
phd-cpy-pio-sne-pre-pytho      arg:    100 | time:   0.241614  | stdev:  0.006662 | var:  0.000044 | mem:   6647.60
phd-cpy-pio-sne-pre-pytho      arg:    200 | time:   0.870454  | stdev:  0.017455 | var:  0.000305 | mem:   6646.53
phd-cpy-pio-sne-pre-pytho      arg:    300 | time:   1.969456  | stdev:  0.052760 | var:  0.002784 | mem:   6651.33
phd-cpy-3a0-van-pytho          arg:    100 | time:   0.355088  | stdev:  0.007057 | var:  0.000050 | mem:   6545.07
phd-cpy-3a0-van-pytho          arg:    200 | time:   1.335549  | stdev:  0.021511 | var:  0.000463 | mem:   6555.47
phd-cpy-3a0-van-pytho          arg:    300 | time:   3.042990  | stdev:  0.032533 | var:  0.001058 | mem:   6599.87
### phd-cpy-3a0-thr-cod-pytho     :  1.3931 (avg-sum:   1.132603)
### phd-cpy-pio-sne-pre-pyt-no-psf:  1.6122 (avg-sum:   0.978727)
### phd-cpy-pio-sne-pre-pytho     :  1.5361 (avg-sum:   1.027175)
### phd-cpy-3a0-van-pytho         :  1.0000 (avg-sum:   1.577876)
Overall performance:
  Interpreter: cpython-3.3a0-threaded-code/python           :  1.129733 (speedup:  1.3004, counts: 510)
Overall performance:
  Interpreter: cpython-pio-sneak-preview/python-no-psfc     :  1.000752 (speedup:  1.4680, counts: 510)
Overall performance:
  Interpreter: cpython-pio-sneak-preview/python             :  1.036613 (speedup:  1.4172, counts: 510)
Overall performance:
  Interpreter: cpython-3.3a0-vanilla/python                 :  1.469095 (speedup:  1.0000, counts: 510)


More information about the Python-Dev mailing list