[pypy-commit] extradoc extradoc: Draft blog about PPC
arigo
noreply at buildbot.pypy.org
Fri Oct 16 06:22:09 EDT 2015
Author: Armin Rigo <arigo at tunes.org>
Branch: extradoc
Changeset: r5570:74cf69f6e2ac
Date: 2015-10-16 12:23 +0200
http://bitbucket.org/pypy/extradoc/changeset/74cf69f6e2ac/
Log: Draft blog about PPC
diff --git a/blog/draft/ppc-backend.rst b/blog/draft/ppc-backend.rst
new file mode 100644
--- /dev/null
+++ b/blog/draft/ppc-backend.rst
@@ -0,0 +1,118 @@
+Hi all,
+
+PyPy's JIT now supports the 64-bit PowerPC architecture! This is the
+third architecture supported, in addition to x86 (32 and 64) and ARM
+(32-bit only). More precisely, we support the big- and the
+little-endian variants of ppc64. Thanks to IBM for funding this work!
+
+The new JIT backend has been merged into "default". You should be able
+to translate PPC versions `as usual`__ directly on the machines. For
+the foreseeable future, I will compile and distribute binary versions
+corresponding to the official releases (for Fedora), but of course I'd
+welcome it if someone else could step in and do it. Also, it is unclear
+yet if we will run a buildbot.
+
+.. __: http://pypy.org/download.html#building-from-source
+
+To check that the result performs well, I logged in a ppc64le machine
+and ran the usual benchmark suite of PyPy (minus sqlitesynth: sqlite
+was not installed on that machine). I ran it twice at a difference of
+12 hours, as an attempt to reduce risks caused by other users suddenly
+using the machine. The machine was overall relatively quiet. Of
+course, this is scientifically not good enough; it is what I could come
+up with given the limited resources.
+
+Here are the results, where the numbers are speed-up factors between the
+non-jit and the jit version of PyPy. The first column is x86-64, for
+reference. The second and third columns are the two ppc64le runs. A
+few benchmarks are not reported here because the runner doesn't execute
+them on non-jit (however, apart from sqlitesynth, they all worked).
+
+::
+
+ ai 13.7342 16.1659 14.9091
+ bm_chameleon 8.5944 8.5858 8.66
+ bm_dulwich_log 5.1256 5.4368 5.5928
+ bm_krakatau 5.5201 2.3915 2.3452
+ bm_mako 8.4802 6.8937 6.9335
+ bm_mdp 2.0315 1.7162 1.9131
+ chaos 56.9705 57.2608 56.2374
+ sphinx
+ crypto_pyaes 62.505 80.149 79.7801
+ deltablue 3.3403 5.1199 4.7872
+ django 28.9829 23.206 23.47
+ eparse 2.3164 2.6281 2.589
+ fannkuch 9.1242 15.1768 11.3906
+ float 13.8145 17.2582 17.2451
+ genshi_text 16.4608 13.9398 13.7998
+ genshi_xml 8.2782 8.0879 9.2315
+ go 6.7458 11.8226 15.4183
+ hexiom2 24.3612 34.7991 33.4734
+ html5lib 5.4515 5.5186 5.365
+ json_bench 28.8774 29.5022 28.8897
+ meteor-contest 5.1518 5.6567 5.7514
+ nbody_modified 20.6138 22.5466 21.3992
+ pidigits 1.0118 1.022 1.0829
+ pyflate-fast 9.0684 10.0168 10.3119
+ pypy_interp 3.3977 3.9307 3.8798
+ raytrace-simple 69.0114 108.8875 127.1518
+ richards 94.1863 118.1257 102.1906
+ rietveld 3.2421 3.0126 3.1592
+ scimark_fft
+ scimark_lu
+ scimark_montecarlo
+ scimark_sor
+ scimark_sparsematmul
+ slowspitfire 2.8539 3.3924 3.5541
+ spambayes 5.0646 6.3446 6.237
+ spectral-norm 41.9148 42.1831 43.2913
+ spitfire 3.8788 4.8214 4.701
+ spitfire_cstringio 7.606 9.1809 9.1691
+ sqlitesynth
+ sympy_expand 2.9537 2.0705 1.9299
+ sympy_integrate 4.3805 4.3467 4.7052
+ sympy_str 1.5431 1.6248 1.5825
+ sympy_sum 6.2519 6.096 5.6643
+ telco 61.2416 54.7187 55.1705
+ trans2_annotate
+ trans2_rtype
+ trans2_backendopt
+ trans2_database
+ trans2_source
+ twisted_iteration 55.5019 51.5127 63.0592
+ twisted_names 8.2262 9.0062 10.306
+ twisted_pb 12.1134 13.644 12.1177
+ twisted_tcp 4.9778 1.934 5.4931
+
+ GEOMETRIC MEAN 9.31 9.70 10.01
+
+The last line reports the geometric mean of each column. We see that
+the goal was reached: PyPy's JIT actually improves performance by a
+factor of around 9.7 to 10 times on ppc64le. By comparison, it "only"
+improves performance by a factor 9.3 on Intel x86-64. I don't know why,
+but I'd guess it mostly means that a non-jitted PyPy performs slightly
+better on Intel than it does on PowerPC.
+
+Why is that? Actually, similar numbers are also higher on ARM than on
+Intel. We like to guess that on ARM, running the whole interpreter in
+PyPy takes up a lot of resources, e.g. the instruction cache, which the
+JIT's assembler doesn't need any more after the process is warmed up.
+This argument doesn't work for PowerPC, but there are other more subtle
+variants of it. Notably, Intel is doing crazy things about branch
+prediction, which likely helps a big interpreter---both the non-JITted
+PyPy and CPython, and both for the interpreter's main loop itself and
+for the numerous indirect branches that depend on the types of the
+objects. Moreover, on PowerPC I did notice that gcc itself is not
+perfect at optimization: during development of this backend, I often
+looked at assembler produced by gcc, and there are a number of
+inefficiencies there. All these are factors that slow down the
+non-JITted version of PyPy, but don't influence the speed of the
+assembler produced just-in-time.
+
+Anyway, this is just guessing. The fact remains that PyPy can now
+be used on PowerPC machines. Have fun!
+
+
+A bientot,
+
+Armin.
More information about the pypy-commit
mailing list