[pypy-dev] NumPyPy vs NumPy

Papa, Florin florin.papa at intel.com
Thu Jul 28 09:05:34 EDT 2016


Hi Matti,

Thank you for your reply and for indicating additional numpy benchmarks. 

Please see below the results obtained with vectorization turned on. It seems that vectorization will significantly improve run time for some benchmarks (matrixmul, vectoradd, float2int).

Benchmark   CPython NumPy   PyPy NumPyPy    PyPy NumPy      PyPy NumPyPy vectorial
matrixmul           1       5.838852812     4.866947551     3.332052386
pointbypoint        1       4.922654347     0.981008211     4.917323386
numrand             1       2.478997019     1.082185897     2.486596082
rowmean             1       2.512893263     1.062233015     2.531627012
dsums               1       33.58240465     1.013388981     33.73959105
vectsum             1       1.738446611     0.771660704     1.651790546
cauchy              1       2.168377906     0.887388291     1.789566808
polarcoords         1       1.030962402     0.500905427     1.031192576
vectsort            1       2.214586698     0.973727924     2.205043894
arange              1       2.045342386     0.69941044      2.064583705
vectoradd           1       5.447667037     1.513217941     4.838760016
extractint          1       1.655717606     2.671712185     1.633729987
float2int           1       3.1688          0.905406988     2.764488512
insertzeros         1       2.375043445     1.037504453     2.145735211

>I would expect numpypy to shine in cases where there is heavy use of python together
>with numpy. Your benchmarks are at the other extreme; they demonstrate that our
>reimplementation of the numpy looping ufuncs is slower than C, but do not test the
>python-numpy interaction nor how well the JIT can optimize python code using numpy.
>For your tests Richard's suggestion of turning on vectorization may show a large
>improvement, as it brings numpypy's optimizations closer to the ones done by a good
>C compiler. But even so, it is impressive that without vectorization we are only 2-4
>times slower than the heavily vectorized c implementation, and that the cpyext emulation
>layer seems not to matter that much in your benchmarks.
>
>In general, timeit does a bad job for pypy benchmarks since it does not allow for warmup
>time and is geared to measure a minimum. Your data demonstrates some of the pitfalls of
>benchmarking - note that you show two very different results for your "cauchy" benchmark.
>You may want to check out the perf module http://perf.readthedocs.io for a more
>sophisticated way of running benchmarks or read https://arxiv.org/abs/1602.00602, which
>summarizes the problems benchmarking.

The benchmarks I wrote do seem to stress numpy, leaving out python or the cpyext emulation layer. I realize that this is not a realistic scenario for real life workloads, which is why I am interested in more representative workloads that have a high visibility and can emphasize the advantages of PyPy. I will look at the benchmarking links indicated to find more suitable workloads and benchmarking methodology.

Also, I corrected the "cauchy" issue, the first row was actually matrix multiplication.

>In order to continue this discussion, could you create a repository with these benchmarks
>and a set of instructions how to reproduce them? You do not say what platform you use,
>what machine you ran the tests on, whether you used MKL/BLAS, what versions of pypy and
>cpython you used, ... Once we have a conveniently reproducible way to have this conversation
>we may be able to make progress toward reaching some operative conclusions, but I'm not sure
>a mailing list is the best place these days.

Creating a public repository with the benchmarks can be a time consuming procedure due
to internal methodologies, but please find attached the benchmarks and the Python script
used to run them (num_perf.py, similar to perf.py in CPython's benchmark suite). In order to
obtain a csv file with the benchmark results, please follow these steps:

unzip numpy_benchmark.zip
cd numpy_benchmark
python num_perf.py -b all /path/to/python1 /path/to/python2

I do not seem to have MKL on my system, but I do have lapack/blas runtimes installed, as recommended
here [1]. I am running Ubuntu 16.04 LTS on an 18-core Intel(R) Xeon(R) (Haswell) CPU E5-2699 v3 @ 2.30GHz.

BIOS settings:      Intel Turbo Boost Technology: false
                    Hyper-Threading: false

OS configuration:   CPU freq set at fixed: 2.6GHz by
                        echo 2300000 > /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
                        echo 2300000 > /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
                    Address Space Layout Randomization (ASLR) disabled (to reduce run to run variation) by
                        echo 0 > /proc/sys/kernel/randomize_va_space

CPython version is 2.7.11+, the default that comes with Ubuntu 16.04 LTS. PyPy version is 5.3.1, I downloaded
an already compiled binary (7e8df3df9641, Jun 14 2016, 13:58:02).

We can continue this discussion any place you consider suitable (if the mailing list is not the place for this).

[1] https://bitbucket.org/pypy/numpy/overview


Regards,
Florin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: numpy_benchmark.zip
Type: application/x-zip-compressed
Size: 8132 bytes
Desc: numpy_benchmark.zip
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20160728/04339650/attachment-0001.bin>


More information about the pypy-dev mailing list