[pypy-dev] Updated 'High Performance Python' tutorial (the one from EuroPython 2011)

Mon Nov 7 19:04:02 CET 2011

Hello Ian,

On 25/07/11 11:00, Ian Ozsvald wrote:
> Dear all, I've published v0.2 of my High Performance Python tutorial
> write-up from the session I ran at EuroPython:
> http://ianozsvald.com/2011/07/25/high-performance-python-tutorial-v0-2-from-europython-2011/

today I and Armin investigated a bit more about the performances of the 
mandelbrot algorithm that you wrote for your tutorial.  What we found is very 
interesting :-).

We compared three versions of the code:

- a (slightly modified) pure python one on PyPy
- the Cython one using calculate_z.pyx_2_bettermath
- the shedskin one, using shedskin2.py

The PyPy version looks like this:

def calculate_z_serial_purepython(q, maxiter, z):
     """Pure python with complex datatype, iterating over list of q and z"""
     output = [0] * len(q)
     for i in range(len(q)):
         zi = z[i]
         qi = q[i]
         for iteration in range(maxiter):
             zi = zi * zi + qi
             if (zi.real*zi.real + zi.imag*zi.imag) > 4.0:
                 output[i] = iteration
                 break
     return output

i.e., it is exactly the same as pure_python_2.py, but we avoid to use abs(zi), 
so it is comparable with the cython and shedskin version.

First, we ran the programs to calculate passing "1000 1000" as arguments, and 
these are the results:

PyPy: 1.95 secs
Cython: 0.58 secs
Shedskin: 0.42 secs

so, PyPy is ~4.5x slower than Shedskin.

However, we realized that using the default values for x1,x2,y1,y2, the 
innermost loop runs very few iterations most of the time, and this is one case 
in which PyPy suffer most, because it needs to go through a bridge to continue 
the execution, and at the moment bridges are slower than loops.

So, we changed the values of x1,x2,y1,y2 to compute a different region, in 
which the innermost loop runs more frequently.  We used these values:
x1, x2, y1, y2 = 0.37865401-0.02, 0.37865401+0.02, 0.669227668-0.02, 
0.669227668+0.02

and since all programs are faster to compute the image, we used "3000 3000" as 
arguments from the command line.  These are the results:

PyPy: 0.89
Cython: 1.76
Shedskin: 0.26

So, in this case, PyPy is ~2x faster than Cython and ~3.5x slower than Shedskin.

In the meantime, Armin wrote a C version of it:
http://paste.pocoo.org/raw/504216/

which tooks 0.946 seconds to complete. This is in line with the PyPy's result, 
but we are still investigating why the shedskin's version is so much faster.

ciao,
Anto