[pypy-dev] Very slow Julia example on PyPy/numpy - can you help me understand why it is slow?

Ian Ozsvald ian at ianozsvald.com
Thu Feb 20 12:53:49 CET 2014


Hello all. I'm talking at PyData London this Saturday on the 'high
performance landscape', pypy and pypy/numpy get discussed. PyPy and
lists works great on my Julia example, the pypy/numpy version is very
poor with respect to other solutions (cython, pythran, numba).

Can you help me understand what's wrong? I'd like to give a sensible
answer to the audience. I'll also note in the talk that this result is
still 10* faster than Python+numpy used the same way.

Sidenote - on a separate example (for my book) using Monte Carlo Pi
estimation with a 1e7 vector, pypy/numpy is 2* slower than
python/numpy (which obviously isn't bad), so I'm fairly confident my
pypy/numpy code is working correctly. This example uses a vectorised
operation.

I'm using a fresh daily build of pypy and a fresh checkout of numpy.

Here's the main function, I'll copy the full example down below:
def calculate_z(maxiter, zs, cs, output):
    """Calculate output list using Julia update rule"""
    for i in range(len(zs)):  # range, xrange - same speed
        n = 0
        z = zs[i]
        c = cs[i]
        # expanding the math make it 2 seconds slower
        #while n < maxiter and (z.real * z.real + z.imag * z.imag) < 4:
        # using a for loop in place of while makes it 1 second slower
        # using direct array references (removing z,c) does not change speed
        while n < maxiter and abs(z) < 2:
            z = z * z + c
            n += 1
        output[i] = n

I pass in 2 np arrays as complex128 (zs has co-ords, cs is a constant
- this is a consistent example in my book which looks at cpu and
memory costs etc). I iterate with a while loop doing some simple math.

Cython+numpy takes 0.19s. (similar results for numba, pythran)
PyPy+numpy takes 5s.
Python+numpy takes 54s (due to dereference cost on each array access)

Is the problem related to dereferencing? I'm not doing a vectorised
operation for this example. I know that dereferencing each element
using Python+numpy is horribly slow.

Ian.
Full code:
"""Julia set generator without optional PIL-based image drawing"""
import time
import numpy as np

# area of complex space to investigate
x1, x2, y1, y2 = -1.8, 1.8, -1.8, 1.8
c_real, c_imag = -0.62772, -.42193


def calculate_z(maxiter, zs, cs, output):
    """Calculate output list using Julia update rule"""
    for i in range(len(zs)):  # range, xrange - same speed
        n = 0
        z = zs[i]
        c = cs[i]
        # expanding the math make it 2 seconds slower
        #while n < maxiter and (z.real * z.real + z.imag * z.imag) < 4:
        # using a for loop in place of while makes it 1 second slower
        # using direct array references (removing z,c) does not change speed
        while n < maxiter and abs(z) < 2:
            z = z * z + c
            n += 1
        output[i] = n


def calc_pure_python(desired_width, max_iterations):
    """Create a list of complex co-ordinates (zs) and complex
parameters (cs), build Julia set and display"""
    x_step = (float(x2 - x1) / float(desired_width))
    y_step = (float(y1 - y2) / float(desired_width))
    x = []
    y = []
    ycoord = y2
    while ycoord > y1:
        y.append(ycoord)
        ycoord += y_step
    xcoord = x1
    while xcoord < x2:
        x.append(xcoord)
        xcoord += x_step
    # build a list of co-ordinates and the initial condition for each cell.
    # Note that our initial condition is a constant and could easily be removed,
    # we use it to simulate a real-world scenario with several inputs
to our function
    zs = []
    cs = []
    for ycoord in y:
        for xcoord in x:
            zs.append(complex(xcoord, ycoord))
            cs.append(complex(c_real, c_imag))

    zs_np = np.array(zs, np.complex128)
    cs_np = np.array(cs, np.complex128)

    print "Length of x:", len(x)
    print "Total elements:", len(zs)
    start_time = time.time()
    output = np.empty(len(zs), dtype=np.int64)
    calculate_z(max_iterations, zs_np, cs_np, output)
    end_time = time.time()
    secs = end_time - start_time
    print "Took", secs, "seconds"

    validation_sum = sum(output)
    print "Total sum of elements (for validation):", validation_sum


# Calculate the Julia set using a pure Python solution with
# reasonable defaults for a laptop
calc_pure_python(desired_width=1000, max_iterations=300)


-- 
Ian Ozsvald (A.I. researcher)
ian at IanOzsvald.com

http://IanOzsvald.com
http://MorConsulting.com/
http://Annotate.IO
http://SocialTiesApp.com/
http://TheScreencastingHandbook.com
http://FivePoundApp.com/
http://twitter.com/IanOzsvald
http://ShowMeDo.com


More information about the pypy-dev mailing list