[pypy-dev] Very slow Julia example on PyPy/numpy - can you help me understand why it is slow?

Thu Feb 20 21:40:41 CET 2014

Hi Armin. The point of the question was not to remove numpy but to
understand the behaviour :-) I've already done a set of benchmarks
with lists and with numpy, I've copied the results below. I'm using
the same Julia code throughout (there's a note about the code below).
PyPy on lists is indeed very compelling.

One observation I've made of beginners (and I did the same) is that
iterating over numpy arrays seems natural until you learn it is
horribly slow. The you learn to vectorise. Some of the current tools
handle the non-vectorised case really well and that's something I want
to mention.

For Julia I've used lists and numpy. Using a numpy list rather than an
`array` makes sense as arrays won't hold a complex type (and messing
with decomposing the complex elements into two arrays gets even
sillier) and the example is still trivial for a reader to understand.
numpy arrays (and Python arrays) are good because they use much less
RAM than big lists. The reason why my example code above made lists
and then turned them into numpy arrays...that's because I was lazy and
hadn't finished tidying this demo (my bad!).

My results on the Julia code (4 core i7) for reference:

lists:
CPython 12s
Pythran 0.4s  # 1 line of annotation
PyPy 0.3s  # no changes required, my recommendation in my talk
Cython 0.2s  # lots of annotation required
ShedSkin 0.22s  # basically the same result as Cython plus overhead of
a 1e6 element memory copy

numpy:
(OMP = OpenMP with dynamic scheduling)
PyPy 5s
Numba 0.4s (I couldn't find prange support in v0.12)
Cython 0.16s
Pythran OMP 0.1s
Cython OMP 0.05s

I don't mind that my use of numpy is silly, I'm just curious to
understand why pypynumpy diverges from the results of the other
compiler technologies. The simple answer might be 'because pypynumpy
is young' and that'd be fine - at least I'd have an answer if someone
asks the question in my talk. If someone has more details, that'd be
really interesting too. Is there a fundamental reason why pypynumpy
couldn't do the example as fast as cython/numba/pythran?

Cheers,
i.

On 20 February 2014 18:26, Armin Rigo <arigo at tunes.org> wrote:
> Hi Ian,
>
> On 20 February 2014 12:53, Ian Ozsvald <ian at ianozsvald.com> wrote:
>> def calculate_z(maxiter, zs, cs, output):
>>     """Calculate output list using Julia update rule"""
>
> This particular example uses numpy is a very strange and useless way,
> as I'm sure you know.  It builds a regular list of 1'000'000 items;
> then it converts it to a pair of numpy arrays; then it iterates over
> the array.  It's obviously better to just iterate over the original
> list (also in CPython).  But I know it's not really the point; the
> point is rather that numpy is slow with PyPy, slower than we would
> expect.  This is known, basically, but a good reminder that we need to
> look at it from the performance point of view.  So far, we focused on
> completeness.
>
> [ Just for reference, I killed numpy from your example and got a 4x
> speed-up (down from 5s to 1.25s).  Afterwards, I expanded the math:
>
>>         # expanding the math make it 2 seconds slower
>>         #while n < maxiter and (z.real * z.real + z.imag * z.imag) < 4:
>
> which is good in theory because abs() requires a square root.  As it
> turns out, it is very good indeed.  This results in another 5x
> speed-up, to 0.25s.  This is close enough from Cython speed (which is
> probably mostly gcc's speed in this example) that I'd say we are done.
> ]
>
>
> A bientôt,
>
> Armin.

-- 
Ian Ozsvald (A.I. researcher)
ian at IanOzsvald.com

http://IanOzsvald.com
http://MorConsulting.com/
http://Annotate.IO
http://SocialTiesApp.com/
http://TheScreencastingHandbook.com
http://FivePoundApp.com/
http://twitter.com/IanOzsvald
http://ShowMeDo.com