[pypy-dev] Very slow Julia example on PyPy/numpy - can you help me understand why it is slow?

Sat Feb 22 19:45:43 CET 2014

Hello Ian,

Le 20/02/14 20:40, Ian Ozsvald a écrit :
> Hi Armin. The point of the question was not to remove numpy but to
> understand the behaviour :-) I've already done a set of benchmarks
> with lists and with numpy, I've copied the results below. I'm using
> the same Julia code throughout (there's a note about the code below).
> PyPy on lists is indeed very compelling.
>
> One observation I've made of beginners (and I did the same) is that
> iterating over numpy arrays seems natural until you learn it is
> horribly slow. The you learn to vectorise. Some of the current tools
> handle the non-vectorised case really well and that's something I want
> to mention.
>
> For Julia I've used lists and numpy. Using a numpy list rather than an
> `array` makes sense as arrays won't hold a complex type (and messing
> with decomposing the complex elements into two arrays gets even
> sillier) and the example is still trivial for a reader to understand.
> numpy arrays (and Python arrays) are good because they use much less
> RAM than big lists. The reason why my example code above made lists
> and then turned them into numpy arrays...that's because I was lazy and
> hadn't finished tidying this demo (my bad!).

I agree that your code looks rather sensible (at least, to people who 
haven't internalised yet all the "stupid" implementation details 
concerning arrays, lists, iteration and vectorisation). So it's a bit of 
a shame that PyPy doesn't do better.

> I don't mind that my use of numpy is silly, I'm just curious to
> understand why pypynumpy diverges from the results of the other
> compiler technologies. The simple answer might be 'because pypynumpy
> is young' and that'd be fine - at least I'd have an answer if someone
> asks the question in my talk. If someone has more details, that'd be
> really interesting too. Is there a fundamental reason why pypynumpy
> couldn't do the example as fast as cython/numba/pythran?

To answer such questions, the best way is to use the jitviewer 
(https://bitbucket.org/pypy/jitviewer ). Looking at the trace for the 
inner loop, I can see every operation on a scalar triggers a dict lookup 
to obtain its dtype. This looks like self-inflicted pain coming the 
broken objspace abstraction rather than anything fundamental. Fixing 
that should improve speed by about an order of magnitude.

Cheers,
Ronan