Hello Ian, Le 20/02/14 20:40, Ian Ozsvald a écrit :
Hi Armin. The point of the question was not to remove numpy but to understand the behaviour :-) I've already done a set of benchmarks with lists and with numpy, I've copied the results below. I'm using the same Julia code throughout (there's a note about the code below). PyPy on lists is indeed very compelling.
One observation I've made of beginners (and I did the same) is that iterating over numpy arrays seems natural until you learn it is horribly slow. The you learn to vectorise. Some of the current tools handle the non-vectorised case really well and that's something I want to mention.
For Julia I've used lists and numpy. Using a numpy list rather than an `array` makes sense as arrays won't hold a complex type (and messing with decomposing the complex elements into two arrays gets even sillier) and the example is still trivial for a reader to understand. numpy arrays (and Python arrays) are good because they use much less RAM than big lists. The reason why my example code above made lists and then turned them into numpy arrays...that's because I was lazy and hadn't finished tidying this demo (my bad!).
I agree that your code looks rather sensible (at least, to people who haven't internalised yet all the "stupid" implementation details concerning arrays, lists, iteration and vectorisation). So it's a bit of a shame that PyPy doesn't do better.
I don't mind that my use of numpy is silly, I'm just curious to understand why pypynumpy diverges from the results of the other compiler technologies. The simple answer might be 'because pypynumpy is young' and that'd be fine - at least I'd have an answer if someone asks the question in my talk. If someone has more details, that'd be really interesting too. Is there a fundamental reason why pypynumpy couldn't do the example as fast as cython/numba/pythran?
To answer such questions, the best way is to use the jitviewer (https://bitbucket.org/pypy/jitviewer ). Looking at the trace for the inner loop, I can see every operation on a scalar triggers a dict lookup to obtain its dtype. This looks like self-inflicted pain coming the broken objspace abstraction rather than anything fundamental. Fixing that should improve speed by about an order of magnitude. Cheers, Ronan