
Dear list! I stumbled upon a problem and I was wondering if someone would by so kind and explain to me what is going on. The problem is sorting 2 000 000 points by `x` coordinate: from random import random from time import time class point(object): def __init__(self, x, y): self.x, self.y = x, y data = [point(random(), random()) for i in range(2000000)] t = time() data.sort(key=lambda p:p.x) print time() - t on my machine in runs 8.74s under PyPy 5.4.1 on MacOS. I then try to sort the points in JavaScript: var data = []; for (var i=0; i<2000000; i++) { data.push({x:Math.random(), y:Math.random()}); } console.time('sorting'); data.sort(function(a, b) { return a.x - b.x; }); console.timeEnd('sorting'); and it runs in 3.09s under V8 5.3. I was just wondering, why is it nearly 3x slower under PyPy than it is under V8? Is there any way I could make the code run faster? Thank you in advance!

Can you start by having a stable benchmark? Instead of calling random have a reverse counter to create your list in reverse order. That will at least get somewhat similar work on both languages (javascript random is notoriously bad at being random). Maybe the difference in performance is all from the difference in the distribution of the numbers. Also try bigger/smaller lists to see how they behave. (also warming up both jit engines might be necessary) On Mon, Sep 26, 2016 at 10:57 PM, Tuom Larsen <tuom.larsen@gmail.com> wrote:
-- Leonardo Santagada

Thanks for the reply! I don't think a reverse counter is better because TimSort (used in PyPy?) is able to recognise the pattern [1] and do better than other kinds of sorts. And almost for a year [2] now, V8 implements random as XorShift128+ which very good. I also tried different sizes and the timings still show similar difference. [1] http://c2.com/cgi/wiki?TimSort [2] https://bugs.chromium.org/p/chromium/issues/detail?id=559024 On Mon, Oct 3, 2016 at 10:46 AM, Leonardo Santagada <santagada@gmail.com> wrote:

Hi Tuom, The problem is the key=... argument to sort, which isn't optimized well. This kind of code is much faster: wrapped_data = [(p.x, p) for p in data] wrapped_data.sort() data = [it[1] for it in wrapped_data] We should fix this problem. I've created an issue so it doesn't get lost: https://bitbucket.org/pypy/pypy/issues/2410/listsort-key-is-slow Thanks for the report! Cheers, Carl Friedrich On 26/09/16 22:57, Tuom Larsen wrote:

Can you start by having a stable benchmark? Instead of calling random have a reverse counter to create your list in reverse order. That will at least get somewhat similar work on both languages (javascript random is notoriously bad at being random). Maybe the difference in performance is all from the difference in the distribution of the numbers. Also try bigger/smaller lists to see how they behave. (also warming up both jit engines might be necessary) On Mon, Sep 26, 2016 at 10:57 PM, Tuom Larsen <tuom.larsen@gmail.com> wrote:
-- Leonardo Santagada

Thanks for the reply! I don't think a reverse counter is better because TimSort (used in PyPy?) is able to recognise the pattern [1] and do better than other kinds of sorts. And almost for a year [2] now, V8 implements random as XorShift128+ which very good. I also tried different sizes and the timings still show similar difference. [1] http://c2.com/cgi/wiki?TimSort [2] https://bugs.chromium.org/p/chromium/issues/detail?id=559024 On Mon, Oct 3, 2016 at 10:46 AM, Leonardo Santagada <santagada@gmail.com> wrote:

Hi Tuom, The problem is the key=... argument to sort, which isn't optimized well. This kind of code is much faster: wrapped_data = [(p.x, p) for p in data] wrapped_data.sort() data = [it[1] for it in wrapped_data] We should fix this problem. I've created an issue so it doesn't get lost: https://bitbucket.org/pypy/pypy/issues/2410/listsort-key-is-slow Thanks for the report! Cheers, Carl Friedrich On 26/09/16 22:57, Tuom Larsen wrote:
participants (3)
-
Carl Friedrich Bolz
-
Leonardo Santagada
-
Tuom Larsen