Dear list! I stumbled upon a problem and I was wondering if someone would by so kind and explain to me what is going on. The problem is sorting 2 000 000 points by `x` coordinate: from random import random from time import time class point(object): def __init__(self, x, y): self.x, self.y = x, y data = [point(random(), random()) for i in range(2000000)] t = time() data.sort(key=lambda p:p.x) print time() - t on my machine in runs 8.74s under PyPy 5.4.1 on MacOS. I then try to sort the points in JavaScript: var data = []; for (var i=0; i<2000000; i++) { data.push({x:Math.random(), y:Math.random()}); } console.time('sorting'); data.sort(function(a, b) { return a.x - b.x; }); console.timeEnd('sorting'); and it runs in 3.09s under V8 5.3. I was just wondering, why is it nearly 3x slower under PyPy than it is under V8? Is there any way I could make the code run faster? Thank you in advance!
Can you start by having a stable benchmark? Instead of calling random have a reverse counter to create your list in reverse order. That will at least get somewhat similar work on both languages (javascript random is notoriously bad at being random). Maybe the difference in performance is all from the difference in the distribution of the numbers. Also try bigger/smaller lists to see how they behave. (also warming up both jit engines might be necessary) On Mon, Sep 26, 2016 at 10:57 PM, Tuom Larsen <tuom.larsen@gmail.com> wrote:
Dear list!
I stumbled upon a problem and I was wondering if someone would by so kind and explain to me what is going on.
The problem is sorting 2 000 000 points by `x` coordinate:
from random import random from time import time
class point(object): def __init__(self, x, y): self.x, self.y = x, y
data = [point(random(), random()) for i in range(2000000)]
t = time() data.sort(key=lambda p:p.x) print time() - t
on my machine in runs 8.74s under PyPy 5.4.1 on MacOS. I then try to sort the points in JavaScript:
var data = []; for (var i=0; i<2000000; i++) { data.push({x:Math.random(), y:Math.random()}); }
console.time('sorting'); data.sort(function(a, b) { return a.x - b.x; }); console.timeEnd('sorting');
and it runs in 3.09s under V8 5.3.
I was just wondering, why is it nearly 3x slower under PyPy than it is under V8? Is there any way I could make the code run faster?
Thank you in advance! _______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
-- Leonardo Santagada
Thanks for the reply! I don't think a reverse counter is better because TimSort (used in PyPy?) is able to recognise the pattern [1] and do better than other kinds of sorts. And almost for a year [2] now, V8 implements random as XorShift128+ which very good. I also tried different sizes and the timings still show similar difference. [1] http://c2.com/cgi/wiki?TimSort [2] https://bugs.chromium.org/p/chromium/issues/detail?id=559024 On Mon, Oct 3, 2016 at 10:46 AM, Leonardo Santagada <santagada@gmail.com> wrote:
Can you start by having a stable benchmark? Instead of calling random have a reverse counter to create your list in reverse order. That will at least get somewhat similar work on both languages (javascript random is notoriously bad at being random).
Maybe the difference in performance is all from the difference in the distribution of the numbers. Also try bigger/smaller lists to see how they behave. (also warming up both jit engines might be necessary)
On Mon, Sep 26, 2016 at 10:57 PM, Tuom Larsen <tuom.larsen@gmail.com> wrote:
Dear list!
I stumbled upon a problem and I was wondering if someone would by so kind and explain to me what is going on.
The problem is sorting 2 000 000 points by `x` coordinate:
from random import random from time import time
class point(object): def __init__(self, x, y): self.x, self.y = x, y
data = [point(random(), random()) for i in range(2000000)]
t = time() data.sort(key=lambda p:p.x) print time() - t
on my machine in runs 8.74s under PyPy 5.4.1 on MacOS. I then try to sort the points in JavaScript:
var data = []; for (var i=0; i<2000000; i++) { data.push({x:Math.random(), y:Math.random()}); }
console.time('sorting'); data.sort(function(a, b) { return a.x - b.x; }); console.timeEnd('sorting');
and it runs in 3.09s under V8 5.3.
I was just wondering, why is it nearly 3x slower under PyPy than it is under V8? Is there any way I could make the code run faster?
Thank you in advance! _______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
--
Leonardo Santagada
Hi Tuom, The problem is the key=... argument to sort, which isn't optimized well. This kind of code is much faster: wrapped_data = [(p.x, p) for p in data] wrapped_data.sort() data = [it[1] for it in wrapped_data] We should fix this problem. I've created an issue so it doesn't get lost: https://bitbucket.org/pypy/pypy/issues/2410/listsort-key-is-slow Thanks for the report! Cheers, Carl Friedrich On 26/09/16 22:57, Tuom Larsen wrote:
Dear list!
I stumbled upon a problem and I was wondering if someone would by so kind and explain to me what is going on.
The problem is sorting 2 000 000 points by `x` coordinate:
from random import random from time import time
class point(object): def __init__(self, x, y): self.x, self.y = x, y
data = [point(random(), random()) for i in range(2000000)]
t = time() data.sort(key=lambda p:p.x) print time() - t
on my machine in runs 8.74s under PyPy 5.4.1 on MacOS. I then try to sort the points in JavaScript:
var data = []; for (var i=0; i<2000000; i++) { data.push({x:Math.random(), y:Math.random()}); }
console.time('sorting'); data.sort(function(a, b) { return a.x - b.x; }); console.timeEnd('sorting');
and it runs in 3.09s under V8 5.3.
I was just wondering, why is it nearly 3x slower under PyPy than it is under V8? Is there any way I could make the code run faster?
Thank you in advance! _______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
Hello Carl, thanks a lot for the clarification and for creating the ticket! On Mon, Oct 3, 2016 at 7:28 PM, Carl Friedrich Bolz <cfbolz@gmx.de> wrote:
Hi Tuom,
The problem is the key=... argument to sort, which isn't optimized well. This kind of code is much faster:
wrapped_data = [(p.x, p) for p in data] wrapped_data.sort() data = [it[1] for it in wrapped_data]
We should fix this problem. I've created an issue so it doesn't get lost:
https://bitbucket.org/pypy/pypy/issues/2410/listsort-key-is-slow
Thanks for the report!
Cheers,
Carl Friedrich
On 26/09/16 22:57, Tuom Larsen wrote:
Dear list!
I stumbled upon a problem and I was wondering if someone would by so kind and explain to me what is going on.
The problem is sorting 2 000 000 points by `x` coordinate:
from random import random from time import time
class point(object): def __init__(self, x, y): self.x, self.y = x, y
data = [point(random(), random()) for i in range(2000000)]
t = time() data.sort(key=lambda p:p.x) print time() - t
on my machine in runs 8.74s under PyPy 5.4.1 on MacOS. I then try to sort the points in JavaScript:
var data = []; for (var i=0; i<2000000; i++) { data.push({x:Math.random(), y:Math.random()}); }
console.time('sorting'); data.sort(function(a, b) { return a.x - b.x; }); console.timeEnd('sorting');
and it runs in 3.09s under V8 5.3.
I was just wondering, why is it nearly 3x slower under PyPy than it is under V8? Is there any way I could make the code run faster?
Thank you in advance! _______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
participants (3)
-
Carl Friedrich Bolz
-
Leonardo Santagada
-
Tuom Larsen