Mailman 3 Sorting structs - pypy-dev - python.org

newer
improve error message when missing...

Sorting structs

older
NumPy or NumPyPy

Tuom Larsen

Sept. 26, 2016

8:57 p.m.

Dear list! I stumbled upon a problem and I was wondering if someone would by so kind and explain to me what is going on. The problem is sorting 2 000 000 points by `x` coordinate: from random import random from time import time class point(object): def __init__(self, x, y): self.x, self.y = x, y data = [point(random(), random()) for i in range(2000000)] t = time() data.sort(key=lambda p:p.x) print time() - t on my machine in runs 8.74s under PyPy 5.4.1 on MacOS. I then try to sort the points in JavaScript: var data = []; for (var i=0; i<2000000; i++) { data.push({x:Math.random(), y:Math.random()}); } console.time('sorting'); data.sort(function(a, b) { return a.x - b.x; }); console.timeEnd('sorting'); and it runs in 3.09s under V8 5.3. I was just wondering, why is it nearly 3x slower under PyPy than it is under V8? Is there any way I could make the code run faster? Thank you in advance!

Reply

Sign in to reply online Use email software

Show replies by date

Leonardo Santagada

October 2016

8:46 a.m.

Can you start by having a stable benchmark? Instead of calling random have a reverse counter to create your list in reverse order. That will at least get somewhat similar work on both languages (javascript random is notoriously bad at being random). Maybe the difference in performance is all from the difference in the distribution of the numbers. Also try bigger/smaller lists to see how they behave. (also warming up both jit engines might be necessary) On Mon, Sep 26, 2016 at 10:57 PM, Tuom Larsen <tuom.larsen@gmail.com> wrote:

-- Leonardo Santagada

Reply

Sign in to reply online Use email software

Tuom Larsen

1:53 p.m.

Thanks for the reply! I don't think a reverse counter is better because TimSort (used in PyPy?) is able to recognise the pattern [1] and do better than other kinds of sorts. And almost for a year [2] now, V8 implements random as XorShift128+ which very good. I also tried different sizes and the timings still show similar difference. [1] http://c2.com/cgi/wiki?TimSort [2] https://bugs.chromium.org/p/chromium/issues/detail?id=559024 On Mon, Oct 3, 2016 at 10:46 AM, Leonardo Santagada <santagada@gmail.com> wrote:

Reply

Sign in to reply online Use email software

Carl Friedrich Bolz

5:28 p.m.

Hi Tuom, The problem is the key=... argument to sort, which isn't optimized well. This kind of code is much faster: wrapped_data = [(p.x, p) for p in data] wrapped_data.sort() data = [it[1] for it in wrapped_data] We should fix this problem. I've created an issue so it doesn't get lost: https://bitbucket.org/pypy/pypy/issues/2410/listsort-key-is-slow Thanks for the report! Cheers, Carl Friedrich On 26/09/16 22:57, Tuom Larsen wrote:

Reply

Sign in to reply online Use email software

Tuom Larsen

9:49 p.m.

Hello Carl, thanks a lot for the clarification and for creating the ticket! On Mon, Oct 3, 2016 at 7:28 PM, Carl Friedrich Bolz <cfbolz@gmx.de> wrote:

Reply

Sign in to reply online Use email software

Leonardo Santagada

October 2016

8:46 a.m.

Can you start by having a stable benchmark? Instead of calling random have a reverse counter to create your list in reverse order. That will at least get somewhat similar work on both languages (javascript random is notoriously bad at being random). Maybe the difference in performance is all from the difference in the distribution of the numbers. Also try bigger/smaller lists to see how they behave. (also warming up both jit engines might be necessary) On Mon, Sep 26, 2016 at 10:57 PM, Tuom Larsen <tuom.larsen@gmail.com> wrote:

-- Leonardo Santagada

Reply

Sign in to reply online Use email software

Tuom Larsen

1:53 p.m.

Thanks for the reply! I don't think a reverse counter is better because TimSort (used in PyPy?) is able to recognise the pattern [1] and do better than other kinds of sorts. And almost for a year [2] now, V8 implements random as XorShift128+ which very good. I also tried different sizes and the timings still show similar difference. [1] http://c2.com/cgi/wiki?TimSort [2] https://bugs.chromium.org/p/chromium/issues/detail?id=559024 On Mon, Oct 3, 2016 at 10:46 AM, Leonardo Santagada <santagada@gmail.com> wrote:

Reply

Sign in to reply online Use email software

Carl Friedrich Bolz

5:28 p.m.

Hi Tuom, The problem is the key=... argument to sort, which isn't optimized well. This kind of code is much faster: wrapped_data = [(p.x, p) for p in data] wrapped_data.sort() data = [it[1] for it in wrapped_data] We should fix this problem. I've created an issue so it doesn't get lost: https://bitbucket.org/pypy/pypy/issues/2410/listsort-key-is-slow Thanks for the report! Cheers, Carl Friedrich On 26/09/16 22:57, Tuom Larsen wrote:

Reply

Sign in to reply online Use email software

Tuom Larsen

9:49 p.m.

Hello Carl, thanks a lot for the clarification and for creating the ticket! On Mon, Oct 3, 2016 at 7:28 PM, Carl Friedrich Bolz <cfbolz@gmx.de> wrote:

Reply

Sign in to reply online Use email software

3089

Age (days ago)

3096

Last active (days ago)

Download

4 comments

3 participants

tags

participants (3)

Carl Friedrich Bolz
Leonardo Santagada
Tuom Larsen