[pypy-dev] Helping with STM at the PyCon 2013 (Santa Clara) sprints

Mon Feb 18 00:38:02 CET 2013

That's great, thanks! I did get it to work when you wrote earlier, but
it's definitely faster now.

I tried a ridiculously simple and no-conflict parallel program and
came up with this, which gave me some questionable performance numbers
from a build of 65ec96e15463:

taavi at pypy:~/pypy/pypy/goal$ ./pypy-c -m timeit -s 'import
transaction; transaction.set_num_threads(1)' '
def foo():
   x = 0
   for y in range(100000):
       x += y
transaction.add(foo)
transaction.add(foo)
transaction.run()'
10 loops, best of 3: 198 msec per loop

taavi at pypy:~/pypy/pypy/goal$ ./pypy-c -m timeit -s 'import
transaction; transaction.set_num_threads(2)' '
def foo():
   x = 0
   for y in range(100000):
       x += y
transaction.add(foo)
transaction.add(foo)
transaction.run()'
10 loops, best of 3: 415 msec per loop

It's entirely possible that this is an effect of running inside a
VMWare guest (set to use 2 cores) running on my Core2Duo laptop. If
this is the case, I'll refrain from trying to do anything remotely
like benchmarking in this environment in the future. :)

Would it be more helpful (if I want to contribute to STM) to use
something like a high-CPU EC2 instance, or should I look at obtaining
something like an 8-real-core AMD X8?

(my venerable X2 has started to disagree with its RAM, so it's prime
for retirement)

Thanks!

On Sun, Feb 17, 2013 at 3:58 AM, Armin Rigo <arigo at tunes.org> wrote:
> Hi Taavi,
>
> I finally fixed pypy-stm with signals.  Now I'm getting again results
> that scale with the number of processors.
>
> Note that it stops scaling up at some point, around 4 or 6 threads, on
> machines I tried it on.  I suspect it's related to the fact that
> physical processors have 4 or 6 cores internally, but the results are
> still a bit inconsistent.  Using the "taskset" command to force the
> threads to run on particular physical sockets seems to help a little
> bit with some numbers.  Fwiw, I got the maximum throughput on a
> 24-cores machine by really running 24 threads, but that seems
> wasteful, as it is only 25% better than running 6 threads on one
> physical socket.
>
> The next step will be trying to reduce the overhead, currently
> considerable (about 10x slower than CPython, too much to ever have any
> net benefit).  Also high on the list is fixing the constant memory
> leak (i.e. implementing major garbage collection steps).
>
>
> A bientôt,
>
> Armin.

--
taa
/*eof*/