[Numpy-discussion] multiprocessing shared arrays and numpy
faltet at pytables.org
Fri Mar 5 08:14:51 EST 2010
On Fri, Mar 05, 2010 at 10:51:12AM +0100, Gael Varoquaux wrote:
> On Fri, Mar 05, 2010 at 09:53:02AM +0100, Francesc Alted wrote:
> > Yeah, 10% of improvement by using multi-cores is an expected figure for
> > memory bound problems. This is something people must know: if their
> > computations are memory bound (and this is much more common that one
> > may initially think), then they should not expect significant speed-ups
> > on their parallel codes.
> Hey Francesc,
> Any chance this can be different for NUMA (non uniform memory access)
> architectures? AMD multicores used to be NUMA, when I was still following
> these problems.
As far as I can tell, NUMA architectures work better accelerating
independent processes that run independently one of each other. In
this case, hardware is in charge of putting closely-related data in
memory that is 'nearer' to each processor. This scenario *could*
happen in truly parallel process too, but as I said, in general it
works best for independent processes (read multiuser machines).
> FWIW, I observe very good speedups on my problems (pretty much linear in
> the number of CPUs), and I have data parallel problems on fairly large
> data (~100Mo a piece, doesn't fit in cache), with no synchronisation at
> all between the workers. CPUs are Intel Xeons.
Maybe your processes are not as memory-bound as you think. Do you get
much better speed-up by using NUMA than a simple multi-core machine
with one single path to memory? I don't think so, but maybe I'm wrong
More information about the NumPy-Discussion