[Numpy-discussion] multiprocessing shared arrays and numpy
faltet at pytables.org
Fri Mar 5 10:22:07 EST 2010
A Friday 05 March 2010 14:46:00 Gael Varoquaux escrigué:
> On Fri, Mar 05, 2010 at 08:14:51AM -0500, Francesc Alted wrote:
> > > FWIW, I observe very good speedups on my problems (pretty much linear
> > > in the number of CPUs), and I have data parallel problems on fairly
> > > large data (~100Mo a piece, doesn't fit in cache), with no
> > > synchronisation at all between the workers. CPUs are Intel Xeons.
> > Maybe your processes are not as memory-bound as you think.
> That's the only explaination that I can think of. I have two types of
> bottlenecks. One is blas level 3 operations (mainly SVDs) on large
> matrices, the second is resampling, where are repeat the same operation
> many times over almost the same chunk of data. In both cases the data is
> fairly large, so I expected the operations to be memory bound.
Not at all. BLAS 3 operations are mainly CPU-bounded, because algorithms (if
they are correctly implemented, of course, but any decent BLAS 3 library will
do) have many chances to reuse data from caches. BLAS 1 (and lately 2 too)
are the ones that are memory-bound.
And in your second case, you are repeating the same operation over the same
chunk of data. If this chunk is small enough to fit in cache, then the
bottleneck is CPU again (and probably access to L1/L2 cache), and not access
to memory. But if, as you said, you are seeing periods that are memory-
bounded (i.e. CPUs are starving), then it may well be that this chunksize does
not fit well in cache, and then your problem is memory access for this case.
Maybe you can get better performance by reducing your chunksize so that it
fits in cache (L1 or L2).
So, I do not think that NUMA architectures would perform your current
computations any better than your current SMP platform (and you know that NUMA
architectures are much more complex and expensive than SMP ones). But
experimenting is *always* the best answer to these hairy questions ;-)
More information about the NumPy-Discussion