[CentralOH] embarrassingly parallel loops question

Joe Shaw joe at joeshaw.org
Fri Jul 29 14:35:36 EDT 2016


Hi,

If you're on Linux one thing you might want to try is the perf tool.  That
might give you a sense if the overhead is in the Python runtime, or if page
faults or syscalls are the bottleneck.  If you think it might be in the
Python code itself, running with the built-in profiler might be helpful.

I have no idea how either of these interact with multiprocessing, however.

Lastly, cache line locality is a big deal.  I don't know to what extent you
can optimize memory layout in Python programs (maybe with numpy?) but if
you can get data in contiguous memory you will greatly improve your L1/L2
cache hit rate and the CPU won't have to go to (comparatively much slower)
RAM.

Joe

On Fri, Jul 29, 2016 at 1:44 PM, Samuel <sriveravi at gmail.com> wrote:

> Hello Group,
>
> So I have this embarrassingly parallel number crunching i'm trying to do
> in a for loop.  Each iteration there is some crunching that is independent
> of all other iterations, so I was able to set this up pretty easy using a
> multiprocessing pool.  (Side detail, each iteration depends on some common
> data structures that I make global and gives me the fastest cruch time
> versus passing to each thread explicitly).  Takes about 30ms to run:
>
>
> import multiprocessing
> pool = multiprocessing.Pool( numCores)
> results = pool.map( crunchFunctionIter, xrange(len(setN)))
>
>
> Running on 1 core, tiny slowdown (~5ms overhead, ~35 ms to run)
> Running on 2 cores I get about a 2x speedup which is great and expected (
> ~18ms to run).
> But the speedup saturates there and I can't get more juice even when
> upping to 4 or 6 cores.
>
> The thing is, all iterations are pretty much independent so I don't see
> why in theory I don't get close to a linear speedup.  Or at least an (N-1)
> speedup.  My guess is there is something weird with the memory sharing that
> is causing unnecessary overhead.  Another colleague doing a similar
> embarrassingly parallel problem saw the same saturation at about 2 cores.
>
> Any thoughts on what is going on, or what I need to do to make this
> embarrassingly parallel thing speedup linearly?  Should I just use a
> different library and set up my data structures a different way?
>
> Thanks,
> Sam
>
> _______________________________________________
> CentralOH mailing list
> CentralOH at python.org
> https://mail.python.org/mailman/listinfo/centraloh
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/centraloh/attachments/20160729/da253367/attachment.html>


More information about the CentralOH mailing list