[Chicago] threading is slow
Oren Livne
livne at uchicago.edu
Thu Mar 7 16:43:08 CET 2013
This makes sense, thanks. I am happy even with a 10x speedup. Now, I am
not sure how to pass into the processing function (which I map() over a
range of values) a read-only global variable. In my real program, when I
added it to the arguments, my computer hangs. So I tried a simpler case.
Even in the example below, I still get a speedup by using multiple
threads, but simply passing in the large numpy array made each call 20
times slower. Is that an unrelated problem, that the array is passed by
value instead of by reference? Also, for my complex object I get a
synchronization exception : NoneType found, str expected from
multiprocessing.
Thanks!
Oren
'''
============================================================
http://stackoverflow.com/questions/4413821/multiprocessing-pool-example
Created on Mar 6, 2013
@author: Oren Livne <livne at uchicago.edu>
============================================================
'''
from multiprocessing import Pool
from time import time
import numpy as np
K = 200000
def CostlyFunction((z, problem)):
r = 0
for k in problem:
r += z ** (1 / k ** 1.5)
return r
if __name__ == "__main__":
currtime = time()
N = 10
problem = np.arange(1, K + 2)
w = sum(map(CostlyFunction, ((i, problem) for i in xrange(N))))
t = t = time() - currtime
print 'Serial : time elapsed: %.2f, result = %f' % (t, w)
for p in [1, 2, 4]: #[1, 2, 4, 8, 16, 24, 30]:#2 ** np.arange(4):
currtime = time()
po = Pool(processes=p)
res = po.map_async(CostlyFunction, ((i, problem) for i in
xrange(N)))
w = sum(res.get())
tp = time() - currtime
print '%2d procs: time elapsed: %.2f (%.1fx), result = %f' %
(p, tp, t / tp, w)
On 3/7/2013 6:34 AM, Martin Maney wrote:
> On Thu, Mar 07, 2013 at 06:16:14AM -0600, Oren Livne wrote:
>> For a purely computational task, multiprocessing seems to give twice
>> smaller speedup than the # processors in the machine: 2x for 4-proc
>> and 10x for 24-proc. Is that normal?
> Hmmmm. If half the "processors" are hyperthreads, then I think that
> would be as expected - they can actually reduce throughput pretty
> easily by increasing contention for shared resources - cache lines,
> memory transfers, ...
>
> For a computationally bound task, you shouldn't expect much improvement
> beyond the number of physical cores in the system. And then there are
> all those twisty, problem-specific things, like arranging the data so
> that work units that may go to different cores don't cause excessive
> cache-line bouncing, etc.
>
> It's not directly relevant, but this article talks about most of these
> issues and, usefully, compares the right and wrong ("1975 programming")
> ways of getting performance out fo the hardware:
>
> https://www.varnish-cache.org/trac/wiki/ArchitectNotes
>
--
A person is just about as big as the things that make him angry.
More information about the Chicago
mailing list