[Chicago] threading is slow

Thu Mar 7 16:51:52 CET 2013

You are getting into the mechanics of fork. It can't be a read only global
variable since it is being read in another process. When you pass it in, it
is most likely being copied.

On Thu, Mar 7, 2013 at 8:43 AM, Oren Livne <livne at uchicago.edu> wrote:

> This makes sense, thanks. I am happy even with a 10x speedup. Now, I am
> not sure how to pass into the processing function (which I map() over a
> range of values) a read-only global variable. In my real program, when I
> added it to the arguments, my computer hangs. So I tried a simpler case.
> Even in the example below, I still get a speedup by using multiple threads,
> but simply passing in the large numpy array made each call 20 times slower.
> Is that an unrelated problem, that the array is passed by value instead of
> by reference? Also, for my complex object I get a synchronization exception
> : NoneType found, str expected from multiprocessing.
>
> Thanks!
> Oren
>
>
> '''
> ==============================**==============================
> http://stackoverflow.com/**questions/4413821/**
> multiprocessing-pool-example<http://stackoverflow.com/questions/4413821/multiprocessing-pool-example>
>
> Created on Mar 6, 2013
> @author: Oren Livne <livne at uchicago.edu>
> ==============================**==============================
> '''
> from multiprocessing import Pool
> from time import time
> import numpy as np
>
> K = 200000
> def CostlyFunction((z, problem)):
>     r = 0
>     for k in problem:
>
>         r += z ** (1 / k ** 1.5)
>     return r
>
> if __name__ == "__main__":
>     currtime = time()
>     N = 10
>     problem = np.arange(1, K + 2)
>     w = sum(map(CostlyFunction, ((i, problem) for i in xrange(N))))
>
>     t = t = time() - currtime
>     print 'Serial  : time elapsed: %.2f, result = %f' % (t, w)
>
>     for p in [1, 2, 4]: #[1, 2, 4, 8, 16, 24, 30]:#2 ** np.arange(4):
>
>         currtime = time()
>         po = Pool(processes=p)
>         res = po.map_async(CostlyFunction, ((i, problem) for i in
> xrange(N)))
>
>         w = sum(res.get())
>         tp = time() - currtime
>         print '%2d procs: time elapsed: %.2f (%.1fx), result = %f' % (p,
> tp, t / tp, w)
>
> On 3/7/2013 6:34 AM, Martin Maney wrote:
>
>> On Thu, Mar 07, 2013 at 06:16:14AM -0600, Oren Livne wrote:
>>
>>> For a purely computational task, multiprocessing seems to give twice
>>> smaller speedup than the # processors in the machine: 2x for 4-proc
>>> and 10x for 24-proc. Is that normal?
>>>
>> Hmmmm.  If half the "processors" are hyperthreads, then I think that
>> would be as expected - they can actually reduce throughput pretty
>> easily by increasing contention for shared resources - cache lines,
>> memory transfers, ...
>>
>> For a computationally bound task, you shouldn't expect much improvement
>> beyond the number of physical cores in the system.  And then there are
>> all those twisty, problem-specific things, like arranging the data so
>> that work units that may go to different cores don't cause excessive
>> cache-line bouncing, etc.
>>
>> It's not directly relevant, but this article talks about most of these
>> issues and, usefully, compares the right and wrong ("1975 programming")
>> ways of getting performance out fo the hardware:
>>
>>    https://www.varnish-cache.org/**trac/wiki/ArchitectNotes<https://www.varnish-cache.org/trac/wiki/ArchitectNotes>
>>
>>
>
> --
> A person is just about as big as the things that make him angry.
>
> ______________________________**_________________
> Chicago mailing list
> Chicago at python.org
> http://mail.python.org/**mailman/listinfo/chicago<http://mail.python.org/mailman/listinfo/chicago>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/chicago/attachments/20130307/e9f8855d/attachment.html>