[Numpy-discussion] numpy.random and multiprocessing

Sturla Molden sturla at molden.no
Thu Dec 11 13:04:21 EST 2008


On 12/11/2008 6:29 PM, David Cournapeau wrote:

> def task(x):
>     np.random.seed()
>     return np.random.random(x)
> 
> But does this really make sense ?

Hard to say... There is a chance of this producing indentical or 
overlapping sequences, albeit unlikely. I would not do this. I'd make 
one process responsible for making the random numbers and write those to 
a queue. It would scale if generating the deviates is the least costly 
part of the algorithm.

Sturla Molden



=== test.py ===
from test_helper import task, generator
from multiprocessing import Pool, Process, Queue


q = Queue(maxsize=32)   # or whatever
g = Process(args=(4,q)) # preferably a number much larger than 4!!!
g.start()

p = Pool(4)

jobs = list()
for i in range(4):
     jobs.append(p.apply_async(task, (q,)))

print [j.get() for j in jobs]

p.close()
p.join()
g.terminate()

=== test_helper.py ===
import numpy as np

def generator(x, q):
     while 1:
         item = np.random.random(x)
         q.put(item)

def task(q):
     return q.get()





> Is the goal to parallelize a big sampler into N tasks of M trials, to
> produce the same result as a sequential set of M*N trials ? Then it does
> sound like a trivial task at all. I know there exists libraries 
> explicitly designed for parallel random number generation - maybe this
> is where we should look, instead of using heuristics which are likely to
> be bogus, and generate wrong results.
> 
> cheers,
> 
> David
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion




More information about the NumPy-Discussion mailing list