[Numpy-discussion] numpy.random and multiprocessing

Thu Dec 11 14:16:40 EST 2008

I'd just like to add that yet another option would be to use the 
manager/proxy object in multiprocessing. In this case 
numpy.random.random will be called in the parent process. I have not 
used this and I am not sure how efficient it is. But the possibility is 
there.

Sturla Molden

=== test.py ===
from test_helper import task, RandomManager
from multiprocessing import Pool

rm = RandomManager()
rm.start()
random = rm.Random()

p = Pool(4)

jobs = list()
for i in range(4):
     jobs.append(p.apply_async(task, (4,random)))

print [j.get() for j in jobs]

p.close()
p.join()

rm.shutdown()

=== test_helper.py ===
import numpy as np
import multiprocessing as mp
from mp.managers import BaseManager, CreatorMethod

class RandomClass(object):
     def random(self, x):
         return np.random.random(x)

class RandomManager(BaseManager):
     Random = CreatorMethod(RandomClass)

def task(x, random):
     return random.random(x)

On 12/11/2008 4:20 PM, Gael Varoquaux wrote:
> Hi there,
> 
> I have been using the multiprocessing module a lot to do statistical tests
> such as Monte Carlo or resampling, and I have just discovered something
> that makes me wonder if I haven't been accumulating false results. Given
> two files:
> 
> === test.py ===
> from test_helper import task
> from multiprocessing import Pool
> 
> p = Pool(4)
> 
> jobs = list()
> for i in range(4):
>     jobs.append(p.apply_async(task, (4, )))
> 
> print [j.get() for j in jobs]
> 
> p.close()
> p.join()
> 
> === test_helper.py ===
> import numpy as np
> 
> def task(x):
>     return np.random.random(x)
> 
> =======
> 
> If I run test.py, I get:
> 
> [array([ 0.35773964,  0.63945684,  0.50855196,  0.08631373]), array([
> 0.35773964,  0.63945684,  0.50855196,  0.08631373]), array([ 0.35773964,
> 0.63945684,  0.50855196,  0.08631373]), array([ 0.65357725,  0.35649382,
> 0.02203999,  0.7591353 ])]
> 
> In other words, the 4 processes give me the same exact results.
> 
> Now I understand why this is the case: the different instances of the
> random number generator where created by forking from the same process,
> so they are exactly the very same object. This is howver a fairly bad
> trap. I guess other people will fall into it.
> 
> The take home message is: 
> **call 'numpy.random.seed()' when you are using multiprocessing**
> 
> I wonder if we can find a way to make this more user friendly? Would be
> easy, in the C code, to check if the PID has changed, and if so reseed
> the random number generator? I can open up a ticket for this if people
> think this is desirable (I think so).
> 
> On a side note, there are a score of functions in numpy.random with
> __module__ to None. It makes it inconvenient to use it with
> multiprocessing (for instance it forced the creation of the 'test_helper'
> file here).
> 
> Gaël
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion