Thanks for the replies and pointers. I got multiprocessing.Pool to work, but it eats up memory and time. I append two implementation segments below. The multiprocessing version is about 33 times _slower_ than the single processor version. Unless I use a small number of processors, memory fills up and I kill the job to make the computer usable again. The following segments of code are inside a loop that steps over 115 lines of pixels.
Several problems here: (1) I am sorry I didn't mention this earlier, but looking over your original email, it appears that your single-process code might be very inefficient: it seems to perturb each particle individually in a for- loop rather than working on an array of all the particles. Perhaps you should try to fix that before adding multiprocessing? Basically, you should hopefully be able to write random_fork to work on a number of particles at once using numpy broadcasting, etc. This way, the for- loop that steps through the elements is implemented in compiled C, rather than interpreted python. Check out various numpy tutorials for details, but here's the general gist: points = numpy.arange(6000).reshape((3000,2)) # 3000 x,y points perturbations = numpy.random.normal(size=(3000,2)) def perturb_bad(points, perturbations): for point, perturbation in zip(points, perturbations): point += perturbation def perturb_good(points, perturbations): points += perturbations timeit perturb_bad(points, perturbations) # 10 loops, best of 3: 18.7 milliseconds per loop timeit perturb_good(points, perturbations) # 10000 loops, best of 3: 161 microseconds per loop Compare this orders-of-magnitude gain to the at-best-8-fold gain you'd get from multiprocessing the bad code. Also note that "map" is basically just an interpreted for-loop under the hood: import operator timeit map(operator.add, points, perturbations) # 10 loops, best of 3: 18.7 milliseconds per loop The moral here is to avoid looping constructs in python when working with sets of numbers and instead use numpy operations that operate on lots of numbers with one python command. (2) From the slowdowns you report, it looks like overhead costs are completely dominating. For each job, the code and data need to be serialized (pickled, I think, is how the multiprocessing library handles it), written to a pipe, unpickled, executed, and the results need to be pickled, sent back, and unpickled. Perhaps using memmap to share state might be better? Or you can make sure that the function parameters and results can be very rapidly pickled and unpickled (single numpy arrays, e.g., not lists-of-sub-arrays or something). Still, tune the single-processor code first. Perhaps you can send more detailed code samples and folks on the list can offer some advice about how to make it numpy-friendly and fast. Zach On May 27, 2010, at 5:37 PM, Andy Fraser wrote:
Thanks for the replies and pointers. I got multiprocessing.Pool to work, but it eats up memory and time. I append two implementation segments below. The multiprocessing version is about 33 times _slower_ than the single processor version. Unless I use a small number of processors, memory fills up and I kill the job to make the computer usable again. The following segments of code are inside a loop that steps over 115 lines of pixels.
def func(job): return job[0].random_fork(job[1])
. . . . . .
#Multiprocessing version:
noise = numpy.random.standard_normal((N_particles,noise_df)) jobs = zip(self.particles,noise) self.particles = self.pool.map(func, jobs, self.chunk_size) return (m,v)
. . . . . .
#Single processing version
noise = numpy.random.standard_normal((N_particles,noise_df)) jobs = zip(self.particles,noise) self.particles = map(func, jobs) return (m,v)
-- Andy Fraser ISR-2 (MS:B244) afraser@lanl.gov Los Alamos National Laboratory 505 665 9448 Los Alamos, NM 87545 _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user