Multiprocessing Pool and functions with many arguments

Piet van Oostrum piet at cs.uu.nl
Fri May 1 12:46:18 EDT 2009


>>>>> "psaffrey at googlemail.com" <psaffrey at googlemail.com> (P) wrote:

>P> I'm trying to get to grips with the multiprocessing module, having
>P> only used ParallelPython before.

>P> based on this example:

>P> http://docs.python.org/library/multiprocessing.html#using-a-pool-of-workers

>P> what happens if I want my "f" to take more than one argument? I want
>P> to have a list of tuples of arguments and have these correspond the
>P> arguments in f, but it keeps complaining that I only have one argument
>P> (the tuple). Do I have to pass in a tuple and break it up inside f? I
>P> can't use multiple input lists, as I would with regular map.

You give the tuple of the arguments for the function:

def f(a, b, c):
    return a + b * c

pool = Pool(processes=4)              # start 4 worker processes

result = pool.apply(f, (2, 3, 4))    # evaluate "f(2, 3, 4)" 

Or if you have a list:

args = [ (2, 3, 4), # arguments for call 1
          (5, 6, 7) # arguments for call 2
        ]

print [pool.apply(f, a) for a in args]

However, as each call to apply wait for its results, this will execute
sequentially instead of parallel.

You can't use map directly as it works only with single argument functions. 
>>> print pool.map(f, args) 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/multiprocessing/pool.py", line 148, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/multiprocessing/pool.py", line 422, in get
    raise self._value
TypeError: f() takes exactly 3 arguments (1 given)

Is that what you mean?

But you can use a wrapper function:

def wrapf(abc):
    return f(*abc)

[later...]

print pool.map(wrapf, args) 

This is covered in the examples section of multiprocessing (see
calculate and calculatestar for example).

Or you can use apply_async and later wait for the results:

results = [pool.apply_async(f, a) for a in args]
print [r.get() for r in results]

Now the calls to f are done in parallel, which you can check by putting
a sleep inside f.

-- 
Piet van Oostrum <piet at cs.uu.nl>
URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
Private email: piet at vanoostrum.org



More information about the Python-list mailing list