[Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

Sean Harrington seanharr11 at gmail.com
Fri Oct 12 08:33:32 EDT 2018

Hi Nathaniel - this if this solution can be made performant, than I would
be more than satisfied.

I think this would require removing "func" from the "task tuple", and
storing the "func" "once per worker" somewhere globally (maybe a class
attribute set post-fork?).

This also has the beneficial outcome of increasing general performance of
Pool.map and friends. I've seen MANY folks across the interwebs doing
things like passing instance methods to map, resulting in "big" tasks, and
slower-than-sequential parallelized code. Parallelizing "instance methods"
by passing them to map, w/o needing to wrangle with staticmethods and
globals, would be a GREAT feature! It'd just be as easy as:

    Pool.map(self.func, ls)

What do you think about this idea? This is something I'd be able to take
on, assuming I get a few core dev blessings...

On Thu, Oct 4, 2018 at 6:15 AM Nathaniel Smith <njs at pobox.com> wrote:

> On Wed, Oct 3, 2018 at 6:30 PM, Sean Harrington <seanharr11 at gmail.com>
> wrote:
> > with Pool(func_kwargs={"big_cache": big_cache}) as pool:
> >     pool.map(func, ls)
> I feel like it would be nicer to spell this:
> with Pool() as pool:
>     pool.map(functools.partial(func, big_cache=big_cache), ls)
> And this might also solve your problem, if pool.map is clever enough
> to only send the function object once to each worker?
> -n
> --
> Nathaniel J. Smith -- https://vorpus.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20181012/003ab385/attachment.html>

More information about the Python-Dev mailing list