[Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals
Nathaniel Smith
njs at pobox.com
Fri Oct 12 15:34:17 EDT 2018
On Fri, Oct 12, 2018, 06:09 Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Fri, 12 Oct 2018 08:33:32 -0400
> Sean Harrington <seanharr11 at gmail.com> wrote:
> > Hi Nathaniel - this if this solution can be made performant, than I would
> > be more than satisfied.
> >
> > I think this would require removing "func" from the "task tuple", and
> > storing the "func" "once per worker" somewhere globally (maybe a class
> > attribute set post-fork?).
> >
> > This also has the beneficial outcome of increasing general performance of
> > Pool.map and friends. I've seen MANY folks across the interwebs doing
> > things like passing instance methods to map, resulting in "big" tasks,
> and
> > slower-than-sequential parallelized code. Parallelizing "instance
> methods"
> > by passing them to map, w/o needing to wrangle with staticmethods and
> > globals, would be a GREAT feature! It'd just be as easy as:
> >
> > Pool.map(self.func, ls)
> >
> > What do you think about this idea? This is something I'd be able to take
> > on, assuming I get a few core dev blessings...
>
> Well, I'm not sure how it would work, so it's difficult to give an
> opinion. How do you plan to avoid passing "self"? By caching (by
> equality? by identity?)? Something else? But what happens if "self"
> changed value (in the case of a mutable object) in the parent? Do you
> keep using the stale version in the child? That would break
> compatibility...
>
I was just suggesting that within a single call to Pool.map, it would be
reasonable optimization to only send the fn once to each worker. So e.g. if
you have 5 workers and 1000 items, you'd only pickle fn 5 times, rather
than 1000 times like we do now. I wouldn't want to get any fancier than
that with caching data between different map calls or anything.
Of course even this may turn out to be too complicated to implement in a
reasonable way, since it would require managing some extra state on the
workers. But semantically it would be purely an optimization of current
semantics.
-n
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20181012/814918c5/attachment.html>
More information about the Python-Dev
mailing list