[Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals
antoine at python.org
Fri Oct 12 10:54:48 EDT 2018
Le 12/10/2018 à 16:49, Sean Harrington a écrit :
> Yes - “func” (and “self” which func is bound to) would be copied to each
> child worker process, where they are stored and applied to each element
> of the iterable being mapped over.
Only if it has changed, then, right?
I suspect that would work, but it will break compatibility in some cases
(think of a mutable object that hasn't defined equality - so it defaults
to identity). It's also introducing a complication in the API which
people didn't have to think of before.
The fact that you're doing all this in order to eschew global variables
for global resources doesn't warm me much to the idea. Unless other
core developers are enthusiastic I'm not willing to integrate such a change.
> On Fri, Oct 12, 2018 at 10:41 AM Antoine Pitrou <solipsis at pitrou.net
> <mailto:solipsis at pitrou.net>> wrote:
> On Fri, 12 Oct 2018 09:42:50 -0400
> Sean Harrington <seanharr11 at gmail.com <mailto:seanharr11 at gmail.com>>
> > I would contend that this is much more granular than Dask - this
> is just an
> > optimization of Pool.map() to avoid redundantly passing the same
> > repeatedly, once per task, to each worker, with the primary goal of
> > eliminating redundant serialization of large-memory-footprinted
> > This is a different use case than Dask - I don't intend to
> approach the
> > shared memory or distributed computing realms.
> > And the second call to Pool.map would update the cached "self" as
> a part of
> > its initialization workflow, s.t. "the latest version of self when
> map() is
> > called is taken into account".
> I still don't understand how that works. If you "updated the cached
> self", then surely you must transmit it to the child, right?
> Python-Dev mailing list
> Python-Dev at python.org <mailto:Python-Dev at python.org>
More information about the Python-Dev