[Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

Antoine Pitrou antoine at python.org
Fri Oct 12 09:24:18 EDT 2018


Le 12/10/2018 à 15:17, Sean Harrington a écrit :
> The implementation details need to be flushed out, but agnostic of
> these, do you believe this a valid solution to the initial problem? Do
> you also see it as a beneficial optimization to Pool, given that we
> don't need to store funcs/bound-methods/partials on the tasks themselves?

I'm not sure, TBH.  I also think it may be better to leave this to
higher levels (for example Dask will intelligently distribute data on
workers and let you work with a kind of proxy object in the main
process, transfering data only when necessary).

> The latter concern about "what happens if `self` changed value in the
> parent" is the same concern as "what happens if `func` changes in the
> parent?" given the current implementation. This is an assumption that is
> currently made with Pool.map_async(func, ls). If "func" changes in the
> parent, there is no communication with the child. So one just needs to
> be aware that calling "map_async(self.func, ls)" while the state of
> "self" is changing, will not communicate changes to each worker. The
> state is frozen when Pool.map is called, just as is the case now.

If you cache "self" between pool.map calls, then the question is not
"what happens if self changes *during* a map() call" but "what happens
if self changes *between* two map() calls"?  While the former is
intuitively undefined, current users would expect the latter to have a
clear answer, which is: the latest version of self when map() is called
is taken into account.

Regards

Antoine.


More information about the Python-Dev mailing list