[Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

Sean Harrington seanharr11 at gmail.com
Fri Oct 12 09:42:50 EDT 2018


I would contend that this is much more granular than Dask - this is just an
optimization of Pool.map() to avoid redundantly passing the same `func`
repeatedly, once per task, to each worker, with the primary goal of
eliminating redundant serialization of large-memory-footprinted Callables.
This is a different use case than Dask - I don't intend to approach the
shared memory or distributed computing realms.

And the second call to Pool.map would update the cached "self" as a part of
its initialization workflow, s.t. "the latest version of self when map() is
called is taken into account".

Do you see a difficulty in accomplishing the second behavior?

On Fri, Oct 12, 2018 at 9:25 AM Antoine Pitrou <antoine at python.org> wrote:

>
> Le 12/10/2018 à 15:17, Sean Harrington a écrit :
> > The implementation details need to be flushed out, but agnostic of
> > these, do you believe this a valid solution to the initial problem? Do
> > you also see it as a beneficial optimization to Pool, given that we
> > don't need to store funcs/bound-methods/partials on the tasks themselves?
>
> I'm not sure, TBH.  I also think it may be better to leave this to
> higher levels (for example Dask will intelligently distribute data on
> workers and let you work with a kind of proxy object in the main
> process, transfering data only when necessary).
>
> > The latter concern about "what happens if `self` changed value in the
> > parent" is the same concern as "what happens if `func` changes in the
> > parent?" given the current implementation. This is an assumption that is
> > currently made with Pool.map_async(func, ls). If "func" changes in the
> > parent, there is no communication with the child. So one just needs to
> > be aware that calling "map_async(self.func, ls)" while the state of
> > "self" is changing, will not communicate changes to each worker. The
> > state is frozen when Pool.map is called, just as is the case now.
>
> If you cache "self" between pool.map calls, then the question is not
> "what happens if self changes *during* a map() call" but "what happens
> if self changes *between* two map() calls"?  While the former is
> intuitively undefined, current users would expect the latter to have a
> clear answer, which is: the latest version of self when map() is called
> is taken into account.
>
> Regards
>
> Antoine.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/seanharr11%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20181012/c1c83eb7/attachment-0001.html>


More information about the Python-Dev mailing list