[Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

Fri Sep 28 19:23:06 EDT 2018

Hi Antoine - see inline below for my response...thanks for your time!

On Fri, Sep 28, 2018 at 6:45 PM Antoine Pitrou <solipsis at pitrou.net> wrote:

>
> Hi,
>
> On Fri, 28 Sep 2018 17:07:33 -0400
> Sean Harrington <seanharr11 at gmail.com> wrote:
> >
> > In *short*, the implementation of the feature works as follows:
> >
> >    1. Exposes a kwarg on Pool.__init__ called `expect_initret`, that
> >    defaults to False. When set to True:
> >       1. Capture the return value of the initializer kwarg of Pool
> >       2. Pass this value to the function being applied, as a kwarg.
> >
> > Again, in *short,* the motivation of the feature is to provide an
> explicit
> > "flow of data" from parent process to worker process, and to avoid being
> > *forced* to using the *global* keyword in initializer, or being *forced*
> to
> > create global variables in the parent process.
>
> Thanks for taking the time to explain your use case and write a
> proposal.
>
> My reactions to this are:
>
> 1. The proposed API is ugly.  This basically allows you to pass an
> argument which changes with which arguments another function is later
> called...

> Yes I agree that this is a not-perfect contract, but isn't this also a
concern with the current implementation? And isn't this pattern arguably
more explicit than "The function-being-applied relying on the initializer
to create a global variable from within it's lexical scope"?

2. A global variable seems like the adequate way to represent a
> process-global object (which is exactly your use case)

> There is nothing wrong with using a global variable, especially in nearly
every toy example found on the internet of using multiprocessing.Pool (i.e.
optimizing a simple script). But what happens when you have lots of nested
function calls in your applied function? My simple argument is that the
developer should not be constrained to make the objects passed globally
available in the process, as this MAY break encapsulation for large
projects.

3. If you don't like globals, you could probably do something like
> lazily-initialize the resource when a function needing it is executed;
> this also avoids creating the resource if the child doesn't use it at
> all.  Would that work for you?
>
> I have nothing against globals, my gripe is with being enforced to use
them for every Pool use case. Further, if initializing the resource is
expensive, we only want to do this ONE time per worker process. So no, this
will not ~always~ work.

> As a more general remark, I understand the desire to make the Pool
> object more flexible, but we can also not pile up features until it
> satisfies all use cases.
>
> I understand that this is a legitimate concern, but this is about API
approachability.  Python end-users of Pool are forced to declare a global
from a lexical scope. Most Python end-users probably don't even know this
is possible. Sure, this is adding a feature for a use case that I outlined,
but really this is one of the two major use cases of "initializer" and
"initargs" (see my blog post for the 2 generalized use cases
<https://thelaziestprogrammer.com/python/multiprocessing-pool-expect-initret-proposal>),
not some obscure use case. This is making that *very common* use case more
approachable.

> As another general remark, concurrent.futures is IMHO the preferred API
> for the future, and where feature work should probably concentrate.
>
> This is good to hear and know. And will keep this mind moving forward!

> Regards
>
> Antoine.
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/seanharr11%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20180928/91aefe08/attachment-0001.html>