[Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

Sat Sep 29 08:13:19 EDT 2018

On Sat, Sep 29, 2018 at 6:24 AM Antoine Pitrou <solipsis at pitrou.net> wrote:

>
> Hi Sean,
>
> On Fri, 28 Sep 2018 19:23:06 -0400
> Sean Harrington <seanharr11 at gmail.com> wrote:
> > My simple argument is that the
> > developer should not be constrained to make the objects passed globally
> > available in the process, as this MAY break encapsulation for large
> > projects.
>
> IMHO, global variables don't break encapsulation if they remain private
> to the module where they actually play a role.
>
> Of course, there are also global-like alternatives to globals, such as
> class attributes...  The multiprocessing module itself uses globals (or
> quasi-globals) internally for various implementation details.
>

>>>  Yes, class attributes are a viable alternative. I've written about
this here.
<https://thelaziestprogrammer.com/python/multiprocessing-pool-a-global-solution>
Still,
the argument is not against global variables, class attributes or any close
cousins -- it is simply that developers shouldn't be forced to use these.

> > 3. If you don't like globals, you could probably do something like
> > > lazily-initialize the resource when a function needing it is executed;
> > > this also avoids creating the resource if the child doesn't use it at
> > > all.  Would that work for you?
> > >
> > > I have nothing against globals, my gripe is with being enforced to
> use
> > them for every Pool use case. Further, if initializing the resource is
> > expensive, we only want to do this ONE time per worker process.
>
> That's what I meant with lazy initialization: initialize it if not
> already done, otherwise just use the already-initialized resource.
> It's a common pattern.
>
> (you can view it as a 1-element cache if you prefer)
>

>>> Sorry - I wasn't following your initial suggestion. This is a valid
solution for ONE of the general use cases (where we initialize objects in
each worker post-fork). However it fails for the other Pool use case of
"initializing a big object in your parent, and passing to each worker,
without using globals."

> > As a more general remark, I understand the desire to make the Pool
> > > object more flexible, but we can also not pile up features until it
> > > satisfies all use cases.
> > >
> > > I understand that this is a legitimate concern, but this is about API
> > approachability.  Python end-users of Pool are forced to declare a global
> > from a lexical scope. Most Python end-users probably don't even know this
> > is possible.
>
> Hmm...  We might have a disagreement on the target audience of the
> multiprocessing module.  multiprocessing isn't very high-level, I would
> expect it to be used by experienced programmers who know how to mutate
> a global variable from a lexical scope.
>

>>> It is one thing to MUTATE  a global from a lexical scope. No gripes
there. The specific concept I'm referencing here, is "DECLARING a global
variable, from within a lexical scope". This is not as a intuitive for most
programmers.

> For non-programmer end-users, such as data scientists, there are
> higher-level libraries such as Celery (http://www.celeryproject.org/)
> and Dask distributed (https://distributed.readthedocs.io/en/latest/).
> Perhaps it would be worth mentioning them in the documentation.
>

>>> We likely do NOT have disagreements on the multiprocessing module.
Multiprocessing is NOT high-level, I agree. But the beauty of the "Pool"
API is that it gives non-programmer end-users (like data scientists) the
ability to leverage multiple cores, without (in most cases) needing to know
implementation details about multiprocessing. All they need to understand
is the higher-order-function "map()", which is a very simple concept. (I
even sound over-complicated myself calling it a "higher-order-function"...)

> Regards
>
> Antoine.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/seanharr11%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20180929/3c112d38/attachment.html>