Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

29 Sep 2018

      On Sat, Sep 29, 2018 at 6:24 AM Antoine Pitrou  wrote:
...
Hi Sean,
On Fri, 28 Sep 2018 19:23:06 -0400
Sean Harrington  wrote:
...
My simple argument is that the
developer should not be constrained to make the objects passed globally
available in the process, as this MAY break encapsulation for large
projects.
IMHO, global variables don't break encapsulation if they remain private
to the module where they actually play a role.
Of course, there are also global-like alternatives to globals, such as
class attributes...  The multiprocessing module itself uses globals (or
quasi-globals) internally for various implementation details.
...
...
...
Yes, class attributes are a viable alternative. I've written about
this here.
https://thelaziestprogrammer.com/python/multiprocessing-pool-a-global-soluti...
Still,
the argument is not against global variables, class attributes or any close
cousins -- it is simply that developers shouldn't be forced to use these.
...
...
...
lazily-initialize the resource when a function needing it is executed;
this also avoids creating the resource if the child doesn't use it at
all.  Would that work for you?
I have nothing against globals, my gripe is with being enforced to
use
3. If you don't like globals, you could probably do something like
them for every Pool use case. Further, if initializing the resource is
expensive, we only want to do this ONE time per worker process.
That's what I meant with lazy initialization: initialize it if not
already done, otherwise just use the already-initialized resource.
It's a common pattern.
(you can view it as a 1-element cache if you prefer)

...
...
...
Sorry - I wasn't following your initial suggestion. This is a valid
solution for ONE of the general use cases (where we initialize objects in
each worker post-fork). However it fails for the other Pool use case of
"initializing a big object in your parent, and passing to each worker,
without using globals."
...
...
As a more general remark, I understand the desire to make the Pool
...
object more flexible, but we can also not pile up features until it
satisfies all use cases.
I understand that this is a legitimate concern, but this is about API
approachability.  Python end-users of Pool are forced to declare a global
from a lexical scope. Most Python end-users probably don't even know this
is possible.
Hmm...  We might have a disagreement on the target audience of the
multiprocessing module.  multiprocessing isn't very high-level, I would
expect it to be used by experienced programmers who know how to mutate
a global variable from a lexical scope.
...
...
...
It is one thing to MUTATE  a global from a lexical scope. No gripes
there. The specific concept I'm referencing here, is "DECLARING a global
variable, from within a lexical scope". This is not as a intuitive for most
programmers.
...
For non-programmer end-users, such as data scientists, there are
higher-level libraries such as Celery (http://www.celeryproject.org/)
and Dask distributed (https://distributed.readthedocs.io/en/latest/).
Perhaps it would be worth mentioning them in the documentation.

...
...
...
We likely do NOT have disagreements on the multiprocessing module.
Multiprocessing is NOT high-level, I agree. But the beauty of the "Pool"
API is that it gives non-programmer end-users (like data scientists) the
ability to leverage multiple cores, without (in most cases) needing to know
implementation details about multiprocessing. All they need to understand
is the higher-order-function "map()", which is a very simple concept. (I
even sound over-complicated myself calling it a "higher-order-function"...)
...
Regards
Antoine.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/seanharr11%40gmail.com