On Sat, Sep 29, 2018 at 6:24 AM Antoine Pitrou
Hi Sean,
On Fri, 28 Sep 2018 19:23:06 -0400 Sean Harrington
wrote: My simple argument is that the developer should not be constrained to make the objects passed globally available in the process, as this MAY break encapsulation for large projects.
IMHO, global variables don't break encapsulation if they remain private to the module where they actually play a role.
Of course, there are also global-like alternatives to globals, such as class attributes... The multiprocessing module itself uses globals (or quasi-globals) internally for various implementation details.
Yes, class attributes are a viable alternative. I've written about
this here. https://thelaziestprogrammer.com/python/multiprocessing-pool-a-global-soluti... Still, the argument is not against global variables, class attributes or any close cousins -- it is simply that developers shouldn't be forced to use these.
lazily-initialize the resource when a function needing it is executed; this also avoids creating the resource if the child doesn't use it at all. Would that work for you?
I have nothing against globals, my gripe is with being enforced to use
3. If you don't like globals, you could probably do something like them for every Pool use case. Further, if initializing the resource is expensive, we only want to do this ONE time per worker process.
That's what I meant with lazy initialization: initialize it if not already done, otherwise just use the already-initialized resource. It's a common pattern.
(you can view it as a 1-element cache if you prefer)
Sorry - I wasn't following your initial suggestion. This is a valid solution for ONE of the general use cases (where we initialize objects in each worker post-fork). However it fails for the other Pool use case of "initializing a big object in your parent, and passing to each worker, without using globals."
As a more general remark, I understand the desire to make the Pool
object more flexible, but we can also not pile up features until it satisfies all use cases.
I understand that this is a legitimate concern, but this is about API approachability. Python end-users of Pool are forced to declare a global from a lexical scope. Most Python end-users probably don't even know this is possible.
Hmm... We might have a disagreement on the target audience of the multiprocessing module. multiprocessing isn't very high-level, I would expect it to be used by experienced programmers who know how to mutate a global variable from a lexical scope.
It is one thing to MUTATE a global from a lexical scope. No gripes
there. The specific concept I'm referencing here, is "DECLARING a global variable, from within a lexical scope". This is not as a intuitive for most programmers.
For non-programmer end-users, such as data scientists, there are higher-level libraries such as Celery (http://www.celeryproject.org/) and Dask distributed (https://distributed.readthedocs.io/en/latest/). Perhaps it would be worth mentioning them in the documentation.
We likely do NOT have disagreements on the multiprocessing module. Multiprocessing is NOT high-level, I agree. But the beauty of the "Pool" API is that it gives non-programmer end-users (like data scientists) the ability to leverage multiple cores, without (in most cases) needing to know implementation details about multiprocessing. All they need to understand is the higher-order-function "map()", which is a very simple concept. (I even sound over-complicated myself calling it a "higher-order-function"...)
Regards
Antoine. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/seanharr11%40gmail.com