<div dir="ltr"><br><div class="gmail_quote"><div dir="ltr">On Sat, Sep 29, 2018 at 6:24 AM Antoine Pitrou <<a href="mailto:solipsis@pitrou.net">solipsis@pitrou.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

Hi Sean,<br>

<br>

On Fri, 28 Sep 2018 19:23:06 -0400<br>

Sean Harrington <<a href="mailto:seanharr11@gmail.com" target="_blank">seanharr11@gmail.com</a>> wrote:<br>

> My simple argument is that the<br>

> developer should not be constrained to make the objects passed globally<br>

> available in the process, as this MAY break encapsulation for large<br>

> projects.<br>

<br>

IMHO, global variables don't break encapsulation if they remain private<br>

to the module where they actually play a role.<br>

<br>

Of course, there are also global-like alternatives to globals, such as<br>

class attributes...  The multiprocessing module itself uses globals (or<br>

quasi-globals) internally for various implementation details.<br></blockquote><div><br></div></div><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div class="gmail_quote"><div>>>>  Yes, class attributes are a viable alternative. <a href="https://thelaziestprogrammer.com/python/multiprocessing-pool-a-global-solution">I've written about this here.</a> Still, the argument is not against global variables, class attributes or any close cousins -- it is simply that developers shouldn't be forced to use these.</div></div></blockquote><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

> 3. If you don't like globals, you could probably do something like<br>

> > lazily-initialize the resource when a function needing it is executed;<br>

> > this also avoids creating the resource if the child doesn't use it at<br>

> > all.  Would that work for you?<br>

> ><br>

> > I have nothing against globals, my gripe is with being enforced to use  <br>

> them for every Pool use case. Further, if initializing the resource is<br>

> expensive, we only want to do this ONE time per worker process.<br>

<br>

That's what I meant with lazy initialization: initialize it if not<br>

already done, otherwise just use the already-initialized resource.<br>

It's a common pattern.<br>

<br>

(you can view it as a 1-element cache if you prefer)<br></blockquote><div><br></div></div><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div class="gmail_quote"><div>>>> Sorry - I wasn't following your initial suggestion. This is a valid solution for ONE of the general use cases (where we initialize objects in each worker post-fork). However it fails for the other Pool use case of "initializing a big object in your parent, and passing to each worker, without using globals."</div><div><br></div></div></blockquote><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

> > As a more general remark, I understand the desire to make the Pool<br>

> > object more flexible, but we can also not pile up features until it<br>

> > satisfies all use cases.<br>

> ><br>

> > I understand that this is a legitimate concern, but this is about API  <br>

> approachability.  Python end-users of Pool are forced to declare a global<br>

> from a lexical scope. Most Python end-users probably don't even know this<br>

> is possible.<br>

<br>

Hmm...  We might have a disagreement on the target audience of the<br>

multiprocessing module.  multiprocessing isn't very high-level, I would<br>

expect it to be used by experienced programmers who know how to mutate<br>

a global variable from a lexical scope.<br></blockquote><div><br></div></div><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div class="gmail_quote"><div>>>> It is one thing to MUTATE  a global from a lexical scope. No gripes there. The specific concept I'm referencing here, is "DECLARING a global variable, from within a lexical scope". This is not as a intuitive for most programmers. </div></div></blockquote><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

For non-programmer end-users, such as data scientists, there are<br>

higher-level libraries such as Celery (<a href="http://www.celeryproject.org/" rel="noreferrer" target="_blank">http://www.celeryproject.org/</a>)<br>

and Dask distributed (<a href="https://distributed.readthedocs.io/en/latest/" rel="noreferrer" target="_blank">https://distributed.readthedocs.io/en/latest/</a>).<br>

Perhaps it would be worth mentioning them in the documentation.<br></blockquote><div> </div></div><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div class="gmail_quote"><div>>>> We likely do NOT have disagreements on the multiprocessing module. Multiprocessing is NOT high-level, I agree. But the beauty of the "Pool" API is that it gives non-programmer end-users (like data scientists) the ability to leverage multiple cores, without (in most cases) needing to know implementation details about multiprocessing. All they need to understand is the higher-order-function "map()", which is a very simple concept. (I even sound over-complicated myself calling it a "higher-order-function"...)</div></div></blockquote><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

Regards<br>

<br>

Antoine.<br>

_______________________________________________<br>

Python-Dev mailing list<br>

<a href="mailto:Python-Dev@python.org" target="_blank">Python-Dev@python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/python-dev" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/python-dev</a><br>

Unsubscribe: <a href="https://mail.python.org/mailman/options/python-dev/seanharr11%40gmail.com" rel="noreferrer" target="_blank">https://mail.python.org/mailman/options/python-dev/seanharr11%40gmail.com</a><br>

</blockquote></div></div>