<div dir="ltr"><div class="gmail_quote"><div>This thread seems more appropriate for python-ideas than python-dev.</div><div><br></div><div dir="ltr"><br></div><div dir="ltr">On Mon, Oct 22, 2018 at 5:28 AM Sean Harrington <<a href="mailto:seanharr11@gmail.com" target="_blank">seanharr11@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Michael - the initializer/globals pattern still might be necessary if you need to create an object AFTER a worker process has been instantiated (i.e. a database connection).</div></blockquote><div><br></div><div>You said you wanted to avoid the initializer/globals pattern and have such things as database connections in the defaults or closure of the task-function, or the bound instance, no? Did I misunderstand?</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"> Further, the user may want to access all of the niceties of Pool, like imap, imap_unordered, etc.  The goal (IMO) would be to preserve an interface which many Python users have grown accustomed to, and to allow them to access this optimization out-of-the-bag.</div></blockquote><div><br></div><div>You just said that the dominant use-case was mapping a single task-function. It sounds like we're talking past each other in some way. It'll help to have a concrete example of a case that satisfies all the characteristics you've described: (1) no globals used for communication between initializer and task-functions; (2) single task-function, mapped once; (3) an instance-method as task-function, causing a large serialization burden; and (4) did I miss anything?</div><div><br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Having talked to folks at the Boston Python meetup, folks on my dev team, and perusing stack overflow, this "instance method parallelization" is a pretty common pattern that is often times a negative return on investment for the developer, due to the implicit implementation detail of pickling the function (and object) once per task.<br></div></div></blockquote><div><br></div><div>I believe you.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div></div><div>Is anyone open to reviewing a PR concerning this optimization of Pool, delivered as a subclass? This feature restricts the number of unique tasks being executed by workers at once to 1, while allowing aggressive subprocess-level function cacheing to prevent repeated serialization/deserialization of large functions/closures. The use case is s.t. the user only ever needs 1 call to Pool.map(func, ls) (or friends) executing at once, when `func` has a non-trivial memory footprint.<br></div></div></blockquote><div><br></div><div>You're quite eager to have this PR merged. I understand that. However, it's reasonable to take some time to discuss the design of what you're proposing. You don't need it in the stdlib to get your own work done, nor to share it with others.<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

</blockquote></div>

</blockquote></div></div>