Michael - the initializer/globals pattern still might be necessary if you need to create an object AFTER a worker process has been instantiated (i.e. a database connection). Further, the user may want to access all of the niceties of Pool, like imap, imap_unordered, etc.  The goal (IMO) would be to preserve an interface which many Python users have grown accustomed to, and to allow them to access this optimization out-of-the-bag.

Having talked to folks at the Boston Python meetup, folks on my dev team, and perusing stack overflow, this "instance method parallelization" is a pretty common pattern that is often times a negative return on investment for the developer, due to the implicit implementation detail of pickling the function (and object) once per task.

Is anyone open to reviewing a PR concerning this optimization of Pool, delivered as a subclass? This feature restricts the number of unique tasks being executed by workers at once to 1, while allowing aggressive subprocess-level function cacheing to prevent repeated serialization/deserialization of large functions/closures. The use case is s.t. the user only ever needs 1 call to Pool.map(func, ls) (or friends) executing at once, when `func` has a non-trivial memory footprint.


On Fri, Oct 19, 2018 at 4:06 PM Michael Selik <mike@selik.org> wrote:
On Fri, Oct 19, 2018 at 5:01 AM Sean Harrington <seanharr11@gmail.com> wrote:
I like the idea to extend the Pool class [to optimize the case when only one function is passed to the workers].

Why would this keep the same interface as the Pool class? If its workers are restricted to calling only one function, that should be passed into the Pool constructor. The map and apply methods would then only receive that function's args and not the function itself. You're also trying to avoid the initializer/globals pattern, so you could eliminate that parameter from the Pool constructor. In fact, it sounds more like you'd want a function than a class. You can call it "procmap" or similar. That's code I've written more than once.

    results = poolmap(func, iterable, processes=cpu_count())

The nuance is that, since there's no explicit context manager, you'll want to ensure the pool is shut down after all the tasks are finished, even if the results generator hasn't been fully consumed.