[Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

Sean Harrington seanharr11 at gmail.com
Mon Oct 22 08:28:08 EDT 2018


Michael - the initializer/globals pattern still might be necessary if you
need to create an object AFTER a worker process has been instantiated (i.e.
a database connection). Further, the user may want to access all of the
niceties of Pool, like imap, imap_unordered, etc.  The goal (IMO) would be
to preserve an interface which many Python users have grown accustomed to,
and to allow them to access this optimization out-of-the-bag.

Having talked to folks at the Boston Python meetup, folks on my dev team,
and perusing stack overflow, this "instance method parallelization" is a
pretty common pattern that is often times a negative return on investment
for the developer, due to the implicit implementation detail of pickling
the function (and object) once per task.

Is anyone open to reviewing a PR concerning this optimization of Pool,
delivered as a subclass? This feature restricts the number of unique tasks
being executed by workers at once to 1, while allowing aggressive
subprocess-level function cacheing to prevent repeated
serialization/deserialization of large functions/closures. The use case is
s.t. the user only ever needs 1 call to Pool.map(func, ls) (or friends)
executing at once, when `func` has a non-trivial memory footprint.


On Fri, Oct 19, 2018 at 4:06 PM Michael Selik <mike at selik.org> wrote:

> On Fri, Oct 19, 2018 at 5:01 AM Sean Harrington <seanharr11 at gmail.com>
> wrote:
>
>> I like the idea to extend the Pool class [to optimize the case when only
>> one function is passed to the workers].
>>
>
> Why would this keep the same interface as the Pool class? If its workers
> are restricted to calling only one function, that should be passed into the
> Pool constructor. The map and apply methods would then only receive that
> function's args and not the function itself. You're also trying to avoid
> the initializer/globals pattern, so you could eliminate that parameter from
> the Pool constructor. In fact, it sounds more like you'd want a function
> than a class. You can call it "procmap" or similar. That's code I've
> written more than once.
>
>     results = poolmap(func, iterable, processes=cpu_count())
>
> The nuance is that, since there's no explicit context manager, you'll want
> to ensure the pool is shut down after all the tasks are finished, even if
> the results generator hasn't been fully consumed.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20181022/039b9535/attachment.html>


More information about the Python-Dev mailing list