[Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

Michael Selik mike at selik.org
Mon Oct 22 14:00:42 EDT 2018

This thread seems more appropriate for python-ideas than python-dev.

On Mon, Oct 22, 2018 at 5:28 AM Sean Harrington <seanharr11 at gmail.com>

> Michael - the initializer/globals pattern still might be necessary if you
> need to create an object AFTER a worker process has been instantiated (i.e.
> a database connection).

You said you wanted to avoid the initializer/globals pattern and have such
things as database connections in the defaults or closure of the
task-function, or the bound instance, no? Did I misunderstand?

Further, the user may want to access all of the niceties of Pool, like
> imap, imap_unordered, etc.  The goal (IMO) would be to preserve an
> interface which many Python users have grown accustomed to, and to allow
> them to access this optimization out-of-the-bag.

You just said that the dominant use-case was mapping a single
task-function. It sounds like we're talking past each other in some way.
It'll help to have a concrete example of a case that satisfies all the
characteristics you've described: (1) no globals used for communication
between initializer and task-functions; (2) single task-function, mapped
once; (3) an instance-method as task-function, causing a large
serialization burden; and (4) did I miss anything?

> Having talked to folks at the Boston Python meetup, folks on my dev team,
> and perusing stack overflow, this "instance method parallelization" is a
> pretty common pattern that is often times a negative return on investment
> for the developer, due to the implicit implementation detail of pickling
> the function (and object) once per task.

I believe you.

> Is anyone open to reviewing a PR concerning this optimization of Pool,
> delivered as a subclass? This feature restricts the number of unique tasks
> being executed by workers at once to 1, while allowing aggressive
> subprocess-level function cacheing to prevent repeated
> serialization/deserialization of large functions/closures. The use case is
> s.t. the user only ever needs 1 call to Pool.map(func, ls) (or friends)
> executing at once, when `func` has a non-trivial memory footprint.

You're quite eager to have this PR merged. I understand that. However, it's
reasonable to take some time to discuss the design of what you're
proposing. You don't need it in the stdlib to get your own work done, nor
to share it with others.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20181022/2bbb6341/attachment.html>

More information about the Python-Dev mailing list