[Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

Michael Selik michael.selik at gmail.com
Thu Oct 18 00:49:12 EDT 2018


If imap_unordered is currently re-pickling and sending func each time it's
called on the worker, I have to suspect there was some reason to do that
and not cache it after the first call. Rather than assuming that's an
opportunity for an optimization, I'd want to be certain it won't have edge
case negative effects.


On Tue, Oct 16, 2018 at 2:53 PM Sean Harrington <seanharr11 at gmail.com>
wrote:

> Is your concern something like the following?
>
> with Pool(8) as p:
>     gen = p.imap_unordered(func, ls)
>     first_elem = next(gen)
>     p.apply_async(long_func, x)
>     remaining_elems = [elem for elem in gen]
>

My concern was passing the same function (or a function with the same
qualname). You're suggesting caching functions and identifying them by
qualname to avoid re-pickling a large stateful object that's shoved into
the function's defaults or closure. Is that a correct summary?

If so, how would the function cache distinguish between two functions with
the same name? Would it need to examine the defaults and closure as well?
If so, that means it's pickling the second one anyway, so there's no
efficiency gain.

In [1]: def foo(a):
   ...:     def bar():
   ...:         print(a)
   ...:     return bar
In [2]: f = foo(1)
In [3]: g = foo(2)
In [4]: f
Out[4]: <function __main__.foo.<locals>.bar()>
In [5]: g
Out[5]: <function __main__.foo.<locals>.bar()>

If we say pool.apply_async(f) and pool.apply_async(g), would you want the
latter one to avoid serialization, letting the worker make a second call
with the first function object?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20181017/1d30bec3/attachment.html>


More information about the Python-Dev mailing list