[Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

Joni Orponen j.orponen at 4teamwork.ch
Fri Oct 19 07:31:46 EDT 2018


On Fri, Oct 19, 2018 at 9:09 AM Thomas Moreau <thomas.moreau.2010 at gmail.com>
wrote:

> Hello,
>
> I have been working on the concurent.futures module lately and I think
> this optimization should be avoided in the context of python Pools.
>
> This is an interesting idea, however its implementation will bring many
> complicated issues as it breaks the basic paradigm of a Pool: the tasks are
> independent and you don't know which worker is going to run which task.
>
> The function is serialized with each task because of this paradigm. This
> ensure that any worker picking the task will be able to perform it
> independently from the tasks it has run before, given that it as been
> initialized correctly at the beginning. This makes it simple to run each
> task.
>

I would not mind if there would be a subtype of Pool where you can only
apply one kind of task to. This is a very common use mode.

Though the question there is 'should this live in Python itself'? I'd be
fine with a package on PyPi.

As the Pool comes with no scheduler, with your idea, you would need a
> synchronization step to send the function to all workers before you can
> launch your task. But if there is already one worker performing a long
> running task, does the Pool wait for it to be done before it sends the
> function? If the Pool doesn't wait, how does it ensure that this worker
> will be able to get the definition of the function before running it?
> Also, the multiprocessing.Pool has some features where a worker can shut
> itself down after a given number of tasks or a timeout. How does it ensure
> that the new worker will have the definition of the function?
> It is unsafe to try such a feature (sending only once an object) anywhere
> else than in the initializer which is guaranteed to be run once per worker.
>
> On the other hand, you mentioned an interesting point being that making
> globals available in the workers could be made simpler. A possible solution
> would be to add a "globals" argument in the Pool which would instanciate
> global variables in the workers. I have no specific idea but on the
> implementation of such features but it would be safer as it would be an
> initialization feature.
>

Would this also mean one could use a Pool in a context where threading is
used? Currently using threading side effects unpicklables into the globals.

Also being able to pass in globals=None would be optimal for a lot of use
cases.

-- 
Joni Orponen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20181019/3a30c4bc/attachment.html>


More information about the Python-Dev mailing list