Le 12/10/2018 à 15:17, Sean Harrington a écrit :
The implementation details need to be flushed out, but agnostic of these, do you believe this a valid solution to the initial problem? Do you also see it as a beneficial optimization to Pool, given that we don't need to store funcs/bound-methods/partials on the tasks themselves?
I'm not sure, TBH. I also think it may be better to leave this to higher levels (for example Dask will intelligently distribute data on workers and let you work with a kind of proxy object in the main process, transfering data only when necessary).
The latter concern about "what happens if `self` changed value in the parent" is the same concern as "what happens if `func` changes in the parent?" given the current implementation. This is an assumption that is currently made with Pool.map_async(func, ls). If "func" changes in the parent, there is no communication with the child. So one just needs to be aware that calling "map_async(self.func, ls)" while the state of "self" is changing, will not communicate changes to each worker. The state is frozen when Pool.map is called, just as is the case now.
If you cache "self" between pool.map calls, then the question is not "what happens if self changes *during* a map() call" but "what happens if self changes *between* two map() calls"? While the former is intuitively undefined, current users would expect the latter to have a clear answer, which is: the latest version of self when map() is called is taken into account. Regards Antoine.