<div dir="ltr">I would contend that this is much more granular than Dask - this is just an optimization of Pool.map() to avoid redundantly passing the same `func` repeatedly, once per task, to each worker, with the primary goal of eliminating redundant serialization of large-memory-footprinted Callables. This is a different use case than Dask - I don't intend to approach the shared memory or distributed computing realms.<div><br></div><div>And the second call to Pool.map would update the cached "self" as a part of its initialization workflow, s.t. "the latest version of self when map() is called is taken into account".</div><div><br></div><div>Do you see a difficulty in accomplishing the second behavior?</div></div><br><div class="gmail_quote"><div dir="ltr">On Fri, Oct 12, 2018 at 9:25 AM Antoine Pitrou <<a href="mailto:antoine@python.org">antoine@python.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
Le 12/10/2018 à 15:17, Sean Harrington a écrit :<br>
> The implementation details need to be flushed out, but agnostic of<br>
> these, do you believe this a valid solution to the initial problem? Do<br>
> you also see it as a beneficial optimization to Pool, given that we<br>
> don't need to store funcs/bound-methods/partials on the tasks themselves?<br>
<br>
I'm not sure, TBH. I also think it may be better to leave this to<br>
higher levels (for example Dask will intelligently distribute data on<br>
workers and let you work with a kind of proxy object in the main<br>
process, transfering data only when necessary).<br>
<br>
> The latter concern about "what happens if `self` changed value in the<br>
> parent" is the same concern as "what happens if `func` changes in the<br>
> parent?" given the current implementation. This is an assumption that is<br>
> currently made with Pool.map_async(func, ls). If "func" changes in the<br>
> parent, there is no communication with the child. So one just needs to<br>
> be aware that calling "map_async(self.func, ls)" while the state of<br>
> "self" is changing, will not communicate changes to each worker. The<br>
> state is frozen when Pool.map is called, just as is the case now.<br>
<br>
If you cache "self" between pool.map calls, then the question is not<br>
"what happens if self changes *during* a map() call" but "what happens<br>
if self changes *between* two map() calls"? While the former is<br>
intuitively undefined, current users would expect the latter to have a<br>
clear answer, which is: the latest version of self when map() is called<br>
is taken into account.<br>
<br>
Regards<br>
<br>
Antoine.<br>
_______________________________________________<br>
Python-Dev mailing list<br>
<a href="mailto:Python-Dev@python.org" target="_blank">Python-Dev@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/python-dev" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/python-dev</a><br>
Unsubscribe: <a href="https://mail.python.org/mailman/options/python-dev/seanharr11%40gmail.com" rel="noreferrer" target="_blank">https://mail.python.org/mailman/options/python-dev/seanharr11%40gmail.com</a><br>
</blockquote></div>