<div class="gmail_quote">On Thu, Feb 25, 2010 at 1:33 AM, Brian Quinlan <span dir="ltr"><<a href="mailto:brian@sweetapp.com">brian@sweetapp.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div style="word-wrap:break-word"><div>The PEP officially lives at:</div><div class="im"><div><a href="http://python.org/dev/peps/pep-3148/" target="_blank">http://python.org/dev/peps/pep-3148</a></div><div><br></div></div>
<div>but this version is the most up-to-date:</div><div><a href="http://code.google.com/p/pythonfutures/source/browse/branches/feedback/pep-3148.txt" target="_blank">http://code.google.com/p/pythonfutures/source/browse/branches/feedback/pep-3148.txt</a></div>
<div><br></div><br><div><div><div></div><div class="h5"><div>On Feb 24, 2010, at 7:04 AM, Jeffrey Yasskin wrote:</div><br><blockquote type="cite"><div class="gmail_quote">On Tue, Feb 23, 2010 at 3:31 AM, Brian Quinlan <span dir="ltr"><<a href="mailto:brian@sweetapp.com" target="_blank">brian@sweetapp.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex"> <div style="word-wrap:break-word">
<br><div><div><div>On Feb 22, 2010, at 2:37 PM, Jeffrey Yasskin wrote:</div></div></div><div><div><br><blockquote type="cite"><div><blockquote type="cite"><blockquote type="cite"> * Could you specify in what circumstances a pure computational<br>
</blockquote></blockquote><blockquote type="cite"><blockquote type="cite">Future-based program may deadlock? (Ideally, that would be "never".)<br> </blockquote></blockquote><blockquote type="cite"><blockquote type="cite">
Your current implementation includes two such deadlocks, for which<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">I've attached a test.<br> </blockquote></blockquote><blockquote type="cite">
<br></blockquote><blockquote type="cite"><blockquote type="cite">* Do you want to make calling Executor.shutdown(wait=True) from within<br></blockquote></blockquote><blockquote type="cite"> <blockquote type="cite">the same Executor 1) detect the problem and raise an exception, 2)<br>
</blockquote></blockquote><blockquote type="cite"><blockquote type="cite">deadlock, 3) unspecified behavior, or 4) wait for all other threads<br> </blockquote></blockquote><blockquote type="cite"><blockquote type="cite">and then let the current one continue?<br>
</blockquote></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">What about a note saying that using any futures functions or methods from<br> </blockquote><blockquote type="cite">inside a scheduled call is likely to lead to deadlock unless care is taken?<br>
</blockquote><br>Jesse pointed out that one of the first things people try to do when<br>using concurrency libraries is to try to use them inside themselves.<br> I've also tried to use a futures library that forbade nested use<br>
('cause I wrote it), and it was a real pain.<br></div></blockquote><div><br></div></div>You can use the API from within Executor-invoked functions - you just have to be careful.</div> </div></blockquote><div><br></div>
<div>It's the job of the PEP (and later the docs) to explain exactly what care is needed. Or were you asking if I was ok with adding that explanation to the PEP? I think that explanation is the minimum requirement (that's what I meant by "Could you specify in what circumstances a pure computational</div>
<div>Future-based program may deadlock?"), but it would be better if it could never deadlock, which is achievable by stealing work.</div></div></blockquote><div><br></div></div></div>I don't think so, see below.</div>
<div><div><div></div><div class="h5"><br><blockquote type="cite"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">
<div style="word-wrap:break-word"><div><div><blockquote type="cite"><div>It should be easy enough to detect that the caller of<br>Executor.shutdown is one of the Executor's threads or processes, but I<br>wouldn't mind making the obviously incorrect "wait for my own<br>
completion" deadlock or throw an exception, and it would make sense to<br>give Executor implementors their choice of which to do.<br><br><blockquote type="cite"><blockquote type="cite">* This is a nit, but I think that the parameter names for<br>
</blockquote></blockquote><blockquote type="cite"><blockquote type="cite">ThreadPoolExecutor and ProcessPoolExecutor should be the same so<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">people can parametrize their code on those constructors. Right now<br>
</blockquote></blockquote><blockquote type="cite"><blockquote type="cite">they're "max_threads" and "max_processes", respectively. I might<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">
suggest "max_workers".<br></blockquote></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">I'm not sure that I like that. In general consolidating the constructors for<br></blockquote>
<blockquote type="cite">executors is not going to be possible.<br></blockquote><br>In general, yes, but in this case they're the same, and we should try<br>to avoid gratuitous differences.<br></div></blockquote><div>
<br> </div></div>num_threads and num_processes is more explicit than num_workers but I don't really care so I changed it.<div><br></div></div></div></blockquote><div>Thanks. </div><div><br></div><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">
<div style="word-wrap:break-word"><div><div><div><blockquote type="cite"><div><blockquote type="cite"><blockquote type="cite">* I'd like users to be able to write Executors besides the simple<br></blockquote> </blockquote>
<blockquote type="cite"><blockquote type="cite">ThreadPoolExecutor and ProcessPoolExecutor you already have. To enable<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">that, could you document what the subclassing interface for Executor<br>
</blockquote></blockquote><blockquote type="cite"><blockquote type="cite">looks like? that is, what code do user-written Executors need to<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">include? I don't think it should include direct access to<br>
</blockquote></blockquote><blockquote type="cite"><blockquote type="cite">future._state like ThreadPoolExecutor uses, if at all possible.<br></blockquote></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">
One of the difficulties here is:<br></blockquote><blockquote type="cite">1. i don't want to commit to the internal implementation of Futures<br></blockquote><br>Yep, that's why to avoid requiring them to have direct access to the<br>
internal variables.<br><br><blockquote type="cite">2. it might be hard to make it clear which methods are public to users and<br></blockquote><blockquote type="cite">which methods are public to executor implementors<br></blockquote>
<br>One way to do it would be to create another type for implementors and<br>pass it to the Future constructor.<br></div></blockquote><div><br></div></div></div><div>If we change the future interface like so:</div></div>
<div> <br></div><div>class Future(object):</div><div> # Existing public methods</div><div> ...</div><div> # For executors only</div><div> def set_result(self):</div><div> ...</div><div><div> def set_exception(self):</div>
<div> ...</div><div><div> def check_cancel_and_notify(self):</div><div> # returns True if the Future was cancelled and</div><div> # notifies anyone who cares i.e. waiters for</div><div> # wait() and as_completed</div>
</div></div></div></blockquote><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><div><div> </div></div></div></div></blockquote>
<blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word">
<div><div></div><div>Then an executor implementor need only implement:</div><div><br> </div><div>def submit(self, fn, *args, **kwargs):</div><div><br></div><div>With the logic to actual execute fn(*args, **kwargs) and update the returned future, of course.</div>
<div><br></div><div>Thoughts?</div><div> <br></div></div></div></blockquote><div>Could you write up the submit() implementations you're thinking of? That kind of interface extension seems right.</div></div></blockquote>
<div><br></div></div></div><div>I mean that submit will implement all of the application-specific logic and call the above methods as it processes the future. I added a note (but not much in the way of details) about that.</div>
<div class="im"><br></div></div></div></blockquote><div>Your process pool still relies on future._condition, but I think you can just delete that line and everything will still work. This seems fine to me. Thanks! </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div style="word-wrap:break-word"><div><div class="im"><blockquote type="cite"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">
<div style="word-wrap:break-word"><div><div><blockquote type="cite"><div><blockquote type="cite"><blockquote type="cite">* Could you specify in what circumstances a pure computational<br></blockquote></blockquote> <blockquote type="cite">
<blockquote type="cite">Future-based program may deadlock? (Ideally, that would be "never".)<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">Your current implementation includes two such deadlocks, for which<br>
</blockquote></blockquote><blockquote type="cite"><blockquote type="cite">I've attached a test.<br></blockquote></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Thanks for the tests but I wasn't planning on changing this behavior. I<br>
</blockquote><blockquote type="cite">don't really like the idea of using the calling thread to perform the wait<br></blockquote><blockquote type="cite">because:<br></blockquote><blockquote type="cite">1. not all executors will be able to implement that behavior<br>
</blockquote><br>Why not?</div></blockquote><div><br></div></div>What if my executor sends the data to a remove cluster for execution and running it locally isn't feasible?</div></div></blockquote><div><br></div><div>
If the executor can't send itself across the network, you're fine since it'll be impossible to create cycles. If the executor can add threads dynamically when it notices that it's not using enough of the CPU, it's also fine since you remove the limited resource. If the executor can be copied and cannot add threads, then it sent the data one way somehow, so it should be able to send the data the other way to execute locally. It _is_ possible to run out of memory or stack space. Is that what you're worried about?</div>
<div><br></div><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">
<div style="word-wrap:break-word"><div><div><blockquote type="cite"><div> Thread pools can implement it,</div> </blockquote><div><br></div></div><div>Do you have a strategy in mind that would let you detect arbitrary deadlocks in threaded futures?</div>
</div></div></blockquote><div><br></div><div>Yes, AFAIK work stealing suffices for systems made up only of futures and executors. Non-future blocking objects can reintroduce deadlocks, but I believe futures alone can't.</div>
</div></blockquote><div><br></div></div><div>How would work stealing help with this sort of deadlock?</div><div class="im"><div><br></div><div>import time</div><div>def wait_on_b():</div><div> time.sleep(5)</div><div> print(b.result())</div>
</div><div class="im"><div> return 5</div><div><br></div><div>def wait_on_a():</div><div> time.sleep(5)</div><div> print(a.result())</div></div><div> return 6</div><div class="im"><div><br></div><div><br></div>f = ThreadPoolExecutor(max_workers=2)</div>
</div><div>a = f.submit(wait_on_b)</div><div>b = f.submit(wait_on_a)</div><div><br></div></div></blockquote><div><br></div><div>Heh. If you're going to put that in the pep, at least make it correct (sleeping is not synchronization):</div>
<div><br></div>import threading<br>condition = threading.Condition(threading.Lock())<br>a = None<br>b = None<br><br></div><div class="gmail_quote">def wait_on_b():<br> with condition:<br> while b is None:<br> condition.wait()<br>
print(b.result())<br> return 5</div><div class="gmail_quote"><br>def wait_on_a():<br> with condition:<br> while a is None:<br> condition.wait()<br> print(a.result())<br> return 6<br><br>f = ThreadPoolExecutor(max_workers=2)</div>
<div class="gmail_quote">with condition:<br> a = f.submit(wait_on_b)<br> b = f.submit(wait_on_a)</div><div class="gmail_quote"> condition.notifyAll()<br><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div style="word-wrap:break-word"><div></div><div>In any case, I've updated the docs and PEP to indicate that deadlocks are possible.</div><div></div></div></blockquote></div><br>
<div>Thanks. I still disagree, and think users are much more likely to be surprised by occasional deadlocks due to cycles of executors than they are about guaranteed deadlocks from cycles of futures, but I don't want to be the only one holding up the PEP by insisting on this.</div>
<div><br></div><div>I think there are places the names could be improved, and Jesse probably has an opinion on exactly where this should go in the package hierarchy, but I think it will make a good addition to the standard library. Thanks for working on it!</div>
<div><br></div><div>Jeffrey</div>