[Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from)

Mon Oct 22 02:23:52 CEST 2012

On Sun, Oct 21, 2012 at 1:07 PM, Steve Dower <Steve.Dower at microsoft.com> wrote:
>> Did you check the source? That's simply incorrect. It uses locks, of the threading variety.
>
> Yes, I've spent a lot of time in the source for Future while working on this.

Sorry, I should have realized this, since your code example contained
monkey-patching that Future class...

> It has synchronisation which is _aware_ of threads, but it never creates, requires or uses them. It simply ensures thread-safe reentrancy, which will be required for any general solution unless it is completely banned from interacting across CPU threads.

I don't see it that way. Any time you acquire a lock, you may be
blocked for a long time. In a typical event loop that's an absolute
no-no. Typically, to wait for another thread, you give the other
thread a callback that adds a new event for *this* thread.

Now, it's possible that in Windows, when using IOCP, the philosophy is
different -- I think I've read in
http://msdn.microsoft.com/en-us/library/aa365198%28VS.85%29.aspx that
there can be multiple threads reading events from a single queue.

But AFAIK, in Twisted and Tornado and similar systems, and probably
even in gevent and Stackless, there is a strong culture around having
only a single thread handling events (at least only one thread at a
time), since the assumption is that as long as you don't suspend, you
can trust that the world doesn't change, and that assumption becomes
invalid when other threads may also be handling events from the same
queue. It's possible to design a world where different threads have
their own event queues, and this assumption would only be valid for
events belonging to the same queue; however that seems complicated.
And you still don't want to ever attempt to acquire a *threading*
lock, because you end up blocking the entire event loop.

>> ( However one could write an implementation with the same interface that doesn't.)
>
> And this is as simple as replacing threading.Condition() with no-op acquire() and release() functions. Regardless, the big advantage of requiring 'Future' as an interface* is that other implementations can be substituted.

Yes, here I think we are in (possibly violent :-) agreement.

> (Maybe making the implementation of future a property of the active event loop? I don't mind particular event loops from banning CPU threads, but the entire API should allow their existence.)

Perhaps. Lots of possibilities in this design space.

> (*I'm inclined to define this as 'result()', 'done()', 'add_done_callback()', 'exception()', 'set_result()' and 'set_exception()' functions. Maybe more, but I think that's sufficient. The current '_waiters' list is an optimisation for add_done_callback(),  and doesn't need to be part of the interface.)

Agreed. I don't see much use for the cancellation stuff and all the
extra complexity that adds to the interface. BTW, I think
concurrent.futures.Future doesn't stop you from calling set_result()
or set_exception() more than once, which I think is a mistake -- I do
enforce that in NDB's Futures.

[Here you snipped some context. You proposed having public APIs that
use "yield <future>" and leaving "yield from <generator>" as something
the user can use in her own program. To which I replied:]

>> Hm. I think it'll be confusing.
>
> I think the basic case ("just make it work") will be simpler, and the advanced case ("minimise memory/CPU usage") will be more complicated.

Let's agree to disagree on this. I think they are both valid design
choices with different trade-offs. We should explore both directions
further so as to form a better opinion.

>> And the Futures-only-in-public-APIs rule seems to encourage less efficient solutions.
>
> Personally, I'd prefer developers to get a correct solution without having to understand how the whole thing works (the "pit of success"). I'm also sceptical of any other rule being as portable and composable - I don't think a standard library should have APIs where "you must only call this function with yield-from". ('await' in C# is not compulsory - you can take the Task returned from an async method and do whatever you like with it.)

Surely "whatever you like" is constrained by whatever the Task type
defines. Maybe it looks like a Future and has a blocking method to
wait for the result, like .result() on concurrent.futures.Future? If
you want that functionality for generators you just have to call some
function, passing it the generator as an argument. Remember, Python
doesn't consider that an inferior choice of API design compared to
making something a method of the object itself -- witness len(),
repr() and many others.

FWIW, if I may sound antagonistic, I actually think that we're mostly
in violent agreement, and I think we're getting closer to coming up
with a sensible set of requirements and possibly even an API proposal.
Keep it coming!

-- 
--Guido van Rossum (python.org/~guido)