On Sat, Oct 20, 2012 at 12:31 PM, Steve Dower <Steve.Dower@microsoft.com> wrote:
- Nit: I don't like calling the event loop context; there are too many things called context (e.g. context managers in Python), so I prefer to call it what it is -- event loop or I/O loop.
The naming collision with context managers has been brought up before, so I'm okay with changing that. We used context mainly because it's close to the terminology used in .NET, where you schedule tasks/continuations in a particular SynchronizationContext. I believe "I/O loop" would be inaccurate, but "event loop" is probably appropriate.
I'm happy to settle on event loop. (Terminology in this area seems fraught with conflicting conventions; Twisted calls it a reactor, after the reactor pattern, but I've been chided by others for using this term without explanation; Tornado calls it I/O loop.)
- You mention a query interface a few times but there are no details in your example code; can you elaborate? (Or was that a typo for queue?)
I think I just changed terminology while writing - this is the 'get_future_for' call, which is not guaranteed to provide a waitable/pollable object for any type.
Then what is the use? What *is* its contract?
The intent is to allow an event loop to optionally provide support for (say) select(), but not to force that upon all implementations. If (when) someone implements a Windows GetMessage() based loop then requiring 'native' select() support is unfair. (Also, an implementation for Windows 8 would not directly involve an event loop, but would pass everything through to the underlying OS.)
I'm all for declaring select() an implementation detail. It doesn't scale on any platform; on Windows it only works for sockets; the properly scaling alternative varies per platform. (It is IOCP on Windows, right?) This *probably* also means that the concept of file descriptor is out the window (even though Tornado apparently cannot do anything without it -- it's probably not used on Windows at all). And I suspect that it means that the implementation of the socket abstraction will vary per platform. The collection of other implementations of the same abstraction available, and even available other abstractions, will also vary per platform -- on Unix, there are pseudo ttys, pipes, named pipes, and unix domain sockets; I don't recall the set available on Windows, but I'm sure it is different. Then there is SSL/TLS, which feels like it requires special handling but in the end implements an abstraction similar to sockets. I assume that in many cases it is easy to bridge from the various platform-specific abstractions and implementation to more cross-platform abstractions; this is where the notions of transports and protocols seem most important. I haven't explored those enough, sadly. One note inspired by my mention of SSL, but also by discussions about GUI event loops in other threads: it is easy to think that everything is reducible to a file descriptor, but often it is not that easy. E.g. with something like SSL, you can't just select on the underlying socket, and then when it's ready call the read() method of the SSL layer -- it's possible that the read() will still block because the socket didn't have enough bytes to be able to decrypt the next block of data. Similar for sockets associated with e.g. GUI event management (e.g. X).
- This is almost completely isomorphic with NDB's tasklets, except that you borrow the Future class implementation from concurrent.futures -- I think that's the wrong building block to start with, because it is linked too closely to threads.
As far as I can see, the only link that futures have with threads is that the ThreadPoolExecutor class is in the same module. `Future` itself is merely an object that can be polled, waited on, or assigned a callback, which means it represents all asynchronous operations. Some uses are direct (e.g., polling a future that represents pollable I/O) while others require emulation (adding a callback for pollable I/O), which is partly why the 'get_future_for' function exists - to allow the event loop to use the object directly if it can.
I wish it was true. But the Future class contains a condition variable, and the Waiter class used by the implementation uses an event. Both are directly imported from the threading module, and if you block on either of these, it is a hard block (not even interruptable by a signal). Don't worry too much about this -- it's just the particular implementation (concurrent.futures.Future). We can define a better Future class for our purposes elsewhere, with the same interface (or a subset -- I don't care much for the whole cancellation feature) but without references to threading. For those Futures, we'll have to decide what should happen if you call result() when the Future isn't done yet -- raise an error (similar to EWOULDBLOCK), or somehow block, possibly running a recursive event loop? (That's what NDB does, but not everybody likes it.) I think the normal approach would be to ask the scheduler to suspend the current task until the Future is ready -- it can easily arrange for that by adding a callback. In NDB this is spelled "yield <future>". In the yield-from <generator> world we could spell it that way too (i.e. yield, not yield from), or we could make it so that we can write yield from <future>, or perhaps we need a helper call: yield from wait(<future>) or maybe a method on the Future class (since it is our own), yield from <future>.wait(). These are API design details. (I also have a need to block for the Futures returned by ThreadPoolExecutor and ProcessPoolExecutor -- those are handy when you really can't run something inline in the event loop -- the simplest example being getaddrinfo(), which may block for DNS.)
- There is a big speed difference between yield from <generator> and yield <future>. With yield <future>, the scheduler has to do significant work for each yield at an intermediate level, whereas with yield from, the schedule is only involved when actual blocking needs to be performed. In my experience, real code has lots of intermediate levels. Therefore I would like to use yield from. You can already do most things with yield from that you can do with Futures; there are a few operations that need a helper (in particular spawning truly concurrent tasks), but the helper code can be much simpler than the Future object, and isn't needed as often, so it's still a bare win.
I don't believe the scheduler is involved that frequently, but it is true that more Futures than are strictly necessary are created.
IIUC every yield must pass a Future, and every time that happens the scheduler gets it and must arrange for a callback on that Future which resumes the generator. I have code like that in NDB and you have very similar code like that in your version (wrapper in @async, and later _Awaiter._step()).
The first step (up to a yield) of any @async method is always run immediately - if there is no yield, then the returned future is already completed and has the result. The event loop as implemented could be optimised slightly for this case, but since Future calls new callbacks immediately if it has already completed then we never 'unschedule' the task.
Interesting that you always run the first step immediately. I don't do this in NDB. Can you explain why you think you need it? (It may simply be an optimization I've overlooked. :-)
yield from can of course be used for the intermediate levels in exactly the same way as it is used for refactoring generators. The difference is that the top level is an @async decorator, at which point a Future is created. So 'read_async' might have @async applied, but it can 'yield from' any other generators that yield futures. Then the person calling 'read_async' is free to use any Future compatible interface rather than being forced into continuing the 'yield from' chain all the way to the top. (In particular, I think this works much better in the interactive scenario - I can write "x = read_async().result()", but how do you implement a 'yield from' approach in a REPL?)
Yeah, this is what I do in NDB, as I mentioned above (the recursive event loop call). But I suspect it would be very easy to write a helper function that you give a generator and which runs it to completion. It would also have to invoke the event loop, but that seems unavoidable, and otherwise the event loop isn't running in interactive mode, right? (Unless it runs in a separate thread, in which case the helper function should just communicate with that thread.) Final remark: I keep wondering if it's better to try and stay "pure" in the public API and use only yield from, plus some helpers like spawn(), join() and par(), or if a decent, pragmatic public API can offer a combination. I worry that most users will have a hard time remembering when to use yield and when yield from. -- --Guido van Rossum (python.org/~guido)