Steve, I don't want to beat around the bush, I think your approach is too slow. In may situations I would be guilty of premature optimization saying this, but (a) the whole *point* of async I/O is to be blindingly fast (the C10K problem), and (b) the time difference is rather marked. I wrote a simple program for each version (attached) that times a simple double-recursive function, where each recursive level uses yield. With a depth of 20, wattle takes about 24 seconds on my MacBook Pro. And the same problem in tulip takes 0.7 seconds! That's close to two orders of magnitude. Now, this demo is obviously geared towards showing the pure overhead of the "one future per level" approach compared to "pure yield from". But that's what you're proposing. And I think allowing the user to mix yield and yield from is just too risky. (I got rid of block_r/w() + bare yield as a public API from tulip -- that API is now wrapped up in a generator too. And I can do that without feeling guilty knowing that an extra level of generators costs me almost nothing. Debugging experience: I made the same mistake in each program (I guess I copied it over before fixing the bug :-), which caused an AttributeError to happen at the time.time() call. In both frameworks this was baffling, because it caused the program to exit immediately without any output. So on this count we're even. :-) I have to think more about what I'd like to borrow from wattle -- I agree that it's nice to mark up async functions with a decorator (it just shouldn't affect call speed), I like being able to start a task with a single call. Probably more, but my family is calling me to get out of bed. :-) -- --Guido van Rossum (python.org/~guido)