To save people scrolling to get to the interesting parts, I'll lead with the links: Detailed write-up: https://bitbucket.org/stevedower/tulip/wiki/Proposal Source code: https://bitbucket.org/stevedower/tulip/src (Yes, I renamed my repo after the code name was selected. That would have been far too much of a coincidence.) Practically all of the details are in the write-up linked first, so anything that's not is either something I didn't think of or something I decided is unimportant right now (for example, the optimal way to wait for ten thousand sockets simultaneously on every different platform). There's a reimplemented Future class in the code which is not essential, but it is drastically simplified from concurrent.futures.Future (CFF). It can't be directly replaced by CFF, but only because CFF requires more state management that the rest of the implementation does not perform ("set_running_or_notify_cancel"). CFF also includes cancellation, for which I've proposed a different mechanism. For the sake of a quick example, I've modified Guido's main.doit function (http://code.google.com/p/tulip/source/browse/main.py) to how it could be written with my proposal (apologies if I've butchered it, but I think it should behave the same): @async def doit(): TIMEOUT = 2 cs = CancellationSource() cs.cancel_after(TIMEOUT) tasks = set() task1 = urlfetch('localhost', 8080, path='/', cancel_source=cs) tasks.add(task1) task2 = urlfetch('127.0.0.1', 8080, path='/home', cancel_source=cs) tasks.add(task2) task3 = urlfetch('python.org', 80, path='/', cancel_source=cs) tasks.add(task3) task4 = urlfetch('xkcd.com', ssl=True, path='/', af=socket.AF_INET, cancel_source=cs) tasks.add(task4) ## for t in tasks: t.start() # tasks start as soon as they are called - this function does not exist yield delay(0.2) # I believe this is equivalent to scheduling.with_timeout(0.2, ...)? winners = [t.result() for t in tasks if t.done()] print('And the winners are:', [w for w in winners]) results = [] # This 'wait all' loop could easily be a helper function for t in tasks: # Unfortunately, [(yield t) for t in tasks] does not work :( results.append((yield t)) print('And the players were:', [r for r in results]) return results This is untested code, and has a few differences. I don't have task names, so it will print the returned value from urlfetch (a tuple of (host, port, path, status, len(data), time_taken)). The cancellation approach is quite different, but IMO far more likely to avoid the finally-related issues discussed in other threads. However, I want to emphasise that unless you are already familiar with this exact style, it is near impossible to guess exactly what is going on from this little sample. Please read the write-up before assuming what is or is not possible with this approach. Cheers, Steve
Possibly I should have selected a different code name, now I come to think of it, but we came up with such similar code that I don't think it'll stay separate for too long. ________________________________ From: Python-ideas [python-ideas-bounces+steve.dower=microsoft.com@python.org] on behalf of Steve Dower [Steve.Dower@microsoft.com] Sent: Monday, October 29, 2012 6:40 PM To: python-ideas@python.org Subject: [Python-ideas] Async API: some more code to review To save people scrolling to get to the interesting parts, I'll lead with the links: Detailed write-up: https://bitbucket.org/stevedower/tulip/wiki/Proposal Source code: https://bitbucket.org/stevedower/tulip/src (Yes, I renamed my repo after the code name was selected. That would have been far too much of a coincidence.) Practically all of the details are in the write-up linked first, so anything that's not is either something I didn't think of or something I decided is unimportant right now (for example, the optimal way to wait for ten thousand sockets simultaneously on every different platform). There's a reimplemented Future class in the code which is not essential, but it is drastically simplified from concurrent.futures.Future (CFF). It can't be directly replaced by CFF, but only because CFF requires more state management that the rest of the implementation does not perform ("set_running_or_notify_cancel"). CFF also includes cancellation, for which I've proposed a different mechanism. For the sake of a quick example, I've modified Guido's main.doit function (http://code.google.com/p/tulip/source/browse/main.py) to how it could be written with my proposal (apologies if I've butchered it, but I think it should behave the same): @async def doit(): TIMEOUT = 2 cs = CancellationSource() cs.cancel_after(TIMEOUT) tasks = set() task1 = urlfetch('localhost', 8080, path='/', cancel_source=cs) tasks.add(task1) task2 = urlfetch('127.0.0.1', 8080, path='/home', cancel_source=cs) tasks.add(task2) task3 = urlfetch('python.org', 80, path='/', cancel_source=cs) tasks.add(task3) task4 = urlfetch('xkcd.com', ssl=True, path='/', af=socket.AF_INET, cancel_source=cs) tasks.add(task4) ## for t in tasks: t.start() # tasks start as soon as they are called - this function does not exist yield delay(0.2) # I believe this is equivalent to scheduling.with_timeout(0.2, ...)? winners = [t.result() for t in tasks if t.done()] print('And the winners are:', [w for w in winners]) results = [] # This 'wait all' loop could easily be a helper function for t in tasks: # Unfortunately, [(yield t) for t in tasks] does not work :( results.append((yield t)) print('And the players were:', [r for r in results]) return results This is untested code, and has a few differences. I don't have task names, so it will print the returned value from urlfetch (a tuple of (host, port, path, status, len(data), time_taken)). The cancellation approach is quite different, but IMO far more likely to avoid the finally-related issues discussed in other threads. However, I want to emphasise that unless you are already familiar with this exact style, it is near impossible to guess exactly what is going on from this little sample. Please read the write-up before assuming what is or is not possible with this approach. Cheers, Steve
On Monday, October 29, 2012, Steve Dower wrote:
Possibly I should have selected a different code name, now I come to think of it, but we came up with such similar code that I don't think it'll stay separate for too long.
Hm, yes, this felt weird. I figured the code names would be useful to reference the proposals when comparing them, not as the ultimate eventual project name once it's beeb PEP-ified and put in the stdlib. Maybe you can call yours "wattle"? That's a Pythonic plant name. :-) (Sorry, still reading through your docs and code, it's too early for more substantial fedback.) --Guido -- --Guido van Rossum (python.org/~guido)
Guido van Rossum wrote:
On Monday, October 29, 2012, Steve Dower wrote:
Possibly I should have selected a different code name, now I come to think of it, but we came up with such similar code that I don't think it'll stay separate for too long.
Hm, yes, this felt weird. I figured the code names would be useful to reference the proposals when comparing them, not as the ultimate eventual project name once it's beeb PEP-ified and put in the stdlib.
Maybe you can call yours "wattle"? That's a Pythonic plant name. :-)
Nice idea. I renamed it and (hopefully) made it so the original links still work. https://bitbucket.org/stevedower/wattle/src https://bitbucket.org/stevedower/wattle/wiki/Proposal I was never expecting the name to last, I just figured you had to make something up to create a project. Eventually it will all just become a boring PEP-xxx number... Cheers, Steve
Steve, I don't want to beat around the bush, I think your approach is too slow. In may situations I would be guilty of premature optimization saying this, but (a) the whole *point* of async I/O is to be blindingly fast (the C10K problem), and (b) the time difference is rather marked. I wrote a simple program for each version (attached) that times a simple double-recursive function, where each recursive level uses yield. With a depth of 20, wattle takes about 24 seconds on my MacBook Pro. And the same problem in tulip takes 0.7 seconds! That's close to two orders of magnitude. Now, this demo is obviously geared towards showing the pure overhead of the "one future per level" approach compared to "pure yield from". But that's what you're proposing. And I think allowing the user to mix yield and yield from is just too risky. (I got rid of block_r/w() + bare yield as a public API from tulip -- that API is now wrapped up in a generator too. And I can do that without feeling guilty knowing that an extra level of generators costs me almost nothing. Debugging experience: I made the same mistake in each program (I guess I copied it over before fixing the bug :-), which caused an AttributeError to happen at the time.time() call. In both frameworks this was baffling, because it caused the program to exit immediately without any output. So on this count we're even. :-) I have to think more about what I'd like to borrow from wattle -- I agree that it's nice to mark up async functions with a decorator (it just shouldn't affect call speed), I like being able to start a task with a single call. Probably more, but my family is calling me to get out of bed. :-) -- --Guido van Rossum (python.org/~guido)
Guido van Rossum wrote:
I don't want to beat around the bush, I think your approach is too slow. In may situations I would be guilty of premature optimization saying this, but (a) the whole *point* of async I/O is to be blindingly fast (the C10K problem), and (b) the time difference is rather marked.
I wrote a simple program for each version (attached) that times a simple double-recursive function, where each recursive level uses yield.
With a depth of 20, wattle takes about 24 seconds on my MacBook Pro. And the same problem in tulip takes 0.7 seconds! That's close to two orders of magnitude. Now, this demo is obviously geared towards showing the pure overhead of the "one future per level" approach compared to "pure yield from". But that's what you're proposing.
I get similar results on my machine with those benchmarks, though the difference was not so significant with my own (100 connections x 100 messages to SocketSpam.py - I included SocketSpamStress.py). The only time there was more than about 5% difference was when the 'yield from' case was behaving completely differently (each connection's routine was not interleaving with the others - my own bug, which I fixed). Choice of scheduler makes a difference as well. Using my UnthreadedSocketScheduler() instead of SingleThreadedScheduler() halves the time taken, and just using "main(depth).result()" reduces that by about 10% again. It still is not directly comparable to tulip, but there are ways to make them equivalent (discussed below).
And I think allowing the user to mix yield and yield from is just too risky.
The errors involved when you get yield and yield from confused are quite clear in this case. However, if you use 'yield' instead of 'yield from' in tulip, you simply don't ever run that function. Maybe this will give you an error further down the track, but it won't be as immediate. On the other hand, if you're really after extreme performance (*cough*use C*cough* :) ) we can easily add an "__unwrapped__" attribute to @async that provides access to the internal generator, which you can then 'yield from' from: @async def binary(n): if n <= 0: return 1 l = yield from binary.__unwrapped__(n-1) r = yield from binary.__unwrapped__(n-1) return l + 1 + r With this change the performance is within 5% of tulip (most times are up to 5% slower, but some are faster - I'd say margin of error), regardless of the scheduler. (I've no doubt this could be improved further by modifying _Awaiter and Future to reduce the amount of memory allocations, and a super optimized library could use C implementations that still fit the API and work with existing code.) I much prefer treating 'yield from __unwrapped__' as an advanced case, so I'm all for providing ways to optimize async code where necessary, but when I think about how I'd teach this to a class of undergraduates I'd much rather have the simpler @async/yield rule (which doesn't even require an understanding of generators). For me, "get it to work" and "get it to work, fast" comes well before "get it to work fast".
(I got rid of block_r/w() + bare yield as a public API from tulip -- that API is now wrapped up in a generator too. And I can do that without feeling guilty knowing that an extra level of generators costs me almost nothing.
I don't feel particularly guilty about the extra level... if the operations you're blocking on are that much quicker than the overhead then you probably don't need to block. I'm pretty certain that even with multiple network cards you'll still suffer from bus contention before suffering from generator overhead.
Debugging experience: I made the same mistake in each program (I guess I copied it over before fixing the bug :-), which caused an AttributeError to happen at the time.time() call. In both frameworks this was baffling, because it caused the program to exit immediately without any output. So on this count we're even. :-)
This is my traceback once I misspell time(): ...>c:\Python33_x64\python.exe wattle_bench.py Traceback (most recent call last): File "wattle_bench.py", line 27, in <module> SingleThreadScheduler().run(main, depth=depth) File "SingleThreadScheduler.py", line 106, in run raise self._exit_exception File "scheduler.py", line 171, in _step next_future = self.generator.send(result) File "wattle_bench.py", line 22, in main t1 = time.tme() AttributeError: 'module' object has no attribute 'tme' Of course, if you do call an @async function and don't yield (or call result()) then you won't ever see an exception. I don't think there's any nice way to propagate these automatically (except maybe through a finalizer... not so keen on that). You can do 'op.add_done_callback(Future.result)' to force the error to be raised somewhere (or better yet, pass it to a logger - this is why we allow multiple callbacks, after all).
I have to think more about what I'd like to borrow from wattle -- I agree that it's nice to mark up async functions with a decorator (it just shouldn't affect call speed), I like being able to start a task with a single call.
You'll probably find (as I did in my early work) that starting the task in the initial call doesn't work with yield from. Because it does the first next() call, you can't send results/exceptions back in. If all the yields (at the deepest level) are blank, this might be okay, but it caused me issues when I was yielding objects to wait for. I'm also interested in your thoughts on get_future_for(), since that seems to be one of the more unorthodox ideas of wattle. I can clearly see how it works, but I have no idea whether I've expressed it well in the description. Cheers, Steve
On 2012-10-29, at 9:40 PM, Steve Dower <Steve.Dower@microsoft.com> wrote:
To save people scrolling to get to the interesting parts, I'll lead with the links:
Detailed write-up: https://bitbucket.org/stevedower/tulip/wiki/Proposal
Source code: https://bitbucket.org/stevedower/tulip/src
Your design looks very similar to the framework I developed. I'll try to review your code in detail tomorrow. Couple of things I like already: 1) Use of 'yield from' is completely optional 2) @async decorator. That makes coroutines more visible and allows to add extra methods to them. 3) Tight control over coroutines execution, something that is completely missing when you use yield-from. I dislike the choice of name for 'async', though. Since @async-decorated functions are going to be yielded most of the time (yield makes them "sync" in that context), I'd stick to plain @coroutine. P.S. If this approach is viable (optional yield-from, required @async-or-something decorator), I can invest some time and open source the core of my framework (one benefit is that it has lots and lots of unit-tests). - Yury
On Mon, Oct 29, 2012 at 7:29 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
Couple of things I like already:
1) Use of 'yield from' is completely optional
That's actually my biggest gripe...
2) @async decorator. That makes coroutines more visible and allows to add extra methods to them.
Yes on marking them more visibly. No on wrapping each call into an object that slows down the invocation.
3) Tight control over coroutines execution, something that is completely missing when you use yield-from.
This I don't understand. What do you mean by "tight control"? And why would you want it?
I dislike the choice of name for 'async', though. Since @async-decorated functions are going to be yielded most of the time (yield makes them "sync" in that context), I'd stick to plain @coroutine.
Hm. I think of it this way: the "async" (or whatever) function *is* asynchronous, and just calling it does *not* block. However if you then *yield* (or in my tulip proposal *yield from*) it, that suspends the current task until the asyc function completes, giving the *illusion* of synchronicity or blocking. (I have to admit I was confused by a comment in Steve's example code saying "does not block" on a line containing a yield, where I have been used to think of such lines as blocking.)
P.S. If this approach is viable (optional yield-from, required @async-or-something decorator), I can invest some time and open source the core of my framework (one benefit is that it has lots and lots of unit-tests).
Just open-sourcing the tests would already be useful!! -- --Guido van Rossum (python.org/~guido)
Guido, Well, with such a jaw dropping benchmarks results there is no point in discussion whether it's better to use yield-froms or yields+promises. But let me also share results of my framework: - Plain coroutines - 24.4 - Coroutines + greenlets - 34.5 - Coroutines + greenlets + many cython optimizations: 4.79 (still too slow) Now with dynamically replacing (opcodes magic) 'yield' with 'yield_' to entirely avoid generators and some other optimizations I believe it's possible to speed it up even further, probably to times below 1 second. But, again, the price of not using yield-froms is too high (and I don't even mention hard-to-fix tracebacks when you use just yields) On 2012-10-30, at 10:52 AM, Guido van Rossum <guido@python.org> wrote:
On Mon, Oct 29, 2012 at 7:29 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
Couple of things I like already:
1) Use of 'yield from' is completely optional
That's actually my biggest gripe...
Yes, let's use just one thing everywhere.
2) @async decorator. That makes coroutines more visible and allows to add extra methods to them.
Yes on marking them more visibly. No on wrapping each call into an object that slows down the invocation.
3) Tight control over coroutines execution, something that is completely missing when you use yield-from.
This I don't understand. What do you mean by "tight control"? And why would you want it?
Actually, if we make decorating coroutines with @coro-like decorator strongly recommended (or even required) I can get that tight-control thing. It gives you the following: - Breakdown profiling results by individual coroutines - Blocking code detection - Hacks to protect finally statements, modify your coroutines internals, etc / probably I'm the only one in the world who need this :( - Better debugging (just logging individual coroutines sometimes helps) And decorator makes code more future-proof as well. Who knows what kind of instruments you need later.
I dislike the choice of name for 'async', though. Since @async-decorated functions are going to be yielded most of the time (yield makes them "sync" in that context), I'd stick to plain @coroutine.
Hm. I think of it this way: the "async" (or whatever) function *is* asynchronous, and just calling it does *not* block. However if you then *yield* (or in my tulip proposal *yield from*) it, that suspends the current task until the asyc function completes, giving the *illusion* of synchronicity or blocking. (I have to admit I was confused by a comment in Steve's example code saying "does not block" on a line containing a yield, where I have been used to think of such lines as blocking.)
"*illusion* of synchronicity or blocking" -- that's precisely the reason I don't like '@async' used together with yields.
P.S. If this approach is viable (optional yield-from, required @async-or-something decorator), I can invest some time and open source the core of my framework (one benefit is that it has lots and lots of unit-tests).
Just open-sourcing the tests would already be useful!!
When the tulip is ready I simply start integrating them. - Yury
participants (3)
-
Guido van Rossum
-
Steve Dower
-
Yury Selivanov