Hello, First of all, sorry for the lengthy email. I've really tried to make it concise and I hope that I didn't fail entirely. At the beginning I want to describe the framework my company has been working on for several years, and on which we successfully deployed several web applications that are exposed to 1000s of users today. It survived multiple architecture reviews and redesigns, so I believe its design is worth to be introduced here. Secondly, I'm really glad that many advanced python developers find that use of "yield" is viable for async API, and that it even may be "The Right Way". Because when we started working on our framework that sounded nuts (and still sounds...) The framework ============= I'll describe here only the core functionality, not touching message bus & dispatch, protocols design, IO layers, etc. If someone gets interested - I can start another thread. The very core of the system is Scheduler. I prefer it to be called "Scheduler", and not "Reactor" or something else, because it's not just an event loop. It loops over micro-threads, where a micro-thread is a primitive that holds a pointer to the current running/suspended task. Task can be anything, from coroutine, to a Sleep command. A Task may be suspended because of IO waiting, a lock primitive, a timeout or something else. You can even write programs that are not IO-bound at all. To the code. So when you have:: @coroutine def foo(): bar_value = yield bar() defined, and then executed, 'foo' will send a Task object (wrapped around 'bar'), so that it will be executed in the foo's micro-thread. And because we return a Task, we can also do:: yield bar().with_timeout(1) or even (alike coroutines with Futures):: bar_value_promise = yield bar().with_timeout(1).async() [some code] bar_value = yield bar_value_promise So far there is nothing new. The need for something "new" emerged when we started to use it in "real world" applications. Consider you have some ORM, and the following piece of code:: topics = FE.select([ FE.publication_date, FE.body, FE.category, (FE.creator, [ (FE.creator.subject, [ (gpi, [ gpi.avatar ]) ]) ]) ]).filter(FE.publication_date < FE.publication_date.now(), FE.category == self.category) and later:: for topic in topics: print(topic.avatar.bucket.path, topic.category.name) Everything is lazily-loaded, so a DB query here can be run at virtually any point. When you iterate it pre-fetches objects, or addressing an attribute which wasn't told to be loaded, etc. The thing is that there is no way to express with 'yield' all that semantics. There is no 'for yield' statement, there is no pretty way of resolving an attribute with 'yield'. So even if you decide to write everything around you from scratch supporting 'yields', you still can't make a nice python API for some problems. Another problem is that "writing everything from scratch" thing. Nobody wants it. We always want to reuse, nobody wants to write an SMTP client from scratch, when there is a decent one available right in the stdlib. So the solution was simple. Incorporate greenlets. With greenlets we got a 'yield_()' function, that can be called from any coroutine, and from framework user's point of view it is the same as 'yield' statement. Now we were able to create a green-socket object, that looks as a plain stdlib socket, and fix our ORM. With it help we also were able to wrap lots and lots of existing libraries in a nice 'yield'-style design, without rewriting their internals. At the end - we have a hybrid approach. For 95% we use explicit 'yields', and for the rest 5% - well, we know that when we use ORM it may do some implicit 'yields', but that's OK. Now, with adopting greenlets a whole new optimization set of strategies became available. For instance, we can substitute 'yield' statements with 'yield_' command transparently by messing with opcodes, and by tweaking 'yield_' and reusing 'Task' objects we can achieve near regular-python-call performance, but with a tight control over our coroutines & micro-threads. And when PyPy finally adds support for Python 3, STM & JIT-able continulets, it would be very interesting to see how we can improve performance even further. Conclusion ========== The whole point of this text was to show, that pure 'yield' approach will not work. Moreover, I don't think it's time to pronounce "The Right Way" of 'yielding' and 'yield-fromming'. There are so many ways of doing that: with @coroutine decorator, plain generators, futures and Tasks, and perhaps more. And I honestly don't know with one is the right one. What we really need now (and I think Guido has already mentioned that) is a callback-based (Deferreds, Futures, plain callbacks) design that is easy to plug-and-play in any coroutine-framework. It has to be low-level and simple. Sort of WSGI for async frameworks ;) We also need to work on the stdlib, so that it is easy to inject a custom socket in any object. Ideally, by passing it in the constructor (as everybody hates monkey-patching.) With all that said, I'd be happy to dedicate a fair amount of my time to help with the design and implementation. Thank you! Yury
On 10/23/12 12:33 PM, Yury Selivanov wrote:
The whole point of this text was to show, that pure 'yield' approach will not work. Moreover, I don't think it's time to pronounce "The Right Way" of 'yielding' and 'yield-fromming'. There are so many ways of doing that: with @coroutine decorator, plain generators, futures and Tasks, and perhaps more. And I honestly don't know with one is the right one.
[Thanks Yury for giving me a convenient place to jump in] I abandoned the callback-driven approach in 1999, after pushing it as far as I could handle. IMHO you can build single pieces in a relatively clean fashion, but you cannot easily combine those pieces together to build real systems. Over the past year I've played a little with some generator-based code (tlslite & bluelets for example), and I don't think they're much of an improvement. Whether it's decorated callbacks, generators, whatever, it all reminds me of impenetrable monad code in Haskell. Continuation-passing-style isn't something that humans should be expected to do, it's a trick for compilers. 8^)
What we really need now (and I think Guido has already mentioned that) is a callback-based (Deferreds, Futures, plain callbacks) design that is easy to plug-and-play in any coroutine-framework. It has to be low-level and simple. Sort of WSGI for async frameworks ;)
I've been trying to play catch-up since being told about this thread a couple of days ago. If I understand it correctly, 'yield-from' looks like it can help make generator-based-concurrency a little more sane by cutting back on endless chains of 'for x in ...: yield ...', right? That certainly sounds like an improvement, but does the generator nature of the API bubble all the way up to the top? Can you send an email with a function call?
We also need to work on the stdlib, so that it is easy to inject a custom socket in any object. Ideally, by passing it in the constructor (as everybody hates monkey-patching.)
I second this one. Having a way to [optionally] pass in a factory for sockets would help with portability, and would cut down on the temptation to monkey-patch. It'd be really great to use standard 'async' protocol implementations in a performant way... although I'm not sure how/if I can wedge such code into systems like shrapnel*, but it all starts with being able to pass in a socket-like object (or factory). -Sam (*) Since no one else has mentioned it yet, a tiny plug here for shrapnel: https://github.com/ironport/shrapnel
Sam, BTW, kudos for shrapnel! On 2012-10-23, at 5:25 PM, Sam Rushing <sam-pydeas@rushing.nightmare.com> wrote: [snip]
On 10/23/12 12:33 PM, Yury Selivanov wrote:
What we really need now (and I think Guido has already mentioned that) is a callback-based (Deferreds, Futures, plain callbacks) design that is easy to plug-and-play in any coroutine-framework. It has to be low-level and simple. Sort of WSGI for async frameworks ;)
I've been trying to play catch-up since being told about this thread a couple of days ago. If I understand it correctly, 'yield-from' looks like it can help make generator-based-concurrency a little more sane by cutting back on endless chains of 'for x in ...: yield ...', right? That certainly sounds like an improvement, but does the generator nature of the API bubble all the way up to the top? Can you send an email with a function call?
Well, I guess so. Let's say, urllib is rewritten internally in async-style, exposing publicly its old API, like:: def urlopen(*args, **kwargs): return run_coro(urlopen_async, args, kwargs) where 'run_coro' takes care of setting up a Scheduler/event-loop and running yield-style or callback-style private code. So that 'urllib' is blocking, but there is an option of using 'urlopen_async' for those who need it. For basic library functions that will work. And that's already a huge win. But developing a complicated library will become twice as hard, as you'll need to maintain two versions of API - sync & async all the way through the code. There is only one way to 'magically' make existing code both sync- & async- friendly--greenlets, but I think there is no chance for them (or stackless) to land in cpython in the foreseeable future (although it would be awesome.) BTW, why didn't you use greenlets in shrapnel and ended up with your own implementation?
We also need to work on the stdlib, so that it is easy to inject a custom socket in any object. Ideally, by passing it in the constructor (as everybody hates monkey-patching.)
I second this one. Having a way to [optionally] pass in a factory for sockets would help with portability, and would cut down on the temptation to monkey-patch.
Great. Let's see - if nobody is opposed to this we can start with submitting patches :) Or is there a need for a separate small PEP? Thanks, Yury
On 10/23/12 3:05 PM, Yury Selivanov wrote:
Sam,
BTW, kudos for shrapnel! Thanks!
For basic library functions that will work. And that's already a huge win. But developing a complicated library will become twice as hard, as you'll need to maintain two versions of API - sync & async all the way through the code.
This is really difficult, if you want to see a great example of trying to make all parties happy, look at Pika (an AMQP implementation). Actually this reminds me, it would be really great if there was a standardized with_timeout()API. It's better than adding timeout args to all the functions. I'm sure that systems like Twisted & gevent could also implement it (if they don't already have it): In shrapnel it is simply: coro.with_timeout (<seconds>, <fun>, *args, **kwargs) Timeouts are caught thus: try: coro.with_timeout (...) except coro.TimeoutError: ...
There is only one way to 'magically' make existing code both sync- & async- friendly--greenlets, but I think there is no chance for them (or stackless) to land in cpython in the foreseeable future (although it would be awesome.)
BTW, why didn't you use greenlets in shrapnel and ended up with your own implementation? I think shrapnel predates greenlets... some of the core asm code for greenlets may have come from one of shrapnel's precursors at ironport... Unfortunately it took many years to get shrapnel open-sourced - I remember talking with Guido about it over lunch in ~2006.
-Sam
On 2012-10-23, at 7:00 PM, Sam Rushing <sam-pydeas@rushing.nightmare.com> wrote:
On 10/23/12 3:05 PM, Yury Selivanov wrote:
Sam,
BTW, kudos for shrapnel! Thanks!
For basic library functions that will work. And that's already a huge win. But developing a complicated library will become twice as hard, as you'll need to maintain two versions of API - sync & async all the way through the code.
This is really difficult, if you want to see a great example of trying to make all parties happy, look at Pika (an AMQP implementation).
Thanks, will take a look!
Actually this reminds me, it would be really great if there was a standardized with_timeout()API. It's better than adding timeout args to all the functions. I'm sure that systems like Twisted & gevent could also implement it (if they don't already have it):
In shrapnel it is simply:
coro.with_timeout (<seconds>, <fun>, *args, **kwargs)
Timeouts are caught thus:
try: coro.with_timeout (...) except coro.TimeoutError: ...
You're right--if we want to ship some "standard" async API in python, API for timeouts is a must. We will at least need to handle timeouts in async code in the stdlib, won't we... A question: How do you protect finally statements in shrapnel? If we have a following coroutine (greenlet style): def foo(): connection = open_connection() try: spam() finally: [some code] connection.close() What happens if you run 'foo.with_timeout(1)' and timeout occurs at "[some code]" point? Will you just abort 'foo', possibly preventing 'connection' from being closed? - Yury
Yury Selivanov wrote:
def foo(): connection = open_connection() try: spam() finally: [some code] connection.close()
What happens if you run 'foo.with_timeout(1)' and timeout occurs at "[some code]" point?
I would say that vital cleanup code probably shouldn't do anything that could block. If you really need to do that, it should be protected by a finally clause of its own: def foo(): connection = open_connection() try: spam() finally: try: [some code] finally: connection.close() -- Greg
Hi Greg, On 2012-10-23, at 8:24 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Yury Selivanov wrote:
def foo(): connection = open_connection() try: spam() finally: [some code] connection.close() What happens if you run 'foo.with_timeout(1)' and timeout occurs at "[some code]" point?
I would say that vital cleanup code probably shouldn't do anything that could block. If you really need to do that, it should be protected by a finally clause of its own:
def foo(): connection = open_connection() try: spam() finally: try: [some code] finally: connection.close()
Please take a look at the problem definition in PEP 419. It's not about try..finally nesting, it's about Scheduler being aware that a coroutine is in its 'finally' block and thus shouldn't be interrupted at the moment (a problem that doesn't exist in a non-coroutine world). Speaking about your solution, imagine if you have three connections to close, what will you write? finally: try: c1.close() # coroutine call finally: try: c2.close() # coroutine call finally: c3.close() # coroutine call But if you somehow make scheduler aware of 'finally' block, through PEP 419 (which I don't like), or like in my framework where we inline special code in finally statement by modifying coroutine opcodes (which I don't like too), you can simply write:: finally: c1.close() c2.close() c3.close() And scheduler will gladly wait until finally is over. And the code snippet above is something, that is familiar to every user of python--nobody expects code in the finally section to be interrupted from the *outside* world. If we fail to guarantee 'finally' block safety, then coroutine-style programming is going to be much tougher. Or we have to abandon timeouts and coroutines interruptions. So eventually, we'll need to figure out the best mechanism/approach for this. Now, I don't think it's the right moment to shift discussion into this particular problem, but I would rather like to bring up the point, that implementing 'yield'-style coroutines is a very hard thing, and I'm not sure that we should implement them in 3.4. Setting guidelines and standard protocols, adding socket-factories support where necessary in the stdlib is a better approach (in my humble opinion.) - Yury
Yury Selivanov wrote:
It's not about try..finally nesting, it's about Scheduler being aware that a coroutine is in its 'finally' block and thus shouldn't be interrupted at the moment
It would be a bad idea to make a close() method, or anything else that might be needed for cleanup purposes, be a 'yield from' call. If it's an ordinary function, it can't be interrupted in the world we're talking about, so the PEP 419 problem doesn't apply. If I were feeling in a radical mood, I might go as far as suggesting that 'yield' and 'yield from' be syntactically forbidden inside a finally clause. That would force you to design your cleanup code to be safe from interruptions. -- Greg
On Wed, Oct 24, 2012 at 2:30 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Yury Selivanov wrote:
It's not about try..finally nesting, it's about Scheduler being aware that a coroutine is in its 'finally' block and thus shouldn't be interrupted at the moment
It would be a bad idea to make a close() method, or anything else that might be needed for cleanup purposes, be a 'yield from' call. If it's an ordinary function, it can't be interrupted in the world we're talking about, so the PEP 419 problem doesn't apply.
If I were feeling in a radical mood, I might go as far as suggesting that 'yield' and 'yield from' be syntactically forbidden inside a finally clause. That would force you to design your cleanup code to be safe from interruptions.
What's the problem with just letting the cleanup take as long as it wants to and do whatever it wants? That's how try/finally works in regular Python code. -- --Guido van Rossum (python.org/~guido)
Hi Guido, On 2012-10-24, at 6:43 PM, Guido van Rossum <guido@python.org> wrote:
On Wed, Oct 24, 2012 at 2:30 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Yury Selivanov wrote:
It's not about try..finally nesting, it's about Scheduler being aware that a coroutine is in its 'finally' block and thus shouldn't be interrupted at the moment
It would be a bad idea to make a close() method, or anything else that might be needed for cleanup purposes, be a 'yield from' call. If it's an ordinary function, it can't be interrupted in the world we're talking about, so the PEP 419 problem doesn't apply.
If I were feeling in a radical mood, I might go as far as suggesting that 'yield' and 'yield from' be syntactically forbidden inside a finally clause. That would force you to design your cleanup code to be safe from interruptions.
What's the problem with just letting the cleanup take as long as it wants to and do whatever it wants? That's how try/finally works in regular Python code.
The problem appears when you add timeouts support. Let me show you an abstract example (I won't use yield_froms, but I'm sure that the problem is the same with them): @coroutine def fetch_comments(app): session = yield app.new_session() try: return (yield session.query(...)) finally: yield session.close() and now we execute that with: #: Get a list of comments; throw a TimeoutError if it #: takes more than 1 second comments = yield fetch_comments(app).with_timeout(1.0) Now, scheduler starts with 'fetch_comments', then executes 'new_session', then executes 'session.query' in a round-robin fashion. Imagine, that database query took a bit less than a second to execute, scheduler pushes the result in coroutine, and then a timeout event occurs. So scheduler throws a 'TimeoutError' in the coroutine, thus preventing the 'session.close' to be executed. There is no way for a scheduler to understand, that there is no need in pushing the exception right now, as the coroutine is in its finally block. And this situation is a pretty common when you have such timeouts mechanism in place and widely used. - Yury
On Wed, Oct 24, 2012 at 4:03 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
Hi Guido,
On 2012-10-24, at 6:43 PM, Guido van Rossum <guido@python.org> wrote:
What's the problem with just letting the cleanup take as long as it wants to and do whatever it wants? That's how try/finally works in regular Python code.
The problem appears when you add timeouts support.
Let me show you an abstract example (I won't use yield_froms, but I'm sure that the problem is the same with them):
@coroutine def fetch_comments(app): session = yield app.new_session() try: return (yield session.query(...)) finally: yield session.close()
and now we execute that with:
#: Get a list of comments; throw a TimeoutError if it #: takes more than 1 second comments = yield fetch_comments(app).with_timeout(1.0)
Now, scheduler starts with 'fetch_comments', then executes 'new_session', then executes 'session.query' in a round-robin fashion.
Imagine, that database query took a bit less than a second to execute, scheduler pushes the result in coroutine, and then a timeout event occurs. So scheduler throws a 'TimeoutError' in the coroutine, thus preventing the 'session.close' to be executed. There is no way for a scheduler to understand, that there is no need in pushing the exception right now, as the coroutine is in its finally block.
And this situation is a pretty common when you have such timeouts mechanism in place and widely used.
Ok, I can understand. But still, this is a problem with timeouts in general, not just with timeouts in a yield-based environment. How does e.g. Twisted deal with this? As a work-around, I could imagine some kind of with-statement that tells the scheduler we're already in the finally clause (it could still send you a timeout if your cleanup takes way too long): try: yield <regular code> finally: with protect_finally(): yield <cleanup code> Of course this could be abused, but at your own risk -- the scheduler only gives you a fixed amount of extra time and then it's quits. -- --Guido van Rossum (python.org/~guido)
On Wed, Oct 24, 2012 at 4:03 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
Hi Guido,
On 2012-10-24, at 6:43 PM, Guido van Rossum <guido@python.org> wrote:
What's the problem with just letting the cleanup take as long as it wants to and do whatever it wants? That's how try/finally works in regular Python code.
The problem appears when you add timeouts support.
Let me show you an abstract example (I won't use yield_froms, but I'm sure that the problem is the same with them):
@coroutine def fetch_comments(app): session = yield app.new_session() try: return (yield session.query(...)) finally: yield session.close()
and now we execute that with:
#: Get a list of comments; throw a TimeoutError if it #: takes more than 1 second comments = yield fetch_comments(app).with_timeout(1.0)
Now, scheduler starts with 'fetch_comments', then executes 'new_session', then executes 'session.query' in a round-robin fashion.
Imagine, that database query took a bit less than a second to execute, scheduler pushes the result in coroutine, and then a timeout event occurs. So scheduler throws a 'TimeoutError' in the coroutine, thus preventing the 'session.close' to be executed. There is no way for a scheduler to understand, that there is no need in pushing the exception right now, as the coroutine is in its finally block.
And this situation is a pretty common when you have such timeouts mechanism in place and widely used.
Ok, I can understand. But still, this is a problem with timeouts in general, not just with timeouts in a yield-based environment. How does e.g. Twisted deal with this?
As a work-around, I could imagine some kind of with-statement that tells the scheduler we're already in the finally clause (it could still send you a timeout if your cleanup takes way too long):
try: yield <regular code> finally: with protect_finally(): yield <cleanup code>
Of course this could be abused, but at your own risk -- the scheduler only gives you a fixed amount of extra time and then it's quits.
Could another workaround be to spawn the cleanup code without yielding - in effect saying "go and do this, but don't come back"? Then there is nowhere for the scheduler to throw the exception. I ask because this falls out naturally with my implementation (code is coming, but work is taking priority right now): "do_cleanup()" instead of "yield do_cleanup()". I haven't tried it in this context yet, so no idea whether it works, but I don't see why it wouldn't. In a system without the @async decorator you'd need a "scheduler.current.spawn(do_cleanup)" instead of yield [from]s, but it can still be done. Cheers, Steve
On 2012-10-24, at 7:25 PM, Steve Dower <Steve.Dower@microsoft.com> wrote: [snip]
Could another workaround be to spawn the cleanup code without yielding - in effect saying "go and do this, but don't come back"? Then there is nowhere for the scheduler to throw the exception.
I ask because this falls out naturally with my implementation (code is coming, but work is taking priority right now): "do_cleanup()" instead of "yield do_cleanup()". I haven't tried it in this context yet, so no idea whether it works, but I don't see why it wouldn't. In a system without the @async decorator you'd need a "scheduler.current.spawn(do_cleanup)" instead of yield [from]s, but it can still be done.
Well, yes, this will work. If we have the following: # "async()" is a way to launch coroutines in my framework without # "coming back"; with it they just return a promise/future that needs # to be yielded again finally: yield c.close().async() The solution is very limited though. Imagine if you have lots of cleanup code finally: yield c1.close().async() # go and do this, but don't come back yield c2.close().async() The above won't work, as scheduler would have an opportunity to break everything on the second 'yield'. You may solve it by grouping cleanup code in a separate inner coroutine, like: @coroutine def do_stuff(): try: ... finally: @coroutine def cleanup(): yield c1.close() yield c2.close() yield cleanup().async() # go and do this, but don't come back But that looks even worse than using 'with protect_finally()'. - Yury
On 2012-10-24, at 7:12 PM, Guido van Rossum <guido@python.org> wrote:
On Wed, Oct 24, 2012 at 4:03 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
Hi Guido,
On 2012-10-24, at 6:43 PM, Guido van Rossum <guido@python.org> wrote:
What's the problem with just letting the cleanup take as long as it wants to and do whatever it wants? That's how try/finally works in regular Python code.
The problem appears when you add timeouts support.
Let me show you an abstract example (I won't use yield_froms, but I'm sure that the problem is the same with them):
@coroutine def fetch_comments(app): session = yield app.new_session() try: return (yield session.query(...)) finally: yield session.close()
and now we execute that with:
#: Get a list of comments; throw a TimeoutError if it #: takes more than 1 second comments = yield fetch_comments(app).with_timeout(1.0)
Now, scheduler starts with 'fetch_comments', then executes 'new_session', then executes 'session.query' in a round-robin fashion.
Imagine, that database query took a bit less than a second to execute, scheduler pushes the result in coroutine, and then a timeout event occurs. So scheduler throws a 'TimeoutError' in the coroutine, thus preventing the 'session.close' to be executed. There is no way for a scheduler to understand, that there is no need in pushing the exception right now, as the coroutine is in its finally block.
And this situation is a pretty common when you have such timeouts mechanism in place and widely used.
Ok, I can understand. But still, this is a problem with timeouts in general, not just with timeouts in a yield-based environment. How does e.g. Twisted deal with this?
I don't know, I hope someone with an expertise in Twisted can tell us. But I would imagine that they don't have this particular problem, as it should be related only to coroutines and schedulers that run them. I.e. it's a problem when you run some code and may interrupt it. And you can't interrupt a plain python code that uses callbacks without yields and greenlets.
As a work-around, I could imagine some kind of with-statement that tells the scheduler we're already in the finally clause (it could still send you a timeout if your cleanup takes way too long):
try: yield <regular code> finally: with protect_finally(): yield <cleanup code>
Of course this could be abused, but at your own risk -- the scheduler only gives you a fixed amount of extra time and then it's quits.
Right, that's the basic approach. But it also gives you a feeling of a "broken" language feature. I.e. we have coroutines, but we can not implement timeouts on top of them without making 'finally' blocks look ugly. And if we assume that you can run any coroutine with a timeout - you'll need to use 'protect_finally' in virtually every 'finally' statement. I solved the problem by dynamically inlining 'with protect_finally()' code in @coroutine decorator (something that I would never suggest to put in the stdlib, btw). There is also PEP 419, but I don't like it as well, as it is tied to frames--two low level (and I'm not sure how it will work with future CPython optimizations and PyPy's JIT.) BUT, the concept is nice. I've implemented a number of protocols with yield-coroutines, and managing timeouts with a simple ".with_timeout()" call is a very handy and readable feature. So, I hope, that we can all brainstorm this problem to make coroutines "complete", if we decide to start using them widely. - Yury
On Wed, Oct 24, 2012 at 4:26 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
On 2012-10-24, at 7:12 PM, Guido van Rossum <guido@python.org> wrote:
Ok, I can understand. But still, this is a problem with timeouts in general, not just with timeouts in a yield-based environment. How does e.g. Twisted deal with this?
I don't know, I hope someone with an expertise in Twisted can tell us.
But I would imagine that they don't have this particular problem, as it should be related only to coroutines and schedulers that run them. I.e. it's a problem when you run some code and may interrupt it. And you can't interrupt a plain python code that uses callbacks without yields and greenlets.
Well, but in the Twisted world, if a cleanup callback requires more blocking calls, it has to spawn more deferred callbacks. So I think they *do* have the problem, unless they don't have a way at all to constrain the total running time of an action involving cascading callbacks. Also, they have inlineCallbacks which does use yield.
As a work-around, I could imagine some kind of with-statement that tells the scheduler we're already in the finally clause (it could still send you a timeout if your cleanup takes way too long):
try: yield <regular code> finally: with protect_finally(): yield <cleanup code>
Of course this could be abused, but at your own risk -- the scheduler only gives you a fixed amount of extra time and then it's quits.
Right, that's the basic approach. But it also gives you a feeling of a "broken" language feature. I.e. we have coroutines, but we can not implement timeouts on top of them without making 'finally' blocks look ugly. And if we assume that you can run any coroutine with a timeout - you'll need to use 'protect_finally' in virtually every 'finally' statement.
I think the problem may be with timeouts, or with doing blocking I/O in cleanup clauses. I suspect that any system implementing timeouts has subtle bugs.
I solved the problem by dynamically inlining 'with protect_finally()' code in @coroutine decorator (something that I would never suggest to put in the stdlib, btw). There is also PEP 419, but I don't like it as well, as it is tied to frames--two low level (and I'm not sure how it will work with future CPython optimizations and PyPy's JIT.)
BUT, the concept is nice. I've implemented a number of protocols with yield-coroutines, and managing timeouts with a simple ".with_timeout()" call is a very handy and readable feature. So, I hope, that we can all brainstorm this problem to make coroutines "complete", if we decide to start using them widely.
I think the with-clause is the solution. Note that in a world with only blocking calls this *can* be a problem (despite your repeated claims that it's not a problem there) -- a common approach to giving operations a timeout is sending it a SIGTERM (which you can easily call with a signal handler in Python) when the deadline is over, then sending it more SIGTERM signals every few seconds until it dies, and sending SIGKILL (which can't be caught) if it takes too long to die. -- --Guido van Rossum (python.org/~guido)
On 2012-10-24, at 7:43 PM, Guido van Rossum <guido@python.org> wrote:
On Wed, Oct 24, 2012 at 4:26 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
On 2012-10-24, at 7:12 PM, Guido van Rossum <guido@python.org> wrote:
Ok, I can understand. But still, this is a problem with timeouts in general, not just with timeouts in a yield-based environment. How does e.g. Twisted deal with this?
I don't know, I hope someone with an expertise in Twisted can tell us.
But I would imagine that they don't have this particular problem, as it should be related only to coroutines and schedulers that run them. I.e. it's a problem when you run some code and may interrupt it. And you can't interrupt a plain python code that uses callbacks without yields and greenlets.
Well, but in the Twisted world, if a cleanup callback requires more blocking calls, it has to spawn more deferred callbacks. So I think they *do* have the problem, unless they don't have a way at all to constrain the total running time of an action involving cascading callbacks. Also, they have inlineCallbacks which does use yield.
Right. I was under impression that you don't just use 'finally' stmt but rather setup a Deferred with a cleanup callback. Anyways, I'm now curious enough so I'll take a look...
Note that in a world with only blocking calls this *can* be a problem (despite your repeated claims that it's not a problem there) -- a common approach to giving operations a timeout is sending it a SIGTERM (which you can easily call with a signal handler in Python) when the deadline is over, then sending it more SIGTERM signals every few seconds until it dies, and sending SIGKILL (which can't be caught) if it takes too long to die.
Yes, you're right. I guess I've just never seen anybody trying to protect their 'finally' statements from being interrupted with a signal. Whereas with coroutines we needed to protect lots of them, as otherwise we had many and many bugs with unclosed database connections etc. So 'protect_finally' is going to be a very common thing to use. - Yury
On 2012-10-24, at 8:00 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
On 2012-10-24, at 7:43 PM, Guido van Rossum <guido@python.org> wrote:
On Wed, Oct 24, 2012 at 4:26 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
On 2012-10-24, at 7:12 PM, Guido van Rossum <guido@python.org> wrote:
Ok, I can understand. But still, this is a problem with timeouts in general, not just with timeouts in a yield-based environment. How does e.g. Twisted deal with this?
I don't know, I hope someone with an expertise in Twisted can tell us.
But I would imagine that they don't have this particular problem, as it should be related only to coroutines and schedulers that run them. I.e. it's a problem when you run some code and may interrupt it. And you can't interrupt a plain python code that uses callbacks without yields and greenlets.
Well, but in the Twisted world, if a cleanup callback requires more blocking calls, it has to spawn more deferred callbacks. So I think they *do* have the problem, unless they don't have a way at all to constrain the total running time of an action involving cascading callbacks. Also, they have inlineCallbacks which does use yield.
Right.
I was under impression that you don't just use 'finally' stmt but rather setup a Deferred with a cleanup callback. Anyways, I'm now curious enough so I'll take a look...
Well, that wasn't too hard to find: Timeouts: http://stackoverflow.com/questions/221745/is-it-possible-to-set-a-timeout-on... - Yury
On 2012-10-24, at 8:00 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
On 2012-10-24, at 7:43 PM, Guido van Rossum <guido@python.org> wrote:
On Wed, Oct 24, 2012 at 4:26 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
On 2012-10-24, at 7:12 PM, Guido van Rossum <guido@python.org> wrote:
Ok, I can understand. But still, this is a problem with timeouts in general, not just with timeouts in a yield-based environment. How does e.g. Twisted deal with this?
I don't know, I hope someone with an expertise in Twisted can tell us.
But I would imagine that they don't have this particular problem, as it should be related only to coroutines and schedulers that run them. I.e. it's a problem when you run some code and may interrupt it. And you can't interrupt a plain python code that uses callbacks without yields and greenlets.
Well, but in the Twisted world, if a cleanup callback requires more blocking calls, it has to spawn more deferred callbacks. So I think they *do* have the problem, unless they don't have a way at all to constrain the total running time of an action involving cascading callbacks. Also, they have inlineCallbacks which does use yield.
Right.
I was under impression that you don't just use 'finally' stmt but rather setup a Deferred with a cleanup callback. Anyways, I'm now curious enough so I'll take a look...
Well, that wasn't too hard to find:
Timeouts: http://stackoverflow.com/questions/221745/is-it-possible-to-set-a-timeout-on...
Maybe our approach to timeouts should be based on running two tasks in parallel, where the second delays for the timeout period and then cancels the first (I believe this is what they're doing in Twisted). My vision for cancellation involves the worker task polling (or whatever is appropriate for low-level tasks), rather than an exception being forced in by the scheduler, so this avoids the finally issue - it's too late to cancel the task at that point. It also strengthens the case for including a cancellation protocol, which I was keen on anyway. Cheers, Steve
On 25/10/12 12:43, Guido van Rossum wrote:
Note that in a world with only blocking calls this *can* be a problem... a common approach to giving operations a timeout is sending it a SIGTERM
Well, yes, if you have preemptive interruptions of some kind, then things are a lot trickier. But I'm assuming we're using cooperative scheduling *instead* of things like that. (Note that in the face of preemption, I don't think it's possible to solve this problem completely without language support, because there will always be a small window of opportunity between entering the finally clause and getting into the with-statement or whatever that you're using to block asynchronous signals.) -- Greg
On 2012-10-24, at 9:34 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote: [...]
(Note that in the face of preemption, I don't think it's possible to solve this problem completely without language support, because there will always be a small window of opportunity between entering the finally clause and getting into the with-statement or whatever that you're using to block asynchronous signals.)
Agree. In my experience, though, broken finally blocks due to interruption by a signal is a very rare thing (again, that maybe different for someone else.) - Yury
On Wed, Oct 24, 2012 at 7:28 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
On 2012-10-24, at 9:34 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote: [...]
(Note that in the face of preemption, I don't think it's possible to solve this problem completely without language support, because there will always be a small window of opportunity between entering the finally clause and getting into the with-statement or whatever that you're using to block asynchronous signals.)
Agree.
In my experience, though, broken finally blocks due to interruption by a signal is a very rare thing (again, that maybe different for someone else.)
We're far from our starting point: in a the yield-from (or yield) world, there are no truly async interrupts, but anything that yields may be interrupted, if we decide to implement timeouts by throwing an exception into the generator (which seems the logical thing to do). The with-statement can deal with this fine (there's no yield between entering the finally and entering the with-block) but making the cleanup into its own task (like Steve proposed) sounds fine too. In any case this sounds like something that each framework should decide for itself. -- --Guido van Rossum (python.org/~guido)
This could alse be another application for extension options on futures: try: ... finally: yield do_cleanup_1().set_options(never_raise=True) yield do_cleanup_2().set_options(never_raise=True) The scheduler can then ignore exceptions (including CancelledError) instead of raising them. ('set_scheduler_hint' may be a better name than 'set_options', now I come to think of it. I like the extensibility of this, since I don't think anyone can predict what advanced options every scheduler may want - the function takes **params and updates a (lazily created) dict on the future.) Of course, this will also work (and is pretty much equivalent): try: ... finally: try: yield do_cleanup_1() except: pass try: yield do_cleanup_2() except: pass We'll probably need/want some form of 'atomic' primitive anyway, which might work like this: yield atomically(do_cleanup_1, do_cleanup_2, ...) Though the behaviour of this when exceptions are involved gets complicated - do we abort all of them? Pass the exception on? Continue anyway? Which exception gets reported? Cheers, Steve ________________________________________ From: Python-ideas [python-ideas-bounces+steve.dower=microsoft.com@python.org] on behalf of Guido van Rossum [guido@python.org] Sent: Wednesday, October 24, 2012 7:51 PM To: Yury Selivanov Cc: python-ideas@python.org Subject: Re: [Python-ideas] Async API On Wed, Oct 24, 2012 at 7:28 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
On 2012-10-24, at 9:34 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote: [...]
(Note that in the face of preemption, I don't think it's possible to solve this problem completely without language support, because there will always be a small window of opportunity between entering the finally clause and getting into the with-statement or whatever that you're using to block asynchronous signals.)
Agree.
In my experience, though, broken finally blocks due to interruption by a signal is a very rare thing (again, that maybe different for someone else.)
We're far from our starting point: in a the yield-from (or yield) world, there are no truly async interrupts, but anything that yields may be interrupted, if we decide to implement timeouts by throwing an exception into the generator (which seems the logical thing to do). The with-statement can deal with this fine (there's no yield between entering the finally and entering the with-block) but making the cleanup into its own task (like Steve proposed) sounds fine too. In any case this sounds like something that each framework should decide for itself. -- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
On 2012-10-24, at 10:51 PM, Guido van Rossum <guido@python.org> wrote:
On Wed, Oct 24, 2012 at 7:28 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
On 2012-10-24, at 9:34 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote: [...]
(Note that in the face of preemption, I don't think it's possible to solve this problem completely without language support, because there will always be a small window of opportunity between entering the finally clause and getting into the with-statement or whatever that you're using to block asynchronous signals.)
Agree.
In my experience, though, broken finally blocks due to interruption by a signal is a very rare thing (again, that maybe different for someone else.)
We're far from our starting point: in a the yield-from (or yield) world, there are no truly async interrupts, but anything that yields may be interrupted, if we decide to implement timeouts by throwing an exception into the generator (which seems the logical thing to do). The with-statement can deal with this fine (there's no yield between entering the finally and entering the with-block) but making the cleanup into its own task (like Steve proposed) sounds fine too.
In any case this sounds like something that each framework should decide for itself.
BTW, is there a way of adding a read-only property to generator objects - 'in_finally'? Will it actually slow down things? - Yury
On 2012-10-25, at 12:37 AM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
On 2012-10-24, at 10:51 PM, Guido van Rossum <guido@python.org> wrote:
On Wed, Oct 24, 2012 at 7:28 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
On 2012-10-24, at 9:34 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote: [...]
(Note that in the face of preemption, I don't think it's possible to solve this problem completely without language support, because there will always be a small window of opportunity between entering the finally clause and getting into the with-statement or whatever that you're using to block asynchronous signals.)
Agree.
In my experience, though, broken finally blocks due to interruption by a signal is a very rare thing (again, that maybe different for someone else.)
We're far from our starting point: in a the yield-from (or yield) world, there are no truly async interrupts, but anything that yields may be interrupted, if we decide to implement timeouts by throwing an exception into the generator (which seems the logical thing to do). The with-statement can deal with this fine (there's no yield between entering the finally and entering the with-block) but making the cleanup into its own task (like Steve proposed) sounds fine too.
In any case this sounds like something that each framework should decide for itself.
BTW, is there a way of adding a read-only property to generator objects - 'in_finally'? Will it actually slow down things?
Well, I couldn't resist and just implemented a *proof of concept* myself. The patch is here: https://dl.dropbox.com/u/21052/gen_in_finally.patch The patch adds 'gi_in_finally' read-only property to generator objects. There is no observable difference between patched & unpatched python (latest master) in pybench. Some small demo:
def spam(): ... try: ... yield 1 ... finally: ... yield 2 ... yield 3 gen = spam() gen.gi_in_finally, gen.send(None), gen.gi_in_finally (0, 1, 0) gen.gi_in_finally, gen.send(None), gen.gi_in_finally (0, 2, 1) gen.gi_in_finally, gen.send(None), gen.gi_in_finally (1, 3, 0) gen.gi_in_finally, gen.send(None), gen.gi_in_finally Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration
If we decide to merge this in cpython, then this whole problem with 'finally' statements can be solved (at least for generator-based coroutines.) What do you think? - Yury
Hi Yury, On Thu, Oct 25, 2012 at 9:18 AM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
Well, I couldn't resist and just implemented a *proof of concept* myself. The patch is here: https://dl.dropbox.com/u/21052/gen_in_finally.patch
The patch adds 'gi_in_finally' read-only property to generator objects.
Why haven't you used my implementation? http://bugs.python.org/issue14730 -- Paul
]On 2012-10-25, at 3:49 AM, Paul Colomiets <paul@colomiets.name> wrote:
Hi Yury,
On Thu, Oct 25, 2012 at 9:18 AM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
Well, I couldn't resist and just implemented a *proof of concept* myself. The patch is here: https://dl.dropbox.com/u/21052/gen_in_finally.patch
The patch adds 'gi_in_finally' read-only property to generator objects.
Why haven't you used my implementation?
Because it's a different thing. Yours is a PEP 419 implementation -- 'sys.setcleanuphook'. Mine is a quick hack to add 'gi_in_finally' property to generators and see how good/bad it is. - Yury
On Thu, Oct 25, 2012 at 6:37 AM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
]On 2012-10-25, at 3:49 AM, Paul Colomiets <paul@colomiets.name> wrote:
Hi Yury,
On Thu, Oct 25, 2012 at 9:18 AM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
Well, I couldn't resist and just implemented a *proof of concept* myself. The patch is here: https://dl.dropbox.com/u/21052/gen_in_finally.patch
The patch adds 'gi_in_finally' read-only property to generator objects.
Why haven't you used my implementation?
Because it's a different thing. Yours is a PEP 419 implementation -- 'sys.setcleanuphook'. Mine is a quick hack to add 'gi_in_finally' property to generators and see how good/bad it is.
I feel it's a code smell if you need to use this feature a lot. If you need it rarely, well, use one of the existing work-arounds. -- --Guido van Rossum (python.org/~guido)
On 2012-10-25, at 10:44 AM, Guido van Rossum <guido@python.org> wrote:
On Thu, Oct 25, 2012 at 6:37 AM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
]On 2012-10-25, at 3:49 AM, Paul Colomiets <paul@colomiets.name> wrote:
Hi Yury,
On Thu, Oct 25, 2012 at 9:18 AM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
Well, I couldn't resist and just implemented a *proof of concept* myself. The patch is here: https://dl.dropbox.com/u/21052/gen_in_finally.patch
The patch adds 'gi_in_finally' read-only property to generator objects.
Why haven't you used my implementation?
Because it's a different thing. Yours is a PEP 419 implementation -- 'sys.setcleanuphook'. Mine is a quick hack to add 'gi_in_finally' property to generators and see how good/bad it is.
I feel it's a code smell if you need to use this feature a lot. If you need it rarely, well, use one of the existing work-arounds.
But the feature isn't going to be used by users directly. It will be used only in scheduler implementations. Users will just write 'finally' blocks and they will work as expected. This just makes coroutines look and behave more like ordinary functions. Isn't it one of our goals--to make it convenient and reliable? - Yury
Mine is a quick hack to add 'gi_in_finally' property to generators and see how good/bad it is.
I feel it's a code smell if you need to use this feature a lot. If you need it rarely, well, use one of the existing work-arounds. But the feature isn't going to be used by users directly. It will be used only in scheduler implementations. Users will just write 'finally' blocks and they will work as expected. This just makes coroutines look and behave more like ordinary functions. Isn't it one of our goals--to make it convenient and reliable?
I'm agree with the intent, but I'm more worried about the broadness of this approach. What happens in this case? try: try: yield some_op() finally: yield cleanup_that_raises_network_error() except NetworkError: # will we ever see this? Basically, I don't think we can handle the "don't raise" cases entirely automatically, though I'd like to be able to. Cheers, Steve
On 2012-10-25, at 11:28 AM, Steve Dower <Steve.Dower@microsoft.com> wrote:
Mine is a quick hack to add 'gi_in_finally' property to generators and see how good/bad it is.
I feel it's a code smell if you need to use this feature a lot. If you need it rarely, well, use one of the existing work-arounds. But the feature isn't going to be used by users directly. It will be used only in scheduler implementations. Users will just write 'finally' blocks and they will work as expected. This just makes coroutines look and behave more like ordinary functions. Isn't it one of our goals--to make it convenient and reliable?
I'm agree with the intent, but I'm more worried about the broadness of this approach. What happens in this case?
try: try: yield some_op() finally: yield cleanup_that_raises_network_error() except NetworkError: # will we ever see this?
Basically, I don't think we can handle the "don't raise" cases entirely automatically, though I'd like to be able to.
We can. You can experiment with the approach--I've implemented it a bit differently and it proved to work. Now we're just talking about making this feature supported on the interpreter level. As for your example - I'm not sure what's the NetworkError is and how it relates to TimeoutError... But if you have something like this: try: try: yield some_op().with_timeout(0.1) finally: yield something_else() except TimeoutError: # Then everything would be just fine here. Look, it all the same as if you just drop yields. Generators already support 'finally' clause perfectly. - Yury
Mine is a quick hack to add 'gi_in_finally' property to generators and see how good/bad it is.
I feel it's a code smell if you need to use this feature a lot. If you need it rarely, well, use one of the existing work-arounds. But the feature isn't going to be used by users directly. It will be used only in scheduler implementations. Users will just write 'finally' blocks and they will work as expected. This just makes coroutines look and behave more like ordinary functions. Isn't it one of our goals--to make it convenient and reliable?
I'm agree with the intent, but I'm more worried about the broadness of this approach. What happens in this case?
try: try: yield some_op() finally: yield cleanup_that_raises_network_error() except NetworkError: # will we ever see this?
Basically, I don't think we can handle the "don't raise" cases entirely automatically, though I'd like to be able to.
We can. You can experiment with the approach--I've implemented it a bit differently and it proved to work. Now we're just talking about making this feature supported on the interpreter level.
As for your example - I'm not sure what's the NetworkError is and how it relates to TimeoutError...
But if you have something like this:
try: try: yield some_op().with_timeout(0.1) finally: yield something_else() except TimeoutError: # Then everything would be just fine here.
Look, it all the same as if you just drop yields. Generators already support 'finally' clause perfectly.
The type of the error is irrelevant - if something_else() might raise an exception that is expected, it won't be passed in because the scheduler is suppressing exceptions inside finally blocks. Or perhaps I've misunderstood the point of gi_in_finally? Cheers, Steve
On 2012-10-25, at 11:43 AM, Steve Dower <Steve.Dower@microsoft.com> wrote:
Mine is a quick hack to add 'gi_in_finally' property to generators and see how good/bad it is.
I feel it's a code smell if you need to use this feature a lot. If you need it rarely, well, use one of the existing work-arounds. But the feature isn't going to be used by users directly. It will be used only in scheduler implementations. Users will just write 'finally' blocks and they will work as expected. This just makes coroutines look and behave more like ordinary functions. Isn't it one of our goals--to make it convenient and reliable?
I'm agree with the intent, but I'm more worried about the broadness of this approach. What happens in this case?
try: try: yield some_op() finally: yield cleanup_that_raises_network_error() except NetworkError: # will we ever see this?
Basically, I don't think we can handle the "don't raise" cases entirely automatically, though I'd like to be able to.
We can. You can experiment with the approach--I've implemented it a bit differently and it proved to work. Now we're just talking about making this feature supported on the interpreter level.
As for your example - I'm not sure what's the NetworkError is and how it relates to TimeoutError...
But if you have something like this:
try: try: yield some_op().with_timeout(0.1) finally: yield something_else() except TimeoutError: # Then everything would be just fine here.
Look, it all the same as if you just drop yields. Generators already support 'finally' clause perfectly.
The type of the error is irrelevant - if something_else() might raise an exception that is expected, it won't be passed in because the scheduler is suppressing exceptions inside finally blocks. Or perhaps I've misunderstood the point of gi_in_finally?
The only thing scheduler will ever suppress--is its *own* intent to *interrupt* something (until `gi_in_finally` gets back to 0.) Every other exception must be propagated as usual, without even checking `gi_in_finally` flag. - Yury
Yuri, please give up this particular issue (trying to patch CPython to record whether a generator is in a finally clause). I have failed to explain my reasons why I think it is a bad idea, but you haven't convinced me it's a good idea, and we have at least two decent work-arounds. So let me just use the release cycle as an argument: your patch is a new feature, 3.3 just came out, so it cannot be introduced until 3.4. I don't want to wait for that. -- --Guido van Rossum (python.org/~guido)
On 2012-10-25, at 11:58 AM, Guido van Rossum <guido@python.org> wrote:
Yuri, please give up this particular issue (trying to patch CPython to record whether a generator is in a finally clause). I have failed to explain my reasons why I think it is a bad idea, but you haven't convinced me it's a good idea, and we have at least two decent work-arounds. So let me just use the release cycle as an argument: your patch is a new feature, 3.3 just came out, so it cannot be introduced until 3.4. I don't want to wait for that.
OK, NP. One question: what do we actually want to get? What're the goals? - A specification (PEP?) of how to make stdlib more async-friendly? - To develop a separate library that may be included in the stdlib one day? - And what's your opinion on writing a PEP about making it possible to pass a custom socket-factory to stdlib objects? I'm (and I think it's not just me) a bit lost here, after reading 100s of emails on python-ideas. And I just want to know where to channel my energy and expertise ;) - Yury
One question: what do we actually want to get? What're the goals?
- A specification (PEP?) of how to make stdlib more async-friendly?
- To develop a separate library that may be included in the stdlib one day?
- And what's your opinion on writing a PEP about making it possible to pass a custom socket-factory to stdlib objects?
I'm (and I think it's not just me) a bit lost here, after reading 100s of emails on python-ideas. And I just want to know where to channel my energy and expertise ;)
It's not just you, I'm not entirely clear on what we expect to end up with either. My current view is that we'll get a PEP that defines a convention for user code and an interface for schedulers. Adding *_async() methods to the entire standard library could take a long time and should probably be divided up so we can have really experienced devs on particular areas (e.g. someone on Windows sockets, someone else on Linux sockets, etc.) and may need individual PEPs. My hope is that the first PEP provides a protocol for users to defer the rest of a task until after some/any operation has completed - I don't really want sockets/networking/files/threads/etc. to leak through at all, though these are all important use cases that need to be tried. This is the way I'm approaching it, so please let me know if I'm off the mark :) Cheers, Steve
On Thu, Oct 25, 2012 at 9:10 AM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
One question: what do we actually want to get? What're the goals?
Good question. I'm still in the requirements gathering phase myself.
- A specification (PEP?) of how to make stdlib more async-friendly?
That's one of the hopeful goals, but a lot of things need to be decided before we can start adapting the stdlib. It is also likely that this will be a process that will take several release (and may never finish completely).
- To develop a separate library that may be included in the stdlib one day?
That's one way I am pursuing and I hope others will too.
- And what's your opinion on writing a PEP about making it possible to pass a custom socket-factory to stdlib objects?
That sounds like it might be jumping to a specific solution. I agree that the stdlib often, unfortunately, couples classes too tightly, where a class that needs an instance of another class just instantiates that other class rather than having an instance passed in (at least as an option). We're doing better with files these days -- most APIs (that I can think of) that work with streams let you pass one in. So maybe you're on to something. Perhaps, as a step towards the exploration of this PEP, you could come up with a concrete list of modules and classes (or other API elements) that you think would benefit from being able to pass in a socket? Please start another thread -- python-ideas is fine. I will read it.
I'm (and I think it's not just me) a bit lost here, after reading 100s of emails on python-ideas. And I just want to know where to channel my energy and expertise ;)
Totally understood. I'm overwhelmed myself by the vast array of options. Still, I have been writing some experimental code myself, and I am beginning to understand in which direction I'd like to move. I am thinking of having a strict separation between an event loop, a task scheduler, specific transports, and protocol implementations. - The event loop in turn separates into a component that knows how to poll for I/O (or other) events using the best mechanism available on the platform, and a part that manages callback functions -- these are closely tied together, but the idea is that the callback management part does not have to vary by platform, so only the I/O polling needs to be a platform-specific. Details subject to bikeshedding (I've only got something working on Linux and OSX so far). One of the requirements for this event loop is that it should be possible to run frameworks like Twisted or Tornado using an adapter to it, and it should also be possible for Twisted/Tornado/etc. to provide their own event loop (again via some kind of adaptation) to replace the default one. - For the task scheduler I am piling all my hopes on PEP-380, i.e. yield from. I have not found a single thing that is harder to do using this style than using the PEP-342 yield <future> style, and I really don't like mixing the two up (despite what Steve Dower says :-). But I don't want the event loop interface to know about this at all -- howver the scheduler has to know about the event loop (at least its interface). I am currently refactoring my ideas in this area; I think I'll end up with a Task object that smells a bit like a Future, but represents a whole stack of generator invocations linked via yield-from, and which allows suspension of the entire stack at once; user code only needs to use Tasks when it wants to schedule multiple activities concurrently, not when it just wants to be able to yield. (This may be the core insight in favor of PEP 380.) - Transports (e.g. TCP): I feel like a newbie here. I know sockets pretty well, but the key is to introduce abstractions that let you easily replace a transport with a different one -- e.g. TCP vs. pipes vs. SSL. Twisted clearly has paved the way here -- even if we end up slicing the abstractions somewhat differently, the road to the optimal interface has to take the same road that Twisted took -- implement a simple transport using sockets, then add another transport, refactor the abstractions to share the commonalities and separate the differences, then try adding yet another transport, rinse and repeat. We should provide a bunch of common transports but also let people build new ones; however, there will probably be way fewer transport implementations than protocol implementations. - Protocols (e.g. HTTP): A protocol should ideally be able to work with any transport (though obviously some protocols require certain transport extensions -- hopefully we'll have a small hierarchy of abstract classes defining different transport styles and capabilities). We should provide a bunch of common protocols (e.g. a good HTTP client and server) but this is where users will most often be writing their own -- so the APIs used by protocol implementations must be documented especially well, the standard protocol implementations must be examples of excellent coding style, and the transport implementations should not let protocol implementations get away with undefined behavior. It would be useful to have explicit testing support too -- just like there's a WSGI validator, we could have a protocol validator that acts like a particularly picky transport. (I found this idea in a library written by Jim Fulton for Zope, I think it's zope.ngi. It's a valuable idea.) I think it's inevitable that the choice of using PEP-380 will be reflected in the abstract classes defining transports and protocols. Hopefully we will be able to bridge between the PEP-380 world and Twisted's world of Deferred somehow -- the event loop is one interface layer, but I think we can build adapters for the other levels as well (at least for transports). One final thought: async WSGI anyone? -- --Guido van Rossum (python.org/~guido)
Guido, Thank you for such a detailed and deep response. Lots of good thoughts to digest. One idea: the scope of the problem is enormously big. It may take months/years to synchronize all ideas and thoughts by just communicating ideas over mail list without a concrete thing and subject to discuss. How about you/we create a repository with a draft implementation of scheduler/io loop/coroutines engine and we simply start tweaking an discussing that particular design? That way people will see where to start the discussion, what's done, and some will even participate? The goal is not to write a production-quality software, but rather to have a common place to discuss/try things/benchmark etc. I'm not sure, but maybe places like bitbucket, where you can have a wiki, issues, and the actual code is a better place, than a mail-list. I also think that there's need to move concurrency-related discussions to a separate mail-list, as everything else on python-ideas is lost now. On 2012-10-25, at 1:58 PM, Guido van Rossum <guido@python.org> wrote: [...]
- And what's your opinion on writing a PEP about making it possible to pass a custom socket-factory to stdlib objects?
That sounds like it might be jumping to a specific solution. I agree that the stdlib often, unfortunately, couples classes too tightly, where a class that needs an instance of another class just instantiates that other class rather than having an instance passed in (at least as an option). We're doing better with files these days -- most APIs (that I can think of) that work with streams let you pass one in. So maybe you're on to something. Perhaps, as a step towards the exploration of this PEP, you could come up with a concrete list of modules and classes (or other API elements) that you think would benefit from being able to pass in a socket? Please start another thread -- python-ideas is fine. I will read it.
OK, I will, in a week or two. Need some time for a research. [...]
- For the task scheduler I am piling all my hopes on PEP-380, i.e. yield from. I have not found a single thing that is harder to do using this style than using the PEP-342 yield <future> style, and I really don't like mixing the two up (despite what Steve Dower says :-). But I don't want the event loop interface to know about this at all -- howver the scheduler has to know about the event loop (at least its interface). I am currently refactoring my ideas in this area; I think I'll end up with a Task object that smells a bit like a Future, but represents a whole stack of generator invocations linked via yield-from, and which allows suspension of the entire stack at once; user code only needs to use Tasks when it wants to schedule multiple activities concurrently, not when it just wants to be able to yield. (This may be the core insight in favor of PEP 380.)
The only problem I have with PEP-380, is that to me it's not entirely clear when you should use 'yield' or 'yield from' (please correct me if I am wrong). I'll try to demonstrate it by example: class Socket: def sendall(self, payload): f = Future() IOLoop.sendall(payload, future=f) return f class SMTP: def send(self, s): ... # yield the returned future to the scheduler yield self.sock.sendall(s) ... # And later: s = SMTP() yield from s.send('spam') Is it (roughly) how you want it all to look like? I.e. using 'yield' to send a future/task to the scheduler, and 'yield from' to delegate? If I guessed correctly, and that's how you envision it, I have a question: What if you decide to refactor 'Socket.sendall' to be a coroutine? In that case you'd want users to call it 'yield from Socket.sendall', and not 'yield Socket.sendall'. Thank you, Yury
On Thu, Oct 25, 2012 at 11:39 AM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
Thank you for such a detailed and deep response. Lots of good thoughts to digest.
You're welcome.
One idea: the scope of the problem is enormously big. It may take months/years to synchronize all ideas and thoughts by just communicating ideas over mail list without a concrete thing and subject to discuss. How about you/we create a repository with a draft implementation of scheduler/io loop/coroutines engine and we simply start tweaking an discussing that particular design? That way people will see where to start the discussion, what's done, and some will even participate? The goal is not to write a production-quality software, but rather to have a common place to discuss/try things/benchmark etc. I'm not sure, but maybe places like bitbucket, where you can have a wiki, issues, and the actual code is a better place, than a mail-list.
I am currently working on code. Steve Dower has also said he's going to write some code. I'm just not quite ready to show my code (I need to do a few more iterations on each component). As long as I can use Mercurial I'm happy; bitbucket or Google Code Hosting both work fine for me.
I also think that there's need to move concurrency-related discussions to a separate mail-list, as everything else on python-ideas is lost now.
I don't have that problem. You are the one who started a new thread. :-) If you really want a new mailing list, you can set it up; I'd be happy to join, but my preference would be to stick it out here; I've seen too many specialized lists and SIGs dwindle after an initial burst of activity. [...]
The only problem I have with PEP-380, is that to me it's not entirely clear when you should use 'yield' or 'yield from' (please correct me if I am wrong). I'll try to demonstrate it by example:
class Socket: def sendall(self, payload): f = Future() IOLoop.sendall(payload, future=f) return f
class SMTP: def send(self, s): ... # yield the returned future to the scheduler yield self.sock.sendall(s) ...
# And later: s = SMTP() yield from s.send('spam')
Is it (roughly) how you want it all to look like? I.e. using 'yield' to send a future/task to the scheduler, and 'yield from' to delegate?
I think that's the style that Steve Dower prefers. Greg Ewing would rather see all public APIs use yield from, and reserve plain yield exclusively as an implementation detail of the scheduler. In my own experimental code I am using Greg's style and it is working out great. My main reason for taking a hard stance on this is that it would otherwise be too confusing for users -- should they use yield, yield from, or a plain call? I'd like to tell them "if it blocks, use yield from". BTW, if you haven't read Greg's introduction to this style, here it is -- worth reading! http://www.cosc.canterbury.ac.nz/greg.ewing/python/tasks/SimpleScheduler.htm...
If I guessed correctly, and that's how you envision it, I have a question: What if you decide to refactor 'Socket.sendall' to be a coroutine? In that case you'd want users to call it 'yield from Socket.sendall', and not 'yield Socket.sendall'.
That's why using yield from all the way is better! -- --Guido van Rossum (python.org/~guido)
On 2012-10-25, at 3:25 PM, Guido van Rossum <guido@python.org> wrote:
On Thu, Oct 25, 2012 at 11:39 AM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
[...]
One idea: the scope of the problem is enormously big. It may take months/years to synchronize all ideas and thoughts by just communicating ideas over mail list without a concrete thing and subject to discuss. How about you/we create a repository with a draft implementation of scheduler/io loop/coroutines engine and we simply start tweaking an discussing that particular design? That way people will see where to start the discussion, what's done, and some will even participate? The goal is not to write a production-quality software, but rather to have a common place to discuss/try things/benchmark etc. I'm not sure, but maybe places like bitbucket, where you can have a wiki, issues, and the actual code is a better place, than a mail-list.
I am currently working on code. Steve Dower has also said he's going to write some code. I'm just not quite ready to show my code (I need to do a few more iterations on each component). As long as I can use Mercurial I'm happy; bitbucket or Google Code Hosting both work fine for me.
OK. Let's wait until we have a somewhat stable platform to work with. [...]
Is it (roughly) how you want it all to look like? I.e. using 'yield' to send a future/task to the scheduler, and 'yield from' to delegate?
I think that's the style that Steve Dower prefers. Greg Ewing would rather see all public APIs use yield from, and reserve plain yield exclusively as an implementation detail of the scheduler. In my own experimental code I am using Greg's style and it is working out great. My main reason for taking a hard stance on this is that it would otherwise be too confusing for users -- should they use yield, yield from, or a plain call? I'd like to tell them "if it blocks, use yield from".
BTW, if you haven't read Greg's introduction to this style, here it is -- worth reading! http://www.cosc.canterbury.ac.nz/greg.ewing/python/tasks/SimpleScheduler.htm...
If I guessed correctly, and that's how you envision it, I have a question: What if you decide to refactor 'Socket.sendall' to be a coroutine? In that case you'd want users to call it 'yield from Socket.sendall', and not 'yield Socket.sendall'.
That's why using yield from all the way is better!
Yes, that now makes sense! I'll definitely take a look at Greg's article. Thanks, Yury
On 10/25/2012 12:10 PM, Yury Selivanov wrote:
- And what's your opinion on writing a PEP about making it possible to pass a custom socket-factory to stdlib objects?
I think this is probably a good idea quite aside from async issues. For one thing, it would make testing with a mock-socket class easier. Issues to decide: name of parameter (should be same for all socket using classes); keyword only? (ditto). I am not sure this needs a PEP. Most parameter additions are just tracker issues. But I would be worthwhile to decide on the details here first. -- Terry Jan Reedy
On 2012-10-25, at 4:39 PM, Terry Reedy <tjreedy@udel.edu> wrote:
On 10/25/2012 12:10 PM, Yury Selivanov wrote:
- And what's your opinion on writing a PEP about making it possible to pass a custom socket-factory to stdlib objects?
I think this is probably a good idea quite aside from async issues. For one thing, it would make testing with a mock-socket class easier. Issues to decide: name of parameter (should be same for all socket using classes); keyword only? (ditto).
Right, good catch on mocking sockets! As for the issues: I think that the parameter name should be the same/very consistent, and surely keyword-only.
I am not sure this needs a PEP. Most parameter additions are just tracker issues. But I would be worthwhile to decide on the details here first.
We'll see. I'll start with a detailed post on python-ideas, and if the PEP looks like an overkill - I'd be glad to skip the PEP step. Thanks, Yury
On 10/25/2012 4:51 PM, Yury Selivanov wrote:
On 2012-10-25, at 4:39 PM, Terry Reedy <tjreedy@udel.edu> wrote:
On 10/25/2012 12:10 PM, Yury Selivanov wrote:
- And what's your opinion on writing a PEP about making it possible to pass a custom socket-factory to stdlib objects?
I think this is probably a good idea quite aside from async issues. For one thing, it would make testing with a mock-socket class easier. Issues to decide: name of parameter (should be same for all socket using classes); keyword only? (ditto).
Right, good catch on mocking sockets!
As for the issues: I think that the parameter name should be the same/very consistent, and surely keyword-only.
I left out the following issue: should the argument be a socket-returning callable (a 'socket-factory' as you called it above) or an opened socket? For files, we variously pass file names to be used with the default opener, opened files, and file descriptors, but never an alternate opener (such as StringIO). One reason is the the user typically needs a handle on the file object in order to later retrieve the contents. I am not sure that the same applies to sockets. If I ask the ftp module to get or send a file, I should not ever need to see the socket used for the transport. -- Terry Jan Reedy
Please start a new thread for this sub-topic. Note that for mocking, you won't need to pass in a socket object; you can just mock out socket.socket() directly using Michael Foord's all-singing all-dancing unittest.mock module (now in the Python 3 stdlib). On Thu, Oct 25, 2012 at 2:06 PM, Terry Reedy <tjreedy@udel.edu> wrote:
On 10/25/2012 4:51 PM, Yury Selivanov wrote:
On 2012-10-25, at 4:39 PM, Terry Reedy <tjreedy@udel.edu> wrote:
On 10/25/2012 12:10 PM, Yury Selivanov wrote:
- And what's your opinion on writing a PEP about making it possible to pass a custom socket-factory to stdlib objects?
I think this is probably a good idea quite aside from async issues. For one thing, it would make testing with a mock-socket class easier. Issues to decide: name of parameter (should be same for all socket using classes); keyword only? (ditto).
Right, good catch on mocking sockets!
As for the issues: I think that the parameter name should be the same/very consistent, and surely keyword-only.
I left out the following issue: should the argument be a socket-returning callable (a 'socket-factory' as you called it above) or an opened socket?
For files, we variously pass file names to be used with the default opener, opened files, and file descriptors, but never an alternate opener (such as StringIO). One reason is the the user typically needs a handle on the file object in order to later retrieve the contents.
I am not sure that the same applies to sockets. If I ask the ftp module to get or send a file, I should not ever need to see the socket used for the transport.
-- Terry Jan Reedy
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- --Guido van Rossum (python.org/~guido)
Le Thu, 25 Oct 2012 16:39:40 -0400, Terry Reedy <tjreedy@udel.edu> a écrit :
On 10/25/2012 12:10 PM, Yury Selivanov wrote:
- And what's your opinion on writing a PEP about making it possible to pass a custom socket-factory to stdlib objects?
I think this is probably a good idea quite aside from async issues.
I think it's a rather bad idea. It does not correspond to any real use case and will clutter the API with an additional parameter. Regards Antoine.
Steve Dower wrote:
The type of the error is irrelevant - if something_else() might raise an exception that is expected, it won't be passed in because the scheduler is suppressing exceptions inside finally blocks. Or perhaps I've misunderstood the point of gi_in_finally?
IIUC, it's only *asynchronous* exceptions that would be blocked -- i.e. ones thrown in from a different task, or arising from an external event such as a timeout. An exception raised explicity by the task's own code would be unaffected. -- Greg
If the main concern in all of this is timeouts, it should be possible to address that without adding any more interpreter machinery. For example, when a timeout exception is thrown, whatever is responsible for that can flag the task as being in the process of handling a timeout, and refrain from initiating any more timeouts until that flag is cleared. -- Greg
Hi Guido, On Thu, Oct 25, 2012 at 2:43 AM, Guido van Rossum <guido@python.org> wrote:
On Wed, Oct 24, 2012 at 4:26 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
On 2012-10-24, at 7:12 PM, Guido van Rossum <guido@python.org> wrote:
Ok, I can understand. But still, this is a problem with timeouts in general, not just with timeouts in a yield-based environment. How does e.g. Twisted deal with this?
I don't know, I hope someone with an expertise in Twisted can tell us.
But I would imagine that they don't have this particular problem, as it should be related only to coroutines and schedulers that run them. I.e. it's a problem when you run some code and may interrupt it. And you can't interrupt a plain python code that uses callbacks without yields and greenlets.
Well, but in the Twisted world, if a cleanup callback requires more blocking calls, it has to spawn more deferred callbacks. So I think they *do* have the problem, unless they don't have a way at all to constrain the total running time of an action involving cascading callbacks. Also, they have inlineCallbacks which does use yield.
AFAIR, in twisted there is no timeout on coroutine, there is a timeout on request, which is usually just a socket timeout. So there is no problem of interrupting the code in arbitrary places. Another twisted thing, is doing all writes asynchronously with respect to user code, so if you want to write something and close a connection for finalization you just call: transport.write('something') transport.loseConnection() And they do not return deferreds, so it returns immediately even if the socket is not writable at the moment. (IIRC, it never writes right now, but rather from reactor callback) -- Paul
Sorry, working really long hours these days; just wanted to chime in that yes, you can call transport.write with large strings, and the reactor will do the right thing under the hood: loseConnection is the polite way of dropping a connection, which should wait for all pending writes to finish etc. cheers lvh
On Thu, Oct 25, 2012 at 4:46 AM, Laurens Van Houtven <_@lvh.cc> wrote:
Sorry, working really long hours these days; just wanted to chime in that yes, you can call transport.write with large strings, and the reactor will do the right thing under the hood: loseConnection is the polite way of dropping a connection, which should wait for all pending writes to finish etc.
This seems a decent enough pattern. It also makes it possible to use one of these things as a substitute for a writable file object, so you can e.g. use it as sys.stdout or the stream for a logging.StreamHandler. Still, I wonder what happens if the socket/pipe/whatever that is written to is very slow and the program produces too much data. Does memory just balloon up, or is there some kind of throttling of the writer? Or a buffer overflow exception? For a totally general solution I would at least like to have the *option* of doing synchronous writes. (I'm asking these questions because I'd like to copy this useful pattern -- but I want to get the end cases right.) -- --Guido van Rossum (python.org/~guido)
On Thu, Oct 25, 2012 at 10:43 AM, Guido van Rossum <guido@python.org> wrote:
On Thu, Oct 25, 2012 at 4:46 AM, Laurens Van Houtven <_@lvh.cc> wrote:
Sorry, working really long hours these days; just wanted to chime in that yes, you can call transport.write with large strings, and the reactor will do the right thing under the hood: loseConnection is the polite way of dropping a connection, which should wait for all pending writes to finish etc.
This seems a decent enough pattern. It also makes it possible to use one of these things as a substitute for a writable file object, so you can e.g. use it as sys.stdout or the stream for a logging.StreamHandler.
Still, I wonder what happens if the socket/pipe/whatever that is written to is very slow and the program produces too much data. Does memory just balloon up, or is there some kind of throttling of the writer? Or a buffer overflow exception? For a totally general solution I would at least like to have the *option* of doing synchronous writes.
(I'm asking these questions because I'd like to copy this useful pattern -- but I want to get the end cases right.)
There's a callback that gets called saying "your buffer is too full". This is the producer/consumer API people have referred to. It's not the best API in the world, and Glyph is working on an improvement, but that's the basic idea. The general move is towards a push API - push as much data as you can until you're told to stop. Tornado has a "tell me when this write is removed from the buffer and actually written to the socket" callback. This is more of a pull approach; you write some data, and get notified when you should write some more. --Itamar
On Wed, Oct 24, 2012 at 7:43 PM, Guido van Rossum <guido@python.org> wrote:
I don't know, I hope someone with an expertise in Twisted can tell us.
But I would imagine that they don't have this particular problem, as it should be related only to coroutines and schedulers that run them. I.e. it's a problem when you run some code and may interrupt it. And you can't interrupt a plain python code that uses callbacks without yields and greenlets.
Well, but in the Twisted world, if a cleanup callback requires more blocking calls, it has to spawn more deferred callbacks. So I think they *do* have the problem, unless they don't have a way at all to constrain the total running time of an action involving cascading callbacks. Also, they have inlineCallbacks which does use yield.
Deferreds don't do anything to prevent blocking. They're just a nice abstraction for callbacks. And yes, if you call 1000 functions that do lots of CPU in a row, that will keep other stuff from happening. However, consider how a timeout works: the event loop notices enough time has passed, and so calls some code that tells the Deferred to cancel its operation. So you're *not* adding the cancellation operations to the stack of the original operation, you're starting from the event loop. And so timeouts are just normal event loop world, where you need to be careful not to do to much CPU-intensive processing in any given call, and you can't call blocking system calls (except using a thread). Of course, you can't timeout a function that's just looping using CPU, or a blocking system call, and so code needs to be structured to deal with this, but that's a different issue.
On Fri, Oct 26, 2012 at 7:12 AM, Itamar Turner-Trauring <itamar@futurefoundries.com> wrote:
On Wed, Oct 24, 2012 at 7:43 PM, Guido van Rossum <guido@python.org> wrote:
I don't know, I hope someone with an expertise in Twisted can tell us.
But I would imagine that they don't have this particular problem, as it should be related only to coroutines and schedulers that run them. I.e. it's a problem when you run some code and may interrupt it. And you can't interrupt a plain python code that uses callbacks without yields and greenlets.
Well, but in the Twisted world, if a cleanup callback requires more blocking calls, it has to spawn more deferred callbacks. So I think they *do* have the problem, unless they don't have a way at all to constrain the total running time of an action involving cascading callbacks. Also, they have inlineCallbacks which does use yield.
Deferreds don't do anything to prevent blocking. They're just a nice abstraction for callbacks. And yes, if you call 1000 functions that do lots of CPU in a row, that will keep other stuff from happening.
However, consider how a timeout works: the event loop notices enough time has passed, and so calls some code that tells the Deferred to cancel its operation. So you're *not* adding the cancellation operations to the stack of the original operation, you're starting from the event loop. And so timeouts are just normal event loop world, where you need to be careful not to do to much CPU-intensive processing in any given call, and you can't call blocking system calls (except using a thread).
Of course, you can't timeout a function that's just looping using CPU, or a blocking system call, and so code needs to be structured to deal with this, but that's a different issue.
So, basically, it's just "after T seconds you get this second callback and it's up to you to deal with it"? I guess the timeout callback can inspect the state of the operation, and cancel any pending operations? Do you have a way to translate timeouts into exceptions in inlineCallbacks? If so, how is that working out? -- --Guido van Rossum (python.org/~guido)
There's an exception for "a deferred has been cancelled". Cancelling a deferred fires that down its errback chain just like any exception. Since @inlineCallbacks works on top of deferreds, it magically works:
from twisted.internet import defer d = defer.Deferred() @defer.inlineCallbacks ... def f(): ... yield d ... r = f() r <Deferred at 0x1019df950> d.cancel() r <Deferred at 0x1019df950 current result: <twisted.python.failure.Failure <class 'twisted.internet.defer.CancelledError'>>>
On Fri, Oct 26, 2012 at 5:25 PM, Guido van Rossum <guido@python.org> wrote:
On Fri, Oct 26, 2012 at 7:12 AM, Itamar Turner-Trauring <itamar@futurefoundries.com> wrote:
On Wed, Oct 24, 2012 at 7:43 PM, Guido van Rossum <guido@python.org>
I don't know, I hope someone with an expertise in Twisted can tell us.
But I would imagine that they don't have this particular problem, as
it
should be related only to coroutines and schedulers that run them. I.e. it's a problem when you run some code and may interrupt it. And you can't interrupt a plain python code that uses callbacks without yields and greenlets.
Well, but in the Twisted world, if a cleanup callback requires more blocking calls, it has to spawn more deferred callbacks. So I think they *do* have the problem, unless they don't have a way at all to constrain the total running time of an action involving cascading callbacks. Also, they have inlineCallbacks which does use yield.
Deferreds don't do anything to prevent blocking. They're just a nice abstraction for callbacks. And yes, if you call 1000 functions that do lots of CPU in a row, that will keep other stuff from happening.
However, consider how a timeout works: the event loop notices enough time has passed, and so calls some code that tells the Deferred to cancel its operation. So you're *not* adding the cancellation operations to the stack of the original operation, you're starting from the event loop. And so timeouts are just normal event loop world, where you need to be careful not to do to much CPU-intensive processing in any given call, and you can't call blocking system calls (except using a thread).
Of course, you can't timeout a function that's just looping using CPU, or a blocking system call, and so code needs to be structured to deal with
wrote: this,
but that's a different issue.
So, basically, it's just "after T seconds you get this second callback and it's up to you to deal with it"? I guess the timeout callback can inspect the state of the operation, and cancel any pending operations?
Do you have a way to translate timeouts into exceptions in inlineCallbacks? If so, how is that working out?
-- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- cheers lvh
err, I suppose the missing bit there is that you'll probably want to: reactor.callLater(timeout, d.cancel) As opposed to calling d.cancel() directly. (That snippet was in bpython-urwid with the reactor running in the background, but I doubt it'd work well anywhere else outside of manholes :)) cheers lvh
On Fri, Oct 26, 2012 at 8:52 AM, Laurens Van Houtven <_@lvh.cc> wrote:
err, I suppose the missing bit there is that you'll probably want to:
reactor.callLater(timeout, d.cancel)
As opposed to calling d.cancel() directly. (That snippet was in bpython-urwid with the reactor running in the background, but I doubt it'd work well anywhere else outside of manholes :))
So I think that Yuri's original problem statement, transformed to Twisted+Deferred, might still apply, depending on how you implement it. Yuri essentially did this: def foobar(): # a task try: yield <blocking action> finally: # must clean up regardless of whether action succeeded or failed: yield <blocking cleanup> He then calls this with a timeout, with the semantics that if the generator is blocked in a yield when the timeout arrives, that yield raises a Timeout exception (and at no other time is Timeout raised). The problem with this is that if the action succeeds within the timeout, but barely, there's a chance that the cleanup of a *successful* action receives the Timeout exception. Apparently this bit Yuri. I'm not sure how you'd model that using just Deferreds, but using inlineCallbacks it seems the same thing might happen. Using Deferreds, I assume there's a common pattern to implement this that doesn't have this problem. Of course, using coroutines, there is too -- spawn the cleanup as an independent task. -- --Guido van Rossum (python.org/~guido)
On Fri, Oct 26, 2012 at 12:36 PM, Guido van Rossum <guido@python.org> wrote:
On Fri, Oct 26, 2012 at 8:52 AM, Laurens Van Houtven <_@lvh.cc> wrote:
err, I suppose the missing bit there is that you'll probably want to:
reactor.callLater(timeout, d.cancel)
As opposed to calling d.cancel() directly. (That snippet was in bpython-urwid with the reactor running in the background, but I doubt it'd work well anywhere else outside of manholes :))
So I think that Yuri's original problem statement, transformed to Twisted+Deferred, might still apply, depending on how you implement it. Yuri essentially did this:
def foobar(): # a task try: yield <blocking action> finally: # must clean up regardless of whether action succeeded or failed: yield <blocking cleanup>
He then calls this with a timeout, with the semantics that if the generator is blocked in a yield when the timeout arrives, that yield raises a Timeout exception (and at no other time is Timeout raised). The problem with this is that if the action succeeds within the timeout, but barely, there's a chance that the cleanup of a *successful* action receives the Timeout exception. Apparently this bit Yuri. I'm not sure how you'd model that using just Deferreds, but using inlineCallbacks it seems the same thing might happen. Using Deferreds, I assume there's a common pattern to implement this that doesn't have this problem. Of course, using coroutines, there is too -- spawn the cleanup as an independent task.
If you call cancel() on a Deferred that already has a result, nothing happens. So you don't get a TimeoutError if the operation has succeeded (or failed some other way). This would also be true when using inlineCallbacks, so there's no issue. In general I'm not clear why this is a problem: in a single-threaded program only one thing happens at a time. Your code for triggering a timeout always has the option to check if the operation has succeeded, without worrying about race conditions.
On 2012-10-26, at 12:57 PM, Itamar Turner-Trauring <itamar@futurefoundries.com> wrote:
On Fri, Oct 26, 2012 at 12:36 PM, Guido van Rossum <guido@python.org> wrote: On Fri, Oct 26, 2012 at 8:52 AM, Laurens Van Houtven <_@lvh.cc> wrote:
err, I suppose the missing bit there is that you'll probably want to:
reactor.callLater(timeout, d.cancel)
As opposed to calling d.cancel() directly. (That snippet was in bpython-urwid with the reactor running in the background, but I doubt it'd work well anywhere else outside of manholes :))
So I think that Yuri's original problem statement, transformed to Twisted+Deferred, might still apply, depending on how you implement it. Yuri essentially did this:
def foobar(): # a task try: yield <blocking action> finally: # must clean up regardless of whether action succeeded or failed: yield <blocking cleanup>
He then calls this with a timeout, with the semantics that if the generator is blocked in a yield when the timeout arrives, that yield raises a Timeout exception (and at no other time is Timeout raised). The problem with this is that if the action succeeds within the timeout, but barely, there's a chance that the cleanup of a *successful* action receives the Timeout exception. Apparently this bit Yuri. I'm not sure how you'd model that using just Deferreds, but using inlineCallbacks it seems the same thing might happen. Using Deferreds, I assume there's a common pattern to implement this that doesn't have this problem. Of course, using coroutines, there is too -- spawn the cleanup as an independent task.
If you call cancel() on a Deferred that already has a result, nothing happens. So you don't get a TimeoutError if the operation has succeeded (or failed some other way). This would also be true when using inlineCallbacks, so there's no issue.
In general I'm not clear why this is a problem: in a single-threaded program only one thing happens at a time. Your code for triggering a timeout always has the option to check if the operation has succeeded, without worrying about race conditions.
Let me ask you a question that may help me and others to understand how inlineCallbacks works. If you write the following: def func(): try: yield one_thing() yield and_another() finally: yield and_finally() Then each of those yields will create a separate Deferred object, that 'inlineCallbacks' transparently dispatches via generator send/throw, right? And if you 'yield func()' the same will happen--'inlineCallbacks' will return a Deferred, that will have a result of 'func' execution? Thanks, Yury
On Fri, Oct 26, 2012 at 1:06 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote: ... snip ...
Let me ask you a question that may help me and others to understand how inlineCallbacks works.
If you write the following:
def func(): try: yield one_thing() yield and_another() finally: yield and_finally()
Then each of those yields will create a separate Deferred object, that 'inlineCallbacks' transparently dispatches via generator send/throw, right?
one_thing() and and_another() and and_finally() should return Deferreds. inlineCallbacks gets those Deferreds, adds callbacks for completion/error, and resumes the generator at the appropriate time. You don't use the results from either Deferreds, so the values will just be thrown out. The yield/trampoline doesn't create any Deferreds for those operations itself.
And if you 'yield func()' the same will happen--'inlineCallbacks' will return a Deferred, that will have a result of 'func' execution?
You didn't decorate func with inlineCallbacks, but if you do, func() will give you Deferred. Note that func itself doesn't return any value. In Twisted land, this is done by defer.returnValue(), which uses exceptions to return a value to the trampoline. This maps well to the new sugar in 3.3.
Thanks, Yury _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- Jasper
On 2012-10-26, at 1:14 PM, Jasper St. Pierre <jstpierre@mecheye.net> wrote:
On Fri, Oct 26, 2012 at 1:06 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
... snip ...
Let me ask you a question that may help me and others to understand how inlineCallbacks works.
If you write the following:
def func(): try: yield one_thing() yield and_another() finally: yield and_finally()
Then each of those yields will create a separate Deferred object, that 'inlineCallbacks' transparently dispatches via generator send/throw, right?
one_thing() and and_another() and and_finally() should return Deferreds. inlineCallbacks gets those Deferreds, adds callbacks for completion/error, and resumes the generator at the appropriate time. You don't use the results from either Deferreds, so the values will just be thrown out. The yield/trampoline doesn't create any Deferreds for those operations itself.
And if you 'yield func()' the same will happen--'inlineCallbacks' will return a Deferred, that will have a result of 'func' execution?
You didn't decorate func with inlineCallbacks, but if you do, func() will give you Deferred. Note that func itself doesn't return any value. In Twisted land, this is done by defer.returnValue(), which uses exceptions to return a value to the trampoline. This maps well to the new sugar in 3.3.
Right, I forgot to decorate the 'func' with 'inlineCallbacks'. If it is decorated, though, how can I invoke it with a timeout? - Yury
On Fri, Oct 26, 2012 at 7:49 PM, Yury Selivanov <yselivanov.ml@gmail.com>wrote:
If it is decorated, though, how can I invoke it with a timeout?
The important thing to remember is that the fundamental abstraction at play here is the deferred. Calling such a decorated function gives you a deferred. So, you call it with a timeout the same way you timeout (cancel) any deferred: d = deferred_returning_expression reactor.callLater(timeout, d.cancel) Where deferred_returning_expression can be anything, including calling your @inlineCallbacks-decorated function. The way it fits in with all existing stuff, making it look an awful lot like a lot of existing stuff, is probably why deferred cancellation is one of the more recent features to make it into twisted: a lot of people did similar things using the tools that were already there.
- Yury
-- cheers lvh
On Fri, Oct 26, 2012 at 9:57 AM, Itamar Turner-Trauring <itamar@futurefoundries.com> wrote:
On Fri, Oct 26, 2012 at 12:36 PM, Guido van Rossum <guido@python.org> wrote:
On Fri, Oct 26, 2012 at 8:52 AM, Laurens Van Houtven <_@lvh.cc> wrote:
err, I suppose the missing bit there is that you'll probably want to:
reactor.callLater(timeout, d.cancel)
As opposed to calling d.cancel() directly. (That snippet was in bpython-urwid with the reactor running in the background, but I doubt it'd work well anywhere else outside of manholes :))
So I think that Yuri's original problem statement, transformed to Twisted+Deferred, might still apply, depending on how you implement it. Yuri essentially did this:
def foobar(): # a task try: yield <blocking action> finally: # must clean up regardless of whether action succeeded or failed: yield <blocking cleanup>
He then calls this with a timeout, with the semantics that if the generator is blocked in a yield when the timeout arrives, that yield raises a Timeout exception (and at no other time is Timeout raised). The problem with this is that if the action succeeds within the timeout, but barely, there's a chance that the cleanup of a *successful* action receives the Timeout exception. Apparently this bit Yuri. I'm not sure how you'd model that using just Deferreds, but using inlineCallbacks it seems the same thing might happen. Using Deferreds, I assume there's a common pattern to implement this that doesn't have this problem. Of course, using coroutines, there is too -- spawn the cleanup as an independent task.
If you call cancel() on a Deferred that already has a result, nothing happens. So you don't get a TimeoutError if the operation has succeeded (or failed some other way). This would also be true when using inlineCallbacks, so there's no issue.
In general I'm not clear why this is a problem: in a single-threaded program only one thing happens at a time. Your code for triggering a timeout always has the option to check if the operation has succeeded, without worrying about race conditions.
But the example is not single-threaded (in the informal sense that you use it here). Each yield is a suspension point where other things can happen, and one of those things could be a cancellation of *this* task (because of a timeout or otherwise). The example would have to set some flag indicating it has a result after the first yield (i.e. before entering the finally, or at least before yielding in the finally clause). And the timeout callback would have to check this flag. This makes it slightly awkward to design a general-purpose timeout mechanism for tasks written in this style -- if you expect a timeout or cancellation you must protect your cleanup code from it by using some API. Anyway, no need to respond: I think I understand how Twisted deals with this, and translating that into the world of PEP 380 is not your job. -- --Guido van Rossum (python.org/~guido)
Guido van Rossum wrote:
The example would have to set some flag indicating it has a result after the first yield (i.e. before entering the finally, or at least before yielding in the finally clause). And the timeout callback would have to check this flag. This makes it slightly awkward to design a general-purpose timeout mechanism for tasks written in this style -- if you expect a timeout or cancellation you must protect your cleanup code from it by using some API.
This is where having a way to find out whether a generator is in a finally clause would help. It would allow the scheduler to take care of this transparently. -- Greg
On 2012-10-27, at 7:21 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Guido van Rossum wrote:
The example would have to set some flag indicating it has a result after the first yield (i.e. before entering the finally, or at least before yielding in the finally clause). And the timeout callback would have to check this flag. This makes it slightly awkward to design a general-purpose timeout mechanism for tasks written in this style -- if you expect a timeout or cancellation you must protect your cleanup code from it by using some API.
This is where having a way to find out whether a generator is in a finally clause would help. It would allow the scheduler to take care of this transparently.
Right. But now I'm not sure this approach will work with yield-froms. As when you yield-fromming scheduler knows nothing about the chain of generators, as it's all hidden in the yield-from implementation. - Yury
Yury Selivanov wrote:
But now I'm not sure this approach will work with yield-froms. As when you yield-fromming scheduler knows nothing about the chain of generators, as it's all hidden in the yield-from implementation.
I think this just means that the implementation would involve more than looking at a single bit. Something like an in_finally() method that looks along the yield-from chain and returns true if any of the generators are in a finally section. -- Greg
On 2012-10-27, at 7:52 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Yury Selivanov wrote:
But now I'm not sure this approach will work with yield-froms. As when you yield-fromming scheduler knows nothing about the chain of generators, as it's all hidden in the yield-from implementation.
I think this just means that the implementation would involve more than looking at a single bit. Something like an in_finally() method that looks along the yield-from chain and returns true if any of the generators are in a finally section.
That would not be a solution either. Imagine that we have two coroutines: @coroutine def c1(): try: yield c2().with_timeout(1.0) # p1 finally: try: yield c2().with_timeout(1.0) # p2 except TimeoutError: pass @coroutine def c2(): try: yield c3().with_timeout(2.0) # p3 finally: yield c4() # p4 In the above example scheduler *can* safely interrupt "c2" when it is invoked from "c1" at "p2". I.e. scheduler can't interrupt the coroutine when it is itself in its finally statement, but it's fine to interrupt it when it is not, even if it is invoked from other coroutine's finally block. If you translate this example in yield-from form, then checking 'in_finally()' result on "c1" when it is at "p2" will prevent you to raise TimeoutError, but you clearly should. In other words, we want coroutines behaviour to be closer to the regular python code. - Yury
Yury Selivanov wrote:
In the above example scheduler *can* safely interrupt "c2" when it is invoked from "c1" at "p2". I.e. scheduler can't interrupt the coroutine when it is itself in its finally statement, but it's fine to interrupt it when it is not, even if it is invoked from other coroutine's finally block.
I'm confused about the relationship between c1 and c2 here, and what you mean by one coroutine "invoking" another. Can you post a version that uses yield-from instead of yielding objects with unknown (to me) semantics? -- Greg
On 2012-10-28, at 1:55 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Yury Selivanov wrote:
In the above example scheduler *can* safely interrupt "c2" when it is invoked from "c1" at "p2". I.e. scheduler can't interrupt the coroutine when it is itself in its finally statement, but it's fine to interrupt it when it is not, even if it is invoked from other coroutine's finally block.
I'm confused about the relationship between c1 and c2 here, and what you mean by one coroutine "invoking" another.
Can you post a version that uses yield-from instead of yielding objects with unknown (to me) semantics?
The reason I kept using my version is because I'm not sure how we will set timeouts for yield-from style coroutines. But let's assume that we can do that with a context manager. Let's also assume that generator object has 'in_finally()' method, as you defined: "Something like an in_finally() method that looks along the yield-from chain and returns true if any of the generators are in a finally section." def coro1(): try: with timeout(1.0): yield from coro2() # 1 finally: try: with timeout(1.0): yield from coro2() # 2 except TimeoutError: pass def coro2(): try: block() yield # 3 action() finally: block() yield # 4 another_action() Now, if "coro2" is suspended at #4 -- it shouldn't be interrupted with TimeoutError. If, however, "coro2" is at #3 -- it can be, and it doesn't matter was it called from #1 or #2. IIUC, yield-from supporting scheduler, won't know about "coro2". All it will have is a generator for "coro1". All dispatching will be handled by "yield from" statement automatically. In this case, you can't rely on "coro1.in_finally()", because it will return: - True, when "coro1" is at #1 & "coro2" is at #4 (it's unsafe to interrupt) - True, when "coro1" is at #2 & "coro2" is at #3 (safe to interrupt) The fundamental problem here, is that scheduler knows nothing about coroutines call chain. It doesn't even know at what generator 'with timeout' was called. - Yury
Yury Selivanov wrote:
def coro1(): try: with timeout(1.0): yield from coro2() # 1 finally: try: with timeout(1.0): yield from coro2() # 2 except TimeoutError: pass
def coro2(): try: block() yield # 3 action() finally: block() yield # 4 another_action()
Now, if "coro2" is suspended at #4 -- it shouldn't be interrupted with TimeoutError.
If, however, "coro2" is at #3 -- it can be, and it doesn't matter was it called from #1 or #2.
What is your reasoning behind asserting this? Because it's inside a try block of its own? Because it's subject to a nested timeout? Something else? -- Greg
On 2012-10-29, at 1:05 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Yury Selivanov wrote:
def coro1(): try: with timeout(1.0): yield from coro2() # 1 finally: try: with timeout(1.0): yield from coro2() # 2 except TimeoutError: pass def coro2(): try: block() yield # 3 action() finally: block() yield # 4 another_action() Now, if "coro2" is suspended at #4 -- it shouldn't be interrupted with TimeoutError. If, however, "coro2" is at #3 -- it can be, and it doesn't matter was it called from #1 or #2.
What is your reasoning behind asserting this? Because it's inside a try block of its own? Because it's subject to a nested timeout? Something else?
Because scheduler, when it is deciding to interrupt a coroutine or not, should only question whether that particular coroutine is in its finally, and not the one which called it. - Yury
Yury Selivanov wrote:
Because scheduler, when it is deciding to interrupt a coroutine or not, should only question whether that particular coroutine is in its finally, and not the one which called it.
So given this: def c1(): try: something() finally: yield from c2() very_important_cleanup() def c2(): yield from block() # 1 it should be okay to interrupt at point 1, even though it will prevent very_important_cleanup() from being done? That doesn't seem right to me. -- Greg
On 2012-10-29, at 8:06 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Yury Selivanov wrote:
Because scheduler, when it is deciding to interrupt a coroutine or not, should only question whether that particular coroutine is in its finally, and not the one which called it.
So given this:
def c1(): try: something() finally: yield from c2() very_important_cleanup()
def c2(): yield from block() # 1
it should be okay to interrupt at point 1, even though it will prevent very_important_cleanup() from being done?
That doesn't seem right to me.
-- Greg _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
Oh... I'm sorry for the empty reply. On 2012-10-29, at 8:06 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Yury Selivanov wrote:
Because scheduler, when it is deciding to interrupt a coroutine or not, should only question whether that particular coroutine is in its finally, and not the one which called it.
So given this:
def c1(): try: something() finally: yield from c2() very_important_cleanup()
def c2(): yield from block() # 1
it should be okay to interrupt at point 1, even though it will prevent very_important_cleanup() from being done?
That doesn't seem right to me.
But you don't just randomly interrupt coroutines. You interrupt them when you *explicitly stated*, for instance, that this very one coroutine is executed with a timeout. And it's your responsibility to handle a TimeoutError when you call it with such restriction. That's the *main* thing here. Again, when you, explicitly, execute something with a timeout), then that very something shouldn't be interrupted uncontrollably by the scheduler. It's that particular something, whose 'finally' should be protected. So in your example scheduler would never ever has a question of interrupting c2(), because it wasn't called with any restriction/timeout. There simply no reason to interrupt it ever. But if you want to make c2() interruptible, you would write: def c1(): try: something() finally: yield from with_timeout(2.0, c2()) very_important_cleanup() And that way, c2() actually may be (and at some point will be) interrupted by scheduler. And it's your responsibility to catch TimeoutError. So you would write your code in the following way to protect c1's finally statement: def c1(): try: something() finally: try: yield from with_timeout(2.0, c2()) except TimeoutError: ... very_important_cleanup() Now, the problem is that when you call c2() with a timeout, scheduler should not interrupt c2's finally statement (if there is any). And it has nothing to do with c1 entirely. So if c2() code is like the following: def c2(): try: something() finally: yield from someotherthing() important_cleanup() Then you need scheduler to know if it is in its finally or not. Because its c2() which was run with a timeout. It's c2() code that may be subject to aborting. And it doesn't matter from where c2() was called, the only thing that matters, is that if it was called with a timeout, its finally block should be protected from interrupting. That's all. - Yury
Yury Selivanov wrote:
So in your example scheduler would never ever has a question of interrupting c2(), because it wasn't called with any restriction/timeout. There simply no reason to interrupt it ever.
But there's nothing to stop someone writing def c3(): try: yield from with_timeout(10.0, c1()) except TimeoutError: print("That's cool, I can cope with that") Also, it's not just TimeoutErrors that are a potential problem, it's any asynchronous exception. For example, the task calling c1() might get cancelled by another task while c2() is blocked. If cancelling is implemented by throwing in an exception, you have the same problem.
Then you need scheduler to know if it is in its finally or not. Because its c2() which was run with a timeout. It's c2() code that may be subject to aborting.
I'm really not following your reasoning here. You seem to be speaking as if with_timeout() calls only have an effect one level deep. But that's not the case -- the frame that a TimeoutError gets thrown into by with_timeout() can be nested any number of yield-from calls deep. -- Greg
On 2012-10-30, at 1:53 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Yury Selivanov wrote:
So in your example scheduler would never ever has a question of interrupting c2(), because it wasn't called with any restriction/timeout. There simply no reason to interrupt it ever.
But there's nothing to stop someone writing
def c3(): try: yield from with_timeout(10.0, c1()) except TimeoutError: print("That's cool, I can cope with that")
Also, it's not just TimeoutErrors that are a potential problem, it's any asynchronous exception. For example, the task calling c1() might get cancelled by another task while c2() is blocked. If cancelling is implemented by throwing in an exception, you have the same problem.
Then you need scheduler to know if it is in its finally or not. Because its c2() which was run with a timeout. It's c2() code that may be subject to aborting.
I'm really not following your reasoning here. You seem to be speaking as if with_timeout() calls only have an effect one level deep. But that's not the case -- the frame that a TimeoutError gets thrown into by with_timeout() can be nested any number of yield-from calls deep.
Greg, Looks like I'm failing to explain my point of view (which is maybe wrong). The problem is tough, and without a shared code to debug and test ideas on it's just hard to communicate. Let's get back to this issue once we have a framework/library to work on. Thanks, Yury
Guido, Greg, On 2012-10-27, at 7:45 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
Right. But now I'm not sure this approach will work with yield-froms. As when you yield-fromming scheduler knows nothing about the chain of generators, as it's all hidden in the yield-from implementation.
I think I've come up with a solution that should work for yield-froms too (if we accept my in_finally idea in 3.4). And there should be a way of writing a 'protect_finally' context manager too. I'll illustrate the approach on Guido's tulip micro-framework (consider it a pseudo code to illustrate the idea): class Interrupt(BaseException): """Should penetrate all try..excepts""" def call_with_timeout(timeout, gen): context.current_task._add_timeout(timeout, gen) try: return (yield from gen) except Interrupt: raise TimeoutError() from None class Task: def _add_timeout(timeout, gen): self.eventloop.call_later( timeout, partial(self._interrupt, gen)) def _interrupt(self, gen): if not gen.in_finally: gen.throw(Interrupt, Interrupt(), None) else: # So we set a flag to watch for gen's in_finally value # on each 'step' call. And when it's 0 - Task.step # will call '_interrupt' again. self._watch_finally(gen) I defined a new function 'call_with_timeout', because tulip's 'with_timeout' starts a new Task, whereas the former works in any generator inside the task. So, after that you'd be able to do the following: yield from call_with_timeout(1.0, something()) And something's 'finally' won't ever be aborted. - Yury
On 2012-10-29, at 9:59 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
Guido, Greg,
On 2012-10-27, at 7:45 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
Right. But now I'm not sure this approach will work with yield-froms. As when you yield-fromming scheduler knows nothing about the chain of generators, as it's all hidden in the yield-from implementation.
I think I've come up with a solution that should work for yield-froms too (if we accept my in_finally idea in 3.4). And there should be a way of writing a 'protect_finally' context manager too.
I'll illustrate the approach on Guido's tulip micro-framework (consider it a pseudo code to illustrate the idea):
class Interrupt(BaseException): """Should penetrate all try..excepts"""
def call_with_timeout(timeout, gen): context.current_task._add_timeout(timeout, gen) try: return (yield from gen) except Interrupt: raise TimeoutError() from None
class Task: def _add_timeout(timeout, gen): self.eventloop.call_later( timeout, partial(self._interrupt, gen))
def _interrupt(self, gen): if not gen.in_finally: gen.throw(Interrupt, Interrupt(), None) else: # So we set a flag to watch for gen's in_finally value # on each 'step' call. And when it's 0 - Task.step # will call '_interrupt' again. self._watch_finally(gen)
I defined a new function 'call_with_timeout', because tulip's 'with_timeout' starts a new Task, whereas the former works in any generator inside the task.
So, after that you'd be able to do the following:
yield from call_with_timeout(1.0, something())
And something's 'finally' won't ever be aborted.
Ah, the solution is wrong, I've tricked myself. The right code would be something like that: class Interrupt(BaseException): """Should penetrate all try..excepts""" def call_with_timeout(timeout, gen): context.current_task._add_timeout(timeout, gen) try: return (yield from gen) except Interrupt: raise TimeoutError() from None class Task: def _add_timeout(timeout, gen): # XXX The following line is the key. We need a reference # to the generator object that is yield-fromming our 'gen' # ('caller' for 'gen') current_yield_from = self.gen.yield_from self.eventloop.call_later( timeout, partial(self._interrupt, gen, current_yield_from)) def _interrupt(self, gen, yf): if not yf.in_finally: # If gen's caller is not in it's finally block - it's # safe for us to interrupt gen. gen.throw(Interrupt, Interrupt(), None) else: # So we set a flag to watch for yf's in_finally value # on each 'step' call. And when it's 0 - Task.step # will call '_interrupt' again. self._watch_finally(yf, gen) IOW, besides just 'in_finally', we also need to add 'yield_from' property to generator object. The latter will hold a reference to the sub-generator that current generator is yielding from. The logic is pretty twisted, but i'm sure that the problem is solvable. P.S. I'm not proposing to add anything. It's more about finding *any* way to actually solve the problem correctly. Once we find that way, we *maybe* start thinking about language support of it. - Yury
On 25/10/12 12:12, Guido van Rossum wrote:
Of course this could be abused, but at your own risk -- the scheduler only gives you a fixed amount of extra time and then it's quits.
Which is another good reason to design your cleanup code so that it can't take an arbitrarily long time. -- Greg
On 25/10/12 11:43, Guido van Rossum wrote:
What's the problem with just letting the cleanup take as long as it wants to and do whatever it wants?
IIUC, the worry is not about time, it's that either 1) another task could run during the cleanup and mess something up, or 2) an exception could be thrown into the task during the cleanup and prevent it being completed. From a correctness standpoint, it doesn't matter if the cleanup takes a long time, as long as it doesn't yield. -- Greg
Greg, On 2012-10-24, at 5:30 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Yury Selivanov wrote:
It's not about try..finally nesting, it's about Scheduler being aware that a coroutine is in its 'finally' block and thus shouldn't be interrupted at the moment
It would be a bad idea to make a close() method, or anything else that might be needed for cleanup purposes, be a 'yield from' call. If it's an ordinary function, it can't be interrupted in the world we're talking about, so the PEP 419 problem doesn't apply.
If I were feeling in a radical mood, I might go as far as suggesting that 'yield' and 'yield from' be syntactically forbidden inside a finally clause. That would force you to design your cleanup code to be safe from interruptions.
I'm not sure it would be a good idea... Cleanup code for a DB connection *will* need to run queries to the database (at least in some circumstances). And we can't make them blocking. - Yury
On 25/10/12 11:47, Yury Selivanov wrote:
Cleanup code for a DB connection *will* need to run queries to the database (at least in some circumstances).
That smells like a design problem to me. If something goes wrong, the most you should have to do is roll back any transactions you were in the middle of. Trying to perform further queries is just inviting more trouble. -- Greg
On 2012-10-24, at 8:52 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
On 25/10/12 11:47, Yury Selivanov wrote:
Cleanup code for a DB connection *will* need to run queries to the database (at least in some circumstances).
That smells like a design problem to me. If something goes wrong, the most you should have to do is roll back any transactions you were in the middle of. Trying to perform further queries is just inviting more trouble.
Right. And that rolling back - a tiny db query "rollback" - is an async code, and where there is an async code, no matter how tiny and fast, - scheduler has an opportunity to screw it up. Guido's 'with protected_finally' should work, although it probably will look weird for for people unfamiliar with coroutines and this particular problem. - Yury
On 25/10/12 14:07, Yury Selivanov wrote:
Right. And that rolling back - a tiny db query "rollback" - is an async code,
Only if we implement it as a blocking operation as far as our task scheduler is concerned. I wouldn't do it that way -- I'd perform it synchronously and assume it'll be fast enough for that not to be a problem. BTW, we seem to be using different definitions for the term "query". To my way of thinking, a rollback is *not* a query, even if it happens to be triggered by sending a "rollback" command to the SQL interpreter. At the Python API level, it should appear as a distinct operation with its own method. -- Greg
Greg, On 2012-10-24, at 9:29 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
On 25/10/12 14:07, Yury Selivanov wrote:
Right. And that rolling back - a tiny db query "rollback" - is an async code,
Only if we implement it as a blocking operation as far as our task scheduler is concerned. I wouldn't do it that way -- I'd perform it synchronously and assume it'll be fast enough for that not to be a problem.
In a non-blocking application there is no way of running a blocking code, even if it's anticipated to block for a mere millisecond. Because if something gets out of control and it blocks for a longer period of time - everything just stops, right? Or did you mean something else with "synchronously" (perhaps Steve Dower's approach)?
BTW, we seem to be using different definitions for the term "query". To my way of thinking, a rollback is *not* a query, even if it happens to be triggered by sending a "rollback" command to the SQL interpreter. At the Python API level, it should appear as a distinct operation with its own method.
Right. I meant that "sending a rollback command to the SQL interpreter" part--this should be done through a non-blocking socket. To invoke an operation on a non-blocking socket we have to do it through 'yield' or 'yield from', hence - give scheduler a chance to interrupt the coroutine. Given the fact that we know, that the clean-up code should be simple and fast, it still contains coroutine context switches in real world code, be it due to the need of sending some information via a socket, or just by calling some other coroutine. If you write a single 'yield' in your finally block, and that (or caller) coroutine is called with a timeout, there is a chance that its 'finally' block execution will be aborted by a scheduler. Writing this yield/non-blocking type of code in finally blocks is a necessity, unfortunately. And even if that cleanup code is incredibly fast, if you have a webserver that runs for days/weeks/months, bad things will happen. So if we decide to adopt Guido's approach with explicitly marking critical finally blocks (well, they are all critical) with 'with protected_finally()' - allright. If we somehow invent a mechanism that would allow us to hide this all from user and protect finally blocks implicitly in scheduler - that's even better. Or we should design a totally different approach of handling timeouts, and try to not to interrupt coroutines at all. - Yury
On 10/23/12 4:30 PM, Yury Selivanov wrote:
How do you protect finally statements in shrapnel? If we have a following coroutine (greenlet style):
def foo(): connection = open_connection() try: spam() finally: [some code] connection.close()
What happens if you run 'foo.with_timeout(1)' and timeout occurs at "[some code]" point? Will you just abort 'foo', possibly preventing 'connection' from being closed?
Timeouts are raised as normal exceptions - for exactly this reason. The interesting part of the implementation is keeping each with_timeout() call separate. If you have nested with_timeout() calls and the outer timeout goes off, it will skip the inner exception handler and fire only the outer one. In other words, the code for with_timeout() verifies that any timeouts propagating through it belong to it. https://github.com/ironport/shrapnel/blob/master/coro/_coro.pyx#L1126-1142 -Sam
On 24.10.12 01:00, Sam Rushing wrote:
On 10/23/12 3:05 PM, Yury Selivanov wrote: ...
There is only one way to 'magically' make existing code both sync- & async- friendly--greenlets, but I think there is no chance for them (or stackless) to land in cpython in the foreseeable future (although it would be awesome.)
BTW, why didn't you use greenlets in shrapnel and ended up with your own implementation? I think shrapnel predates greenlets... some of the core asm code for greenlets may have come from one of shrapnel's precursors at ironport... Unfortunately it took many years to get shrapnel open-sourced - I remember talking with Guido about it over lunch in ~2006.
Hi Sam, greenlets were developed in 2004 by Armin Rigo, on the first (and maybe only) Stackless sprint here in Berlin. The greenlet asm code was ripped out of Stackless and slightly improved, but has the same old stack-slicing idea. cheers - chris -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/
On 10/23/12 5:43 PM, Christian Tismer wrote:
greenlets were developed in 2004 by Armin Rigo, on the first (and maybe only) Stackless sprint here in Berlin. The greenlet asm code was ripped out of Stackless and slightly improved, but has the same old stack-slicing idea.
Ah, ok. I remember talking with you at the 2005 PyCon about my two-stack solution*, but don't remember if anything came of it. Do greenlets use a single stack? -Sam (*) nothing to do with Israel
-----Original Message----- From: Python-ideas [mailto:python-ideas- bounces+kristjan=ccpgames.com@python.org] On Behalf Of Sam Rushing Sent: 23. október 2012 23:01 To: Yury Selivanov Cc: python-ideas@python.org Subject: Re: [Python-ideas] Async API
In shrapnel it is simply:
coro.with_timeout (<seconds>, <fun>, *args, **kwargs)
Timeouts are caught thus:
try: coro.with_timeout (...) except coro.TimeoutError: ...
Hi Sam ( I rember our talk about Shrapnel here at CCP some years back) , others: Jumping in here with some random stuff, in case anyone cares: A few years ago, I started trying to create a standard library for stackless python. we use it internally at ccp and it is open source, at https://bitbucket.org/krisvale/stacklesslib What it provides is 1) some utility classes for stackless (context managers mostly) but also synchronization primitives. 2) a basic "main" functionality: A main loop and an event scheduler 3) a set of replacement modules for threading/socket, etc 4) Monkeypatching tools, to monkeypatch in the replacements, and even run monkeypatched scripts. On the basis of the event scheduler, I also implemented timeout for socket.receive() functions. These used to allow e.g. timeouts for locking operations Timeouts are indeed implemented as exceptions raised. There are some minor race issues to think about but that's it. Notice the need for a stacklesslib.main module. The issue I have found with this sort of event driven model, is that composability suffers when everyone has their own idea about what a "main" loop should be. In threaded programming, the OS provides the main loop and the event scheduler. For something like Python, a whole application has to agree on what the main loop is, and how to schedule future events. Hopefully this discussion is an attempt to settle that in a standard manner. Cheers, Kristján p.s. stacklesslib is in a state of protracted and procrastinated development. I promised that I would fix it up at last pycon. Mostly I'm working on restructuring and making the main loop work more "out of the box."
On 23.10.2012 21:33, Yury Selivanov wrote:
topics = FE.select([ FE.publication_date, FE.body, FE.category, (FE.creator, [ (FE.creator.subject, [ (gpi, [ gpi.avatar ]) ]) ]) ]).filter(FE.publication_date< FE.publication_date.now(), FE.category == self.category)
Why use Python when you clearly want Java? Sturla
On 2012-10-26, at 6:27 AM, Sturla Molden <sturla@molden.no> wrote:
On 23.10.2012 21:33, Yury Selivanov wrote:
topics = FE.select([ FE.publication_date, FE.body, FE.category, (FE.creator, [ (FE.creator.subject, [ (gpi, [ gpi.avatar ]) ]) ]) ]).filter(FE.publication_date< FE.publication_date.now(), FE.category == self.category)
Why use Python when you clearly want Java?
And why do you think so? ;) - Yury
participants (14)
-
Antoine Pitrou
-
Christian Tismer
-
Greg Ewing
-
Guido van Rossum
-
Itamar Turner-Trauring
-
Jasper St. Pierre
-
Kristján Valur Jónsson
-
Laurens Van Houtven
-
Paul Colomiets
-
Sam Rushing
-
Steve Dower
-
Sturla Molden
-
Terry Reedy
-
Yury Selivanov