On Thu, Oct 11, 2012 at 3:28 PM, Guido van Rossum <guido@python.org> wrote:
On Mon, Oct 8, 2012 at 10:12 PM, Ben Darnell <ben@bendarnell.com> wrote:
On Mon, Oct 8, 2012 at 8:30 AM, Guido van Rossum <guido@python.org> wrote:
It's a Future constructor, a (conditional) add_done_callback, plus the calls to set_result or set_exception and the with statement for error handling. In full:
def future_wrap(f): @functools.wraps(f) def wrapper(*args, **kwargs): future = Future() if kwargs.get('callback') is not None: future.add_done_callback(kwargs.pop('callback')) kwargs['callback'] = future.set_result def handle_error(typ, value, tb): future.set_exception(value) return True with ExceptionStackContext(handle_error): f(*args, **kwargs) return future return wrapper
Hmm... I *think* it automatically adds a special keyword 'callback' to the *call* site so that you can do things like
fut = some_wrapped_func(blah, callback=my_callback)
and then instead of using yield to wait for the callback, put the continuation of your code in the my_callback() function.
Yes. Note that if you're passing in a callback you're probably going to just ignore the return value. The callback argument and the future return value are essentially two alternative interfaces; it probably doesn't make sense to use both at once (but as a library author it's useful to provide both).
Definitely sounds like something that could be simplified if you didn't have backward compatibility baggage...
Probably, although I still feel like callback-passing has its place. For example, I think the Tornado chat demo (https://github.com/facebook/tornado/blob/master/demos/chat/chatdemo.py) would be less clear with coroutines and Futures than it is now (although it would fit better into Greg's schedule/unschedule style). That doesn't mean that every method has to take a callback, but I'd be reluctant to get rid of them until we have more experience with the generator/future-focused style.
In Tornado the Future is created by a decorator and hidden from the asynchronous function (it just sees the callback),
Hm, interesting. NDB goes the other way, the callbacks are mostly used to make Futures work, and most code (including large swaths of internal code) uses Futures. I think NDB is similar to monocle here. In NDB, you can do
f = <some function returning a Future> r = yield f
where "yield f" is mostly equivalent to f.result(), except it gives better opportunity for concurrency.
Yes, tornado's gen.engine does the same thing here. However, the stakes are higher than "better opportunity for concurrency" - in an event loop if you call future.result() without yielding, you'll deadlock if that Future's task needs to run on the same event loop.
That would depend on the semantics of the event loop implementation. In NDB's event loop, such a .result() call would just recursively enter the event loop, and you'd only deadlock if you actually have two pieces of code waiting for each other's completion.
Hmm, I think I'd rather deadlock. :) If the event loop is reentrant then the application code has be coded defensively as if it were preemptively multithreaded, which introduces the possibility of deadlock or (probably) more subtle/less frequent errors. Reentrancy has been a significant problem in my experience, so I've been moving towards a policy where methods in Tornado that take a callback never run it immediately; callbacks are always scheduled on the next iteration of the IOLoop with IOLoop.add_callback.
The latter is a good tactic and I'm also using it. (Except for some reason we had to add the concept of "immediate callbacks" to our Future class, and those are run inside the set_result() call. But most callbacks don't use that feature.)
I don't have a choice about making the event loop reentrant -- App Engine's underlying RPC multiplexing implementation *is* reentrant, and there is a large set of "classic" APIs that I cannot stop the user from calling that reenter it. But even if my hand wasn't forced, I'm not sure if I would make your choice. In NDB, there is a full complement of synchronous APIs that exactly matches the async APIs, and users are free to use the synchronous APIs in parts of their code where they don't need concurrency. Hence, every sychronous API just calls its async sibling and immediately waits for its result, which implicitly invokes the event loop.
Tornado has a synchronous HTTPClient that does the same thing, although each fetch creates and runs its own IOLoop rather than spinning the top-level IOLoop. (This means it doesn't really make sense to run it when there is a top-level IOLoop; it's provided as a convenience for scripts and multi-threaded apps who want an HTTPRequest interface consistent with the async version).
Of course, I have it easy -- multiple incoming requests are dispatched to separate threads by the App Engine runtime, so I don't have to worry about multiplexing at that level at all -- just end user code that is essentially single-threaded unless they go out of their way.
I did end up debugging one user's problem where they were making a synchronous call inside an async handler, and -- very rarely! -- the recursive event loop calls kept stacking up until they hit a StackOverflowError. So I would agree that async code shouldn't make synchronous API calls; but I haven't heard yet from anyone who was otherwise hurt by the recursive event loop invocations -- in particular, nobody has requested locks.
I think that's because you don't have file descriptor support. In a (level-triggered) event loop if you don't drain the socket before reentering the loop then your read handler will be called again, which generally makes a mess. I suppose with coroutines you'd want edge-triggered instead of level-triggered though, which might make this problem go away.
For concreteness, here's a crude sketch of what the APIs I'm talking about would look like in use (in a hypothetical future version of tornado).
@future_wrap @gen.engine def async_http_client(url, callback): parsed_url = urlparse.urlsplit(url) # works the same whether the future comes from a thread pool or @future_wrap
And you need the thread pool because there's no async version of getaddrinfo(), right?
Right.
addrinfo = yield g_thread_pool.submit(socket.getaddrinfo, parsed_url.hostname, parsed_url.port) stream = IOStream(socket.socket()) yield stream.connect((addrinfo[0][-1])) stream.write('GET %s HTTP/1.0' % parsed_url.path)
Why no yield in front of the write() call?
Because we don't need to wait for the write to complete before we continue to the next statement. write() doesn't return anything; it just succeeds or fails, and if it fails the next read_until will fail too. (although in this case it wouldn't hurt to have the yield either)
I guess you have a certain kind of buffering built in to your stream? So if you make two write() calls without waiting in quick succession, does the system collapse these into one, or does it end up making two system calls, or what? In NDB, there's a similar issue with multiple RPCs that can be batched. I ended up writing an abstraction that automatically combines these; the call isn't actually made until there are no other runnable tasks. I've had to explain this a few times to users who try to get away with overlapping CPU work and I/O, but otherwise it's worked quite well.
Yes, IOStream does buffering for you. Each IOStream.write() call will generally result in a syscall, but once the outgoing socket buffer is full subsequent writes will be buffered in the IOStream and written when the IOLoop says the socket is writable. (the callback argument to write() can be used for flow control in this case) I used to defer the syscall until the IOLoop was idle to batch things up, but it turns out to be more efficient in practice to just write things out each time and let the higher level do its own buffering when appropriate. -Ben