Re: [Python-ideas] asyncore: included batteries don't fit

12 Oct 2012


      On Thu, Oct 11, 2012 at 3:28 PM, Guido van Rossum <guido@python.org> wrote:
...
On Mon, Oct 8, 2012 at 10:12 PM, Ben Darnell <ben@bendarnell.com> wrote:
...
On Mon, Oct 8, 2012 at 8:30 AM, Guido van Rossum <guido@python.org> wrote:
...
...
It's a Future constructor, a (conditional) add_done_callback, plus the
calls to set_result or set_exception and the with statement for error
handling.  In full:
def future_wrap(f):
    @functools.wraps(f)
    def wrapper(*args, **kwargs):
        future = Future()
        if kwargs.get('callback') is not None:
            future.add_done_callback(kwargs.pop('callback'))
        kwargs['callback'] = future.set_result
        def handle_error(typ, value, tb):
            future.set_exception(value)
            return True
        with ExceptionStackContext(handle_error):
            f(*args, **kwargs)
        return future
    return wrapper
Hmm... I *think* it automatically adds a special keyword 'callback' to
the *call* site so that you can do things like
fut = some_wrapped_func(blah, callback=my_callback)
and then instead of using yield to wait for the callback, put the
continuation of your code in the my_callback() function.
Yes.  Note that if you're passing in a callback you're probably going
to just ignore the return value.  The callback argument and the future
return value are essentially two alternative interfaces; it probably
doesn't make sense to use both at once (but as a library author it's
useful to provide both).
Definitely sounds like something that could be simplified if you
didn't have backward compatibility baggage...
Probably, although I still feel like callback-passing has its place.
For example, I think the Tornado chat demo
(https://github.com/facebook/tornado/blob/master/demos/chat/chatdemo.py)
would be less clear with coroutines and Futures than it is now
(although it would fit better into Greg's schedule/unschedule style).
That doesn't mean that every method has to take a callback, but I'd be
reluctant to get rid of them until we have more experience with the
generator/future-focused style.
...
...
...
...
...
...
In Tornado the Future is created by a decorator
and hidden from the asynchronous function (it just sees the callback),
Hm, interesting. NDB goes the other way, the callbacks are mostly used
to make Futures work, and most code (including large swaths of
internal code) uses Futures. I think NDB is similar to monocle here.
In NDB, you can do
f = <some function returning a Future>
  r = yield f
where "yield f" is mostly equivalent to f.result(), except it gives
better opportunity for concurrency.
Yes, tornado's gen.engine does the same thing here.  However, the
stakes are higher than "better opportunity for concurrency" - in an
event loop if you call future.result() without yielding, you'll
deadlock if that Future's task needs to run on the same event loop.
That would depend on the semantics of the event loop implementation.
In NDB's event loop, such a .result() call would just recursively
enter the event loop, and you'd only deadlock if you actually have two
pieces of code waiting for each other's completion.
Hmm, I think I'd rather deadlock. :)  If the event loop is reentrant
then the application code has be coded defensively as if it were
preemptively multithreaded, which introduces the possibility of
deadlock or (probably) more subtle/less frequent errors.  Reentrancy
has been a significant problem in my experience, so I've been moving
towards a policy where methods in Tornado that take a callback never
run it immediately; callbacks are always scheduled on the next
iteration of the IOLoop with IOLoop.add_callback.
The latter is a good tactic and I'm also using it. (Except for some
reason we had to add the concept of "immediate callbacks" to our
Future class, and those are run inside the set_result() call. But most
callbacks don't use that feature.)
I don't have a choice about making the event loop reentrant -- App
Engine's underlying RPC multiplexing implementation *is* reentrant,
and there is a large set of "classic" APIs that I cannot stop the user
from calling that reenter it. But even if my hand wasn't forced, I'm
not sure if I would make your choice. In NDB, there is a full
complement of synchronous APIs that exactly matches the async APIs,
and users are free to use the synchronous APIs in parts of their code
where they don't need concurrency. Hence, every sychronous API just
calls its async sibling and immediately waits for its result, which
implicitly invokes the event loop.
Tornado has a synchronous HTTPClient that does the same thing,
although each fetch creates and runs its own IOLoop rather than
spinning the top-level IOLoop.  (This means it doesn't really make
sense to run it when there is a top-level IOLoop; it's provided as a
convenience for scripts and multi-threaded apps who want an
HTTPRequest interface consistent with the async version).
...
Of course, I have it easy -- multiple incoming requests are dispatched
to separate threads by the App Engine runtime, so I don't have to
worry about multiplexing at that level at all -- just end user code
that is essentially single-threaded unless they go out of their way.
I did end up debugging one user's problem where they were making a
synchronous call inside an async handler, and -- very rarely! -- the
recursive event loop calls kept stacking up until they hit a
StackOverflowError. So I would agree that async code shouldn't make
synchronous API calls; but I haven't heard yet from anyone who was
otherwise hurt by the recursive event loop invocations -- in
particular, nobody has requested locks.
I think that's because you don't have file descriptor support.  In a
(level-triggered) event loop if you don't drain the socket before
reentering the loop then your read handler will be called again, which
generally makes a mess.  I suppose with coroutines you'd want
edge-triggered instead of level-triggered though, which might make
this problem go away.
...
...
...
...
For concreteness, here's a crude sketch of what the APIs I'm talking
about would look like in use (in a hypothetical future version of
tornado).
@future_wrap
@gen.engine
def async_http_client(url, callback):
    parsed_url = urlparse.urlsplit(url)
    # works the same whether the future comes from a thread pool or @future_wrap
And you need the thread pool because there's no async version of
getaddrinfo(), right?
Right.
...
...
addrinfo = yield g_thread_pool.submit(socket.getaddrinfo, parsed_url.hostname, parsed_url.port)
    stream = IOStream(socket.socket())
    yield stream.connect((addrinfo[0][-1]))
    stream.write('GET %s HTTP/1.0' % parsed_url.path)
Why no yield in front of the write() call?
Because we don't need to wait for the write to complete before we
continue to the next statement.  write() doesn't return anything; it
just succeeds or fails, and if it fails the next read_until will fail
too. (although in this case it wouldn't hurt to have the yield either)
I guess you have a certain kind of buffering built in to your stream?
So if you make two write() calls without waiting in quick succession,
does the system collapse these into one, or does it end up making two
system calls, or what? In NDB, there's a similar issue with multiple
RPCs that can be batched. I ended up writing an abstraction that
automatically combines these; the call isn't actually made until there
are no other runnable tasks. I've had to explain this a few times to
users who try to get away with overlapping CPU work and I/O, but
otherwise it's worked quite well.
Yes, IOStream does buffering for you.  Each IOStream.write() call will
generally result in a syscall, but once the outgoing socket buffer is
full subsequent writes will be buffered in the IOStream and written
when the IOLoop says the socket is writable.  (the callback argument
to write() can be used for flow control in this case)  I used to defer
the syscall until the IOLoop was idle to batch things up, but it turns
out to be more efficient in practice to just write things out each
time and let the higher level do its own buffering when appropriate.


-Ben