[Python-ideas] asyncore: included batteries don't fit

Mon Oct 8 06:44:27 CEST 2012

On Sun, Oct 7, 2012 at 7:01 PM, Guido van Rossum <guido at python.org> wrote:
> As long as it's not so low-level that other people shy away from it.

That depends on the target audience.  The low-level IOLoop and Reactor
are pretty similar -- you can implement one in terms of the other --
but as you move up the stack cross-compatibility becomes harder.  For
example, if I wanted to implement tornado's IOStreams in twisted, I
wouldn't start with the analogous class in twisted (Protocol?), I'd go
down to the Reactor and build from there, so putting something
IOStream or Protocol in asycore2 wouldn't do much to unify the two
worlds.  (it would help people build async stuff with the stdlib
alone, but at that point it becomes more like a peer or competitor to
tornado and twisted instead of a bridge between them)

>
> I also have a feeling that one way or another this will require
> cooperation between the Twisted and Tornado developers in order to
> come up with a compromise that both are willing to conform to in a
> meaningful way. (Unfortunately I don't know how to define "meaningful
> way" more precisely here. I guess the idea is that almost all things
> *using* an event loop use the standardized abstract API without caring
> whether underneath it's Tornado, Twisted, or some simpler thing in the
> stdlib.

I'd phrase the goal as being able to run both Tornado and Twisted in
the same thread without any piece of code needing to know about both
systems.  I think that's achievable as far as core functionality goes.
 I expect both sides have some lesser-used functionality that might
not make it into the stdlib version, but as long as it's possible to
plug in a "real" IOLoop or Reactor when needed it should be OK.

>
>> As for the higher-level question of what asynchronous code should look
>> like, there's a lot more room for spirited debate, and I don't think
>> there's enough consensus to declare a One True Way.  Personally, I'm
>> -1 on greenlets as a general solution (what if you have to call
>> MySQLdb or getaddrinfo?), although they can be useful in particular
>> cases to convert well-behaved synchronous code into async (as in
>> Motor: http://emptysquare.net/blog/introducing-motor-an-asynchronous-mongodb-driver-for-python-and-tornado/).
>
> Agreed on both counts.
>
>>  I like Futures, though, and I find that they work well in
>> asynchronous code.  The use of the result() method to encapsulate both
>> successful responses and exceptions is especially nice with generator
>> coroutines.
>
> Yay!
>
>> FWIW, here's the interface I'm moving towards for async code.  From
>> the caller's perspective, asynchronous functions return a Future (the
>> future has to be constructed by hand since there is no Executor
>> involved),
>
> Ditto for NDB (though there's a decorator that often takes care of the
> future construction).
>
>> and also take an optional callback argument (mainly for
>> consistency with currently-prevailing patterns for async code; if the
>> callback is given it is simply added to the Future with
>> add_done_callback).
>
> That's interesting. I haven't found the need for this yet. Is it
> really so common that you can't write this as a Future() constructor
> plus a call to add_done_callback()? Or is there some subtle semantic
> difference?

It's a Future constructor, a (conditional) add_done_callback, plus the
calls to set_result or set_exception and the with statement for error
handling.  In full:

def future_wrap(f):
    @functools.wraps(f)
    def wrapper(*args, **kwargs):
        future = Future()
        if kwargs.get('callback') is not None:
            future.add_done_callback(kwargs.pop('callback'))
        kwargs['callback'] = future.set_result
        def handle_error(typ, value, tb):
            future.set_exception(value)
            return True
        with ExceptionStackContext(handle_error):
            f(*args, **kwargs)
        return future
    return wrapper

>
>> In Tornado the Future is created by a decorator
>> and hidden from the asynchronous function (it just sees the callback),
>
> Hm, interesting. NDB goes the other way, the callbacks are mostly used
> to make Futures work, and most code (including large swaths of
> internal code) uses Futures. I think NDB is similar to monocle here.
> In NDB, you can do
>
>   f = <some function returning a Future>
>   r = yield f
>
> where "yield f" is mostly equivalent to f.result(), except it gives
> better opportunity for concurrency.

Yes, tornado's gen.engine does the same thing here.  However, the
stakes are higher than "better opportunity for concurrency" - in an
event loop if you call future.result() without yielding, you'll
deadlock if that Future's task needs to run on the same event loop.

>
>> although this relies on some Tornado-specific magic for exception
>> handling.  In a coroutine, the decorator recognizes Futures and
>> resumes execution when the future is done.  With these decorators
>> asynchronous code looks almost like synchronous code, except for the
>> "yield" keyword before each asynchronous call.
>
> Yes! Same here.
>
> I am currently trying to understand if using "yield from" (and
> returning a value from a generator) will simplify things. For example
> maybe the need for a special decorator might go away. But I keep
> getting headaches -- perhaps there's a Monad involved. :-)

I think if you build generator handling directly into the event loop
and use "yield from" for calls from one async function to another then
you can get by without any decorators.  But I'm not sure if you can do
that and maintain any compatibility with existing non-generator async
code.

I think the ability to return from a generator is actually a bigger
deal than "yield from" (and I only learned about it from another
python-ideas thread today).  The only reason a generator decorated
with @tornado.gen.engine needs a callback passed in to it is to act as
a psuedo-return, and a real return would prevent the common mistake of
running the callback then falling through to the rest of the function.

For concreteness, here's a crude sketch of what the APIs I'm talking
about would look like in use (in a hypothetical future version of
tornado).

@future_wrap
@gen.engine
def async_http_client(url, callback):
    parsed_url = urlparse.urlsplit(url)
    # works the same whether the future comes from a thread pool or @future_wrap
    addrinfo = yield g_thread_pool.submit(socket.getaddrinfo,
parsed_url.hostname, parsed_url.port)
    stream = IOStream(socket.socket())
    yield stream.connect((addrinfo[0][-1]))
    stream.write('GET %s HTTP/1.0' % parsed_url.path)
    header_data = yield stream.read_until('\r\n\r\n')
    headers = parse_headers(header_data)
    body_data = yield stream.read_bytes(int(headers['Content-Length']))
    stream.close()
    callback(body_data)

# another function to demonstrate composability
@future_wrap
@gen.engine
def fetch_some_urls(url1, url2, url3, callback):
    body1 = yield async_http_client(url1)
    # yield a list of futures for concurrency
    future2 = yield async_http_client(url2)
    future3 = yield async_http_client(url3)
    body2, body3 = yield [future2, future3]
    callback((body1, body2, body3))

One hole in this design is how to deal with callbacks that are run
multiple times.  For example, the IOStream read methods take both a
regular callback and an optional streaming_callback (which is called
with each chunk of data as it arrives).  I think this needs to be
modeled as something like an iterator of Futures, but I haven't worked
out the details yet.

-Ben

>
> --
> --Guido van Rossum (python.org/~guido)