[Python-ideas] asyncore: included batteries don't fit

Mon Oct 8 17:30:12 CEST 2012

On Sun, Oct 7, 2012 at 9:44 PM, Ben Darnell <ben at bendarnell.com> wrote:
> On Sun, Oct 7, 2012 at 7:01 PM, Guido van Rossum <guido at python.org> wrote:
>> As long as it's not so low-level that other people shy away from it.
>
> That depends on the target audience.  The low-level IOLoop and Reactor
> are pretty similar -- you can implement one in terms of the other --
> but as you move up the stack cross-compatibility becomes harder.  For
> example, if I wanted to implement tornado's IOStreams in twisted, I
> wouldn't start with the analogous class in twisted (Protocol?), I'd go
> down to the Reactor and build from there, so putting something
> IOStream or Protocol in asycore2 wouldn't do much to unify the two
> worlds.  (it would help people build async stuff with the stdlib
> alone, but at that point it becomes more like a peer or competitor to
> tornado and twisted instead of a bridge between them)

Sure. And of course we can't expect Twisted and Tornado to just merge
projects. They each have different strengths and weaknesses and they
each have strong opinions on how things should be done. I do get your
point that none of that is incompatible with a shared reactor
specification.

>> I also have a feeling that one way or another this will require
>> cooperation between the Twisted and Tornado developers in order to
>> come up with a compromise that both are willing to conform to in a
>> meaningful way. (Unfortunately I don't know how to define "meaningful
>> way" more precisely here. I guess the idea is that almost all things
>> *using* an event loop use the standardized abstract API without caring
>> whether underneath it's Tornado, Twisted, or some simpler thing in the
>> stdlib.
>
> I'd phrase the goal as being able to run both Tornado and Twisted in
> the same thread without any piece of code needing to know about both
> systems.  I think that's achievable as far as core functionality goes.
>  I expect both sides have some lesser-used functionality that might
> not make it into the stdlib version, but as long as it's possible to
> plug in a "real" IOLoop or Reactor when needed it should be OK.

Sounds good. I think a reactor is always going to be an extension of
the shared spec.

[...]
>> That's interesting. I haven't found the need for this yet. Is it
>> really so common that you can't write this as a Future() constructor
>> plus a call to add_done_callback()? Or is there some subtle semantic
>> difference?
>
> It's a Future constructor, a (conditional) add_done_callback, plus the
> calls to set_result or set_exception and the with statement for error
> handling.  In full:
>
> def future_wrap(f):
>     @functools.wraps(f)
>     def wrapper(*args, **kwargs):
>         future = Future()
>         if kwargs.get('callback') is not None:
>             future.add_done_callback(kwargs.pop('callback'))
>         kwargs['callback'] = future.set_result
>         def handle_error(typ, value, tb):
>             future.set_exception(value)
>             return True
>         with ExceptionStackContext(handle_error):
>             f(*args, **kwargs)
>         return future
>     return wrapper

Hmm... I *think* it automatically adds a special keyword 'callback' to
the *call* site so that you can do things like

  fut = some_wrapped_func(blah, callback=my_callback)

and then instead of using yield to wait for the callback, put the
continuation of your code in the my_callback() function. But it also
seems like it passes callback=future.set_result as the callback to the
wrapped function, which looks to me like that function was apparently
written before Futures were widely used. This seems pretty impure to
me and I'd like to propose a "future" where such functions either be
given the Future where the result is expected, or (more commonly) the
function would create the Future itself.

Unless I'm totally missing the programming model here.

PS. I'd like to learn more about ExceptionStackContext() -- I've
struggled somewhat with getting decent tracebacks in NDB.

>>> In Tornado the Future is created by a decorator
>>> and hidden from the asynchronous function (it just sees the callback),
>>
>> Hm, interesting. NDB goes the other way, the callbacks are mostly used
>> to make Futures work, and most code (including large swaths of
>> internal code) uses Futures. I think NDB is similar to monocle here.
>> In NDB, you can do
>>
>>   f = <some function returning a Future>
>>   r = yield f
>>
>> where "yield f" is mostly equivalent to f.result(), except it gives
>> better opportunity for concurrency.
>
> Yes, tornado's gen.engine does the same thing here.  However, the
> stakes are higher than "better opportunity for concurrency" - in an
> event loop if you call future.result() without yielding, you'll
> deadlock if that Future's task needs to run on the same event loop.

That would depend on the semantics of the event loop implementation.
In NDB's event loop, such a .result() call would just recursively
enter the event loop, and you'd only deadlock if you actually have two
pieces of code waiting for each other's completion.

[...]
>> I am currently trying to understand if using "yield from" (and
>> returning a value from a generator) will simplify things. For example
>> maybe the need for a special decorator might go away. But I keep
>> getting headaches -- perhaps there's a Monad involved. :-)
>
> I think if you build generator handling directly into the event loop
> and use "yield from" for calls from one async function to another then
> you can get by without any decorators.  But I'm not sure if you can do
> that and maintain any compatibility with existing non-generator async
> code.
>
> I think the ability to return from a generator is actually a bigger
> deal than "yield from" (and I only learned about it from another
> python-ideas thread today).  The only reason a generator decorated
> with @tornado.gen.engine needs a callback passed in to it is to act as
> a psuedo-return, and a real return would prevent the common mistake of
> running the callback then falling through to the rest of the function.

Ah, so you didn't come up with the clever hack of raising an exception
to signify the return value. In NDB, you raise StopIteration (though
it is given the alias 'Return' for clarity) with an argument, and the
wrapper code that is responsible for the Future takes the value from
the StopIteration exception and passes it to the Future's
set_result().

> For concreteness, here's a crude sketch of what the APIs I'm talking
> about would look like in use (in a hypothetical future version of
> tornado).
>
> @future_wrap
> @gen.engine
> def async_http_client(url, callback):
>     parsed_url = urlparse.urlsplit(url)
>     # works the same whether the future comes from a thread pool or @future_wrap

And you need the thread pool because there's no async version of
getaddrinfo(), right?

>     addrinfo = yield g_thread_pool.submit(socket.getaddrinfo, parsed_url.hostname, parsed_url.port)
>     stream = IOStream(socket.socket())
>     yield stream.connect((addrinfo[0][-1]))
>     stream.write('GET %s HTTP/1.0' % parsed_url.path)

Why no yield in front of the write() call?

>     header_data = yield stream.read_until('\r\n\r\n')
>     headers = parse_headers(header_data)
>     body_data = yield stream.read_bytes(int(headers['Content-Length']))
>     stream.close()
>     callback(body_data)
>
> # another function to demonstrate composability
> @future_wrap
> @gen.engine
> def fetch_some_urls(url1, url2, url3, callback):
>     body1 = yield async_http_client(url1)
>     # yield a list of futures for concurrency
>     future2 = yield async_http_client(url2)
>     future3 = yield async_http_client(url3)
>     body2, body3 = yield [future2, future3]
>     callback((body1, body2, body3))

This second one is nearly identical to the way we it's done in NDB.
However I think you have a typo -- I doubt that there should be yields
on the lines creating future2 and future3.

> One hole in this design is how to deal with callbacks that are run
> multiple times.  For example, the IOStream read methods take both a
> regular callback and an optional streaming_callback (which is called
> with each chunk of data as it arrives).  I think this needs to be
> modeled as something like an iterator of Futures, but I haven't worked
> out the details yet.

Ah. Yes, that's a completely different kind of thing, and probably
needs to be handled in a totally different way. I think it probably
needs to be modeled more like an infinite loop where at the blocking
point (e.g. a low-level read() or accept() call) you yield a Future.
Although I can see that this doesn't work well with the IOLoop's
concept of file descriptor (or other event source) registration.

-- 
--Guido van Rossum (python.org/~guido)