[Python-ideas] The async API of the future: Some thoughts from an ignorant Tornado user

Sun Oct 14 00:27:22 CEST 2012

(This is a response to GVR's Google+ post asking for ideas; I
apologize in advance if I come off as an ignorant programming newbie)

I am the author of Gate One (https://github.com/liftoff/GateOne/)
which makes extensive use of Tornado's asynchronous capabilities.  It
also uses multiprocessing and threading to a lesser extent.  The
biggest issue I've had trying to write asynchronous code for Gate One
is complexity.  Complexity creates problems with expressiveness which
results in code that, to me, feels un-Pythonic.  For evidence of this
I present the following example:  The retrieve_log_playback()
function:  http://bit.ly/W532m6 (link goes to Github)

All the function does is generate and return (to the client browser)
an HTML playback of their terminal session recording.  To do it
efficiently without blocking the event loop or slowing down all other
connected clients required loads of complexity (or maybe I'm just
ignorant of "a better way"--feel free to enlighten me).  In an ideal
world I could have just done something like this:

import async # The API of the future ;)
async.async_call(retrieve_log_playback, settings, tws,
mechanism=multiprocessing)
# tws == instance of tornado.web.WebSocketHandler that holds the open connection

...but instead I had to create an entirely separate function to act as
the multiprocessing.Process(), create a multiprocessing.Queue() to
shuffle data back and forth, watch a special file descriptor for
updates (so I can tell when the task is complete), and also create a
closure because the connection instance (aka 'tws') isn't pickleable.

After reading through these threads I feel much of the discussion is
over my head but as someone who will ultimately become a *user* of the
"async API of the future" I would like to share my thoughts...

My opinion is that the goal of any async module that winds up in
Python's standard library should be simplicity and portability.  In
terms of features, here's my 'async wishlist':

* I should not have to worry about what is and isn't pickleable when I
decide that a task should be performed asynchronously.
* I should be able to choose the type of event loop/async mechanism
that is appropriate for the task:  For CPU-bound tasks I'll probably
want to use multiprocessing.  For IO-bound tasks I might want to use
threading.  For a multitude of tasks that "just need to be async" (by
nature) I'll want to use an event loop.
* Any async module should support 'basics' like calling functions at
an interval and calling functions after a timeout occurs (with the
ability to cancel).
* Asynchronous tasks should be able to access the same namespace as
everything else.  Maybe wishful thinking.
* It should support publish/subscribe-style events (i.e. an event
dispatcher).  For example, the ability to watch a file descriptor or
socket for changes in state and call a function when that happens.
Preferably with the flexibility to define custom events (i.e don't
have it tied to kqueue/epoll-specific events).

Thanks for your consideration; and thanks for the awesome language.

--
Dan McDougall - Chief Executive Officer and Developer
Liftoff Software ✈ Your flight to the cloud is now boarding.
904-446-8323