[Python-ideas] Async API: some code to review

Thu Nov 1 16:44:48 CET 2012

On Wed, Oct 31, 2012 at 3:36 PM, Steve Dower <Steve.Dower at microsoft.com> wrote:
> Guido van Rossum wrote:
> There is only one reason to use 'yield from' and that is for the performance optimisation, which I do acknowledge and did observe in my own benchmarks.

Actually, it is not just optimization. The logic of the scheduler also
becomes much simpler.

> I know I've been vague about our intended application (deliberately so, to try and keep the discussion neutral), but I'll lay out some details.

Actually I wish you'd written this sooner. I don't know about you, but
my brain has a hard time understanding abstractions that are presented
without concrete use cases and implementations alongside; OTOH I
delight in taking a concrete mess and extract abstractions from it.
(The Twisted guys are also masters at this.)

So far I didn't really "get" the reasons you brought up for some of
complications you introduced (like multiple Future implementations).
Now I think I'm glimpsing your reasons.

> We're working on adding support for Windows 8 apps (formerly known as Metro) written in Python. These will use the new API (WinRT) which is highly asynchronous - even operations such as opening a file are only* available as an asynchronous function. The intention is to never block on the UI thread.

Interesting. The lack of synchronous wrappers does seem a step back,
but is probably useful as a forcing function given the desire to keep
the UI responsive at all times.

> (* Some synchronous Win32 APIs are still available from C++, but these are actively discouraged and restricted in many ways. Most of Win32 is not usable.)
>
> The model used for these async APIs is future-based: every *Async() function returns a future for a task that is already running. The caller is not allowed to wait on this future - the only option is to attach a callback. C# and VB use their async/await keywords (good 8 min intro video on those: http://www.visualstudiolaunch.com/vs2012vle/Theater?sid=1778) while JavaScript and C++ have multi-line lambda support.

Erik Meijer introduced me to async/await on Elba two months ago. I was
very excited to recognize exactly what I'd done for NDB with @tasklet
and yield, supported by the type checking.

> For Python, we are aiming for closer to the async/await model (which is also how we chose the names).

If we weren't so reluctant to introduce new keywords in Python we
might introduce await as an alias for yield from in the future.

> Incidentally, our early designs used yield from exclusively. It was only when we started discovering edge-cases where things broke, as well as the impact on code 'cleanliness', that we switched to yield.

Very interesting. I'd love to see a much longer narrative on this.
(You can send it to me directly if you feel it would distract the list
or if you feel it's inappropriate to share widely. I'll keep it under
my hat as long as you say so.)

> There are three aspects of this that work better and result in cleaner code with wattle than with tulip:
>
>  - event handlers can be "async-void", such that when the event is raised by the OS/GUI/device/whatever the handler can use asynchronous tasks without blocking the main thread.

I think this is "fire-and-forget"? I.e. you initiate an action and
then just let it run until completion without ever checking the
result? In tulip you currently do that by wrapping it in a Task and
calling its start() method. (BTW I think I'm going to get rid of
start() -- creating a Task should just start it.)

> In this case, the caller receives a future but ignores it because it does not care about the final result. (We could achieve this under 'yield from' by requiring a decorator, which would then probably prevent other Python code from calling the handler directly. There is very limited opportunity for us to reliably intercept this case.)

Are you saying that this property (you don't wait for the result) is
required by the operation rather than an option for the user? I'm only
familiar with the latter -- e.g. I can imagine firing off an operation
that writes a log entry somewhere but not caring about whether it
succeeded -- but I would still make it *possible* to check on the
operation if the caller cares (what if it's a very important log
message?).

If there's no option for the caller, the API should present itself as
a regular function/method and the task-spawning part should be hidden
inside it -- I see no need for the caller to know about this.

What exactly do you mean by "reliably intercept this case" ? A
concrete example would help.

>  - the event loop is implemented by the OS. Our Scheduler implementation does not need to provide an event loop, since we can submit() calls to the OS-level loop. This pattern also allows wattle to 'sit on top of' any other event loop, probably including Twisted and 0MQ, though I have not tried it (except with Tcl).

Ok, so what is the API offered by the OS event loop? I really want to
make sure that tulip can interface with strange event loops, and this
may be the most concrete example so far -- and it may be an important
one.

>  - Future objects can be marshalled directly from Python into Windows, completing the interop story.

What do you mean by marshalled here? Surely not the stdlib marshal
module. Do you just mean that Future objects can be recognized by the
foreign-function interface and wrapped by / copied into native Windows
8 datatypes?

I understand your event loop understands Futures? All of them? Or only
the ones of the specific type that it also returns?

> Even with tulip, we would probably still require a decorator for this case so that we can marshal regular generators as iterables (for which there is a specific type).

I can't quite follow you here, probably due to lack of imagination on
my part. Can you help me with a (somewhat) concrete example?

> Without a decorator, we would probably have to ban both cases to prevent subtly misbehaving programs.

Concrete example?

> At least with wattle, the user does not have to do anything different from any of their other @async functions.

This is because you can put type checks inside @async, which sees the
function object before it's called, rather than the scheduler, which
only sees what it returned, right? That's a trick I use in NDB as well
and I think tulip will end up requiring a decorator too -- but it will
just "mark" the function rather than wrap it in another one, unless
the function is not a generator (in which case it will probably have
to wrap it in something that is a generator). I could imagine a debug
version of the decorator that added wrappers in all cases though.

> Despite this intended application, I have tried to approach this design task independently to produce an API that will work for many cases, especially given the narrow focus on sockets. If people decide to get hung up on "the Microsoft way" or similar rubbish then I will feel vindicated for not mentioning it earlier :-) - it has not had any more influence on wattle than any of my other past experience has.

No worries about that. I agree that we need concrete examples that
takes us beyond the world of sockets; it's just that sockets are where
most of the interest lies (Tornado is a webserver, Twisted is often
admired because of its implementations of many internet protocols,
people benchmark async frameworks on how many HTTP requests per second
they can serve) and I haven't worked with any type of GUI framework in
a very long time. (Kudos for trying your way Tk!)

-- 
--Guido van Rossum (python.org/~guido)