Re: [Python-ideas] Async API: some code to review

31 Oct 2012

      Ok, this is a good point: the more you can do without having to go
through the main loop again the better.

I already took this to heart in my recent rewrites of recv() and
send() -- they try to read/write the underlying socket first, and if
it works, the task isn't suspended; only if they receive EAGAIN or
something similar do they block the task and go back to the top.

In fact, Listener.accept() does the same thing -- meaning the loop can
go around many times without blocking a single time. (The listening
socket is in non-blocking mode so accept() will raise EAGAIN when
there *isn't* another client connection ready immediately.)

This is also one of the advantages of yield-from; you *never* go back
to the end of the ready queue just to invoke another layer of
abstraction. (Steve tries to approximate this by running the generator
immediately until the first yield, but the caller still ends up
suspending to the scheduler, because they are using yield which
doesn't avoid the suspension, unlike yield-from.)

--Guido

On Wed, Oct 31, 2012 at 3:07 AM, Kristján Valur Jónsson
<kristjan@ccpgames.com> wrote:
...
...
-----Original Message-----
From: gvanrossum@gmail.com [mailto:gvanrossum@gmail.com] On Behalf
Of Guido van Rossum
Sent: 30. október 2012 16:40
To: Kristján Valur Jónsson
Cc: Richard Oudkerk; python-ideas@python.org
Subject: Re: [Python-ideas] Async API: some code to review
What kind of time savings are we talking about? I imagine that the
accept() loop I put in tulip/echosvr.py is fast enough in terms of response
time (latency) -- throughput would seem the more important measure (and I
have no idea of this yet).
http://code.google.com/p/tulip/source/browse/echosvr.py#37
To be honest, it isn't serious for applications that serve few connections, but for things like web servers, It becomes important.
Looking at your code:
c
a) will always "block", causing the main thread (using the term loosely here) to once through the event loop, possibly doing other housekeepeing, even if a connection was available.  I don't think there is no way to selectively do completion based io, i.e. do immediate mode if possible.  You either go for one or the other on windows, at least.  in select based mecanisms it could be possible to do a select here first and avoid that extra loop, but for the sake of the application it might be confusing.  It might be best to stick to one system.
b) will either switch to the net task immediately (possible in stackless) or cause the srtart of t to wait until the next round in the event loop.
I this case, t will not start executing until after going around the loop twice.  A new connection can only be accepted each loop.  Imagine two http requests coming in simultaneously, at t=0
The sequence of operations will then be this (assuming FIFO scheduling)
main loop runs
accept 1 returns. task 1 created.  accept 2 scheduled
main loop runs making task 1 and accep2 runnable
task 1 runs.  does processing. performs send, and blocks
accept2 returns, task2 created
main loop runs, making task2 runnable
task2 runs, does processing, performs send.
Contributing to latency in this scenario are all the "main loop" runs.  Note that I may misunderstand the way your architecture works, perhaps there is no main loop, perhaps everything is interleaved.
An alternative something like this:
def loop():
        while True:
                conn, addr = yield from listener.accept()
                handler(conn, addr)
for I in range(n_handlers):
        t = scheduling.Task(loop)
        t.start()
Here, events will be different:
main loop runs, accept 1 and accept 2 runnable
accept 1 returns, stariting handler, processing and blocking on send
accept 2 returns, starting handler, processing, and blocking on send
As you see, there is only one initial housekeeping run needed to make both tasklets runnable and ready to run without interruption, giving the lowest possible total latency to the client.
In my expericene with RPC systems based this kind of asynchronous python IO, lowering the response time from when user space is made aware of the request and when python actually starts _processing_ it is critical to responsiveness..
Cheers
-- 
--Guido van Rossum (python.org/~guido)