[Python-ideas] The async API of the future

Sturla Molden sturla at molden.no
Mon Nov 5 15:19:31 CET 2012

On 03.11.2012 18:22, Antoine Pitrou wrote:

 >> With IOCP on Windows there is a thread-pool that continuously polls 
the i/o tasks
 >> for completion. So I think IOCPs might approach O(n) at some point.
 > Well, I don't know about the IOCP implementation, but "continuously
 > polling the I/O tasks" sounds like a costly way to do it (what system
 > call would that use?).

The polling uses the system call GetOverlappedResult, and if the task is 
unfinished, call Sleep(0) to release the time-slice and poll again.

Specifically, if the last argument to GetOverlappedResult is FALSE, and 
the return value is FALSE, we must call GetLastError to retrieve an 
error code. If GetLastError returns ERROR_IO_INCOMPLETE, we know that 
the task was not finished.

A bit more sophisticated: Put all these asynchronous i/o tasks in a fifo 
queue, and set up a thread-pool that pops tasks off the queue and polls 
with GetOverlappedResult and GetLastError. A task that is unfinished 
goes back into the queue. If a task is complete, the thread that popped 
it off the queue executes a callback. A thread-pool that operates like 
this will reduce/prevent the excessive number of context shifts in the 
kernel as multiple threads hammering on Sleep(0) would incur. Then 
invent a fancy name for this scheme, e.g. call it "I/O Completion Ports".

Then you notice that due to the queue, the latency is proportional to 
O(n) with n the number of pending i/o tasks in the "I/O Completion 
Port". To avoid this affecting the latency, you patch your program by 
setting up multiple "I/O Completion Ports", and reinvent the load 
balancer to distribute i/o tasks to multiple "ports". With a bit of 
work, the server will remain responsive and "rather scalable" as long as 
the server is still i/o bound. At the moment the number of i/o tasks 
makes the server go CPU bound, which will happen rather soon because of 
they way IOCPs operate, the computer overheats and goes up in smoke. And 
that is when the MBA manager starts to curse Windows as well, and 
finally agrees to use Linux or *BSD/Apple instead ;-)

 > If the kernel cooperates, no continuous polling
 > should be required.



My main problem with IOCP is that they provide the "wrong" signal. They 
tell us when I/O is completed. But then the work is already done, and 
how did we know when to start?

The asynch i/o in select, poll, epoll, kqueue, /dev/poll, etc. do the 
opposite. They inform us when to start an i/o task, which makes more 
sense to me at least.

Typically, programs that use IOCP must invent their own means of 
signalling "i/o ready to start", which might kill any advantage of using 
IOCPs over simpler means (e.g. blocking i/o).

This by the way makes me wonder what Windows SUA does? It is OpenBSD 
based. Does it have kqueue or /dev/poll? If so, there must be support 
for it in ntdll.dll, and we might use those functions instead of pesky 


More information about the Python-ideas mailing list