[Python-ideas] The async API of the future
Sturla Molden
sturla at molden.no
Mon Nov 5 15:19:31 CET 2012
On 03.11.2012 18:22, Antoine Pitrou wrote:
>> With IOCP on Windows there is a thread-pool that continuously polls
the i/o tasks
>> for completion. So I think IOCPs might approach O(n) at some point.
>
> Well, I don't know about the IOCP implementation, but "continuously
> polling the I/O tasks" sounds like a costly way to do it (what system
> call would that use?).
The polling uses the system call GetOverlappedResult, and if the task is
unfinished, call Sleep(0) to release the time-slice and poll again.
Specifically, if the last argument to GetOverlappedResult is FALSE, and
the return value is FALSE, we must call GetLastError to retrieve an
error code. If GetLastError returns ERROR_IO_INCOMPLETE, we know that
the task was not finished.
A bit more sophisticated: Put all these asynchronous i/o tasks in a fifo
queue, and set up a thread-pool that pops tasks off the queue and polls
with GetOverlappedResult and GetLastError. A task that is unfinished
goes back into the queue. If a task is complete, the thread that popped
it off the queue executes a callback. A thread-pool that operates like
this will reduce/prevent the excessive number of context shifts in the
kernel as multiple threads hammering on Sleep(0) would incur. Then
invent a fancy name for this scheme, e.g. call it "I/O Completion Ports".
Then you notice that due to the queue, the latency is proportional to
O(n) with n the number of pending i/o tasks in the "I/O Completion
Port". To avoid this affecting the latency, you patch your program by
setting up multiple "I/O Completion Ports", and reinvent the load
balancer to distribute i/o tasks to multiple "ports". With a bit of
work, the server will remain responsive and "rather scalable" as long as
the server is still i/o bound. At the moment the number of i/o tasks
makes the server go CPU bound, which will happen rather soon because of
they way IOCPs operate, the computer overheats and goes up in smoke. And
that is when the MBA manager starts to curse Windows as well, and
finally agrees to use Linux or *BSD/Apple instead ;-)
> If the kernel cooperates, no continuous polling
> should be required.
Indeed.
However:
My main problem with IOCP is that they provide the "wrong" signal. They
tell us when I/O is completed. But then the work is already done, and
how did we know when to start?
The asynch i/o in select, poll, epoll, kqueue, /dev/poll, etc. do the
opposite. They inform us when to start an i/o task, which makes more
sense to me at least.
Typically, programs that use IOCP must invent their own means of
signalling "i/o ready to start", which might kill any advantage of using
IOCPs over simpler means (e.g. blocking i/o).
This by the way makes me wonder what Windows SUA does? It is OpenBSD
based. Does it have kqueue or /dev/poll? If so, there must be support
for it in ntdll.dll, and we might use those functions instead of pesky
IOCPs.
Sturla
More information about the Python-ideas
mailing list