[Python-ideas] An alternate approach to async IO
Trent Nelson
trent at snakebite.org
Tue Nov 27 23:19:53 CET 2012
On Tue, Nov 27, 2012 at 01:42:33PM -0800, Richard Oudkerk wrote:
> On 27/11/2012 8:19pm, Trent Nelson wrote:
> > Got it. So what about the "no processing that can be usefully done
> > by a C level thread" bit? I'm trying to discern whether or not you're
> > highlighting a fundamental flaw in the theory/idea;-)
> >
> > (That it's going to be more optimal to have background threads service
> > IO without the need to acquire the GIL, basically.)
>
> I mean that I don't understand what sort of "servicing" you expect the
> background threads to do.
>
> If you just mean consuming packets from GetQueuedCompletionStatus() and
> pushing them on an interlocked stack then why bother?
Theoretically: lower latency, higher throughput and better
scalability (additional cores improves both) than alternate
approaches when under load.
Let's just say the goal of the new async IO framework is to
be able to handle 65k simultaneous connections and/or saturate
multiple 10Gb Ethernet links (or 16Gb FC, or 300Gb IB) on a
system where a pure C/C++ solution using native libs (kqueue,
epoll, IOCP, GCD etc) *could* do that.
What async IO library of the future could come the closest?
That's sort of the thought process I had, which lead to this
idea.
We should definitely have a systematic way of benchmarking
this sort of stuff though, otherwise it's all conjecture.
On that note, I came across a very interesting presentation
a few weeks ago whilst doing research:
http://www.mailinator.com/tymaPaulMultithreaded.pdf
He makes some very interesting observations regarding contemporary
performance of non-blocking versus thousands-of-blocking-threads.
It highlights the importance of having a way to systematically test
assumptions like "IOCP will handle load better than WSAPoll".
Definitely worth the read. The TL;DR version is:
- Thousands of threads doing blocking IO isn't as bad as
everyone thinks. It used to suck, but these days, on
multicore machines and contemporary kernels, it ain't
so bad.
- Throughput is much better using blocking IO than non.
From Python's next-gen AIO perspective, I think it would be
useful to define our goals. Is absolute balls-to-the-wall
as-close-to-metal-as-possible performance (like 65k clients
or 1GB/s saturation) the ultimate goal?
If not, then what? Half that, but with scalability? Quarter of
that, but with a beautifully elegant/simple API?
Trent.
More information about the Python-ideas
mailing list