[Python-ideas] An alternate approach to async IO

Tue Nov 27 23:19:53 CET 2012

On Tue, Nov 27, 2012 at 01:42:33PM -0800, Richard Oudkerk wrote:
> On 27/11/2012 8:19pm, Trent Nelson wrote:
> >      Got it.  So what about the "no processing that can be usefully done
> >      by a C level thread" bit?  I'm trying to discern whether or not you're
> >      highlighting a fundamental flaw in the theory/idea;-)
> >
> >      (That it's going to be more optimal to have background threads service
> >       IO without the need to acquire the GIL, basically.)
> 
> I mean that I don't understand what sort of "servicing" you expect the 
> background threads to do.
> 
> If you just mean consuming packets from GetQueuedCompletionStatus() and 
> pushing them on an interlocked stack then why bother?

    Theoretically: lower latency, higher throughput and better
    scalability (additional cores improves both) than alternate
    approaches when under load.

    Let's just say the goal of the new async IO framework is to
    be able to handle 65k simultaneous connections and/or saturate
    multiple 10Gb Ethernet links (or 16Gb FC, or 300Gb IB) on a
    system where a pure C/C++ solution using native libs (kqueue,
    epoll, IOCP, GCD etc) *could* do that.

    What async IO library of the future could come the closest?
    That's sort of the thought process I had, which lead to this
    idea.

    We should definitely have a systematic way of benchmarking
    this sort of stuff though, otherwise it's all conjecture.

    On that note, I came across a very interesting presentation
    a few weeks ago whilst doing research:

        http://www.mailinator.com/tymaPaulMultithreaded.pdf

    He makes some very interesting observations regarding contemporary
    performance of non-blocking versus thousands-of-blocking-threads.
    It highlights the importance of having a way to systematically test
    assumptions like "IOCP will handle load better than WSAPoll".

    Definitely worth the read.  The TL;DR version is:

        - Thousands of threads doing blocking IO isn't as bad as
          everyone thinks.  It used to suck, but these days, on
          multicore machines and contemporary kernels, it ain't
          so bad.
        - Throughput is much better using blocking IO than non.

    From Python's next-gen AIO perspective, I think it would be
    useful to define our goals.  Is absolute balls-to-the-wall
    as-close-to-metal-as-possible performance (like 65k clients
    or 1GB/s saturation) the ultimate goal?

    If not, then what?  Half that, but with scalability?  Quarter of
    that, but with a beautifully elegant/simple API?

        Trent.