[Python-ideas] An alternate approach to async IO

Wed Nov 28 03:11:40 CET 2012

On Tue, Nov 27, 2012 at 04:44:05PM -0800, Guido van Rossum wrote:
> On Tue, Nov 27, 2012 at 4:15 PM, Trent Nelson <trent at snakebite.org> wrote:
> >     The rationale for all of this is that this approach should scale
> >     better when heavily loaded (i.e. tens of thousands of connections
> >     and/or Gb/s traffic).  When you're dealing with that sort of load
> >     on a many-core machine (let's say 16+ cores), an interlocked list
> >     is going to reduce latency versus 16+ threads constantly vying for
> >     the GIL.
> >
> >     (That's the theory, at least.)
> 
> But why would you need 15 cores to shuffle the bytes around when you
> have only 1 to run the Python code that responds to those bytes?

    There are a few advantages.  For one, something like this:

        with aio.open('1GB-file-on-a-fast-SSD.raw', 'r') as f:
            data = f.read()

    Or even just:

        with aio.open('/dev/zero', 'rb') as f:
            data = f.read(1024 * 1024 * 1024)

    Would basically complete as fast as it physically possible to read
    the bytes off the device.  That's pretty cool.  Ditto for write.

    Sturla touched on some of the other advantages regarding cache
    locality, reduced context switching and absence of any lock
    contention.

    When using the `for event in aio.events()` approach, sure, you've
    only got one Python thread, but nothing blocks, and you'll be able
    to churn away on as many events per second as a single core allows.

    On more powerful boxes, you'll eventually hit a limit where the
    single core event loop can't keep up with the data being serviced by
    16+ threads.  That's where this chunk of my original e-mail becomes
    relevant:

        > So, let's assume that's all implemented and working in 3.4.  The
        > drawback of this approach is that even though we've allowed for
        > some actual threaded concurrency via background IO threads, the
        > main Python code that loops over aio.events() is still limited
        > to executing on a single core.  Albeit, in a very tight loop that
        > never blocks and would probably be able to process an insane number
        > of events per second when pegging a single core at 100%.

        > So, that's 3.4.  Perhaps in 3.5 we could add automatic support for
        > multiprocessing once the number of events per-poll reach a certain
        > threshold.  The event loop automatically spreads out the processing
        > of events via multiprocessing, facilitating multiple core usage both
        > via background threads *and* Python code.  (And we could probably do
        > some optimizations such that the background IO thread always queues
        > up events for the same multiprocessing instance -- which would yield
        > even more benefits if we had fancy "buffer inheritance" stuff that
        > removes the need to continually copy data from the background IO
        > buffers to the foreground CPython code.)

    So, down the track, we could explore options for future scaling via
    something like multiprocessing when the number of incoming events
    exceeds the ability of the single core `for event in aio.events()`
    event loop.

        Trent.