[Python-ideas] An alternate approach to async IO

Wed Nov 28 21:32:39 CET 2012

On Wed, Nov 28, 2012 at 12:15:22PM -0800, Guido van Rossum wrote:
> On Wed, Nov 28, 2012 at 12:05 PM, Trent Nelson <trent at snakebite.org> wrote:
> > On Wed, Nov 28, 2012 at 07:59:04AM -0800, Guido van Rossum wrote:
> >> OK, now I see. (I thought that was how everyone was using IOCP.
> >> Apparently not?) However, the "short busy wait" worries me. What if
> >> your app *doesn't* get a lot of requests?
> >
> >     From my response to Richard's concern re: busy waits:
> >
> >     Oooer, that's definitely not what I had in mind.  This is how I
> >     envisioned it working (think of events() as similar to poll()):
> >
> >         with aio.events() as events:
> >             for event in events:
> >                 # process event
> >                 ...
> >
> >     That aio.events() call would result in an InterlockedSListFlush,
> >     returning the entire list of available events.  It then does the
> >     conversion into a CPython event type, bundles everything into a
> >     list, then returns.
> >
> >     (In reality, there'd be a bit more glue to handle an empty list
> >      a bit more gracefully, and probably a timeout to aio.events().
> >      Nothing should involve a spinlock though.)
> >
> >> Isn't the alternative to have a "thread pool" with just one thread,
> >> which runs all the Python code and gets woken up by IOCP when it is
> >> idle and there is a new event? How is Trent's proposal an improvement?
> >
> >     I don't really understand this suggestion :/  It's sort of in line
> >     with how IOCP is used currently, i.e. "let me tell you when I'm
> >     ready to process events", which I'm advocating against with this
> >     idea.
> 
> Well, but since the proposal also seems to be to keep all Python code
> in one thread, that thread still has to say when it's ready to process
> events.

    Right, so, I'm arguing that with my approach, because the background
    IO thread stuff is as optimal as it can be -- more IO events would
    be available per event loop iteration, and the latency between the
    event occurring versus when the event loop picks it up would be
    reduced.  The theory being that that will result in higher through-
    put and lower latency in practice.

    Also, from a previous e-mail, this:

        with aio.open('1GB-file-on-a-fast-SSD.raw', 'rb') as f:
            data = f.read()

    Or even just:

        with aio.open('/dev/zero', 'rb') as f:
            data = f.read(1024 * 1024 * 1024)

    Would basically complete as fast as it physically possible to read
    the bytes off the device.  If you've got 16+ cores, then you'll have
    16 cores able to service IO interrupts in parallel.  So, the overall
    time to suck in a chunk of data will be vastly reduced.

    There's no other way to get this sort of performance without taking
    my approach.

> So, again, what's the big deal? Maybe we just need benchmarks
> showing events processed per second for various configurations...

    Definitely agree with the need for benchmarks.  (I'm going to set up
    an 8-core Snakebite box w/ Windows 2012 server specifically for this
    purpose, I think.)

        Trent.