[Python-ideas] An alternate approach to async IO

Wed Nov 28 22:18:48 CET 2012

On Wed, Nov 28, 2012 at 1:02 PM, Trent Nelson <trent at snakebite.org> wrote:

> On Wed, Nov 28, 2012 at 12:49:51PM -0800, Guido van Rossum wrote:
> > On Wed, Nov 28, 2012 at 12:32 PM, Trent Nelson <trent at snakebite.org>
> wrote:
> > >     Right, so, I'm arguing that with my approach, because the
> background
> > >     IO thread stuff is as optimal as it can be -- more IO events would
> > >     be available per event loop iteration, and the latency between the
> > >     event occurring versus when the event loop picks it up would be
> > >     reduced.  The theory being that that will result in higher through-
> > >     put and lower latency in practice.
> > >
> > >     Also, from a previous e-mail, this:
> > >
> > >         with aio.open('1GB-file-on-a-fast-SSD.raw', 'rb') as f:
> > >             data = f.read()
> > >
> > >     Or even just:
> > >
> > >         with aio.open('/dev/zero', 'rb') as f:
> > >             data = f.read(1024 * 1024 * 1024)
> > >
> > >     Would basically complete as fast as it physically possible to read
> > >     the bytes off the device.  If you've got 16+ cores, then you'll
> have
> > >     16 cores able to service IO interrupts in parallel.  So, the
> overall
> > >     time to suck in a chunk of data will be vastly reduced.
> > >
> > >     There's no other way to get this sort of performance without taking
> > >     my approach.
> >
> > So there's something I fundamentally don't understand. Why do those
> > calls, made synchronously in today's CPython, not already run as fast
> > as you can get the bytes off the device? I assume it's just a transfer
> > from kernel memory to user memory. So what is the advantage of using
> > aio over
> >
> >   with open(<file>, 'rb') as f:
> >       data = f.read()
>
>     Ah, right.  That's where the OVERLAPPED aspect comes into play.
>     (Other than Windows and AIX, I don't think any other OS provides
>      an overlapped IO facility?)
>
>     The difference being, instead of having one thread writing to a 1GB
>     buffer, 4KB at a time, you have 16 threads writing to an overlapped
>     1GB buffer, 4KB at a time.
>
>     (Assuming you have 16+ cores, and IO interrupts are coming in whilst
>      existing threads are still servicing previous completions.)
>
>         Trent.
>

Aha. So these are kernel threads? Is the bandwidth of the I/O channel
really higher than one CPU can copy bytes across a user/kernel boundary?

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20121128/1bc1b7f3/attachment.html>