On Wed, Nov 28, 2012 at 1:02 PM, Trent Nelson <trent@snakebite.org> wrote:

On Wed, Nov 28, 2012 at 12:49:51PM -0800, Guido van Rossum wrote:
> On Wed, Nov 28, 2012 at 12:32 PM, Trent Nelson <trent@snakebite.org> wrote:
> > Right, so, I'm arguing that with my approach, because the background
> > IO thread stuff is as optimal as it can be -- more IO events would
> > be available per event loop iteration, and the latency between the
> > event occurring versus when the event loop picks it up would be
> > reduced. The theory being that that will result in higher through-
> > put and lower latency in practice.
> >
> > Also, from a previous e-mail, this:
> >
> > with aio.open('1GB-file-on-a-fast-SSD.raw', 'rb') as f:
> > data = f.read()
> >
> > Or even just:
> >
> > with aio.open('/dev/zero', 'rb') as f:
> > data = f.read(1024 * 1024 * 1024)
> >
> > Would basically complete as fast as it physically possible to read
> > the bytes off the device. If you've got 16+ cores, then you'll have
> > 16 cores able to service IO interrupts in parallel. So, the overall
> > time to suck in a chunk of data will be vastly reduced.
> >
> > There's no other way to get this sort of performance without taking
> > my approach.
>
> So there's something I fundamentally don't understand. Why do those
> calls, made synchronously in today's CPython, not already run as fast
> as you can get the bytes off the device? I assume it's just a transfer
> from kernel memory to user memory. So what is the advantage of using
> aio over
>
> with open(<file>, 'rb') as f:
> data = f.read()

Ah, right. That's where the OVERLAPPED aspect comes into play.
(Other than Windows and AIX, I don't think any other OS provides
an overlapped IO facility?)

The difference being, instead of having one thread writing to a 1GB
buffer, 4KB at a time, you have 16 threads writing to an overlapped
1GB buffer, 4KB at a time.

(Assuming you have 16+ cores, and IO interrupts are coming in whilst
existing threads are still servicing previous completions.)

Trent.