<br><br><div class="gmail_quote">On Wed, Nov 28, 2012 at 1:02 PM, Trent Nelson <span dir="ltr"><<a href="mailto:trent@snakebite.org" target="_blank">trent@snakebite.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div class="HOEnZb"><div class="h5">On Wed, Nov 28, 2012 at 12:49:51PM -0800, Guido van Rossum wrote:<br>

> On Wed, Nov 28, 2012 at 12:32 PM, Trent Nelson <<a href="mailto:trent@snakebite.org">trent@snakebite.org</a>> wrote:<br>

> >     Right, so, I'm arguing that with my approach, because the background<br>

> >     IO thread stuff is as optimal as it can be -- more IO events would<br>

> >     be available per event loop iteration, and the latency between the<br>

> >     event occurring versus when the event loop picks it up would be<br>

> >     reduced.  The theory being that that will result in higher through-<br>

> >     put and lower latency in practice.<br>

> ><br>

> >     Also, from a previous e-mail, this:<br>

> ><br>

> >         with aio.open('1GB-file-on-a-fast-SSD.raw', 'rb') as f:<br>

> >             data = f.read()<br>

> ><br>

> >     Or even just:<br>

> ><br>

> >         with aio.open('/dev/zero', 'rb') as f:<br>

> >             data = f.read(1024 * 1024 * 1024)<br>

> ><br>

> >     Would basically complete as fast as it physically possible to read<br>

> >     the bytes off the device.  If you've got 16+ cores, then you'll have<br>

> >     16 cores able to service IO interrupts in parallel.  So, the overall<br>

> >     time to suck in a chunk of data will be vastly reduced.<br>

> ><br>

> >     There's no other way to get this sort of performance without taking<br>

> >     my approach.<br>

><br>

> So there's something I fundamentally don't understand. Why do those<br>

> calls, made synchronously in today's CPython, not already run as fast<br>

> as you can get the bytes off the device? I assume it's just a transfer<br>

> from kernel memory to user memory. So what is the advantage of using<br>

> aio over<br>

><br>

>   with open(<file>, 'rb') as f:<br>

>       data = f.read()<br>

<br>

</div></div>    Ah, right.  That's where the OVERLAPPED aspect comes into play.<br>

    (Other than Windows and AIX, I don't think any other OS provides<br>

     an overlapped IO facility?)<br>

<br>

    The difference being, instead of having one thread writing to a 1GB<br>

    buffer, 4KB at a time, you have 16 threads writing to an overlapped<br>

    1GB buffer, 4KB at a time.<br>

<br>

    (Assuming you have 16+ cores, and IO interrupts are coming in whilst<br>

     existing threads are still servicing previous completions.)<br>

<span class="HOEnZb"><font color="#888888"><br>

        Trent.<br>

</font></span></blockquote></div><br>Aha. So these are kernel threads? Is the bandwidth of the I/O channel really higher than one CPU can copy bytes across a user/kernel boundary?<br clear="all"><br>-- <br>--Guido van Rossum (<a href="http://python.org/~guido">python.org/~guido</a>)<br>