[Python-Dev] generic async io (was: microthreading vs. async io)

Thu Feb 15 20:46:59 CET 2007

dustin at v.igoro.us wrote:
>
> I think this discussion would be facilitated by teasing the first
> bullet-point from the latter two: the first deals with async IO, while
> the latter two deal with cooperative multitasking.
> 
> It's easy to write a single package that does both, but it's much harder
> to write *two* fairly generic packages with a clean API between them,
> given the varied platform support for async IO and the varied syntax and
> structures (continuations vs. microthreads, in my terminology) for
> multitasking.  Yet I think that division is exactly what's needed.

Hmm.  Now, please, people, don't take offence, but I don't know how
to phrase this tactfully :-(

The 'threading' approach to asynchronous I/O was found to be a BAD
IDEA back in the 1970s, was abandoned in favour of separating
asynchronous I/O from threading, and God alone knows why it was
reinvented - except that most of the people with prior experience
had died or retired :-(

Let's go back to the days when asynchronous I/O was the norm, and
I/O performance critical applications drove the devices directly.
In those days, yes, that approach did make sense.  But it rapidly
ceased to do so with the advent of 'semi-intelligent' devices and
the virtualisation of I/O by the operating system.  That was in
the mid-1970s.  Nowadays, ALL devices are semi-intelligent and no
system since Unix has allowed applications direct access to devices,
except for specialised HPC and graphics.

We used to get 90% of theoretical peak performance on mainframes
using asynchronous I/O from clean, portable applications, but it
was NOT done by treating the I/O as threads and controlling their
synchronisation by hand.  In fact, quite the converse!  It was done
by realising that asynchronous I/O and explicit threading are best
separated ENTIRELY.  There were two main models:

Streaming, as in most languages (Fortran, C, Python, but NOT in
POSIX).  The key properties here are that the transfer boundaries
have no significance, only heavyweight synchronisation primitives
(open, close etc.) provide any constraints on when data are actually
transferred and (for very high performance) buffers are unavailable
from when a transfer is started to when it is checked.  If copying
is acceptable, the last constraint can be dropped.

In the simple case, this allows the library/system to reblock and
perform transfers asynchronously.  In the more advanced case, the
application has to use multiple buffering (at least double), but
can get full performance without any form of threading.  IBM MVT
applications used to get up to 90% without hassle in parallel with
computation and using only a single thread (well, there was only a
single CPU, anyway).

The other model is transactions.  This has the property that there
is a global commit primitive, and the order of transfers is undefined
between commits.  Inter alia, it means that overlapping transfers
are undefined behaviour, whether in a single thread or in multiple
threads.  BSP uses this model.

The MPI-2 design team included a lot of ex-mainframe people and
specifies both models.  While it is designed for parallel applications,
the I/O per se is not controlled like threads.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  nmm1 at cam.ac.uk
Tel.:  +44 1223 334761    Fax:  +44 1223 334679