[Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted

Mon Dec 24 23:58:17 CET 2012

On Dec 21, 2012, at 1:10 PM, Guido van Rossum <guido at python.org> wrote:

> > > The transport is free to buffer the bytes, but it must eventually
> > > cause the bytes to be transferred to the entity at the other end, and
> > > it must maintain stream behavior. That is, t.write(b'abc');
> > > t.write(b'def') is equivalent to t.write(b'abcdef')
> >
> > I think this is a bad idea. The kernel's network stack should do the
> > buffering (and choose appropriate algorithms for that), not the
> > user-level framework. The transport should write the bytes as soon as
> > the fd is ready for writing, and it should write the same chunks as
> > given by the user, not a concatenation of them.
> 
> I asked Glyph about this. It depends on the OS... Mac syscalls are so slow that it is better to join in user space. This should really be up to the transport, although for stream transports the given equivalency should definitely hold.
> 
It's not so much that "mac syscalls are slow" as that "syscalls are not free, and the cost varies". Older versions of MacOS were particularly bad.  Some versions of Linux had bizarre regressions in the performance of send() or recv() or pipe().  The things that pass for syscalls on Windows can be particularly catastrophically slow (although this is practically a consideration for filesystem APIs, not socket APIs, who knows what this the future will hold).

There are a number of other reasons why this should be this way as well.

User-space has the ability to buffer indefinitely, and the kernel does not.  Sometimes, send() returns a truncated value, and you have to deal with this.  Since you've allocated the memory for the value you're calling write() with anyway, you might as well stash it away in the framework.  The alternative is to let every application implement - and by implement, I mean "screw up" - a low-performance buffering implementation.
User-space has more information about the type of information being sent.  If the user does write() write() write() within one loop iteration, the framework can hypothetically optimize that into a single syscall using scatter-gather I/O.  (Fun fact: we tried this, and it turns out that some implementations of scatter-gather I/O are actually *slower* than naive repeated calls; information like this should, again, be preserved within the framework.)
In order to preserve compatibility with other systems (Twisted, Tornado, et. al.), the framework must be within its rights to do the buffering itself, even if it actually does exactly what you're suggesting because that happens to be better for performance in some circumstances.  Choosing different buffering strategies for different applications is an important tuning option.
Applications which appear to work in some contexts if the boundaries of data passed to send() are exactly the same as the boundaries of the data sent to write() should not be coddled; this just makes them harder to debug later.  They should be broken as soon as possible.  This is a subtle, pernicious and nearly constant error that people new to networking make and the sooner it surfaces, the better.  The segments passed to data_received() should be as different as possible from the segments passed to write().
> > Besides, it would be better if transports weren't automatically
> > *streaming* transports. There are connected datagram protocols, such as
> > named pipes under Windows (multiprocessing already uses non-blocking
> > Windows named pipes).
> 
> I think we need to support datagrams, but the default ought to be stream.
> 
In my humble (but entirely, verifiably correct) opinion, thinking of this as a "default" is propagating a design error in the BSD sockets API.  Datagram and stream sockets have radically different semantics.  In Twisted, "dataReceived" and "datagramReceived" are different methods for a good reason.  Again, it's very very easy to fall into the trap of thinking that a TCP segment is a datagram and writing all your application code as if it were.  After all, it probably works over localhost most of the time!  This difference in semantics mirrored by a difference in method naming has helped quite a few people grok the distinction between streaming and datagrams over the years; I think it would be a good idea if Tulip followed suit.

-glyph

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20121224/3f7c9a55/attachment-0001.html>