Re: [Python-ideas] Async API: some code to review

7 Nov 2012

      It's been a week, and nobody has responded to Glyph's email. I don't
think I know enough to agree or disagree with what he said, but it was
well-written and it looked important. Also, Glyph has a lot of
experience with this sort of thing, and it would be a shame if he was
discouraged by the lack of response. We can't really expect people to
contribute if their opinions are ignored.

Can relevant people please take another look at his post?

-- Devin

On Wed, Oct 31, 2012 at 6:10 AM, Glyph  wrote:
...
Finally getting around to this one...
I am sorry if I'm repeating any criticism that has already been rehashed in
this thread.  There is really a deluge of mail here and I can't keep up with
it.  I've skimmed some of it and avoided or noted things that I did see
mentioned, but I figured I should write up something before next week.
To make a long story short, my main points here are:
I think tulip unfortunately has a lot of the problems I tried to describe in
earlier messages,
it would be really great if we could have a core I/O interface that we could
use for interoperability with Twisted before bolting a requirement for
coroutine trampolines on to everything,
twisted-style protocol/transport separation is really important and this
should not neglect it.  As I've tried to illustrate in previous messages, an
API where applications have to call send() or recv() is just not going to
behave intuitively in edge cases or perform well,
I know it's a prototype, but this isn't such an unexplored area that it
should be developed without TDD: all this code should both have tests and
provide testing support to show how applications that use it can be tested
the scheduler module needs some example implementation of something like
Twisted's gatherResults for me to critique its expressiveness; it looks like
it might be missing something in the area of one task coordinating multiple
others but I can't tell
On Oct 28, 2012, at 4:52 PM, Guido van Rossum <guido at python.org> wrote:
The pollster has a very simple API: add_reader(fd, callback, *args),
add_writer(<ditto>), remove_reader(fd), remove_writer(fd), and
poll(timeout) -> list of events. (fd means file descriptor.) There's
also pollable() which just checks if there are any fds registered. My
implementation requires fd to be an int, but that could easily be
extended to support other types of event sources.
I don't see how that is.  All of the mechanisms I would leverage within
Twisted to support other event sources are missing (e.g.: abstract
interfaces for those event sources).  Are you saying that a totally
different pollster could just accept a different type to add_reader, and not
an integer?  If so, how would application code know how to construct
something else.
I'm not super happy that I have parallel reader/writer APIs, but passing a
separate read/write flag didn't come out any more elegant, and I don't
foresee other operation types (though I may be wrong).
add_reader and add_writer is an important internal layer of the API for
UNIX-like operating systems, but the design here is fundamentally flawed in
that application code (e.g. echosvr.py) needs to import concrete
socket-handling classes like SocketTransport and BufferedReader in order to
synthesize a transport.  These classes might need to vary their behavior
significantly between platforms, and application code should not be
manipulating them unless there is a serious low-level need to.
It looks like you've already addressed the fact that some transports need to
be platform-specific.  That's not quite accurate, unless you take a very
broad definition of "platform".  In Twisted, the basic socket-based TCP
transport is actually supported across all platforms; but some other *APIs*
(well, let's be honest, right now, just IOCP, but there have been others,
such as java's native I/O APIs under Jython, in the past).
You have to ask the "pollster" (by which I mean: reactor) for transport
objects, because different multiplexing mechanisms can require different I/O
APIs, even for basic socket I/O.  This is why I keep talking about IOCP.
It's not that Windows is particularly great, but that the IOCP API, if used
correctly, is fairly alien, and is a good proxy for other use-cases which
are less direct to explain, like interacting with GUI libraries where you
need to interact with the GUI's notion of a socket to get notifications,
rather than a raw FD.  (GUI libraries often do this because they have to
support Windows and therefore IOCP.)  Others in this thread have already
mentioned the fact that ZeroMQ requires the same sort of affordance.  This
is really a design error on 0MQ's part, but, you have to deal with it anyway
;-).
More importantly, concretely tying everything to sockets is just bad design.
You want to be able to operate on pipes and PTYs (which need to call read(),
or, a bunch of gross ioctl()s and then read(), not recv()).  You want to be
able to able to operate on these things in unit tests without involving any
actual file descriptors or syscalls.  The higher level of abstraction makes
regular application code a lot shorter, too: I was able to compress
echosvr.py down to 22 lines by removing all the comments and logging and
such, but that is still more than twice as long as the (9 line) echo server
example on the front page of http://twistedmatrix.com/trac/.  It's closer
in length to the (19 line) full line-based publish/subscribe protocol over
on the third tab.
Also, what about testing? You want to be able to simulate the order of
responses of multiple syscalls to coerce your event-driven program to
receive its events in different orders.  One of the big advantages of event
driven programming is that everything's just a method call, so your unit
tests can just call the methods to deliver data to your program and see what
it does, without needing to have a large, elaborate simulation edifice to
pretend to be a socket.  But, once you mix in the magic of the generator
trampoline, it's somewhat hard to assemble your own working environment
without some kind of test event source; at least, it's not clear to me how
to assemble a Task without having a pollster anywhere, or how to make my own
basic pollster for testing.
The event loop has two basic ways to register callbacks:
call_soon(callback, *args) causes callback(*args) to be called the
next time the event loop runs; call_later(delay, callback, *args)
schedules a callback at some time (relative or absolute) in the
future.
"relative or absolute" is hiding the whole monotonic-clocks discussion
behind a simple phrase, but that probably does not need to be resolved
here... I'll let you know if we ever figure it out :).
sockets.py: http://code.google.com/p/tulip/source/browse/sockets.py
This implements some internet primitives using the APIs in
scheduling.py (including block_r() and block_w()). I call them
transports but they are different from transports Twisted; they are
closer to idealized sockets. SocketTransport wraps a plain socket,
offering recv() and send() methods that must be invoked using yield
from.
I feel I should note that these methods behave inconsistently; send()
behaves as sendall(), re-trying its writes until it receives a full buffer,
but recv() may yield a short read.
(But most importantly, block_r and block_w are insufficient as primitives;
you need a separate pollster that uses write_then_block(data) and
read_then_block() too, which may need to dispatch to WSASend/WSARecv or
WriteFile/ReadFile.)
SslTransport wraps an ssl socket (luckily in Python 2.6 and up,
stdlib ssl sockets have good async support!).
stdlib ssl sockets have async support that makes a number of UNIX-y
assumptions.  The wrap_socket trick doesn't work with IOCP, because the I/O
operations are initiated within the SSL layer, and therefore can't be
associated with a completion port, so they won't cause a queued completion
status trigger and therefore won't wake up the loop.  This plagued us for
many years within Twisted and has only relatively recently been fixed:
http://tm.tl/593.
Since probably 99% of the people on this list don't actually give a crap
about Windows, let me give a more practical example: you can't do SSL over a
UNIX pipe.  Off the top of my head, this means you can't write a
command-line tool to encrypt a connection via a shell pipeline, but there
are many other cases where you'd expect to be able to get arbitrary I/O over
stdout.
It's reasonable, of course, for lots of Python applications to not care
about high-performance, high-concurrency SSL on Windows,; select() works
okay for many applications on Windows.  And most SSL happens on sockets, not
pipes, hence the existence of the OpenSSL API that the stdlib ssl module
exposes for wrapping sockets.  But, as I'll explain in a moment, this is one
reason that it's important to be able to give your code a turbo boost with
Twisted (or other third-party extensions) once you start encountering
problems like this.
I don't particularly care about the exact abstractions in this module;
they are convenient and I was surprised how easy it was to add SSL,
but still these mostly serve as somewhat realistic examples of how to
use scheduling.py.
This is where I think we really differ.
I think that the whole attempt to build a coroutine scheduler at the low
level is somewhat misguided and will encourage people to write misleading,
sloppy, incorrect programs that will be tricky to debug (although, to be
fair, not quite as tricky as even more misleading/sloppy/incorrect
multi-threaded ones).  However, I'm more than happy to agree to disagree on
this point: clearly you think that forests of yielding coroutines are a big
part of the future of Python.  Maybe you're even right to do so, since I
have no interest in adding language features, whereas if you hit a rough
edge in 'yield' syntax you can sand it off rather than living with it.  I
will readily concede that 'yield from' and 'return' are nicer than the
somewhat ad-hoc idioms we ended up having to contend with in the current
iteration of @inlineCallbacks.  (Except for the exit-at-a-distance problem,
which it doesn't seem that return->StopIteration addresses - does this
happen, with PEP-380 generators?
http://twistedmatrix.com/trac/ticket/4157)
What I'm not happy to disagree about is the importance of a good I/O
abstraction and interoperation layer.
Twisted is not going away; there are oodles of good reasons that it's built
the way it is, as I've tried to describe in this and other messages, and
none of our plans for its future involve putting coroutine trampolines at
the core of the event loop; those are just fine over on the side with
inlineCallbacks.  However, lots of Python programmers are going to use what
you come up with.  They'd use it even if it didn't really work, just because
it's bundled in and it's convenient.  But I think it'll probably work fine
for many tasks, and it will appeal to lots of people new to event-driven I/O
because of the seductive deception of synchronous control flow and the
superiority to scheduling I/O operations with threads.
What I think is really very important in the design of this new system is to
present an API whereby:
if someone wants to write a basic protocol or data-format parser for the
stdlib, it should be easy to write it as a feed parser without needing
generator coroutines (for example, if they're pushing data into a C library,
they shouldn't have to write a while loop that calls recv, they should be
able to just transform some data callback into Python into some data
callback in C; it should be able to leverage tulip without much more work,
if users of tulip (read; the stdlib) need access to some functionality
implemented within Twisted, like an event-driven DNS client that is more
scalable than getaddrinfo, they can call into Twisted without re-writing
their entire program,
if users of Twisted need to invoke some functionality implemented on top of
tulip, they can construct a task and weave in a scheduler, similarly without
re-writing much,
if users of tulip want to just use Twisted to get better performance or
reliability than the built-in stdlib multiplexor, they ideally shouldn't
have to change anything, just run it with a different import line or
something, and
if (when) users of tulip realize that their generators have devolved into a
mess of spaghetti ;-) and they need to migrate to Twisted-style event-driven
callbacks and maybe some formal state machines or generated parsers to deal
with their inputs, that process can be done incrementally and not in one
giant shoot-the-moon effort which will make them hate Twisted.
As an added bonus, such an API would provide a great basis for Tornado and
Twisted to interoperate.
It would also be nice to have a more discrete I/O layer to insulate
application code from common foibles like the fact that, for example, if you
call send() in tulip multiple times but forget to 'yield from ...send()',
you may end up writing interleaved garbage on the connection, then raising
an assertion error, but only if there's a sufficient quantity of data and it
needs to block; it will otherwise appear to work, leading to bugs that only
start happening when you are pushing large volumes of data through a system
at rates exceeding wire speed.  In other words, "only in production, only
during the holiday season, only during traffic spikes, only when it's really
really important for the system to keep working".
This is why I think that step 1 here needs to be a common low-level API for
event-triggered operations that does not have anything to do with
generators.  I don't want to stop you from doing interesting things with
generators, but I do really want to decouple the tasks so that their
responsibilities are not unnecessarily conflated.
task.unblock() is a method; protocol.data_received is a method.  Both can be
invoked at the same level by an event loop.  Once that low-level event loop
is delivering data to that callback's satisfaction, the callbacks can
happily drive a coroutine scheduler, and the coroutine scheduler can have
much less of a deep integration with the I/O itself; it just needs some kind
of sentinel object (a Future, a Deferred) to keep track of what exactly it's
waiting for.
I'm most interested in feedback on the design of polling.py and
scheduling.py, and to a lesser extent on the design of sockets.py;
main.py is just an example of how this style works out in practice.
It looks to me like there's a design error in scheduling.py with respect to
coordinating concurrent operations.  If you try to block on two operations
at once, you'll get an assertion error ('assert not self.blocked', in
block), so you can't coordinate two interesting I/O requests without
spawning a bunch of new Tasks and then having them unblock their parent Task
when they're done.  I may just be failing to imagine how one would implement
something like Twisted's gatherResults, but this looks like it would be
frustrating, tedious, and involve creating lots of extra objects and making
the scheduler do a bunch more work.
Also, shouldn't there be a lot more real exceptions and a lot fewer
assertions in this code?
Relatedly, add_reader/writer will silently stomp on a previous FD
registration, so if two tasks end up calling recv() on the same socket, it
doesn't look like there's any way to find out that they both did that.  It
looks like the first task to call it will just hang forever, and the second
one will "win"?  What are the intended semantics?
Speaking from the perspective of I/O scheduling, it will also be thrashing
any stateful multiplexor with a ton of unnecessary syscalls.  A Twisted
protocol in normal operation just receiving data from a single connection,
using, let's say, a kqueue-based multiplexor will call kevent() once to
register interest, then kqueue() to block, and then just keep getting
data-available notifications and processing them unless some downstream
buffer fills up and the transport is told to pause producing data, at which
point another kevent() gets issued.  tulip, by contrast, will call kevent()
over and over again, removing and then re-adding its reader repeatedly for
every packet, since it can never know if someone is about to call recv()
again any time soon.  Once again, request/response is not the best model for
retrieving data from a transport; active connections need to be prepared to
receive more data at any time and not in response to any particular request.
Finally, apologies for spelling / grammar errors; I didn't have a lot of
time to copy-edit.
-glyph
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
http://mail.python.org/mailman/listinfo/python-ideas

Re: [Python-ideas] Async API: some code to review

Devin Jeanpierre