Finally getting around to this one...
I am sorry if I'm repeating any criticism that has already been rehashed in this thread. There is really a deluge of mail here and I can't keep up with it. I've skimmed some of it and avoided or noted things that I did see mentioned, but I figured I should write up something before next week.
On Oct 28, 2012, at 4:52 PM, Guido van Rossum <guido at
python.org> wrote:
The pollster has a very simple API: add_reader(fd, callback, *args),
add_writer(<ditto>), remove_reader(fd), remove_writer(fd), and
poll(timeout) -> list of events. (fd means file descriptor.) There's
also pollable() which just checks if there are any fds registered. My
implementation requires fd to be an int, but that could easily be
extended to support other types of event sources.
I don't see how that is. All of the mechanisms I would leverage within Twisted to support other event sources are missing (e.g.: abstract interfaces for those event sources). Are you saying that a totally different pollster could just accept a different type to add_reader, and not an integer? If so, how would application code know how to construct something else.
I'm not super happy that I have parallel reader/writer APIs, but passing a separate read/write flag didn't come out any more elegant, and I don't foresee other operation types (though I may be wrong).
add_reader and add_writer is an important internal layer of the API for UNIX-like operating systems, but the design here is fundamentally flawed in that application code (e.g. echosvr.py) needs to import concrete socket-handling classes like SocketTransport and BufferedReader in order to synthesize a transport. These classes might need to vary their behavior significantly between platforms, and application code should not be manipulating them unless there is a serious low-level need to.
It looks like you've already addressed the fact that some transports need to be platform-specific. That's not quite accurate, unless you take a very broad definition of "platform". In Twisted, the basic socket-based TCP transport is actually supported across all platforms; but some other *APIs* (well, let's be honest, right now, just IOCP, but there have been others, such as java's native I/O APIs under Jython, in the past).
You have to ask the "pollster" (by which I mean: reactor) for transport objects, because different multiplexing mechanisms can require different I/O APIs, even for basic socket I/O. This is why I keep talking about IOCP. It's not that Windows is particularly great, but that the IOCP API, if used correctly, is fairly alien, and is a good proxy for other use-cases which are less direct to explain, like interacting with GUI libraries where you need to interact with the GUI's notion of a socket to get notifications, rather than a raw FD. (GUI libraries often do this because they have to support Windows and therefore IOCP.) Others in this thread have already mentioned the fact that ZeroMQ requires the same sort of affordance. This is really a design error on 0MQ's part, but, you have to deal with it anyway ;-).
More importantly, concretely tying everything to sockets is just bad design. You want to be able to operate on pipes and PTYs (which need to call read(), or, a bunch of gross ioctl()s and then read(), not recv()). You want to be able to able to operate on these things in unit tests without involving
any actual file descriptors or syscalls. The higher level of abstraction makes regular application code a lot shorter, too: I was able to compress echosvr.py down to 22 lines by removing all the comments and logging and such, but that is still more than twice as long as the (9 line) echo server example on the front page of <
http://twistedmatrix.com/trac/>. It's closer in length to the (19 line) full line-based publish/subscribe protocol over on the third tab.
Also, what about testing? You want to be able to simulate the order of responses of multiple syscalls to coerce your event-driven program to receive its events in different orders. One of the big advantages of event driven programming is that everything's just a method call, so your unit tests can just call the methods to deliver data to your program and see what it does, without needing to have a large, elaborate simulation edifice to pretend to be a socket. But, once you mix in the magic of the generator trampoline, it's somewhat hard to assemble your own working environment without some kind of test event source; at least, it's not clear to me how to assemble a Task without having a pollster anywhere, or how to make my own basic pollster for testing.
The event loop has two basic ways to register callbacks:
call_soon(callback, *args) causes callback(*args) to be called the
next time the event loop runs; call_later(delay, callback, *args)
schedules a callback at some time (relative or absolute) in the
future.
"relative or absolute" is hiding the whole monotonic-clocks discussion behind a simple phrase, but that probably does not need to be resolved here... I'll let you know if we ever figure it out :).
sockets.py: http://code.google.com/p/tulip/source/browse/sockets.py
This implements some internet primitives using the APIs in
scheduling.py (including block_r() and block_w()). I call them
transports but they are different from transports Twisted; they are
closer to idealized sockets. SocketTransport wraps a plain socket,
offering recv() and send() methods that must be invoked using yield
from.
I feel I should note that these methods behave inconsistently; send() behaves as sendall(), re-trying its writes until it receives a full buffer, but recv() may yield a short read.
(But most importantly, block_r and block_w are insufficient as primitives; you need a separate pollster that uses write_then_block(data) and read_then_block() too, which may need to dispatch to WSASend/WSARecv or WriteFile/ReadFile.)
SslTransport wraps an ssl socket (luckily in Python 2.6 and up,
stdlib ssl sockets have good async support!).
stdlib ssl sockets have async support that makes a number of UNIX-y assumptions. The wrap_socket trick doesn't work with IOCP, because the I/O operations are initiated within the SSL layer, and therefore can't be associated with a completion port, so they won't cause a queued completion status trigger and therefore won't wake up the loop. This plagued us for many years within Twisted and has only relatively recently been fixed: <
http://tm.tl/593>.
Since probably 99% of the people on this list don't actually give a crap about Windows, let me give a more practical example: you can't do SSL over a UNIX pipe. Off the top of my head, this means you can't write a command-line tool to encrypt a connection via a shell pipeline, but there are many other cases where you'd expect to be able to get arbitrary I/O over stdout.
It's reasonable, of course, for lots of Python applications to not care about high-performance, high-concurrency SSL on Windows,; select() works okay for many applications on Windows. And most SSL happens on sockets, not pipes, hence the existence of the OpenSSL API that the stdlib ssl module exposes for wrapping sockets. But, as I'll explain in a moment, this is one reason that it's important to be able to give your code a turbo boost with Twisted (or other third-party extensions) once you start encountering problems like this.
I don't particularly care about the exact abstractions in this module;
they are convenient and I was surprised how easy it was to add SSL,
but still these mostly serve as somewhat realistic examples of how to
use scheduling.py.
This is where I think we really differ.
I think that the whole attempt to build a coroutine scheduler at the low level is somewhat misguided and will encourage people to write misleading, sloppy, incorrect programs that will be tricky to debug (although, to be fair, not quite as tricky as even more misleading/sloppy/incorrect multi-threaded ones). However, I'm more than happy to agree to disagree on this point: clearly you think that forests of yielding coroutines are a big part of the future of Python. Maybe you're even right to do so, since I have no interest in adding language features, whereas if you hit a rough edge in 'yield' syntax you can sand it off rather than living with it. I will readily concede that 'yield from' and 'return' are nicer than the somewhat ad-hoc idioms we ended up having to contend with in the current iteration of @inlineCallbacks. (Except for the exit-at-a-distance problem, which it doesn't seem that return->StopIteration addresses - does this happen, with PEP-380 generators? <
http://twistedmatrix.com/trac/ticket/4157>)
What I'm not happy to disagree about is the importance of a good I/O abstraction and interoperation layer.
Twisted is not going away; there are oodles of good reasons that it's built the way it is, as I've tried to describe in this and other messages, and none of our plans for its future involve putting coroutine trampolines at the core of the event loop; those are just fine over on the side with inlineCallbacks. However, lots of Python programmers are going to use what you come up with. They'd use it even if it didn't really work, just because it's bundled in and it's convenient. But I think it'll probably work fine for many tasks, and it will appeal to lots of people new to event-driven I/O because of the seductive deception of synchronous control flow and the superiority to scheduling I/O operations with threads.
What I think is really very important in the design of this new system is to present an API whereby:
- if someone wants to write a basic protocol or data-format parser for the stdlib, it should be easy to write it as a feed parser without needing generator coroutines (for example, if they're pushing data into a C library, they shouldn't have to write a while loop that calls recv, they should be able to just transform some data callback into Python into some data callback in C; it should be able to leverage tulip without much more work,
- if users of tulip (read; the stdlib) need access to some functionality implemented within Twisted, like an event-driven DNS client that is more scalable than getaddrinfo, they can call into Twisted without re-writing their entire program,
- if users of Twisted need to invoke some functionality implemented on top of tulip, they can construct a task and weave in a scheduler, similarly without re-writing much,
- if users of tulip want to just use Twisted to get better performance or reliability than the built-in stdlib multiplexor, they ideally shouldn't have to change anything, just run it with a different import line or something, and
- if (when) users of tulip realize that their generators have devolved into a mess of spaghetti ;-) and they need to migrate to Twisted-style event-driven callbacks and maybe some formal state machines or generated parsers to deal with their inputs, that process can be done incrementally and not in one giant shoot-the-moon effort which will make them hate Twisted.
As an added bonus, such an API would provide a great basis for Tornado and Twisted to interoperate.
It would also be nice to have a more discrete I/O layer to insulate application code from common foibles like the fact that, for example, if you call send() in tulip multiple times but forget to 'yield from ...send()', you may end up writing interleaved garbage on the connection, then raising an assertion error, but only if there's a sufficient quantity of data and it needs to block; it will otherwise appear to work, leading to bugs that only start happening when you are pushing large volumes of data through a system at rates exceeding wire speed. In other words, "only in production, only during the holiday season, only during traffic spikes, only when it's really really important for the system to keep working".
This is why I think that step 1 here needs to be a common low-level API for event-triggered operations that does not have anything to do with generators. I don't want to stop you from doing interesting things with generators, but I do really want to decouple the tasks so that their responsibilities are not unnecessarily conflated.
task.unblock() is a method; protocol.data_received is a method. Both can be invoked at the same level by an event loop. Once that low-level event loop is delivering data to that callback's satisfaction, the callbacks can happily drive a coroutine scheduler, and the coroutine scheduler can have much less of a deep integration with the I/O itself; it just needs some kind of sentinel object (a Future, a Deferred) to keep track of what exactly it's waiting for.
I'm most interested in feedback on the design of polling.py and
scheduling.py, and to a lesser extent on the design of sockets.py;
main.py is just an example of how this style works out in practice.
It looks to me like there's a design error in scheduling.py with respect to coordinating concurrent operations. If you try to block on two operations at once, you'll get an assertion error ('assert not self.blocked', in block), so you can't coordinate two interesting I/O requests without spawning a bunch of new Tasks and then having them unblock their parent Task when they're done. I may just be failing to imagine how one would implement something like Twisted's gatherResults, but this looks like it would be frustrating, tedious, and involve creating lots of extra objects and making the scheduler do a bunch more work.
Also, shouldn't there be a lot more real exceptions and a lot fewer assertions in this code?
Relatedly, add_reader/writer will silently stomp on a previous FD registration, so if two tasks end up calling recv() on the same socket, it doesn't look like there's any way to find out that they both did that. It looks like the first task to call it will just hang forever, and the second one will "win"? What are the intended semantics?
Speaking from the perspective of I/O scheduling, it will also be thrashing any stateful multiplexor with a ton of unnecessary syscalls. A Twisted protocol in normal operation just receiving data from a single connection, using, let's say, a kqueue-based multiplexor will call kevent() once to register interest, then kqueue() to block, and then just keep getting data-available notifications and processing them unless some downstream buffer fills up and the transport is told to pause producing data, at which point another kevent() gets issued. tulip, by contrast, will call kevent() over and over again, removing and then re-adding its reader repeatedly for every packet, since it can never know if someone is about to call recv() again any time soon. Once again, request/response is not the best model for retrieving data from a transport; active connections need to be prepared to receive more data at any time and not in response to any particular request.
Finally, apologies for spelling / grammar errors; I didn't have a lot of time to copy-edit.