On 6 Jun, 07:59 pm, lvh@laurensvh.be wrote:
Hey,
For the Twisted folks: this thing has been reviewed by the ZeroMQ folks first because I wanted to be sure I got the technical details right on the their side of things.
I'd like to open up a discussion from a while back regarding the integration of ZeroMQ (a messaging system: similar to AMQP but with the intent to be simpler) into Twisted.
The interested ZeroMQ people and the interested Twisted people (names withheld to protect the guilty) disagreed on what it should look like. I think that's mostly because neither party really understood what the other's software wanted to do. So, I'll try to give everyone a basic explanation without going too deep into either Twisted or ZeroMQ: my apologies if I spell out the basics of your thing too much and it gets boring :)
ZeroMQ aims to be a thin layer above TCP, behaving like TCP but 'better'. That sounds like a vague marketing statement, but it helps to understand some of the terminology if you keep that in the back of your head. (What exactly 'better' means is way beyond the current scope: basically, ZeroMQ wants to help socket programmers to stop reinventing the wheel by implementing common behavior such as pub/sub, request/reply...). Essentially AMQP but much simpler, and brokerless in most cases. This email is already going to go way over the sane character count, thankfully the ZeroMQ webpage does a great job at explaining stuff :-)
I think this highlights the main problem people had. There a partial overlap between Twisted and ZeroMQ. The ZeroMQ implementation does things Twisted does too: it implements a bunch of low level networking stuff using eg epoll. It deals with real sockets, and Twisted wants to do that as well.
ZeroMQ uses things called Sockets. They're similar but not the same thing as TCP sockets (instead delegating work to TCP eventually), so you can't use traditional methods like select or epoll with them, because, for example, they don't have file descriptors. Some underlying thing probably does have fds; but ZeroMQ worries about that for you under the hood, just like Twisted does for other TCP traffic.
There are a couple of options for making ZeroMQ work with Twisted:
1) implement everything in Python, using Twisted's TCP stuff. I think this is mostly a bad idea and the ZeroMQ people seem to agree: _lots_ of work, ZeroMQ libs are stupidly fast already, Python not being the best tool for binary protocols... 2) write a thin wrapper around the C(++) libs: great, as long as it never has to go into the Twisted trunk 3) use pyzmq's thin wrapper around the C(++) libs: sounds like the best idea to me, again with reservations wrt the Twisted trunk
Originally there was a fourth idea, which considered libzmq as a new mechanism: like epoll, so you'd have a ZMQ-specific reactor. A bunch of people didn't like this, and I can somewhat see the point: hard to integrate with other event loops like GUIs, for example.
pyzmq offers something called select, which works just like select except it works on both file descriptors and ZeroMQ Sockets. It just delegates all of the work to libzmq. We could use ThreadedSelectReactor and have it use ZMQ's select. I'm not sure if it should use "normal" select everywhere else: because zmq's select is in fact much better than select.select (it just behaves like select.select in the sense that you give it three sets of fds and an optional timeout; under the hood it's actually epoll or kqueue or whatever) and it can handle plain old file descriptors just fine. So, you'd have a TRS with either 1 zmq.select running on everything or 1 zmq.select running over Sockets and 1 select.select running over your classic fds. Personally I kind of like the idea of zmq's select taking over, but I don't know how well that works in practice.
A shortcoming of this approach is that much of the inefficiency of select(2) comes from its API. If you have a select(2)-compatible API that's implemented in terms of epoll, you're still wasting a ton of effort that you could be skipping if you were using an epoll-compatible API instead. But this is only an argument about performance, and likely no one is going to care about the poor performance of zmq.select anyway.
A potential option for Twisted, which some people don't quite like, would be to have a listenZMQ and connectZMQ, analogous to listenTCP/listenUDP/listenSSL and the respective connect*s. I think this makes more sense to the ZeroMQ people (who think of ZeroMQ as a layer "next to" TCP which happens to be implemented on top of TCP, on top of which you build your stuff) than the Twisted people (who think of ZeroMQ's protocol as yet another TCP-using protocol just like HTTP for example). Having worked with both pieces of software, the more I play with ZeroMQ the more I think listenZMQ/connectZMQ make sense. ZeroMQ really tries to be one of those things and it shows. What ZeroMQ wants to do is semantically much closer to the existing connects and listens. I'm not just making this up: the ZeroMQ people have reviewed this and this is really what ZeroMQ wants to be.
A shortcoming of this approach is that as a reactor method, you have to implement it for each reactor you want to support. You covered this a bit earlier in your email, where you talked about GUI integration. Do you want to maintain an implementation of {listen,connect}ZMQ for select(/whatever), Glib2, Gtk2, wxWidgets, Qt, and Windows? That's a lot more work than just maintaining one implementation.
Another argument for making ZMQ special is that TCP is just one of the things ZeroMQ works with. UNIX domain pipes, PGM reliable multicast, UDP PGM encapsulation, and even inter-thread communication.
You got this one backwards. This is an argument for not implementing ZMQ at the same level as TCP and UNIX sockets. This is an argument for implementing it *on top of* those things. Of course, the main benefit of implementing it on top of them is that you don't have to write a bunch of code to support each transport. And the ZMQ people did that already. Here's how it should work (modulo stupid factoring issues that aren't really related to ZMQ issues), given that there's a big C library that already implements a bunch of stuff that you don't want to re-implement: from twisted.internet.interfaces import IReactorFDSet class ZMQTransport(object): def __init__(self, reactor, zmqSocket, protocol): self._zmqSocket = zmqSocket self._transportPieces = [] # On the next line, I use a method which I made up. Maybe it # corresponds to some actual API ZMQ provides, maybe not, I # dunno. for fd in zmqSocket.allFileDescriptors(): desc = _ZMQFileDescriptor(reactor, fd, zmqSocket) self._transportPieces.append(desc) self._protocol = protocol self._protocol.makeConnection(self) class _ZMQFileDescriptor(object): def __init__(self, reactor, fd, zmqSocket): if not IReactorFDSet.providedBy(reactor): raise RuntimeError( "This is the IReactorFDSet implementation; " "use another reactor or another zmq transport.") self._reactor = reactor self._reactor.addReader(self) self._fd = fd self._zmqSocket = zmqSocket def doRead(self): # Another made up method zmqEvents = self._zmqSocket.nonBlockingReadFrom(self._fd) if zmqEvents: self._protocol.zmqEventsReceived(zmqEvents) def doWrite(self): # One more, for luck. finished = self._zmqSocket.nonBlockingWriteTo(self._fd) if finished: self._reactor.removeWriter(self) def fileno(self): return self._fd def sendZMQEvents(self, events): # Whatever the API is. self._zmqSocket.sendZMQEvents(events) self._reactor.addWriter(self) class ZMQProtocol(object): def makeConnection(self, zmqTransport): self.zmqTransport = zmqTransport def zmqEventsReceived(self, zmqEvents): pass def connectZMQ(reactor, addrinfo, factory): # Blah blah blah - somehow get to the point where you have a # ZMQ Socket. d = ... def cbConnectionSetup(socket): ZMQTransport( reactor, socket, factory.buildProtocol(addrinfo)) d.addCallback(cbConnectionSetup) def main(): from twisted.internet import reactor from twisted.internet.protocol import ClientFactory f = ClientFactory() f.protocol = ZMQProtocol connectZMQ(reactor, ('example.com', 1234), f) reactor.run() Okay, so that came out a little longer than I planned, but turn about is fair play. Anyway, this is a bog standard transport implementation. The only thing even remotely interesting is that it maps multiple file descriptors onto a single transport. And that seems to be the So, if the ZMQ library offers APIs like the ones used in this example, then you're all set. With just a little more code, you can have an overlapped I/O version of this transport (for the one Twisted reactor that doesn't support IReactorFDSet). And then you've got proper Twisted ZMQ support. If it *doesn't* offer APIs like these, then I'd say it's missing some pretty critical APIs. After all, if you can't drive it this way, your chances of being able to write reasonable unit tests for ZMQ-based code are somewhat diminished (not out the window, but it'll be annoying). And I don't understand how you would implement something like ZMQ in a way that *didn't* make it easy to do this. *Particularly* since they have support for several different event notification APIs. So hopefully the worst case is that there are no APIs like these, but it's a minor oversight because the authors thought no one would want them, but they can be added trivially because they map directly onto how the underlying implementation works.
I know some Twisted people way smarter than me basically thought the connectZMQ/listenZMQ thing was a mistake, but I'm not sure to what extent that is because they were right and to what extent that was because they didn't really know very much about ZeroMQ and just went "it works on top of TCP so that's not where it goes". To Twisted folks that disagree: would you change your opinion of ZMQ was *really* something that's side-by-side with TCP instead of being implemented on top of it? Like, say, SCTP is? Does the fact that it can work on top of a bunch of stuff that isn't TCP change that?
If ZMQ were supported in the kernel with new syscalls to interface with it, then it would be nonsensical to talk about implementing it on top of Twisted's existing TCP support. You simply couldn't, because all of the code would have been pushed into the kernel where it can't be used any other way. This doesn't mean it would be a good idea overall to have ZMQ supported at the same level as TCP, though: it just means there would be no other alternative (aside from not supporting it - like what Twisted for SCTP). Whether or not it makes any sense to implement ZMQ in the kernel is something I have no opinion on, since I don't know nearly enough about the particular details of ZMQ.
Talking with the ZeroMQ people has been a positive experience: they were very accessible and cooperative, and really just want a bigger market for their software (who doesn't?) so I hope something useful comes out of this :-)
Great! Convince them to add the necessary APIs (if they don't exist already) from above and everything should be set. :) Jean-Paul