[Twisted-Python] Integrating Twisted with ZeroMQ
Hey, For the Twisted folks: this thing has been reviewed by the ZeroMQ folks first because I wanted to be sure I got the technical details right on the their side of things. I'd like to open up a discussion from a while back regarding the integration of ZeroMQ (a messaging system: similar to AMQP but with the intent to be simpler) into Twisted. The interested ZeroMQ people and the interested Twisted people (names withheld to protect the guilty) disagreed on what it should look like. I think that's mostly because neither party really understood what the other's software wanted to do. So, I'll try to give everyone a basic explanation without going too deep into either Twisted or ZeroMQ: my apologies if I spell out the basics of your thing too much and it gets boring :) ZeroMQ aims to be a thin layer above TCP, behaving like TCP but 'better'. That sounds like a vague marketing statement, but it helps to understand some of the terminology if you keep that in the back of your head. (What exactly 'better' means is way beyond the current scope: basically, ZeroMQ wants to help socket programmers to stop reinventing the wheel by implementing common behavior such as pub/sub, request/reply...). Essentially AMQP but much simpler, and brokerless in most cases. This email is already going to go way over the sane character count, thankfully the ZeroMQ webpage does a great job at explaining stuff :-) I think this highlights the main problem people had. There a partial overlap between Twisted and ZeroMQ. The ZeroMQ implementation does things Twisted does too: it implements a bunch of low level networking stuff using eg epoll. It deals with real sockets, and Twisted wants to do that as well. ZeroMQ uses things called Sockets. They're similar but not the same thing as TCP sockets (instead delegating work to TCP eventually), so you can't use traditional methods like select or epoll with them, because, for example, they don't have file descriptors. Some underlying thing probably does have fds; but ZeroMQ worries about that for you under the hood, just like Twisted does for other TCP traffic. There are a couple of options for making ZeroMQ work with Twisted: 1) implement everything in Python, using Twisted's TCP stuff. I think this is mostly a bad idea and the ZeroMQ people seem to agree: _lots_ of work, ZeroMQ libs are stupidly fast already, Python not being the best tool for binary protocols... 2) write a thin wrapper around the C(++) libs: great, as long as it never has to go into the Twisted trunk 3) use pyzmq's thin wrapper around the C(++) libs: sounds like the best idea to me, again with reservations wrt the Twisted trunk Originally there was a fourth idea, which considered libzmq as a new mechanism: like epoll, so you'd have a ZMQ-specific reactor. A bunch of people didn't like this, and I can somewhat see the point: hard to integrate with other event loops like GUIs, for example. pyzmq offers something called select, which works just like select except it works on both file descriptors and ZeroMQ Sockets. It just delegates all of the work to libzmq. We could use ThreadedSelectReactor and have it use ZMQ's select. I'm not sure if it should use "normal" select everywhere else: because zmq's select is in fact much better than select.select (it just behaves like select.select in the sense that you give it three sets of fds and an optional timeout; under the hood it's actually epoll or kqueue or whatever) and it can handle plain old file descriptors just fine. So, you'd have a TRS with either 1 zmq.select running on everything or 1 zmq.select running over Sockets and 1 select.select running over your classic fds. Personally I kind of like the idea of zmq's select taking over, but I don't know how well that works in practice. A potential option for Twisted, which some people don't quite like, would be to have a listenZMQ and connectZMQ, analogous to listenTCP/listenUDP/listenSSL and the respective connect*s. I think this makes more sense to the ZeroMQ people (who think of ZeroMQ as a layer "next to" TCP which happens to be implemented on top of TCP, on top of which you build your stuff) than the Twisted people (who think of ZeroMQ's protocol as yet another TCP-using protocol just like HTTP for example). Having worked with both pieces of software, the more I play with ZeroMQ the more I think listenZMQ/connectZMQ make sense. ZeroMQ really tries to be one of those things and it shows. What ZeroMQ wants to do is semantically much closer to the existing connects and listens. I'm not just making this up: the ZeroMQ people have reviewed this and this is really what ZeroMQ wants to be. Another argument for making ZMQ special is that TCP is just one of the things ZeroMQ works with. UNIX domain pipes, PGM reliable multicast, UDP PGM encapsulation, and even inter-thread communication. I know some Twisted people way smarter than me basically thought the connectZMQ/listenZMQ thing was a mistake, but I'm not sure to what extent that is because they were right and to what extent that was because they didn't really know very much about ZeroMQ and just went "it works on top of TCP so that's not where it goes". To Twisted folks that disagree: would you change your opinion of ZMQ was *really* something that's side-by-side with TCP instead of being implemented on top of it? Like, say, SCTP is? Does the fact that it can work on top of a bunch of stuff that isn't TCP change that? Talking with the ZeroMQ people has been a positive experience: they were very accessible and cooperative, and really just want a bigger market for their software (who doesn't?) so I hope something useful comes out of this :-) tia lvh
On Sun, Jun 6, 2010 at 2:59 PM, Laurens Van Houtven <lvh@laurensvh.be> wrote:
A potential option for Twisted, which some people don't quite like, would be to have a listenZMQ and connectZMQ, analogous to listenTCP/listenUDP/listenSSL and the respective connect*s. I think this makes more sense to the ZeroMQ people (who think of ZeroMQ as a layer "next to" TCP which happens to be implemented on top of TCP, on top of which you build your stuff) than the Twisted people (who think of ZeroMQ's protocol as yet another TCP-using protocol just like HTTP for example). Having worked with both pieces of software, the more I play with ZeroMQ the more I think listenZMQ/connectZMQ make sense. ZeroMQ really tries to be one of those things and it shows. What ZeroMQ wants to do is semantically much closer to the existing connects and listens. I'm not just making this up: the ZeroMQ people have reviewed this and this is really what ZeroMQ wants to be.
At the moment I only feel compelled to respond to this point in particular. We don't want to have any more transport-specific methods on the reactor, and this has nothing to do with ZMQ. We also don't want to have connectSOCKS, or listenSerialPort. Fortunately, the endpoints API[1] was very recently merged to Twisted trunk, so any new transport-specific connectors/listeners can be implemented in terms of its interfaces. listenTCP and so forth should eventually be deprecated in preference to the endpoints APIs. 1: Source: http://twistedmatrix.com/trac/browser/trunk/twisted/internet/endpoints.py ticket: http://twistedmatrix.com/trac/ticket/1442 -- Christopher Armstrong http://radix.twistedmatrix.com/ http://planet-if.com/
On Sun, Jun 6, 2010 at 10:18 PM, Christopher Armstrong <radix@twistedmatrix.com> wrote:
On Sun, Jun 6, 2010 at 2:59 PM, Laurens Van Houtven <lvh@laurensvh.be> wrote: We don't want to have any more transport-specific methods on the reactor, and this has nothing to do with ZMQ. We also don't want to have connectSOCKS, or listenSerialPort. Fortunately, the endpoints API[1] was very recently merged to Twisted trunk, so any new transport-specific connectors/listeners can be implemented in terms of its interfaces. listenTCP and so forth should eventually be deprecated in preference to the endpoints APIs.
1: Source: http://twistedmatrix.com/trac/browser/trunk/twisted/internet/endpoints.py ticket: http://twistedmatrix.com/trac/ticket/1442 -- Christopher Armstrong http://radix.twistedmatrix.com/ http://planet-if.com/
Yep, that's the thing I was remembering. It just wasn't quite so close to production use back then. So, I take it that just means ZMQ4ServerEndpoint and ZMQ4ClientEndpoint instead of listenZMQ/connectZMQ? Most of the argument (specifically that, to Twisted, ZeroMQ should not be a thing on top of TCP but instead be its own thing). lvh
Whoops, something ate half my sentence. My point was that most of the argument still stands, I think: just listenZMQ and connectZMQ get replaced by ZMQ Endpoints :) lvh
On Mon, 2010-06-07 at 01:39 +0200, Laurens Van Houtven wrote:
Whoops, something ate half my sentence.
My point was that most of the argument still stands, I think: just listenZMQ and connectZMQ get replaced by ZMQ Endpoints :)
1. SSL runs on top of TCP, yet Twisted has connectSSL/listenSSL and endpoints for it. So the issue here is not the fact it runs over TCP. 2. I assume ZeroMQ "Sockets" have capabilities TCP doesn't, otherwise what's the point? :) E.g. you mentioned pub/sub. So, assuming a ZeroMQ endpoint and/or listen+connect methods, the *protocol* you would pass in would be different than the standard protocols you'd use with TCP or SSL, yes? So, sounds like you want to define: A) A way to hook up ZeroMQ event loop with Twisted event loop so that both ZeroMQ and Twisted code can co-exist in same thread. Then, expose ZeroMQ APIs to Python in a way that gives you: B) A protocol class or interface. C) An API for creating and hooking up these protocols to underlying transports, i.e. ZeroMQ Sockets. You could then release this as txZeroMQ; I'm not sure there's much benefit in including this in Twisted, as opposed to standalone project.
On Sun, 2010-06-06 at 23:07 -0400, Itamar Turner-Trauring wrote:
So, sounds like you want to define: A) A way to hook up ZeroMQ event loop with Twisted event loop so that both ZeroMQ and Twisted code can co-exist in same thread.
JP's proposal is superior to this... but may require changes to ZeroMQ.
Indeed, from technical perspective I think this is the only sane way forward. -- Konrads Smelkovs Applied IT sorcery. On Mon, Jun 7, 2010 at 3:27 PM, Itamar Turner-Trauring <itamar@itamarst.org>wrote:
On Sun, 2010-06-06 at 23:07 -0400, Itamar Turner-Trauring wrote:
So, sounds like you want to define: A) A way to hook up ZeroMQ event loop with Twisted event loop so that both ZeroMQ and Twisted code can co-exist in same thread.
JP's proposal is superior to this... but may require changes to ZeroMQ.
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
On Jun 6, 2010, at 4:18 PM, Christopher Armstrong wrote:
We don't want to have any more transport-specific methods on the reactor, and this has nothing to do with ZMQ. We also don't want to have connectSOCKS, or listenSerialPort. Fortunately, the endpoints API[1] was very recently merged to Twisted trunk, so any new transport-specific connectors/listeners can be implemented in terms of its interfaces. listenTCP and so forth should eventually be deprecated in preference to the endpoints APIs.
While the endpoints APIs are great and everyone should use them, I think it's putting it a bit strongly to say that listenTCP and friends will be deprecated. Reactors are still pluggable, and we'll need a mechanism for endpoints and reactors to communicate. There may be some evolution of those APIs in the long term (in particular, the way that connectTCP interacts with client factories is slightly weird) but I think that there will always be something lower-level than endpoints that is still a public, supported API. You just won't be encouraged to use that API at the *application* layer, because endpoints are more flexible.
With regards to the shortcomings of select: there's also a poll-like, but I have no idea if that improves matters :) Anyway, I'll tell the ZMQ folks that people are blocking on their fancy new API that helps with integrating with other event loops. Thankfully, they were already aware of this technical issue and the blueprints are there :) lvh
On Mon, Jun 7, 2010 at 12:38 PM, Glyph Lefkowitz <glyph@twistedmatrix.com> wrote:
While the endpoints APIs are great and everyone should use them, I think it's putting it a bit strongly to say that listenTCP and friends will be deprecated. Reactors are still pluggable, and we'll need a mechanism for endpoints and reactors to communicate. There may be some evolution of those APIs in the long term (in particular, the way that connectTCP interacts with client factories is slightly weird) but I think that there will always be something lower-level than endpoints that is still a public, supported API. You just won't be encouraged to use that API at the *application* layer, because endpoints are more flexible.
But will that "lower-level than endpoints" API need to be TCP-specific? We don't have a listenSerialPort, so why do we need listenTCP (by the way, we should totally have a SerialPortEndpoint)? I understand your point that some reactors provide addReader while others provide addOverlappedIOObject (or whatever the heck it's called), but an Endpoints implementation can DTRT based on the interfaces that the supplied reactor provides, so I don't see why TCPServerEndpoint can't just instantiate a port and call addReader/addWriter or the IOCPreactor equivalent. And maybe we need a better way to deal with the differences between those reactors, but I don't see why that requires us to have public transport-specific methods on the reactor. However, of course I'm not advocating we jump the gun on deprecating listenTCP and so forth. We shouldn't deprecate them now, and when we do, we should make it an *extremely* long period of PendingDeprecation followed by Deprecation. -- Christopher Armstrong http://radix.twistedmatrix.com/ http://planet-if.com/
On Jun 7, 2010, at 6:07 PM, Christopher Armstrong wrote:
But will that "lower-level than endpoints" API need to be TCP-specific?
We don't have a listenSerialPort, so why do we need listenTCP (by the way, we should totally have a SerialPortEndpoint)?
We totally *should* have a SerialPortEndpoint, but I definitely prefer the way that calling 'reactor.listenTCP(...)' implicitly selects a back-end to the platform-detection junk at the bottom of twisted.internet.serialport. The wonky platform-detection has some practical implications, as shown here: <http://twistedmatrix.com/trac/ticket/3802>. I would actually prefer a nice 'AttributeError: SelectReactor has no attribute "listenSerialPort"' to the mess that it currently produces.
I understand your point that some reactors provide addReader while others provide addOverlappedIOObject (or whatever the heck it's called), but an Endpoints implementation can DTRT based on the interfaces that the supplied reactor provides, so I don't see why TCPServerEndpoint can't just instantiate a port and call addReader/addWriter or the IOCPreactor equivalent.
And maybe we need a better way to deal with the differences between those reactors, but I don't see why that requires us to have public transport-specific methods on the reactor.
Well, we need to deal with the differences *somehow*. Currently, we just have a listenTCP on the reactor. This makes the implementation of endpoints dead simple, and it doesn't really vary at all per-platform. It seems to work okay; the main drawback, as far as I can tell, is that it deepens the inheritance hierarchy of the reactors a bit, because we use inheritance to grab the many similar method implementations that many similar reactors share. In the current model, we can add new types of socket-ish things - the ones that I can think of which may one day actually matter are Java SelectableChannel objects and maybe System.Net.Socket.Select if we ever support Jython and IronPython, respectively - by adding new reactors, and these things need new reactors anyway in order to invoke the multiplexors that understand these objects. So that sorta makes sense to me. You call listenTCP on something somewhat opaque, and the opaque thing knows how to give you back an appropriate IListeningPort, ITransport, etc. With the reactor plugin API, we can even add these new types of things outside of Twisted, pretty straightforwardly. This dispatch point *could* be handled internally in each endpoint implementation, and that does seem hypothetically more elegant to me, since it's the endpoint that should care about the TCP-ness and the reactor that should care about the file-descriptor-ness, abstractly. But at some point, the rubber must meet the road, and IReactorFDSet needs to be mapped to twisted.internet.tcp.Port, and twisted.internet.tcp.Connector, or similar, and IReactorIOCPMumbleMumble (btw, this interface should exist) to twisted.internet.iocpreactor.tcp.Port, etc. If I want to do that mapping under the current system, I just implement my own reactor plugin and implement listenTCP to do the right thing. Under a new system where the endpoint was responsible for handling that mapping, can I add a reactor plugin that has a meaningfully different idea of what a multiplexable I/O resource is? I have some vague ideas... something sort of shaped like an adapter registry, maybe? A different plugin API, for things to talk specifically to the TCP endpoint code? I can brainstorm up some vague ideas but none of them really sit right, and certainly none of them are as straightforward as just implementing a bunch of methods specified by interfaces. Now, I'm pretty sure you know how I feel about this, Chris, but just so nobody else takes away the wrong conclusion from this: I'm not saying that the current internal reactor factoring is perfect, or even particularly good. That code needs a lot of cleanup, a lot of documentation, and in many places a general sorting-out of what really constitutes the intended public API. It's far too hard to implement an external reactor, because you can't sensibly inherit any of that code which all reactors practically need to inherit. It's mostly undocumented and it has even changed incompatibly a few times. For example, <http://twistedmatrix.com/trac/changeset/24132> incompatibly changed the signature of 'tcp.Port.__init__'. However, it seems like a more straightforward job to me to figure out a reasonable signature for tcp.Port.__init__, and to do some general de-duplication of code between the IOCP modules and the UNIX-file-descriptor modules, than to come up with a whole new way to correlate endpoint implementations to their respective concrete transport implementations. This quality problem in the core multiplexing code is actually one of the reasons I want to vocally defend the current architecture. Until we have a defined path forward, with a clear design that's better in a way that has some positive, practical ramifications, I don't want to put anyone off doing this necessary maintenance work in twisted.internet. There's not a huge amount of maintenance going on in that area already, but if we adopt the attitude of "Oh, let's forget about that, it's going to get deprecated anyway", I have a feeling the amount of work will drop to zero.
However, of course I'm not advocating we jump the gun on deprecating listenTCP and so forth. We shouldn't deprecate them now, and when we do, we should make it an *extremely* long period of PendingDeprecation followed by Deprecation.
Well, Python 2.7 isn't going to display DeprecationWarning by default anyway, so I think PendingDeprecation may be pointless. But I definitely echo the sentiment, regardless.
On Jun 6, 2010, at 4:18 PM, Christopher Armstrong wrote:
... We also don't want to have connectSOCKS ...
Since I basically said that we *did* possibly want listenSerialPort as an alternative to the half-working platform-detection stuff that twisted.internet.serialport does, I do want to draw a distinction here. Although we might want to have a SOCKS enpdoint, 'connectSOCKS' is something that there is _no_ reason I can see for any reactor to have in the future. SOCKS is a protocol that is implemented on top of TCP. If you want a SOCKS endpoint, it doesn't need extra support from the reactor; it can be formulated entirely in terms of TCP endpoints. Even if you're doing something weird like wrapping a C SOCKS library, that library would almost certainly have platform limitations, and provide you with an object that exposed a file descriptor anyway, so you could use IReactorFDset. (But, practically speaking, you probably just want to use something like 'socksify' anyway, for outgoing connections.)
On Sun, 2010-06-06 at 21:59 +0200, Laurens Van Houtven wrote:
pyzmq offers something called select, which works just like select except it works on both file descriptors and ZeroMQ Sockets. It just delegates all of the work to libzmq. We could use ThreadedSelectReactor and have it use ZMQ's select. I'm not sure if it should use "normal" select everywhere else: because zmq's select is in fact much better than select.select (it just behaves like select.select in the sense that you give it three sets of fds and an optional timeout; under the hood it's actually epoll or kqueue or whatever) and it can handle plain old file descriptors just fine. So, you'd have a TRS with either 1 zmq.select running on everything or 1 zmq.select running over Sockets and 1 select.select running over your classic fds. Personally I kind of like the idea of zmq's select taking over, but I don't know how well that works in practice.
You should probably just make a ZeroMQ reactor, instead of using TSR, which (in its current form) is an ugly hack. TSR buys you nothing because, unfortunately, it doesn't let you hook up to arbitrary reactors. If the API really is select() compatible this presumably would be a fairly trivial subclass of the reactor in twisted.internet.selectreactor.
On 6 Jun, 07:59 pm, lvh@laurensvh.be wrote:
Hey,
For the Twisted folks: this thing has been reviewed by the ZeroMQ folks first because I wanted to be sure I got the technical details right on the their side of things.
I'd like to open up a discussion from a while back regarding the integration of ZeroMQ (a messaging system: similar to AMQP but with the intent to be simpler) into Twisted.
The interested ZeroMQ people and the interested Twisted people (names withheld to protect the guilty) disagreed on what it should look like. I think that's mostly because neither party really understood what the other's software wanted to do. So, I'll try to give everyone a basic explanation without going too deep into either Twisted or ZeroMQ: my apologies if I spell out the basics of your thing too much and it gets boring :)
ZeroMQ aims to be a thin layer above TCP, behaving like TCP but 'better'. That sounds like a vague marketing statement, but it helps to understand some of the terminology if you keep that in the back of your head. (What exactly 'better' means is way beyond the current scope: basically, ZeroMQ wants to help socket programmers to stop reinventing the wheel by implementing common behavior such as pub/sub, request/reply...). Essentially AMQP but much simpler, and brokerless in most cases. This email is already going to go way over the sane character count, thankfully the ZeroMQ webpage does a great job at explaining stuff :-)
I think this highlights the main problem people had. There a partial overlap between Twisted and ZeroMQ. The ZeroMQ implementation does things Twisted does too: it implements a bunch of low level networking stuff using eg epoll. It deals with real sockets, and Twisted wants to do that as well.
ZeroMQ uses things called Sockets. They're similar but not the same thing as TCP sockets (instead delegating work to TCP eventually), so you can't use traditional methods like select or epoll with them, because, for example, they don't have file descriptors. Some underlying thing probably does have fds; but ZeroMQ worries about that for you under the hood, just like Twisted does for other TCP traffic.
There are a couple of options for making ZeroMQ work with Twisted:
1) implement everything in Python, using Twisted's TCP stuff. I think this is mostly a bad idea and the ZeroMQ people seem to agree: _lots_ of work, ZeroMQ libs are stupidly fast already, Python not being the best tool for binary protocols... 2) write a thin wrapper around the C(++) libs: great, as long as it never has to go into the Twisted trunk 3) use pyzmq's thin wrapper around the C(++) libs: sounds like the best idea to me, again with reservations wrt the Twisted trunk
Originally there was a fourth idea, which considered libzmq as a new mechanism: like epoll, so you'd have a ZMQ-specific reactor. A bunch of people didn't like this, and I can somewhat see the point: hard to integrate with other event loops like GUIs, for example.
pyzmq offers something called select, which works just like select except it works on both file descriptors and ZeroMQ Sockets. It just delegates all of the work to libzmq. We could use ThreadedSelectReactor and have it use ZMQ's select. I'm not sure if it should use "normal" select everywhere else: because zmq's select is in fact much better than select.select (it just behaves like select.select in the sense that you give it three sets of fds and an optional timeout; under the hood it's actually epoll or kqueue or whatever) and it can handle plain old file descriptors just fine. So, you'd have a TRS with either 1 zmq.select running on everything or 1 zmq.select running over Sockets and 1 select.select running over your classic fds. Personally I kind of like the idea of zmq's select taking over, but I don't know how well that works in practice.
A shortcoming of this approach is that much of the inefficiency of select(2) comes from its API. If you have a select(2)-compatible API that's implemented in terms of epoll, you're still wasting a ton of effort that you could be skipping if you were using an epoll-compatible API instead. But this is only an argument about performance, and likely no one is going to care about the poor performance of zmq.select anyway.
A potential option for Twisted, which some people don't quite like, would be to have a listenZMQ and connectZMQ, analogous to listenTCP/listenUDP/listenSSL and the respective connect*s. I think this makes more sense to the ZeroMQ people (who think of ZeroMQ as a layer "next to" TCP which happens to be implemented on top of TCP, on top of which you build your stuff) than the Twisted people (who think of ZeroMQ's protocol as yet another TCP-using protocol just like HTTP for example). Having worked with both pieces of software, the more I play with ZeroMQ the more I think listenZMQ/connectZMQ make sense. ZeroMQ really tries to be one of those things and it shows. What ZeroMQ wants to do is semantically much closer to the existing connects and listens. I'm not just making this up: the ZeroMQ people have reviewed this and this is really what ZeroMQ wants to be.
A shortcoming of this approach is that as a reactor method, you have to implement it for each reactor you want to support. You covered this a bit earlier in your email, where you talked about GUI integration. Do you want to maintain an implementation of {listen,connect}ZMQ for select(/whatever), Glib2, Gtk2, wxWidgets, Qt, and Windows? That's a lot more work than just maintaining one implementation.
Another argument for making ZMQ special is that TCP is just one of the things ZeroMQ works with. UNIX domain pipes, PGM reliable multicast, UDP PGM encapsulation, and even inter-thread communication.
You got this one backwards. This is an argument for not implementing ZMQ at the same level as TCP and UNIX sockets. This is an argument for implementing it *on top of* those things. Of course, the main benefit of implementing it on top of them is that you don't have to write a bunch of code to support each transport. And the ZMQ people did that already. Here's how it should work (modulo stupid factoring issues that aren't really related to ZMQ issues), given that there's a big C library that already implements a bunch of stuff that you don't want to re-implement: from twisted.internet.interfaces import IReactorFDSet class ZMQTransport(object): def __init__(self, reactor, zmqSocket, protocol): self._zmqSocket = zmqSocket self._transportPieces = [] # On the next line, I use a method which I made up. Maybe it # corresponds to some actual API ZMQ provides, maybe not, I # dunno. for fd in zmqSocket.allFileDescriptors(): desc = _ZMQFileDescriptor(reactor, fd, zmqSocket) self._transportPieces.append(desc) self._protocol = protocol self._protocol.makeConnection(self) class _ZMQFileDescriptor(object): def __init__(self, reactor, fd, zmqSocket): if not IReactorFDSet.providedBy(reactor): raise RuntimeError( "This is the IReactorFDSet implementation; " "use another reactor or another zmq transport.") self._reactor = reactor self._reactor.addReader(self) self._fd = fd self._zmqSocket = zmqSocket def doRead(self): # Another made up method zmqEvents = self._zmqSocket.nonBlockingReadFrom(self._fd) if zmqEvents: self._protocol.zmqEventsReceived(zmqEvents) def doWrite(self): # One more, for luck. finished = self._zmqSocket.nonBlockingWriteTo(self._fd) if finished: self._reactor.removeWriter(self) def fileno(self): return self._fd def sendZMQEvents(self, events): # Whatever the API is. self._zmqSocket.sendZMQEvents(events) self._reactor.addWriter(self) class ZMQProtocol(object): def makeConnection(self, zmqTransport): self.zmqTransport = zmqTransport def zmqEventsReceived(self, zmqEvents): pass def connectZMQ(reactor, addrinfo, factory): # Blah blah blah - somehow get to the point where you have a # ZMQ Socket. d = ... def cbConnectionSetup(socket): ZMQTransport( reactor, socket, factory.buildProtocol(addrinfo)) d.addCallback(cbConnectionSetup) def main(): from twisted.internet import reactor from twisted.internet.protocol import ClientFactory f = ClientFactory() f.protocol = ZMQProtocol connectZMQ(reactor, ('example.com', 1234), f) reactor.run() Okay, so that came out a little longer than I planned, but turn about is fair play. Anyway, this is a bog standard transport implementation. The only thing even remotely interesting is that it maps multiple file descriptors onto a single transport. And that seems to be the So, if the ZMQ library offers APIs like the ones used in this example, then you're all set. With just a little more code, you can have an overlapped I/O version of this transport (for the one Twisted reactor that doesn't support IReactorFDSet). And then you've got proper Twisted ZMQ support. If it *doesn't* offer APIs like these, then I'd say it's missing some pretty critical APIs. After all, if you can't drive it this way, your chances of being able to write reasonable unit tests for ZMQ-based code are somewhat diminished (not out the window, but it'll be annoying). And I don't understand how you would implement something like ZMQ in a way that *didn't* make it easy to do this. *Particularly* since they have support for several different event notification APIs. So hopefully the worst case is that there are no APIs like these, but it's a minor oversight because the authors thought no one would want them, but they can be added trivially because they map directly onto how the underlying implementation works.
I know some Twisted people way smarter than me basically thought the connectZMQ/listenZMQ thing was a mistake, but I'm not sure to what extent that is because they were right and to what extent that was because they didn't really know very much about ZeroMQ and just went "it works on top of TCP so that's not where it goes". To Twisted folks that disagree: would you change your opinion of ZMQ was *really* something that's side-by-side with TCP instead of being implemented on top of it? Like, say, SCTP is? Does the fact that it can work on top of a bunch of stuff that isn't TCP change that?
If ZMQ were supported in the kernel with new syscalls to interface with it, then it would be nonsensical to talk about implementing it on top of Twisted's existing TCP support. You simply couldn't, because all of the code would have been pushed into the kernel where it can't be used any other way. This doesn't mean it would be a good idea overall to have ZMQ supported at the same level as TCP, though: it just means there would be no other alternative (aside from not supporting it - like what Twisted for SCTP). Whether or not it makes any sense to implement ZMQ in the kernel is something I have no opinion on, since I don't know nearly enough about the particular details of ZMQ.
Talking with the ZeroMQ people has been a positive experience: they were very accessible and cooperative, and really just want a bigger market for their software (who doesn't?) so I hope something useful comes out of this :-)
Great! Convince them to add the necessary APIs (if they don't exist already) from above and everything should be set. :) Jean-Paul
On 04:54 am, exarkun@twistedmatrix.com wrote:
[snip]
from twisted.internet.interfaces import IReactorFDSet
class ZMQTransport(object): [snip]
class _ZMQFileDescriptor(object): [snip] def sendZMQEvents(self, events): # Whatever the API is. self._zmqSocket.sendZMQEvents(events) self._reactor.addWriter(self)
The sendZMQEvents method clearly belongs on ZMQTransport. This complicates the `addWriter` call slightly - but only to the extent of either selecting the correct _ZMQFileDescriptor to pass, or more likely just looping over all of them and calling addWriter for each. Hopefully the idea was clear despite this mistake.
[snip]
def connectZMQ(reactor, addrinfo, factory): [snip]
Also, I meant to mention here that it doesn't really matter if this is a connectXYZ-style function or something using endpoints. That's just a minor API question and all the other ZMQ-related stuff is the same whichever decision you make. Jean-Paul
So, I agree pretty much completely with everything exarkun said, but I do feel like I should add a bit more here about the high-level questions raised here: On Jun 6, 2010, at 3:59 PM, Laurens Van Houtven wrote:
A potential option for Twisted, which some people don't quite like, would be to have a listenZMQ and connectZMQ, analogous to listenTCP/listenUDP/listenSSL and the respective connect*s.
So, listenTCP/listenUDP are very different from listenSSL. JP already made an oblique reference to this when talking about ZMQ possibly being implemented in the kernel. listenTCP and listenUDP are different kernel-level things. Not only are they implemented differently, they have different semantics and interact with different interfaces. UDP is datagram-oriented, TCP is stream-oriented. listenSSL, on the other hand, is a stream transport, implemented in userspace, by a C library. It can be (and actually is, in twisted.protocols.tls) implemented as a regular TCP IProtocol along with providing its own stream-oriented ITransport. There are a couple of reasons that listenSSL and startTLS are implemented as reactor and transport methods, and none of them have to do with the intrinsic specialness of TLS itself: At the time we wrote them, the APIs to implement twisted.protocols.tls simply weren't available. So, we used the mechanisms available to us to interface with the available library at the time, and that meant having a reactor method. The reason that the code remains now that we have a protocol implementation is that the C code in OpenSSL is faster at getting bytes out of a socket than Twisted; it can do less memory copying while parsing the protocol, and efficiency is really important in TLS; you can visibly notice it when a little extra memory copying starts happening at that layer. Nevertheless, when we encounter a situation which that library doesn't support, such as in the IOCP reactor, we need an implementation that can work with Twisted's native I/O APIs; this becomes a tradeoff between a scalable multiplexor and a slightly faster recv() code-path. As far as I'm aware, nobody's done any particular benchmarks on that one, but I would guess that you win a little and you lose a little and it tends to balance out. Still, when it's possible to gain a little efficiency by doing so, it does make some sense for it to be its own transport API. This may also apply to ZMQ, since they appear to be obsessed with performance. (Although that does beg the question why they seem to recommend a 'select'-style API, when as JP notes, that form of API is not great for performance.)
I think this makes more sense to the ZeroMQ people (who think of ZeroMQ as a layer "next to" TCP which happens to be implemented on top of TCP, on top of which you build your stuff)
I still hold that the ZMQ people are somewhat confused, and I believe that this very basic breakdown in their spatial reasoning is a good indication of how ;-). If you inhabit the same physical reality that I do, you may have noticed that one object cannot, in fact, be both "next to" and "on top of" something else. These are distinct coordinates.
than the Twisted people (who think of ZeroMQ's protocol as yet another TCP-using protocol just like HTTP for example). Having worked with both pieces of software, the more I play with ZeroMQ the more I think listenZMQ/connectZMQ make sense. ZeroMQ really tries to be one of those things and it shows. What ZeroMQ wants to do is semantically much closer to the existing connects and listens. I'm not just making this up: the ZeroMQ people have reviewed this and this is really what ZeroMQ wants to be.
More seriously, I don't think you should care what ZeroMQ "wants to be". The question isn't one of existential confusion, it's a practical question of what exactly the library *does*, and what a sensible way to integrate that with Twisted is. To avoid confusion about endpoints vs. reactor methods, I think it's safe to say that you have three implementation options: let's call them "ZMQProtocol", "ZMQTransport", and "ZMQReactor". The thing that you appear to be talking down over and over again, implementing ZeroMQ as a 'regular TCP' IProtocol provider, does not sound like a viable option. The advantage of this option is that it would allow you to transport ZMQ messages over completely arbitrary Twisted ITransport providers and IReactorTCP providers. However, you've never talked about wanting to do that. The disadvantages are that it doesn't sound like it makes sense to you, none of the APIs are exposed, and it generally goes against the grain of the library. So let's forget about that. (Again, it doesn't matter if ZMQ "really is" a layer "next to" or "on top of" TCP or whatever: if the library makes this difficult or impossible, then it doesn't matter where its true soul lies.) JP's option, ZMQTransport, suggests that you should implement it as an IReadDescriptor/IWriteDescriptor. That works if the ZeroMQ library will expose the file descriptors it's using to you. The advantage of this option is that it will work with an arbitrary IReactorFDSet implementation, which basically all of the reactors which can run on a UNIX-like OS are. Also, as JP has described, it's probably not too much code. You can use it with GUI integration, even GUI integration on Windows, and it should work fine. The disadvantages of this option are that apparently ZMQ is going to need to change, because it doesn't want to expose its file descriptors to Python, and it may be complicated to juggle them, depending on when it opens and closes sockets in response to the inner workings of the library. For example, can one "send a ZMQ event" open 3 UDP sockets and a TCP socket, do a bunch of stuff with them, and shut some of them down? Do multiple logical transports, ahem, I mean, "Sockets" (good job naming that, ZMQ guys) ever share their underlying TCP sockets, and thereby require independent management? I don't know, but I can imagine that it might, and that could be a pain to expose sensibly. The third option, which you've discussed, is implementing a reactor in terms of pyzmq's existing multiplexing mechanisms. One advantage of this approach is that it will support ZMQ the most naturally; you can just call the relevant APIs. One advantage which it *may* have - I'm not quite sure - is performance. It may be possible for the ZMQ library to do a bunch of work inside zmq.select() without talking to Twisted's abstractions at all. And while Twisted can be pretty fast, especially for Python, I have never even *heard* of anyone trying to run it over InfiniBand, and if they did, I would not expect 8 million messages per second on any hardware I can think of; the mainloop has too much overhead. Based on some back-of-the-envelope (and probably highly inaccurate) math, Python *bytecode execution* is too much overhead to get that level of performance; I'm kind of skeptical that they even get it in C without benchmark hax of some kind; but nevertheless, they advertise this performance on their home page and they obviously care about it quite a lot. It's not going to speed up your Twisted code at all, of course, and I have no idea if ZMQ messaging dominates your workload, so it may be a negligible gain. The disadvantages of this approach, as several people have already pointed out, are that it won't work with GUI integration, or any custom third-party reactors, or... well, pretty much any features except the ones you explicitly build in yourself. Also, if you want to properly stick to public APIs and build this as an extension to Twisted, you may find yourself rewriting some of the code in twisted.internet, or inheriting some public-but-ugh-we-wish-it-weren't classes. This option may be somewhat labor intensive on the Twisted side of things, although as you note, it will probably be pretty easy with ZMQ. It shouldn't be *too* hard though, and if you're willing to resort to heinous unsupported hacks, you could do something like subclass PollReactor and just replace '_poller' with a poller from zmq.poll, which is at least advertised to be compatible (although I suspect that the reality may fall short slightly, as it often does). Based on this analysis, which is far more thorough than I really wanted to do :(, it sounds to me like ZMQTransport and ZMQReactor are both somewhat feasible, and have overlapping advantages and disadvantages which may make each of them an attractive option in different circumstances. There are probably situations where even ZMQProtocol would make sense. However! In BOTH of these options, you're going to need to define, implicitly or explicitly, IZMQTransport and IZMQProtocol interfaces, stipulating the interaction between the transport layer of your ZMQ API and the protocol layer which applications implement. Maybe pyzmq already outlines this for you, maybe not; but the point is, you should really be focusing on defining *that* interface in a way that makes sense. The rest of this stuff is all implementation details. If you define those interfaces well, then whatever integration option you start with, you should be able to change the internal implementation, or perhaps even use multiple implementations. For example, you may discover that the performance thing is actually significant, and want to use ZMQReactor on your back-end servers, but eventually write some client-side GUI tools which also want to use ZMQ but aren't quite as performance-sensitive. I personally have little interest in ZMQ itself, but I think this general pattern stands for any large, existing C protocol library that someone might want to integrate with Twisted. In most cases the 'reactor' option probably isn't there, but 'is it a protocol or is it a transport' would be a FAQ, if there were more in the way of large, useful C libraries that did async networking stuff :).
participants (6)
-
Christopher Armstrong
-
exarkun@twistedmatrix.com
-
Glyph Lefkowitz
-
Itamar Turner-Trauring
-
Konrads Smelkovs
-
Laurens Van Houtven