Re: [Twisted-Python] Integrating Twisted with ZeroMQ

6 Jun 2010

      On 6 Jun, 07:59 pm, lvh@laurensvh.be wrote:
...
Hey,
For the Twisted folks: this thing has been reviewed by the ZeroMQ
folks first because I wanted to be sure I got the technical details
right on the their side of things.
I'd like to open up a discussion from a while back regarding the
integration of ZeroMQ (a messaging system: similar to AMQP but with
the intent to be simpler) into Twisted.
The interested ZeroMQ people and the interested Twisted people (names
withheld to protect the guilty) disagreed on what it should look like.
I think that's mostly because neither party really understood what the
other's software wanted to do. So, I'll try to give everyone a basic
explanation without going too deep into either Twisted or ZeroMQ: my
apologies if I spell out the basics of your thing too much and it gets
boring :)
ZeroMQ aims to be a thin layer above TCP, behaving like TCP but
'better'. That sounds like a vague marketing statement, but it helps
to understand some of the terminology if you keep that in the back of
your head. (What exactly 'better' means is way beyond the current
scope: basically, ZeroMQ wants to help socket programmers to stop
reinventing the wheel by implementing common behavior such as pub/sub,
request/reply...). Essentially AMQP but much simpler, and brokerless
in most cases. This email is already going to go way over the sane
character count, thankfully the ZeroMQ webpage does a great job at
explaining stuff :-)
I think this highlights the main problem people had. There a partial
overlap between Twisted and ZeroMQ. The ZeroMQ implementation does
things Twisted does too: it implements a bunch of low level networking
stuff using eg epoll. It deals with real sockets, and Twisted wants to
do that as well.
ZeroMQ uses things called Sockets. They're similar but not the same
thing as TCP sockets (instead delegating work to TCP eventually), so
you can't use traditional methods like select or epoll with them,
because, for example, they don't have file descriptors. Some
underlying thing probably does have fds; but ZeroMQ worries about that
for you under the hood, just like Twisted does for other TCP traffic.
There are a couple of options for making ZeroMQ work with Twisted:
1) implement everything in Python, using Twisted's TCP stuff. I think
this is mostly a bad idea and the ZeroMQ people seem to agree: _lots_
of work, ZeroMQ libs are stupidly fast already, Python not being the
best tool for binary protocols...
2) write a thin wrapper around the C(++) libs: great, as long as it
never has to go into the Twisted trunk
3) use pyzmq's thin wrapper around the C(++) libs: sounds like the
best idea to me, again with reservations wrt the Twisted trunk
Originally there was a fourth idea, which considered libzmq as a new
mechanism: like epoll, so you'd have a ZMQ-specific reactor. A bunch
of people didn't like this, and I can somewhat see the point: hard to
integrate with other event loops like GUIs, for example.
pyzmq offers something called select, which works just like select
except it works on both file descriptors and ZeroMQ Sockets. It just
delegates all of the work to libzmq. We could use
ThreadedSelectReactor and have it use ZMQ's select. I'm not sure if it
should use "normal" select everywhere else: because zmq's select is in
fact much better than select.select (it just behaves like
select.select in the sense that you give it three sets of fds and an
optional timeout; under the hood it's actually epoll or kqueue or
whatever) and it can handle plain old file descriptors just fine. So,
you'd have a TRS with either 1 zmq.select running on everything or 1
zmq.select running over Sockets and 1 select.select running over your
classic fds. Personally I kind of like the idea of zmq's select taking
over, but I don't know how well that works in practice.
A shortcoming of this approach is that much of the inefficiency of 
select(2) comes from its API.  If you have a select(2)-compatible API 
that's implemented in terms of epoll, you're still wasting a ton of 
effort that you could be skipping if you were using an epoll-compatible 
API instead.

But this is only an argument about performance, and likely no one is 
going to care about the poor performance of zmq.select anyway.
...
A potential option for Twisted, which some people don't quite like,
would be to have a listenZMQ and connectZMQ, analogous to
listenTCP/listenUDP/listenSSL and the respective connect*s. I think
this makes more sense to the ZeroMQ people (who think of ZeroMQ as a
layer "next to" TCP which happens to be implemented on top of TCP, on
top of which you build your stuff) than the Twisted people (who think
of ZeroMQ's protocol as yet another TCP-using protocol just like HTTP
for example). Having worked with both pieces of software, the more I
play with ZeroMQ the more I think listenZMQ/connectZMQ make sense.
ZeroMQ really tries to be one of those things and it shows. What
ZeroMQ wants to do is semantically much closer to the existing
connects and listens. I'm not just making this up: the ZeroMQ people
have reviewed this and this is really what ZeroMQ wants to be.
A shortcoming of this approach is that as a reactor method, you have to 
implement it for each reactor you want to support.  You covered this a 
bit earlier in your email, where you talked about GUI integration.  Do 
you want to maintain an implementation of {listen,connect}ZMQ for 
select(/whatever), Glib2, Gtk2, wxWidgets, Qt, and Windows?  That's a 
lot more work than just maintaining one implementation.
...
Another argument for making ZMQ special is that TCP is just one of the
things ZeroMQ works with. UNIX domain pipes, PGM reliable multicast,
UDP PGM encapsulation, and even inter-thread communication.
You got this one backwards.  This is an argument for not implementing 
ZMQ at the same level as TCP and UNIX sockets.  This is an argument for 
implementing it *on top of* those things.  Of course, the main benefit 
of implementing it on top of them is that you don't have to write a 
bunch of code to support each transport.  And the ZMQ people did that 
already.

Here's how it should work (modulo stupid factoring issues that aren't 
really related to ZMQ issues), given that there's a big C library that 
already implements a bunch of stuff that you don't want to re-implement:

    from twisted.internet.interfaces import IReactorFDSet

    class ZMQTransport(object):
        def __init__(self, reactor, zmqSocket, protocol):
            self._zmqSocket = zmqSocket
            self._transportPieces = []
            # On the next line, I use a method which I made up.  Maybe it
            # corresponds to some actual API ZMQ provides, maybe not, I
            # dunno.
            for fd in zmqSocket.allFileDescriptors():
                desc = _ZMQFileDescriptor(reactor, fd, zmqSocket)
                self._transportPieces.append(desc)

            self._protocol = protocol
            self._protocol.makeConnection(self)

    class _ZMQFileDescriptor(object):
        def __init__(self, reactor, fd, zmqSocket):
            if not IReactorFDSet.providedBy(reactor):
                raise RuntimeError(
                    "This is the IReactorFDSet implementation; "
                    "use another reactor or another zmq transport.")

            self._reactor = reactor
            self._reactor.addReader(self)
            self._fd = fd
            self._zmqSocket = zmqSocket

        def doRead(self):
            # Another made up method
            zmqEvents = self._zmqSocket.nonBlockingReadFrom(self._fd)
            if zmqEvents:
                self._protocol.zmqEventsReceived(zmqEvents)

        def doWrite(self):
            # One more, for luck.
            finished = self._zmqSocket.nonBlockingWriteTo(self._fd)
            if finished:
                self._reactor.removeWriter(self)

        def fileno(self):
            return self._fd

        def sendZMQEvents(self, events):
            # Whatever the API is.
            self._zmqSocket.sendZMQEvents(events)
            self._reactor.addWriter(self)

    class ZMQProtocol(object):
        def makeConnection(self, zmqTransport):
            self.zmqTransport = zmqTransport

        def zmqEventsReceived(self, zmqEvents):
            pass

    def connectZMQ(reactor, addrinfo, factory):
        # Blah blah blah - somehow get to the point where you have a 
# ZMQ Socket.
        d = ...
        def cbConnectionSetup(socket):
            ZMQTransport(
                reactor, socket, factory.buildProtocol(addrinfo))
        d.addCallback(cbConnectionSetup)

    def main():
        from twisted.internet import reactor
        from twisted.internet.protocol import ClientFactory
        f = ClientFactory()
        f.protocol = ZMQProtocol
        connectZMQ(reactor, ('example.com', 1234), f)
        reactor.run()

Okay, so that came out a little longer than I planned, but turn about is 
fair play.  Anyway, this is a bog standard transport implementation. 
The only thing even remotely interesting is that it maps multiple file 
descriptors onto a single transport.  And that seems to be the
So, if the ZMQ library offers APIs like the ones used in this example, 
then you're all set.  With just a little more code, you can have an 
overlapped I/O version of this transport (for the one Twisted reactor 
that doesn't support IReactorFDSet).  And then you've got proper Twisted 
ZMQ support.

If it *doesn't* offer APIs like these, then I'd say it's missing some 
pretty critical APIs.  After all, if you can't drive it this way, your 
chances of being able to write reasonable unit tests for ZMQ-based code 
are somewhat diminished (not out the window, but it'll be annoying).

And I don't understand how you would implement something like ZMQ in a 
way that *didn't* make it easy to do this.  *Particularly* since they 
have support for several different event notification APIs.  So 
hopefully the worst case is that there are no APIs like these, but it's 
a minor oversight because the authors thought no one would want them, 
but they can be added trivially because they map directly onto how the 
underlying implementation works.
...
I know some Twisted people way smarter than me basically thought the
connectZMQ/listenZMQ thing was a mistake, but I'm not sure to what
extent that is because they were right and to what extent that was
because they didn't really know very much about ZeroMQ and just went
"it works on top of TCP so that's not where it goes". To Twisted folks
that disagree: would you change your opinion of ZMQ was *really*
something that's side-by-side with TCP instead of being implemented on
top of it? Like, say, SCTP is? Does the fact that it can work on top
of a bunch of stuff that isn't TCP change that?
If ZMQ were supported in the kernel with new syscalls to interface with 
it, then it would be nonsensical to talk about implementing it on top of 
Twisted's existing TCP support.  You simply couldn't, because all of the 
code would have been pushed into the kernel where it can't be used any 
other way.  This doesn't mean it would be a good idea overall to have 
ZMQ supported at the same level as TCP, though: it just means there 
would be no other alternative (aside from not supporting it - like what 
Twisted for SCTP).

Whether or not it makes any sense to implement ZMQ in the kernel is 
something I have no opinion on, since I don't know nearly enough about 
the particular details of ZMQ.
...
Talking with the ZeroMQ people has been a positive experience: they
were very accessible and cooperative, and really just want a bigger
market for their software (who doesn't?) so I hope something useful
comes out of this :-)
Great!  Convince them to add the necessary APIs (if they don't exist 
already) from above and everything should be set. :)

Jean-Paul

Re: [Twisted-Python] Integrating Twisted with ZeroMQ

exarkun＠twistedmatrix.com