[IPython-dev] Qt/Curses interfaces future: results of the weekend mini-sprint (or having fun with 0mq)
ellisonbg at gmail.com
Thu Mar 25 15:16:12 EDT 2010
>> * The GIL kills. Because IPython is designed to execute arbitrary
>> user code, and our users often run wrapped C/C++ libraries, it is not
>> uncommon for non-GIL releasing code to be run in IPython. When this
>> happens, any Python thread *completely stops*. When you are building
>> a robust distributed systems, you simply can't have this. As far as I
>> know all Python based networking and RPC libraries suffer from this
>> same exact issue. Note: it is not enough that the underlying socket
>> send/recv happen with the GIL released.
> That sounds intriguing! How 0MQ is different in this regard, does it
> maintain its own threads inside independent of GIL?
0MQ is written in C++ and it maintains its own native treads for
network IO and message queueing. The Python bindings are careful to
release the GIL when calling into 0MQ as well. The result is that 0MQ
sockets can continue to do network IO and message queueing while
Python holds the GIL.
>> * Performance. We need network protocols that have near ping latencies
>> but can also easily handle many MB - GB sized messages at the same
>> time. Prior to 0MQ I have not seen a network protocols that can do
>> both. Our experiments with 0MQ have been shocking. We see near ping
>> latencies for small messages and can send massive messages without
>> even thinking about it. All of this is while CPU and memory usage is
> It sounds you've found a silver bullet :)
At least the bullet that we needed.
> BTW I use twisted for client/server communication in my projects these days
> and while I never had a need to transfer GB sized messages back and forth,
> I've never had any issues with latencies either, except for the delays
> to some particular network.
Yes, I still like Twisted very much and it the GIL is a constraint
that Twisted has to live with. I think you can get Twisted to handle
large messages though - it is just more work.
>> One of the difficulties that networking libraries in Python
>> face (at least currently) is that they all use strings for network
>> buffers. The problem with this is that you end up copying them all
>> over the place. With Twisted, we have to go to incredible lengths to
>> avoid this. Is the situation different with RPyC?
> Yes string type is an old workhorse in python. I don't know internals of
> but I suspect it uses strings extensively as well. What pyzmq uses instead
For the Python rep of messages we do use strings. But once they are
passed down to the C++ 0MQ code they probably use some STL container
and are careful to not copy. Also, it is possible to have 0MQ use the
buffer of the Python string without copying. But there are some
issues with this that we are still sorting out.
>> * Messaging not RPC. As we have developed a distributed architecture
>> that is more and more complex, we have realized something quite
>> significant: we are not really doing RPC, we are sending messages in
>> various patterns and 0MQ encodes these patterns extremely well.
>> Examples are request/reply and pub/sub, but other more complex
>> messaging patterns are possible as well - and we need those. In my
>> mind, the key difference between RPC is the presence of message queues
>> in an architecture. Multiprocessing has some of this actually, but I
>> haven't looked at what they are doing underneath the hood. I
>> encourage you to look at the example Fernando described. It really
>> shows in significant ways that we are not doing RPC.
> Frankly I think the difference between messaging and RPC is mostly a
> terminological one. A message queues presence really just means that the
> system provides asynchronous services and many RPC frameworks
> provide that. (For some digression: In OO design world they even say
> "send a message to the object" instead of "call an object's method"
> sometimes. Wieird geeks :))
Yes, the terminology is slippery. I guess the other thing that I
think of with messaging is the various messaging patterns and routing
* Publish/subscribe with topic based filtering.
* Request/reply, including load balancing/fair queueing amongst
multiple consumers and producers.
* Peer-to-peer messaging.
* Simple message forwarding.
* General message routing based on endpoint identify.
I think you can implement all of these things with an good two-way,
asynchronous RPC system (like Twisted's perspective broker), but it
can be pretty painful.
>> > The reason is that IPython already has a lot of useful and exciting
>> > functionality and yet another RPC framework is somewhat too much. Plus,
>> > you don't have to think about these too low level details like
>> > communication
>> > protocols, serialization etc.
>> 0MQ is definitely not another RPC framework. If you know that RPyC
>> addresses some or all of these issue I have brought up above, i would
>> seriously love to know. One of these days, I will probably try to do
>> some benchmarks that compare twisted, multiprocessing, RPyC and 0MQ
>> for things like latency and throughput. That would be quite
> Yes, 0MQ is not an RPC framework - it is just a low level protocol (albeit
> probably a good one) that you will use to build your own RPC/RMI/messaging
> system. Frankly I do not see 0MQ to be immune to all the issues you've
> up above unless you'll drop python and code everything in C/C++. In my
> experience latencies and and performance bottlenecks usually came from the
> code that serves messages (i.e. server part) not the transport layer, unless
> develop some high load server with thousands messages per second which is
> not the case for IPython I believe.
Yes, you are right. There are two places we have had performance problems:
* Network protocol and message queueing. Low latency, large messages
basic messaging patterns. 0MQ solving these issues.
* Application logic. Our "servers" and "clients" will still need to
implement non-trivial logic and that may still be a bottleneck for us.
> Please do not think that I'm tying to bash the pyzmq idea, not at all! I
> think it is a
> great idea for IPython and it will be a real fun to implement. I'm just
> trying to
> understand what is so different in IPython that any other RPC/RMI/messaging
> framework can't fit? RPyC along side with Pyro was just the first one that
> to mind when I read Fernando's post but there are a lot of them, see for
> python's wiki for a list: http://wiki.python.org/moin/ParallelProcessing.
> I personally have successfully used another toolkit not mentioned on the
> page - http://www.spread.org - it is a group communication toolkit that
> guarantied message delivery and so called virtual synchrony.
Yes, I have looked at spread before, but probably should spend more
time with it. It is similar to 0MQ, but has a different flavor. But
still, quite impressive. Do you know how the python bindings to
spread handle the GIL stuff?
> I think that when the first excitement ends and you will start to develop
> this new
> interface, you will end up implementing all this functionality that other
> frameworks have or the most of it, so it would be useful to at least check
> before implementation.
I am sure you are right at some level that we will end up implementing
aspects that other frameworks have.
>> Another important part of 0MQ is that is runs over protocols other
>> than tcp and interconnects like infiniband. The performance on
>> infiniband is quite impressive.
> Cool! Any Idea how to utilize it in python/IPython?
IPython has a parallel computing infrastructure that runs on
cluster/supercomputers. We would *love* to be able to use infiniband
for messaging in that context - currently we use twisted over tcp.
Cheers and thank!
> Mikhail Terekhov
Brian E. Granger, Ph.D.
Assistant Professor of Physics
Cal Poly State University, San Luis Obispo
bgranger at calpoly.edu
ellisonbg at gmail.com
More information about the IPython-dev