[IPython-dev] pyzmq problems in sending shell messages to a kernel
Jason Grout
jason-sage at creativetrax.com
Wed Feb 12 13:45:32 EST 2014
Hi everyone,
I'm trying to track down a problem we're seeing in the Sage cell server
with sending computation messages to an IPython kernel. This may end up
being a problem with using pyzmq or zmq, so apologies in advance if it
turns out to be OT for this list.
The tl;dr version is: it appears that in some very sporadic cases, pyzmq
is sending a message (an execute_request message) to a kernel's shell
channel tcp port on localhost, but wireshark never registers that
message being sent, and the kernel that is supposed to receive the
message never acts on it. My question is: does anyone have suggestions
on debugging this or narrowing down the problem?
The (abbreviated, simplified) long version: in the sage cell server, we
start up a number of IPython kernels that we keep waiting around for
computations. When a computation is requested, we hook up the kernel's
shell/iopub/heartbeat channels (i.e., create pyzmq zmqstream objects
connecting to the tcp ports corresponding to the kernel's
shell/io/heartbeat channels), send an execute_request, and assemble an
answer for the user from output coming back on the iopub channel. When
the system is under moderate load, every now and then (maybe every 300
computations), we send an execute_request message to one of these
kernels that is waiting around, and I see the zmq socket code claiming
that it sent the message, but wireshark indicates that the message was
never transmitted when looking at raw tcp traffic, and the kernel acts
like it never received the message. We didn't change the high water
mark for zmq, and I'm running zmq 3.2.2 and pyzmq 14.0.1. I've spent a
long time narrowing the issue down to a zmq message not being sent, even
though pyzmq seems to have thought it sent it. Does anyone have any
suggestions for narrowing this down more, or possible causes?
I realize that my setup is a bit complicated, and I've tried to simplify
the issues (but hopefully not too much). Any suggestions or help would
be appreciated. The next thing I'm going to do is (a) upgrade zmq to
4.x, and (b) insert some debugging statements in the zmq library itself
to see if the C zmq library thinks it sent the message.
Thanks,
Jason
More information about the IPython-dev
mailing list