interrupted system call w/ Queue.get

Roy Smith roy at panix.com
Fri Feb 18 15:21:51 CET 2011


In article 
<b3ebf4ec-e0bc-4bfc-8c29-368fee488b15 at l18g2000yqm.googlegroups.com>,
 Philip Winston <pwinston at gmail.com> wrote:

> We have a multiprocess Python program that uses Queue to communicate
> between processes.  Recently we've seen some errors while blocked
> waiting on Queue.get:
> 
> IOError: [Errno 4] Interrupted system call
> 
> What causes the exception?

Unix divides system calls up into "slow" and "fast".  The difference is 
how the react to signals.

Fast calls are things which are expected to return quickly.  A canonical 
example would get getuid(), which just returns a number it looks up in a 
kernel data structure.  Fast syscalls cannot be interrupted by signals.  
If a signal arrives while a fast syscall is running, delivery of the 
signal is delayed until after the call returns.

Slow calls are things which may take an indeterminate amount of time to 
return.  An example would be a read on a network socket; it will block 
until a message arrives, which may be forever.  Slow syscalls get 
interrupted by signals.  If a signal arrives while a slow syscall is 
blocking, the call returns EINTR.  This lets your code have a chance to 
do whatever is appropriate, which might be clean up in preparation for 
process shutdown, or maybe just ignore the interrupt and re-issue the 
system call.

Here's a short python program which shows how this works (tested on 
MacOS-10.6, but should be portable to just about any posix box):

-----------------------------------------------------
#!/usr/bin/env python                                                                               

import socket
import signal
import os

def handler(sig_num, stack_frame):
    return

print "my pid is", os.getpid()
signal.signal(signal.SIGUSR1, handler)
s = socket.socket(type=socket.SOCK_DGRAM)
s.bind(("127.0.0.1", 0))
s.recv(1024)
-----------------------------------------------------

Run this in one window.  It should print out its process number, then 
block on the recv() call.  In another window, send it a SIGUSR1.  You 
should get something like:

play$ ./intr.py 
my pid is 6969
Traceback (most recent call last):
  File "./intr.py", line 14, in <module>
    s.recv(1024)
socket.error: [Errno 4] Interrupted system call

> Is it necessary to catch this exception
> and manually retry the Queue operation?  Thanks.

That's a deeper question which I can't answer.  My guess is the 
interrupted system call is the Queue trying to acquire a lock, but 
there's no predicting what the signal is.  I'm tempted to say that it's 
a bug in Queue that it doesn't catch this exception internally, but 
people who know more about the Queue implementation than I do should 
chime in.

> We have some Python 2.5 and 2.6 machines that have run this program
> for many 1,000 hours with no errors.  But we have one 2.5 machine and
> one 2.7 machine that seem to get the error very often.

Yup, that's the nature of signal delivery race conditions in 
multithreaded programs.  Every machine will behave a little bit 
differently, with no rhyme or reason.  Google "undefined behavior" for 
more details :-)  The whole posix signal delivery mechanism dates back 
to the earliest Unix implementations, long before there were threads or 
networks.  At this point, it's got many layers of duct tape.



More information about the Python-list mailing list