interrupted system call w/ Queue.get

Dan Stromberg drsalists at gmail.com
Fri Feb 18 03:49:23 CET 2011


On Thu, Feb 17, 2011 at 5:46 PM, Philip Winston <pwinston at gmail.com> wrote:

> We have a multiprocess Python program that uses Queue to communicate
> between processes.  Recently we've seen some errors while blocked
> waiting on Queue.get:
>
> IOError: [Errno 4] Interrupted system call
>
> What causes the exception?  Is it necessary to catch this exception
> and manually retry the Queue operation?  Thanks.
>
> We have some Python 2.5 and 2.6 machines that have run this program
> for many 1,000 hours with no errors.  But we have one 2.5 machine and
> one 2.7 machine that seem to get the error very often.
>

You're getting this:

#define  EINTR     4 /* Interrupted system call */

It most likely means that a signal is interrupting a system call (an
interaction with the kernel, not just os.system).

Google for EINTR for more info.

Whatever callable is getting this error, at root, probably should be dealing
with retrying the system call that got interrupted.  However, it can
sometimes be dealt with higher in the call stack, as a sort of bandaid.

This is sadly frequently ignored in application programming.  It's some
kernel dev's favorite complaint about us.

One good way of dealing with this is to use the following function, wrapped
around any system calls:

def retry_on_eintr(function, *args, **kw):
   while True:
      try:
         return function(*args, **kw)
      except OSError, e:
         if e.errno == errno.EINTR:
            continue
         else:
            raise

If you find standard library code that's failing to deal with EINTR, it's
legitimate to submit a bug report about it.

Another way around the issue is to avoid signals.  But that's not always
practical.

How can you tell what's using system calls?  It can be hard, but you can
usually use strace or truss or ktrace or par or trace to find out.  You can
also add a little code to log all your tracebacks somewhere, inspect the log
periodically, and add a retry_on_eintr as needed in the places identified.
Also, most *ix's will have a .h somewhere listing them - on one of my Ubuntu
systems, it's at /usr/include/bits/syscall.h.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20110217/71942b73/attachment.html>


More information about the Python-list mailing list