[issue22331] test_io.test_interrupted_write_text() hangs on the buildbot FreeBSD 7.2

Mon May 30 08:02:49 EDT 2016

Martin Panter added the comment:

This recently hung AMD64 FreeBSD 9.x 3.5. The stack trace was different, and there is only one thread:

http://buildbot.python.org/all/builders/AMD64%20FreeBSD%209.x%203.5/builds/828/steps/test/logs/stdio
[398/398] test_io
Timeout (0:15:00)!
Thread 0x0000000801807400 (most recent call first):
  File "/usr/home/buildbot/python/3.5.koobs-freebsd9/build/Lib/unittest/case.py", line 176 in handle
  File "/usr/home/buildbot/python/3.5.koobs-freebsd9/build/Lib/unittest/case.py", line 727 in assertRaises
  File "/usr/home/buildbot/python/3.5.koobs-freebsd9/build/Lib/test/test_io.py", line 3714 in check_interrupted_write
  File "/usr/home/buildbot/python/3.5.koobs-freebsd9/build/Lib/test/test_io.py", line 3743 in test_interrupted_write_text

Also, x86 Ubuntu Shared 2.7 hung, but the only information I have is it was running test_io.

In my Free BSD case, the write() call is stuck, but in Victor’s original case, the background read() call is stuck. I could explain both cases as a race condition with the alarm signal being delivered:

Victor’s case: SIGALRM delivered somewhere inside assertRaises(), but before the write() system call, and Python has a chance to call the Python signal handler. No data is ever written, so the background read() hangs.

My case: SIGALRM delivered just before or as write() starts, so Python has no chance to interrupt the system call and call its own Python handler.

I wonder if we can change the test to only deliver a signal after first reading one byte, i.e. send the signal directly from the background thread. I think that should cover 99% of cases, though in theory it is possible for something else to interrupt write(), and our signal to be delivered while write() was restarting, causing it to hang. But the combination of those two events would be so unlikely it may not be worth worrying about.

Also, if you close the read end of the pipe before the write end, it should protect against flush() hanging without the EBADF hack by raising EPIPE instead.

I suggest rewriting the background thread like:

def _read():
    s = os.read(r, 1)
    read_results.append(s)
    # The main thread is very likely to be inside the write() syscall now, so interrupt it
    os.kill(os.getpid(), SIGALRM)

and cleaning up the pipe like:

finally:
    os.close(r)
    try:
        wio.close()
    except BrokenPipeError:
        pass

----------
nosy: +martin.panter
status: closed -> open
versions: +Python 2.7, Python 3.6 -Python 3.4

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue22331>
_______________________________________