[Python-Dev] test_multiprocessing: test_listener_client flakiness

Amaury Forgeot d'Arc amauryfa at gmail.com
Wed Jun 18 18:35:26 CEST 2008


2008/6/18 Trent Nelson <tnelson at onresolve.com>:
> I gave my Windows buildbots a little bit of TLC last night.  This little chestnut in test_multiprocessing.py around line 1346 is causing my buildbots to wedge more often than not:
>    def test_listener_client(self):
>        for family in self.connection.families:
>            l = self.connection.Listener(family=family)
>            p = self.Process(target=self._test, args=(l.address,))
>            p.set_daemon(True)
>            p.start()
>            conn = l.accept()
>            self.assertEqual(conn.recv(), 'hello')
>            p.join()
>            l.close()
> The wedging will be a result of that accept() call.  Not knowing anything about the module or the test suite, I can only assume that there's a race condition introduced between when the subprocess attempts to connect to the listener, versus when the l.accept() call is actually entered.  (On the basis that a race condition would explain why sometimes it wedges and sometimes it doesn't.)
> Just FYI, the error in the buildbot log (http://www.python.org/dev/buildbot/all/x86%20W2k8%20trunk/builds/810/step-test/0) when this occurs is as follows:
> test_multiprocessing
> command timed out: 1200 seconds without output
> SIGKILL failed to kill process
> using fake rc=-1
> program finished with exit code -1
> remoteFailed: [Failure instance: Traceback from remote host -- Traceback (most recent call last):
> Failure: buildbot.slave.commands.TimeoutError: SIGKILL failed to kill process
> ]
> (The fact it can't be killed cleanly is a bug in Twisted's signalProcess('KILL') method, which doesn't work against Python processes that have entered accept() calls on Windows (which present the 'wedged' behaviour and have to be forcibly killed with OpenProcess/TerminateProcess).)

I just found the cause of the problem ten minutes ago:
It seems that when a socket listens on the address "" or
"localhost", another process cannot connect to it using the machine's
name (even from the same machine).
The best seems to listen with the empty address "".

Index: Lib/multiprocessing/connection.py
--- Lib/multiprocessing/connection.py   (revision 64374)
+++ Lib/multiprocessing/connection.py   (working copy)
@@ -49,7 +49,7 @@
     Return an arbitrary free address for the given family
     if family == 'AF_INET':
-        return ('localhost', 0)
+        return ('', 0)
     elif family == 'AF_UNIX':
         return tempfile.mktemp(prefix='listener-', dir=get_temp_dir())
     elif family == 'AF_PIPE':

And the test started to pass for me.
Can you please check this in if it works; I don't have svn access for
the moment.

Amaury Forgeot d'Arc

More information about the Python-Dev mailing list