Race condition deadlock in communicate when threading?

Atherun atherun at gmail.com
Tue Sep 27 03:07:31 CEST 2011


In python 2.6.4 I have a fairly complex system running (copying and
pasting it would be quite difficult).  At its core there are builders
that inherit from threading.Thread.  I have some builders that run
external tasks via popen and read output using communicate.  I have
the ability to run any number of builders in parallel. All of this
uses logging as well, where each builder has their own logger they
create from the base logger i.e. logging.getLogger("Logger.
%s"%self.Name).

When running several threads in parallel that each call popen and
communicate functions, python just hangs, about 95% of the time.  All
of it, parent threads, any threads I started before I got to this
step, it just stops responding.  Stepping through it with pdb trying
to find the deadlock will always make it work, killing the external
app I called via popen, after it has hung, will make it move along as
well. Looking at the stack traces using the code found
http://code.activestate.com/recipes/577334-how-to-debug-deadlocked-multi-threaded-programs/
the threads stop running with the following stacks, this is the last
output from the tracer before it stops responding.:

File: "c:\src\extern\python\lib\threading.py", line 497, in
__bootstrap
  self.__bootstrap_inner()
File: "c:\src\extern\python\lib\threading.py", line 525, in
__bootstrap_inner
  self.run()
File: "c:\src\extern\python\lib\threading.py", line 477, in run
  self.__target(*self.__args, **self.__kwargs)
File: "c:\src\extern\python\lib\subprocess.py", line 877, in
_readerthread
  buffer.append(fh.read())

And

out, err = proc.communicate("change: new\ndescription: %s
\n"%changelistDesc)
File: "c:\src\extern\python\lib\subprocess.py", line 689, in
communicate
  return self._communicate(input)
File: "c:\src\extern\python\lib\subprocess.py", line 903, in
_communicate
  stdout_thread.join()
File: "c:\src\extern\python\lib\threading.py", line 637, in join
  self.__block.wait()
File: "c:\src\extern\python\lib\threading.py", line 237, in wait
  waiter.acquire()

I'm trying to track this down so I can eliminate it for good as it
pops up in multiple places from time to time.

Any tips would be appreciated.



More information about the Python-list mailing list