subprocess.Popen stalls

Mon Jan 12 09:50:27 EST 2009

On Mon, 12 Jan 2009 06:37:35 -0800 (PST), "psaffrey at googlemail.com" <psaffrey at googlemail.com> wrote:
>I'm building a bioinformatics application using the ipcress tool:
>
>http://www.ebi.ac.uk/~guy/exonerate/ipcress.man.html
>
>I'm using subprocess.Popen to execute ipcress, which takes a group of
>files full of DNA sequences and returns some analysis on them. Here's
>a code fragment:
>
>cmd = "/usr/bin/ipcress ipcresstmp.txt --sequence /home/pzs/genebuilds/
>human/*.fasta"
>print "checking with ipcress using command", cmd
>p = Popen(cmd, shell=True, bufsize=100, stdout=PIPE, stderr=PIPE)
>retcode = p.wait()
>if retcode != 0:
>	print "ipcress failed with error code:", retcode
>	raise Exception
>output = p.stdout.read()
>
>If I run the command at my shell, it finishes successfully. It takes
>30 seconds - it uses 100% of one core and several hundred MB of memory
>during this time. The output is 220KB of text.
>
>However, running it through Python as per the above code, it stalls
>after 5 seconds not using any processor at all. I've tried leaving it
>for a few minutes with no change. If I interrupt it, it's at the
>"retcode = p.wait()" line.
>
>I've tried making the bufsize really large and that doesn't seem to
>help. I'm a bit stuck - any suggestions? This same command has worked
>fine on other ipcress runs. This one might generate more output than
>the others, but 220KB isn't that much, is it?

You have to read the output.  Otherwise, the process's stdout fills up
and its write attempt eventually blocks, preventing it from continuing.

If you use Twisted's process API instead, the reading will be done for
you (without any of the race conditions that are likely when using the
subprocess module), and things will probably "just work".

Jean-Paul