[Tutor] Subprocess how to use?

Thu Nov 6 00:57:52 CET 2014

On 05Nov2014 14:05, jarod_v6 at libero.it <jarod_v6 at libero.it> wrote:
>I need to use external program from my scirpt.
>code = "intersectBed -a %s -b /database/Refseq_Gene2.bed  -wa -wb|cut -f 4,8|uniq > tmp.tmp1"%("gene5.tmp.bed")
>    code2 = "intersectBed -a %s -b /database/Refseq_Gene2.bed -wa -wb|cut -f 4,8|uniq > tmp.tmp2"%("gene3.tmp.bed")
>    proc = []
>    p = subprocess.Popen(code,shell=True)
>    proc.append(p)
>    time.sleep(1) # seconds
>    p = subprocess.Popen(code2,shell=True)
>    proc.append(p)
>    time.sleep(1)
>    print >>sys.stderr,'-------------------------------------------------------------------------'
>    print >>sys.stderr,"Waiting for Star Fusion Annotate to finish running..."
>    for p in proc:
>        p.communicate()

First remark: as written above, your shell pipelines do not read any input and 
all their output goes to your temp files. Therefore using p.communicate() is a 
waste of time. Just use p.wait(); there is no I/O to do from the point of view 
of your Python code.

Second remark: it is generally a bad idea to use Python's % operator to embed 
strings in shell commands (or other "parsed by someone else" text, such as SQL) 
because unless you are always very careful about the inserted string (eg 
"gene5.tmp.bed") you may insert punctuation, leading to bad behaviour. In your 
example code above you are sort of ok, and we can address this as a separate 
issue after you have things working.

>    What append I don't habve any error however the file are truncated.

You need to find out why.

The shell code _will_ truncate "tmp.tmp1", but then we expect the output of 
"uniq" to appear there.

Start by stripping back the shell pipeline.

If you remove the "|uniq", is there anything in "tmp.tmp1"?

If there is not, strip off the "|cut -f 4,8" as well and check again. At that 
point we're asking: does intersectBed actually emit any output?

You can test that on the command line directly, and then build up the shell 
pipeline to what you have above by hand. When working, _then_ put it into your 
Python program.

>    So I try to use subprocess.call
>
>     p = subprocess.call(shlex.split(code),stdout = subprocess.PIPE, stderr = subprocess.STDOUT, shell = False)
>     but with p.comunicate I have this error:
>
>     ('\n***** ERROR: Unrecognized parameter: -wb|cut *****\n***** ERROR: Unrecognized parameter: >
[...]

This is a standard mistake. shlex is  _not_ a full shell syntax parser. It is a 
simple parser that understands basic shell string syntax (quotes etc) but NOT 
redirections. It is really a tool for using with your own minilanguages, as you 
might if you were writing an interactive program that accepts user commands.

You can't use shlex for shell pipelines or shell redirections.

What is happening above is that all the "words" are being gathered up and 
passed to the "intersectBed" command as arguments; no pipelines! And 
"intersectBed" complains bitterly as you show.

You can see this in the example you post below:

>shlex.split(code)
>Out[102]:
>['intersectBed',
> '-a',
> 'gene5.tmp.bed',
> '-b',
> '/home/maurizio/database/Refseq_Gene2.bed',
> '-wa',
> '-wb|cut',
> '-f',
> '4,8|uniq',
> '>',
> 'tmp.tmp1']
>
> So what can I do? which system do you suggest to my Problem?

First, forget about shlex. It does not do what you want.

Then there are three approaches:

1: Fix up your shell pipeline. You original code should essentially work: there 
is just something wrong with your pipeline. Strip it back until you can see the 
error.

2: Construct the pipeline using Popen. This is complicated for someone new to 
Popen and subprocess. Essentially you would make each pipeline as 3 distinct 
Popen calls, using stdout=PIPE and attached that to the stdin of the next 
Popen.

3: The "cut" and "uniq" commands are very simple operations, and easy write 
directly in Python. Use Popen to dispatch _just_ "intersectBed" with 
stdout=PIPE. Then read from the pipe and do the cut and uniq yourself.

A completely untested and incomplete example of option 3 might look like this:

  P = Popen([ "intersectBed", ...other arguments... ], stdout=PIPE, shell-=False)
  old_fields = []
  for line in P.stdout:
    fields = line.split()   # read a line from "intersectBed" as words
    cut_fields = fields[3:8] # get columns 4..8 (counting from zero, not one)
    if cut_fields != old_fields:    # this does what "uniq" does
      print cut_fields
      old_fields = cut_fields
  P.wait()                  # finished reading output, wait for process and tidy up

Hopefully this suggests some ways forward.

Cheers,
Cameron Simpson <cs at zip.com.au>