[Tutor] Subprocess how to use?
Cameron Simpson
cs at zip.com.au
Thu Nov 6 00:57:52 CET 2014
On 05Nov2014 14:05, jarod_v6 at libero.it <jarod_v6 at libero.it> wrote:
>I need to use external program from my scirpt.
>code = "intersectBed -a %s -b /database/Refseq_Gene2.bed -wa -wb|cut -f 4,8|uniq > tmp.tmp1"%("gene5.tmp.bed")
> code2 = "intersectBed -a %s -b /database/Refseq_Gene2.bed -wa -wb|cut -f 4,8|uniq > tmp.tmp2"%("gene3.tmp.bed")
> proc = []
> p = subprocess.Popen(code,shell=True)
> proc.append(p)
> time.sleep(1) # seconds
> p = subprocess.Popen(code2,shell=True)
> proc.append(p)
> time.sleep(1)
> print >>sys.stderr,'-------------------------------------------------------------------------'
> print >>sys.stderr,"Waiting for Star Fusion Annotate to finish running..."
> for p in proc:
> p.communicate()
First remark: as written above, your shell pipelines do not read any input and
all their output goes to your temp files. Therefore using p.communicate() is a
waste of time. Just use p.wait(); there is no I/O to do from the point of view
of your Python code.
Second remark: it is generally a bad idea to use Python's % operator to embed
strings in shell commands (or other "parsed by someone else" text, such as SQL)
because unless you are always very careful about the inserted string (eg
"gene5.tmp.bed") you may insert punctuation, leading to bad behaviour. In your
example code above you are sort of ok, and we can address this as a separate
issue after you have things working.
> What append I don't habve any error however the file are truncated.
You need to find out why.
The shell code _will_ truncate "tmp.tmp1", but then we expect the output of
"uniq" to appear there.
Start by stripping back the shell pipeline.
If you remove the "|uniq", is there anything in "tmp.tmp1"?
If there is not, strip off the "|cut -f 4,8" as well and check again. At that
point we're asking: does intersectBed actually emit any output?
You can test that on the command line directly, and then build up the shell
pipeline to what you have above by hand. When working, _then_ put it into your
Python program.
> So I try to use subprocess.call
>
> p = subprocess.call(shlex.split(code),stdout = subprocess.PIPE, stderr = subprocess.STDOUT, shell = False)
> but with p.comunicate I have this error:
>
> ('\n***** ERROR: Unrecognized parameter: -wb|cut *****\n***** ERROR: Unrecognized parameter: >
[...]
This is a standard mistake. shlex is _not_ a full shell syntax parser. It is a
simple parser that understands basic shell string syntax (quotes etc) but NOT
redirections. It is really a tool for using with your own minilanguages, as you
might if you were writing an interactive program that accepts user commands.
You can't use shlex for shell pipelines or shell redirections.
What is happening above is that all the "words" are being gathered up and
passed to the "intersectBed" command as arguments; no pipelines! And
"intersectBed" complains bitterly as you show.
You can see this in the example you post below:
>shlex.split(code)
>Out[102]:
>['intersectBed',
> '-a',
> 'gene5.tmp.bed',
> '-b',
> '/home/maurizio/database/Refseq_Gene2.bed',
> '-wa',
> '-wb|cut',
> '-f',
> '4,8|uniq',
> '>',
> 'tmp.tmp1']
>
> So what can I do? which system do you suggest to my Problem?
First, forget about shlex. It does not do what you want.
Then there are three approaches:
1: Fix up your shell pipeline. You original code should essentially work: there
is just something wrong with your pipeline. Strip it back until you can see the
error.
2: Construct the pipeline using Popen. This is complicated for someone new to
Popen and subprocess. Essentially you would make each pipeline as 3 distinct
Popen calls, using stdout=PIPE and attached that to the stdin of the next
Popen.
3: The "cut" and "uniq" commands are very simple operations, and easy write
directly in Python. Use Popen to dispatch _just_ "intersectBed" with
stdout=PIPE. Then read from the pipe and do the cut and uniq yourself.
A completely untested and incomplete example of option 3 might look like this:
P = Popen([ "intersectBed", ...other arguments... ], stdout=PIPE, shell-=False)
old_fields = []
for line in P.stdout:
fields = line.split() # read a line from "intersectBed" as words
cut_fields = fields[3:8] # get columns 4..8 (counting from zero, not one)
if cut_fields != old_fields: # this does what "uniq" does
print cut_fields
old_fields = cut_fields
P.wait() # finished reading output, wait for process and tidy up
Hopefully this suggests some ways forward.
Cheers,
Cameron Simpson <cs at zip.com.au>
More information about the Tutor
mailing list