[Tutor] pipes and redirecting

Wed May 28 04:35:02 CEST 2014

On 27May2014 21:01, Adam Gold <awg1 at gmx.com> wrote:
>I'm trying to run the following unix command from within Python as
>opposed to calling an external Bash script (the reason being I'm doing
>it multiple times within a for loop which is running through a list):
>
>"dd if=/home/adam/1 bs=4k conv=noerror,notrunc,sync | pbzip2 > 1.img.bz2"

First off, one expedient way to do this is to generate a shell script and pipe 
into "sh" (or "sh -uex", my preferred error sensitive invocation).

   p1 = subprocess.Popen(["sh", "-uex"], stdin=PIPE)
   for num in range(1,11):
     print("dd if=/home/adam/%d bs=4k conv=noerror,notrunc,sync | pbzip2 > %d.img.bz2",
           % (num, num), file=p1.stdin)
   p1.stdin.close()
   p1.wait()

Any quoting issues aside, this is surprisingly useful. Let the shell do what it 
is good it.

And NOTHING you've said here requires using bash. Use "sh" and say "sh", it is 
very portable and bash is rarely needed for most stuff.

However, I gather beyond expediency, you want to know how to assemble pipelines 
using subprocess anyway. So...

>The first thing I do is break it into two assignments (I know this isn't
>strictly necessary but it makes the code easier to deal with):
>
>ddIf = shlex.split("dd if=/home/adam/1 bs=4k conv=noerror,notrunc,sync")
>compress = shlex.split("pbzip2 > /home/adam/1.img.bz2")

This is often worth doing regardless. Longer lines are harder to read.

>I have looked at the docs here (and the equivalent for Python 3)
>https://docs.python.org/2/library/subprocess.html.  I can get a 'simple'
>pipe like the following to work:
>
>p1 = subprocess.Popen(["ps"], stdout=PIPE)
>p2 = subprocess.Popen(["grep", "ssh"], stdin=p1.stdout, stdout=subprocess.PIPE)
>p1.stdout.close()
>output = p2.communicate()[0]

If you don't care about the stdout of p2 (and you don't, based on your 
"dd|pbzip2" example above) and you have left p2's stdout alone so that it goes 
to your normal stdout (eg the terminal) then you don't need to waste time with 
.communicate. I almost never use it myself. As the doco says, prone to 
deadlock. I prefer to just do the right thing explicitly myself, as needed.

>I then try to adapt it to my example:
>
>p1 = subprocess.Popen(ddIf, stdout=subprocess.PIPE)
>p2 = subprocess.Popen(compress, stdin=p1.stdout, stdout=subprocess.PIPE)
>p1.stdout.close()
>output = p2.communicate()[0]
>
>I get the following error:
>
>pbzip2: *ERROR: File [>] NOT found!  Skipping...
>-------------------------------------------
>pbzip2: *ERROR: Input file [/home/adam/1.img.bz2] already has a .bz2
>extension!  Skipping
>
>I think that the '>' redirect needs to be dealt with using the
>subprocess module as well but I can't quite put the pieces together.
>I'd appreciate any guidance.  Thanks.

It is as you expect. Consider what the shell does with:

   pbzip2 > 1.img.bz2

It invokes the command "pbzip2" (no arguments) with its output attached to the 
file "1.img.bz2".

So first up: stay away form "shlex". It does _not_ do what you need.

Shlex knows about shell string quoting. It does not know about redirections.
It is handy for parsing minilanguages on your own concoction where you want to 
be able to quote strings with spaces. It is not a full on shell parser.

So it (may) serve you well for the dd invocation because there are no 
redirections. But for your usage, so would the .split() method on a string, or 
even better: don't you already know the arguments for your "dd"? Just fill them 
out directly rather than backtracking from a string.

However, your recipe is very close. Change:

   p2 = subprocess.Popen(compress, stdin=p1.stdout, stdout=subprocess.PIPE)

into:

   p2 = subprocess.Popen(["pbzip2"], stdin=p1.stdout, stdout=open("1.img.bz2", "w"))

Because p2 is writing to "1.img.bz2" you don't need to much about with 
.communicate either. No output to collect, no input to supply.

See where that takes you.

Cheers,
Cameron Simpson <cs at zip.com.au>