bash-style pipes in python?

Dan Stromberg - Datallegro dstromberg at datallegro.com
Thu Jul 12 21:57:18 CEST 2007


On Wed, 11 Jul 2007 20:55:48 -0700, faulkner wrote:

> On Jul 11, 8:56 pm, Dan Stromberg - Datallegro
> <dstromb... at datallegro.com> wrote:
>> I'm constantly flipping back and forth between bash and python.
>>
>> Sometimes, I'll start a program in one, and end up recoding in the
>> other, or including a bunch of python inside my bash scripts, or snippets
>> of bash in my python.
>>
>> But what if python had more of the power of bash-style pipes?  I might not
>> need to flip back and forth so much.  I could code almost entirely in python.
>>
>> The kind of thing I do over and over in bash looks like:
>>
>>         #!/usr/bin/env bash
>>
>>         # exit on errors, like python. Exit on undefind variables, like python.
>>         set -eu
>>
>>         # give the true/false value of the last false command in a pipeline
>>         # not the true/false value of the lat command in the pipeline - like
>>         # nothing I've seen
>>         set -o pipefail
>>
>>         # save output in "output", but only echo it to the screen if the command fails
>>         if ! output=$(foo | bar 2>&1)
>>         then
>>                 echo "$0: foo | bar failed" 1>&2
>>                 echo "$output" 1>&2
>>                 exit 1
>>         fi
>>
>> Sometimes I use $PIPESTATUS too, but not that much.
>>
>> I'm aware that python has a variety of pipe handling support in its
>> standard library.
>>
>> But is there a similarly-simple way already, in python, of hooking the stdout of
>> process foo to the stdin of process bar, saving the stdout and errors from both
>> in a variable, and still having convenient access to process exit values?
>>
>> Would it be possible to overload | (pipe) in python to have the same behavior as in
>> bash?
>>
>> I could deal with slightly more cumbersome syntax, like:
>>
>>         (stdout, stderrs, exit_status) = proc('foo') | proc('bar')
>>
>> ...if the basic semantics were there.
>>
>> How about it?  Has someone already done this?
> 
> class P(subprocess.Popen):
>     def __or__(self, otherp):
>         otherp.stdin.write(self.stdout.read())
>         otherp.stdin.close()
>         return otherp
>     def __init__(self, cmd, *a, **kw):
>         for s in ['out', 'in', 'err']: kw.setdefault('std' + s, -1)
>         subprocess.Popen.__init__(self, cmd.split(), *a, **kw)
> 
> print (P('cat /etc/fstab') | P('grep x')).stdout.read()
> 
> of course, you don't need to overload __init__ at all, and you can
> return otherp.stdout.read() instead of otherp, and you can make
> __gt__, __lt__ read and write files. unfortunately, you can't really
> fudge &>, >>, |&, or any of the more useful pipes, but you can make
> more extensive use of __or__:
> 
> class Pipe:
>     def __or__(self, other):
>         if isinstance(other, Pipe): return ...
>         elif isinstance(other, P): return ...
>     def __init__(self, pipe_type): ...
> 
> k = Pipe(foo)
> m = Pipe(bar)
> 
> P() |k| P()
> P() |m| P()

This is quite cool.

I have to ask though - isn't this going to read everything from foo until
foo closes its stdout, and then write that result to bar - rather than
doing it a block at a time like a true pipe?

I continued thinking about this last night after I sent my post, and
started wondering if it might be possible to come up with something where
you could freely intermix bash pipes and python generators, by faking the
pipes using generators and some form of concurrent function composition -
perhaps threads or subprocesses for the concurrency.  Of course, that
imposes some extra I/O, CPU and context switch overhead, since you'd have
python shuttling data from foo to bar all the time instead of foo sending
data directly to bar, but the ability to mix python and pipes might be
worth it.

Even better might be to have a choice between intermixable and fast.

Comments anyone?




More information about the Python-list mailing list