[Tutor] Interacting with stderr

Cameron Simpson cs at zip.com.au
Thu Aug 28 04:59:19 CEST 2014

On 27Aug2014 18:56, Danny Yoo <dyoo at hashcollision.org> wrote:
>> Crude and incomplete and untested example:
>>   from subprocess import Popen, PIPE
>>   P = Popen("avconv ... lots of arguments...", shell=True, stderr=PIPE)
>>   for line in P.stderr:
>>       ... examine the line from stderr ...
>>   # ok, we have read all of stderr now
>>   xit = P.wait()
>>   if xit != 0:
>>       ... command was unsuccessful, complain, maybe abort ...
>The subprocess documentation has a few good examples of pipelines that
>should apply to this scenario.  I'd recommend the original questioner
>look at the documentation here closely, because he or she is using a
>feature of 'subprocess' that is a bit iffy, namely, the use of
>"shell=True".  Try to avoid "shell=True" unless you really have no

Yes, but I was deliberately avoiding that aspect until the OP had their stderr 
issue worked out.

>Rather than construct the pipeline through the shell, do it through
>Python if you can.  See:
>    https://docs.python.org/2/library/subprocess.html#replacing-shell-pipeline

But his use case is not using a shell pipeline, so irrelevant. It just makes 
things more complex for him.

>Also, prefer the use of communicate() rather than wait() in the
>scenario above.  Otherwise, the code is susceptible to PIPEs getting
>overflowed.  See:
>    https://docs.python.org/2/library/subprocess.html#subprocess.Popen.communicate

Again, disagree. In this specific case, disagree strongly.

Firstly, the OP is not feeding stuff to stdin nor collecting stdout.

Secondly, the OP has made it clear that they're not in a position to wait for 
the command to finish; they need to read stderr as it occurs because the run 
time is very long and they need to act earlier than process completion.

Thirdly, I generally consider advice to use .communicate bad advise.

.communicate has many potential issues:

Primarily, it hides all the mechanics. He will learn nothing.

Next, if .call() is not doing what he needs (and it is not), then .communicate 
will not help either.

Also, .communicate() reads in all of stdout and stderr. This has issues. He's 
already suggested stdout has to go somewhere else, and that he needs to read 
stderr in a streaming fashion. Also, by reading in all of stdout and stderr, 
.communicate can consume an unbounded amount of memory: either of these two 
streams may be more than fits in the machine's memory, or more than one wishes 
to use. Particularly in audio/video processing (as he is doing) there is a lot 
of scope for output streams to be very large.

Further, the doco on .communicate does not say _how_ stdout and stderr are 
read. Is one read, and then the other? Are threads spawned and both read in 
parallel? Is there some awful select/epoll based event loop involved? None of 
these things is specified. If one is read and then then other, there is scope 
for deadlock or just arbitrary stallage.

Personally, I pretty much never have a use for .call or .communicate. Using 
.pipe keeps the requirements on me clear and leaves me maximum flexibility to 
handle I/O and process completion as I see fit.

As you can see from my example code, it is hardly difficult to use Popen 
directly in the OP's use case, and arguably better.

Cameron Simpson <cs at zip.com.au>

