How to read large amounts of output via popen

Fri Aug 6 08:44:48 EDT 2010

On Fri, 06 Aug 2010 02:06:29 -0700, loial wrote:

> I need to read a large amount of data that is being returned in
> standard output by a shell script I am calling.
> 
> (I think the script should really be writing to a file but I have no
> control over that)

If the script is writing to stdout, you get to decide whether its stdout
is a pipe, file, tty, etc.

> Currently I have the following code. It seeems to work, however I
> suspect this may not work with large amounts of standard output.

> process=subprocess.Popen(['myscript', 'param1'],
> shell=False,stdout=subprocess.PIPE,stderr=subprocess.PIPE)
> 
> cmdoutput=process.communicate()

It's certainly not the best way to read large amounts of output.
Unfortunately, better solutions get complicated when you need to read more
than one of stdout and stderr, or if you also need to write to stdin.

If you only need stdout, you can just read from process.stdout in a loop.
You can leave stderr going to wherever the script's stderr goes (e.g. the
terminal), or redirect it to a file.

If you really do need both stdout and stderr, then you either need to
enable non-blocking I/O, or use a separate thread for each stream, or
redirect at least one of them to a file.

FWIW, Popen.communicate() uses non-blocking I/O on Unix and separate
threads on Windows (the standard library doesn't include a mechanism to
enable non-blocking I/O on Windows).

> What is the best way to read a large amount of data from standard
> output and write to a file?

For this case, the best way is to just redirect stdout to a file, rather
than passing it through the script, i.e.:

	outfile = open('outputfile', 'w')
	process = subprocess.call(..., stdout = outfile)
	outfile.close()