Tuning a select() loop for os.popen3()
Christopher DeMarco
cmd at alephant.net
Fri Dec 30 14:27:16 EST 2005
Hi all...
I've written a class to provide an interface to popen; I've included
the actual select() loop below. I'm finding that "sometimes" popen'd
processes take "a really long time" to complete and "other times" I
get incomplete stdout.
E.g:
- on boxA ffmpeg returns in ~25s; on boxB (comparable hardware,
identical OS) ~5m.
- ``ls'' on a directory with 15 nodes returns full stdout; ``ls -R''
on that same directory (with ~32K nodes beneath) stops after
4097KB of output.
The code in question is running on Linux 2.6.x; no cross-platform
portability desired. popen'd commands will never be interactive; I
just wanna read stdin/stdout and perhaps feed a one-shot string via
stdin.
Here's the relevent code (stripped of comments and various OO
setup/output stuff):
# # ## ### ##### ######## ############# #####################
# cut here
def run(self):
import os, select, syslog
(_stdin, _stdout, _stderr) = os.popen3(self.command)
stdoutChunks = []; stderrChunks = []
readList = [_stdout, _stderr];
if self.stdinString is not "": writeList = [_stdin]
else: writeList = []
readStderr = False; readStdout = False
i = 0
while True:
i += 1
(r, w, x) = select.select(readList, writeList, [], 1)
read = ""
if self.stdinString is not "":
if w:
bytesWritten = os.write(_stdin.fileno(), self.stdinString)
writeList.remove(_stdin)
_stdin.close()
continue
if r:
if _stderr in r:
readStderr = True
read = os.read(_stderr.fileno(), 16384)
if read: stderrChunks.append(read)
else: readList.remove(_stderr)
continue
elif _stdout in r:
readStdout = True
read = os.read(_stdout.fileno(), 16384)
if read:
stdoutChunks.append(read)
syslog.syslog("Command instance read %d from stdout" % len(read))
else: readList.remove(_stdout)
continue
else:
if \
(readStderr and self.dieOnStderr) \
or \
readStdout:
syslog.syslog("Command instance finished")
break
return
# cut here
# # ## ### ##### ######## ############# #####################
Tweaking (a) the os.read() buffer size and (b) the select() timeout
and testing with ``ls -R'' on a directory with ~ 32K nodes beneath, I
find the following trends:
1. With a very small os.read() buffer, I get full stdout, but running
time is rather long. Running time increases as select() timeout
increases.
2. With a very large os.read() buffer, I get incomplete stdout (but
running time is *very* fast). As select() timeout increases, I get
better and better results - with a select() timeout of 0.2 I seem to
get reliably full stdout.
The values used in the code I've pasted above - large buffer, large
select() timeout - seem to perform "well enough"; none of the
previously described problems manifest. However, ``ls -lR /'' (way
more than 32K nodes) "sometimes" gives incomplete stdout.
My first question, then, is paranoid: I've run all these benchmarks
because the application using this code saw a HUGE performance hit
when we started using popen'd commands which generated "lots of"
output.
Is there anything wrong with the logic in my code?!
Will I see severe performance degradation (or worse, incomplete
stdout/stderr) as system variables change (e.g. system load increases,
popen'd program changes, popen'd program increases workload, etc.)?
Next question - how do I tune the select() timeout and the os.read()
buffer correctly? Is it *really* per- command, per- system, per-
phase-of-moon voodoo? Is there a Reccommended Setup for such a
select() loop?
Thanks in advance, for insight as well as for tolerating my
long-windedness...
--
Christopher DeMarco <cmd at alephant.net>
Alephant Systems (http://alephant.net)
PGP public key at http://pgp.alephant.net
+1-412-708-9660
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 196 bytes
Desc: Digital signature
URL: <http://mail.python.org/pipermail/python-list/attachments/20051230/2a287415/attachment.sig>
More information about the Python-list
mailing list