On Wed, 10 Nov 2004, John P Speno wrote:
Hi, sorry for the delayed response.
> While using subprocess (aka popen5), I came across one potential gotcha. I've had
> exceptions ending like this:
>
> File "test.py", line 5, in test
> cmd = popen5.Popen(args, stdout=PIPE)
> File "popen5.py", line 577, in __init__
> data = os.read(errpipe_read, 1048576) # Exceptions limited to 1 MB
> OSError: [Errno 4] Interrupted system call
>
> (on Solaris 9)
>
> Would it make sense for subprocess to use a more robust read() function
> which can handle these cases, i.e. when the parent's read on the pipe
> to the child's stderr is interrupted by a system call, and returns EINTR?
> I imagine it could catch EINTR and EAGAIN and retry the failed read().
I assume you are using signals in your application? The os.read above is
not the only system call that can fail with EINTR. subprocess.py is full
of other system calls that can fail, and I suspect that many other Python
modules are as well.
I've made a patch (attached) to subprocess.py (and test_subprocess.py)
that should guard against EINTR, but I haven't committed it yet. It's
quite large.
Are Python modules supposed to handle EINTR? Why not let the C code handle
this? Or, perhaps the signal module should provide a sigaction function,
so that users can use SA_RESTART.
Index: subprocess.py
===================================================================
RCS file: /cvsroot/python/python/dist/src/Lib/subprocess.py,v
retrieving revision 1.8
diff -u -r1.8 subprocess.py
--- subprocess.py 7 Nov 2004 14:30:34 -0000 1.8
+++ subprocess.py 17 Nov 2004 19:42:30 -0000
@@ -888,6 +888,50 @@
pass
+ def _read_no_intr(self, fd, buffersize):
+ """Like os.read, but retries on EINTR"""
+ while True:
+ try:
+ return os.read(fd, buffersize)
+ except OSError, e:
+ if e.errno == errno.EINTR:
+ continue
+ else:
+ raise
+
+
+ def _read_all(self, fd, buffersize):
+ """Like os.read, but retries on EINTR, and reads until EOF"""
+ all = ""
+ while True:
+ data = self._read_no_intr(fd, buffersize)
+ all += data
+ if data == "":
+ return all
+
+
+ def _write_no_intr(self, fd, s):
+ """Like os.write, but retries on EINTR"""
+ while True:
+ try:
+ return os.write(fd, s)
+ except OSError, e:
+ if e.errno == errno.EINTR:
+ continue
+ else:
+ raise
+
+ def _waitpid_no_intr(self, pid, options):
+ """Like os.waitpid, but retries on EINTR"""
+ while True:
+ try:
+ return os.waitpid(pid, options)
+ except OSError, e:
+ if e.errno == errno.EINTR:
+ continue
+ else:
+ raise
+
def _execute_child(self, args, executable, preexec_fn, close_fds,
cwd, env, universal_newlines,
startupinfo, creationflags, shell,
@@ -963,7 +1007,7 @@
exc_value,
tb)
exc_value.child_traceback = ''.join(exc_lines)
- os.write(errpipe_write, pickle.dumps(exc_value))
+ self._write_no_intr(errpipe_write, pickle.dumps(exc_value))
# This exitcode won't be reported to applications, so it
# really doesn't matter what we return.
@@ -979,7 +1023,7 @@
os.close(errwrite)
# Wait for exec to fail or succeed; possibly raising exception
- data = os.read(errpipe_read, 1048576) # Exceptions limited to 1 MB
+ data = self._read_all(errpipe_read, 1048576) # Exceptions limited to 1 MB
os.close(errpipe_read)
if data != "":
child_exception = pickle.loads(data)
@@ -1003,7 +1047,7 @@
attribute."""
if self.returncode == None:
try:
- pid, sts = os.waitpid(self.pid, os.WNOHANG)
+ pid, sts = self._waitpid_no_intr(self.pid, os.WNOHANG)
if pid == self.pid:
self._handle_exitstatus(sts)
except os.error:
@@ -1015,7 +1059,7 @@
"""Wait for child process to terminate. Returns returncode
attribute."""
if self.returncode == None:
- pid, sts = os.waitpid(self.pid, 0)
+ pid, sts = self._waitpid_no_intr(self.pid, 0)
self._handle_exitstatus(sts)
return self.returncode
@@ -1049,27 +1093,33 @@
stderr = []
while read_set or write_set:
- rlist, wlist, xlist = select.select(read_set, write_set, [])
+ try:
+ rlist, wlist, xlist = select.select(read_set, write_set, [])
+ except select.error, e:
+ if e[0] == errno.EINTR:
+ continue
+ else:
+ raise
if self.stdin in wlist:
# When select has indicated that the file is writable,
# we can write up to PIPE_BUF bytes without risk
# blocking. POSIX defines PIPE_BUF >= 512
- bytes_written = os.write(self.stdin.fileno(), input[:512])
+ bytes_written = self._write_no_intr(self.stdin.fileno(), input[:512])
input = input[bytes_written:]
if not input:
self.stdin.close()
write_set.remove(self.stdin)
if self.stdout in rlist:
- data = os.read(self.stdout.fileno(), 1024)
+ data = self._read_no_intr(self.stdout.fileno(), 1024)
if data == "":
self.stdout.close()
read_set.remove(self.stdout)
stdout.append(data)
if self.stderr in rlist:
- data = os.read(self.stderr.fileno(), 1024)
+ data = self._read_no_intr(self.stderr.fileno(), 1024)
if data == "":
self.stderr.close()
read_set.remove(self.stderr)
Index: test/test_subprocess.py
===================================================================
RCS file: /cvsroot/python/python/dist/src/Lib/test/test_subprocess.py,v
retrieving revision 1.14
diff -u -r1.14 test_subprocess.py
--- test/test_subprocess.py 12 Nov 2004 15:51:48 -0000 1.14
+++ test/test_subprocess.py 17 Nov 2004 19:42:30 -0000
@@ -7,6 +7,7 @@
import tempfile
import time
import re
+import errno
mswindows = (sys.platform == "win32")
@@ -35,6 +36,16 @@
fname = tempfile.mktemp()
return os.open(fname, os.O_RDWR|os.O_CREAT), fname
+ def read_no_intr(self, obj):
+ while True:
+ try:
+ return obj.read()
+ except IOError, e:
+ if e.errno == errno.EINTR:
+ continue
+ else:
+ raise
+
#
# Generic tests
#
@@ -123,7 +134,7 @@
p = subprocess.Popen([sys.executable, "-c",
'import sys; sys.stdout.write("orange")'],
stdout=subprocess.PIPE)
- self.assertEqual(p.stdout.read(), "orange")
+ self.assertEqual(self.read_no_intr(p.stdout), "orange")
def test_stdout_filedes(self):
# stdout is set to open file descriptor
@@ -151,7 +162,7 @@
p = subprocess.Popen([sys.executable, "-c",
'import sys; sys.stderr.write("strawberry")'],
stderr=subprocess.PIPE)
- self.assertEqual(remove_stderr_debug_decorations(p.stderr.read()),
+ self.assertEqual(remove_stderr_debug_decorations(self.read_no_intr(p.stderr)),
"strawberry")
def test_stderr_filedes(self):
@@ -186,7 +197,7 @@
'sys.stderr.write("orange")'],
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
- output = p.stdout.read()
+ output = self.read_no_intr(p.stdout)
stripped = remove_stderr_debug_decorations(output)
self.assertEqual(stripped, "appleorange")
@@ -220,7 +231,7 @@
stdout=subprocess.PIPE,
cwd=tmpdir)
normcase = os.path.normcase
- self.assertEqual(normcase(p.stdout.read()), normcase(tmpdir))
+ self.assertEqual(normcase(self.read_no_intr(p.stdout)), normcase(tmpdir))
def test_env(self):
newenv = os.environ.copy()
@@ -230,7 +241,7 @@
'sys.stdout.write(os.getenv("FRUIT"))'],
stdout=subprocess.PIPE,
env=newenv)
- self.assertEqual(p.stdout.read(), "orange")
+ self.assertEqual(self.read_no_intr(p.stdout), "orange")
def test_communicate(self):
p = subprocess.Popen([sys.executable, "-c",
@@ -305,7 +316,8 @@
'sys.stdout.write("\\nline6");'],
stdout=subprocess.PIPE,
universal_newlines=1)
- stdout = p.stdout.read()
+
+ stdout = self.read_no_intr(p.stdout)
if hasattr(open, 'newlines'):
# Interpreter with universal newline support
self.assertEqual(stdout,
@@ -343,7 +355,7 @@
def test_no_leaking(self):
# Make sure we leak no resources
- max_handles = 1026 # too much for most UNIX systems
+ max_handles = 10 # too much for most UNIX systems
if mswindows:
max_handles = 65 # a full test is too slow on Windows
for i in range(max_handles):
@@ -424,7 +436,7 @@
'sys.stdout.write(os.getenv("FRUIT"))'],
stdout=subprocess.PIPE,
preexec_fn=lambda: os.putenv("FRUIT", "apple"))
- self.assertEqual(p.stdout.read(), "apple")
+ self.assertEqual(self.read_no_intr(p.stdout), "apple")
def test_args_string(self):
# args is a string
@@ -457,7 +469,7 @@
p = subprocess.Popen(["echo $FRUIT"], shell=1,
stdout=subprocess.PIPE,
env=newenv)
- self.assertEqual(p.stdout.read().strip(), "apple")
+ self.assertEqual(self.read_no_intr(p.stdout).strip(), "apple")
def test_shell_string(self):
# Run command through the shell (string)
@@ -466,7 +478,7 @@
p = subprocess.Popen("echo $FRUIT", shell=1,
stdout=subprocess.PIPE,
env=newenv)
- self.assertEqual(p.stdout.read().strip(), "apple")
+ self.assertEqual(self.read_no_intr(p.stdout).strip(), "apple")
def test_call_string(self):
# call() function with string argument on UNIX
@@ -525,7 +537,7 @@
p = subprocess.Popen(["set"], shell=1,
stdout=subprocess.PIPE,
env=newenv)
- self.assertNotEqual(p.stdout.read().find("physalis"), -1)
+ self.assertNotEqual(self.read_no_intr(p.stdout).find("physalis"), -1)
def test_shell_string(self):
# Run command through the shell (string)
@@ -534,7 +546,7 @@
p = subprocess.Popen("set", shell=1,
stdout=subprocess.PIPE,
env=newenv)
- self.assertNotEqual(p.stdout.read().find("physalis"), -1)
+ self.assertNotEqual(self.read_no_intr(p.stdout).find("physalis"), -1)
def test_call_string(self):
# call() function with string argument on Windows
/Peter Åstrand <astrand(a)lysator.liu.se>
I've been thinking about this a lot, but haven't made much
progess. Here's a brain dump.
I've been thinking about integrating PEP 325 (Resource-Release Support
for Generators) into the for-loop code, so that you could replace
the_lock.acquire()
try:
BODY
finally:
the_lock.release()
with
for dummy in synchronized(the_lock):
BODY
or perhaps even (making "for VAR" optional in the for-loop syntax)
with
in synchronized(the_lock):
BODY
Then synchronized() could be written cleanly as follows:
def synchronized(lock):
lock.acquire()
try:
yield None
finally:
lock.release()
But then every for-loop would have to contain an extra try-finally
clause; the translation of
for VAR in EXPR:
BODY
would become
__it = iter(EXPR)
try:
while True:
try:
VAR = __it.next()
except StopIteration:
break
BODY
finally:
if hasattr(__it, "close"):
__it.close()
which I don't particularly like: most for-loops DON'T need this, since
they don't use a generator but some other form of iterator, or even if
they use a generator, not all generators have a try/finally loop. But
the bytecode compiler can't know that, so it will always have to
generate this code. It also changes the semantics of using a
generator in a for-loop slightly: if you break out of the for-loop
before the generator is exhausted you will still get the close() call.
It's also a bit funny to see this approach used with the only other
use case for try/finally we've looked at, which requires passing a
variable into the block: the "with_file" use case. We now can write
with_file as a nice and clean generator:
def with_file(filename):
f = open(filename)
try:
yield f
finally:
f.close()
but the use looks very odd because it is syntactically a for-loop but
there's only one iteration:
for f in with_file("/etc/passwd"):
for line in f:
print line[:line.find(":")]
Seeing this example makes me cringe -- why two nested for loops to
loop over the lines of one file???
So I think that this is probably not the right thing to pursue, and we
might be better off with something along the lines of PEP 310. The
authors of PEP 310 agree; under Open Issues they wrote:
There are some simiralities in concept between 'with ...' blocks
and generators, which have led to proposals that for loops could
implement the with block functionality[3]. While neat on some
levels, we think that for loops should stick to being loops.
(Footnote [3] references the tread that originated PEP 325.)
Perhaps the most important lesson we've learned in this thread is that
the 'with' keyword proposed in PEP 310 is redundant -- the syntax
could just be
[VAR '=']* EXPR ':'
BODY
IOW the regular assignment / expression statement gets an optional
colon-plus-suite at the end.
So now let's assume we accept PEP 310 with this change. Does this
leave any use cases for anonymous blocks uncovered? Ruby's each()
pattern is covered by generators; personally I prefer Python's
for var in seq: ...
over Ruby's much-touted
seq.each() {|var| ...}
The try/finally use case is covered by PEP 310. (If you want to
combine this with a for-loop in a single operation, you'll need PEP
325.)
The use cases where the block actually returns a value are probably
callbacks for things like sort() or map(); I have to admit that I'd
rather keep lambda for these (and use named functions for longer
blocks) than introduce an anonymous block syntax that can return
values! I also note that if you *already* have a comparison function,
Ruby's Array sort method doesn't let you pass it in as a function
argument; you have to give it a block that calls the comparison
function, because blocks are not the same as callables (and I'm not
sure that Ruby even *has* callables -- everything seems to be a
block).
My tentative conclusion remains: Python doesn't need Ruby blocks.
Brian Sabbey ought to come up with more examples rather than arguments
why his preferred syntax and semantics are best.
--Guido van Rossum (home page: http://www.python.org/~guido/)
How about, instead of trying to emphasize how different a
block-statement is from a for-loop, we emphasize their similarity?
A regular old loop over a sequence or iterable is written as:
for VAR in EXPR:
BLOCK
A variation on this with somewhat different semantics swaps the keywords:
in EXPR for VAR:
BLOCK
If you don't need the variable, you can leave the "for VAR" part out:
in EXPR:
BLOCK
Too cute? :-)
--
--Guido van Rossum (home page: http://www.python.org/~guido/)
I don't know whether it's true for all the PEP 340 use cases, but the all the
current examples would read very naturally if the block-template could be
specified in an extended try statement:
> 1. A template for ensuring that a lock, acquired at the start of a
> block, is released when the block is left:
try with_lock(myLock):
# Code here executes with myLock held. The lock is
# guaranteed to be released when the block is left (even
# if by an uncaught exception).
> 2. A template for opening a file that ensures the file is closed
> when the block is left:
try opening("/etc/passwd") as f:
for line in f:
print line.rstrip()
>
> 3. A template for committing or rolling back a database
> transaction:
>
try transaction(mydb):
> 4. A template that tries something up to n times:
>
try auto_retry(3):
f = urllib.urlopen("http://python.org/peps/pep-0340.html")
print f.read()
> 5. It is possible to nest blocks and combine templates:
try with_lock(myLock):
try opening("/etc/passwd") as f:
for line in f:
print line.rstrip()
Michael
I haven't heard back from Greg Stein, Jim Fulton, or Paul Prescod.
If anyone can get in touch with them, that would be great.
I suspect that Jim may want to keep the commit privileges active
and that Paul and Greg are done with commits for the time being.
Raymond Hettinger
Brian Sabbey:
> It is possible to implement thunks without them creating their own
> frame. They can reuse the frame of the surrounding function ...
> The implementation just needs to take care
> to save and restore members of the frame that get clobbered when the
> thunk is running.
Michael Hudson:
> Woo. That's cute.
It *sounds* horrendous, but is actually pretty reasonable.
Conceptually, a thunk replaces a suite in the caller.
Most frame members are intended to be shared, and changes
should be visible -- so they don't have to (and shouldn't) be restored.
The only members that need special attention are (f_code, f_lasti)
and possibly (f_blockstack, f_iblock).
(f_code, f_lasti) would need to be replaced with a stack of pairs.
Finishing a code string would mean popping this stack, rather
than popping the whole frame.
Since a completed suite leaves the blockstack where it started,
(f_blockstack, f_iblock) *can* be ignored, though debugging and
CO_MAXBLOCKS both *suggest* replacing the pair with a stack of
pairs.
-jJ
Michael Spencer:
> I don't know whether it's true for all the PEP 340 use cases, but the all the
> current examples would read very naturally if the block-template could be
> specified in an extended try statement:
>> 1. A template for ensuring that a lock, acquired at the start of a
>> block, is released when the block is left:
> try with_lock(myLock):
> # Code here executes with myLock held. The lock is
> # guaranteed to be released when the block is left (even
> # if by an uncaught exception).
So we would have
try ... finally, try ... except, and try (no close).
It works for me, and should be backwards-compatible.
The cases where it doesn't work as well are
(1) You want to insert several different suites.
But the anonymous yield syntax doesn't work well for that either.
(That is one of the arguments for thunks instead of generator abuse.)
(2) You really do want to loop over the suite. Try doesn't imply a loop.
But this is a *good* thing. Resources are not loops, and you can always
make the loop explicit as iteration over the resource
def opener(file):
f=open(file)
try:
yield f
finally:
f.close()
try opener(file) as f:
for line in f:
process(line)
Nick Coghlan:
> Python offers two variants on the basic iterative loop.
> "for NAME in EXPR:" skips the finalisation step. At loop
> completion, a well-behaved iterator may still contain additional values.
> "for NAME from EXPR:" enforces finalisation of the iterator.
> ... At loop completion, a well-behaved [finalizing] iterator is
> always completely exhausted.
(nitpick):
"from" isn't really different from "in". Perhaps
for NAME inall EXPR:
for NAME draining EXPR:
for NAME finalizing EXPR: # too hard to spell, because of s/z?
(substance):
"finalized or not" is a very useful distinction, but I'm not sure it
is something the user should have to worry about. Realistically,
most of my loops intend to drain the iterator (which the compiler
knows because I have no "break:". Regardless of whether I
use a break, I still want the iterator cleaned up if it is drained.
The only thing this second loop form does is set a flag saying
"No, I won't continue -- and I happen to know that no one else
ever will either, even if they do have a reference that prevents
garbage collection. I'm *sure* they won't use it."
That strikes me as a dangerous thing to get in the habit of saying.
Why not just agressively run the finalization on both forms when the
reference count permits?
> This form supports block management operations,
And this seems unrelated to finalization. I understand that as an
implementation detail, you need to define the finalizers somehow.
But the decision to aggressively finalize (in some manner) and
desire to pass a block (that could be for finalization) seem like
orthogonal issues.
-jJ
Guido van Rossum:
> -- but it's more efficient, since calling yield doesn't create a frame.
Neither should a thunk.
> The other problem with thunks is that once we think of them as the
> anonymous functions they are, we're pretty much forced to say that a
> return statement in a thunk returns from the thunk rather than from
> the containing function.
Why should a thunk be a function? We already have first class
functions. What we're missing is a way to pass around a suite.
def foo(a):
if a > 4:
b = a
c = process(a) # thunk line 1
print a # thunk line 2
return # thunk line 3
else:
a.something()
We don't have a good way to package up "c = process(a); print a; return"
The return should exit the whole function, not just (part of) the if clause.
Greg:
>> I'd like to reconsider a thunk implementation. It
>> would be a lot simpler, doing just what is required
>> without any jiggery pokery with exceptions and
>> break/continue/return statements. It would be easy
>> to explain what it does and why it's useful.
> I don't know. In order to obtain the required local variable sharing
> between the thunk and the containing function I believe that every
> local variable used or set in the thunk would have to become a 'cell'
> (our mechanism for sharing variables between nested scopes).
Cells only work if you have a complete set of names at compile-time.
Your own resource-example added "item" to the namespace inside
a block. If you don't know which blocks could be used with a pattern,
cells are out.
That said, the compiler code is already two-pass. Once to find names,
and another time to resolve them. This just means that for thunks
(and functions that call them) the adjustment will be to LOAD_NAME
instead of getting a LOAD_FAST index.
-jJ
Guido van Rossum:
> One [of many separate ideas in PEP 340] is turning generators
> into more general coroutines: continue EXPR passes the expression
> to the iterator's next() method ...
I would have been very happy with that a week ago. Seeing the
specific implementation changed my mind.
The caller shouldn't know what state the generator is in, so the
passed-in-message will be the same regardless of which yield
accepts it. Unless I have a single-yield generator, this means
I end up writing boilerplate code to accept and process the arg
at each yield. I don't want more boilerplate.
> Even without a block-statement, these two changes make yield look a
> lot like invoking a thunk
Though it feels backwards to me; yield is returning control to something
that already had to coordinate the thunks itself.
-jJ