Hello everyone!
We have been encountering several deadlocks in a threaded Python
application which calls subprocess.Popen (i.e. fork()) in some of its
threads.
This has occurred on Python 2.4.1 on a 2.4.27 Linux kernel.
Preliminary analysis of the hang shows that the child process blocks
upon entering the execvp function, in which the import_lock is acquired
due to the following line:
def _ execvpe(file, args, env=None):
from errno import ENOENT, ENOTDIR
...
It is known that when forking from a pthreaded application, acquisition
attempts on locks which were already locked by other threads while
fork() was called will deadlock.
Due to these oddities we were wondering if it would be better to extract
the above import line from the execvpe call, to prevent lock
acquisition attempts in such cases.
Another workaround could be re-assigning a new lock to import_lock
(such a thing is done with the global interpreter lock) at PyOS_AfterFork or
pthread_atfork.
We'd appreciate any opinions you might have on the subject.
Thanks in advance,
Yair and Rotem
On Wed, 10 Nov 2004, John P Speno wrote:
Hi, sorry for the delayed response.
> While using subprocess (aka popen5), I came across one potential gotcha. I've had
> exceptions ending like this:
>
> File "test.py", line 5, in test
> cmd = popen5.Popen(args, stdout=PIPE)
> File "popen5.py", line 577, in __init__
> data = os.read(errpipe_read, 1048576) # Exceptions limited to 1 MB
> OSError: [Errno 4] Interrupted system call
>
> (on Solaris 9)
>
> Would it make sense for subprocess to use a more robust read() function
> which can handle these cases, i.e. when the parent's read on the pipe
> to the child's stderr is interrupted by a system call, and returns EINTR?
> I imagine it could catch EINTR and EAGAIN and retry the failed read().
I assume you are using signals in your application? The os.read above is
not the only system call that can fail with EINTR. subprocess.py is full
of other system calls that can fail, and I suspect that many other Python
modules are as well.
I've made a patch (attached) to subprocess.py (and test_subprocess.py)
that should guard against EINTR, but I haven't committed it yet. It's
quite large.
Are Python modules supposed to handle EINTR? Why not let the C code handle
this? Or, perhaps the signal module should provide a sigaction function,
so that users can use SA_RESTART.
Index: subprocess.py
===================================================================
RCS file: /cvsroot/python/python/dist/src/Lib/subprocess.py,v
retrieving revision 1.8
diff -u -r1.8 subprocess.py
--- subprocess.py 7 Nov 2004 14:30:34 -0000 1.8
+++ subprocess.py 17 Nov 2004 19:42:30 -0000
@@ -888,6 +888,50 @@
pass
+ def _read_no_intr(self, fd, buffersize):
+ """Like os.read, but retries on EINTR"""
+ while True:
+ try:
+ return os.read(fd, buffersize)
+ except OSError, e:
+ if e.errno == errno.EINTR:
+ continue
+ else:
+ raise
+
+
+ def _read_all(self, fd, buffersize):
+ """Like os.read, but retries on EINTR, and reads until EOF"""
+ all = ""
+ while True:
+ data = self._read_no_intr(fd, buffersize)
+ all += data
+ if data == "":
+ return all
+
+
+ def _write_no_intr(self, fd, s):
+ """Like os.write, but retries on EINTR"""
+ while True:
+ try:
+ return os.write(fd, s)
+ except OSError, e:
+ if e.errno == errno.EINTR:
+ continue
+ else:
+ raise
+
+ def _waitpid_no_intr(self, pid, options):
+ """Like os.waitpid, but retries on EINTR"""
+ while True:
+ try:
+ return os.waitpid(pid, options)
+ except OSError, e:
+ if e.errno == errno.EINTR:
+ continue
+ else:
+ raise
+
def _execute_child(self, args, executable, preexec_fn, close_fds,
cwd, env, universal_newlines,
startupinfo, creationflags, shell,
@@ -963,7 +1007,7 @@
exc_value,
tb)
exc_value.child_traceback = ''.join(exc_lines)
- os.write(errpipe_write, pickle.dumps(exc_value))
+ self._write_no_intr(errpipe_write, pickle.dumps(exc_value))
# This exitcode won't be reported to applications, so it
# really doesn't matter what we return.
@@ -979,7 +1023,7 @@
os.close(errwrite)
# Wait for exec to fail or succeed; possibly raising exception
- data = os.read(errpipe_read, 1048576) # Exceptions limited to 1 MB
+ data = self._read_all(errpipe_read, 1048576) # Exceptions limited to 1 MB
os.close(errpipe_read)
if data != "":
child_exception = pickle.loads(data)
@@ -1003,7 +1047,7 @@
attribute."""
if self.returncode == None:
try:
- pid, sts = os.waitpid(self.pid, os.WNOHANG)
+ pid, sts = self._waitpid_no_intr(self.pid, os.WNOHANG)
if pid == self.pid:
self._handle_exitstatus(sts)
except os.error:
@@ -1015,7 +1059,7 @@
"""Wait for child process to terminate. Returns returncode
attribute."""
if self.returncode == None:
- pid, sts = os.waitpid(self.pid, 0)
+ pid, sts = self._waitpid_no_intr(self.pid, 0)
self._handle_exitstatus(sts)
return self.returncode
@@ -1049,27 +1093,33 @@
stderr = []
while read_set or write_set:
- rlist, wlist, xlist = select.select(read_set, write_set, [])
+ try:
+ rlist, wlist, xlist = select.select(read_set, write_set, [])
+ except select.error, e:
+ if e[0] == errno.EINTR:
+ continue
+ else:
+ raise
if self.stdin in wlist:
# When select has indicated that the file is writable,
# we can write up to PIPE_BUF bytes without risk
# blocking. POSIX defines PIPE_BUF >= 512
- bytes_written = os.write(self.stdin.fileno(), input[:512])
+ bytes_written = self._write_no_intr(self.stdin.fileno(), input[:512])
input = input[bytes_written:]
if not input:
self.stdin.close()
write_set.remove(self.stdin)
if self.stdout in rlist:
- data = os.read(self.stdout.fileno(), 1024)
+ data = self._read_no_intr(self.stdout.fileno(), 1024)
if data == "":
self.stdout.close()
read_set.remove(self.stdout)
stdout.append(data)
if self.stderr in rlist:
- data = os.read(self.stderr.fileno(), 1024)
+ data = self._read_no_intr(self.stderr.fileno(), 1024)
if data == "":
self.stderr.close()
read_set.remove(self.stderr)
Index: test/test_subprocess.py
===================================================================
RCS file: /cvsroot/python/python/dist/src/Lib/test/test_subprocess.py,v
retrieving revision 1.14
diff -u -r1.14 test_subprocess.py
--- test/test_subprocess.py 12 Nov 2004 15:51:48 -0000 1.14
+++ test/test_subprocess.py 17 Nov 2004 19:42:30 -0000
@@ -7,6 +7,7 @@
import tempfile
import time
import re
+import errno
mswindows = (sys.platform == "win32")
@@ -35,6 +36,16 @@
fname = tempfile.mktemp()
return os.open(fname, os.O_RDWR|os.O_CREAT), fname
+ def read_no_intr(self, obj):
+ while True:
+ try:
+ return obj.read()
+ except IOError, e:
+ if e.errno == errno.EINTR:
+ continue
+ else:
+ raise
+
#
# Generic tests
#
@@ -123,7 +134,7 @@
p = subprocess.Popen([sys.executable, "-c",
'import sys; sys.stdout.write("orange")'],
stdout=subprocess.PIPE)
- self.assertEqual(p.stdout.read(), "orange")
+ self.assertEqual(self.read_no_intr(p.stdout), "orange")
def test_stdout_filedes(self):
# stdout is set to open file descriptor
@@ -151,7 +162,7 @@
p = subprocess.Popen([sys.executable, "-c",
'import sys; sys.stderr.write("strawberry")'],
stderr=subprocess.PIPE)
- self.assertEqual(remove_stderr_debug_decorations(p.stderr.read()),
+ self.assertEqual(remove_stderr_debug_decorations(self.read_no_intr(p.stderr)),
"strawberry")
def test_stderr_filedes(self):
@@ -186,7 +197,7 @@
'sys.stderr.write("orange")'],
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
- output = p.stdout.read()
+ output = self.read_no_intr(p.stdout)
stripped = remove_stderr_debug_decorations(output)
self.assertEqual(stripped, "appleorange")
@@ -220,7 +231,7 @@
stdout=subprocess.PIPE,
cwd=tmpdir)
normcase = os.path.normcase
- self.assertEqual(normcase(p.stdout.read()), normcase(tmpdir))
+ self.assertEqual(normcase(self.read_no_intr(p.stdout)), normcase(tmpdir))
def test_env(self):
newenv = os.environ.copy()
@@ -230,7 +241,7 @@
'sys.stdout.write(os.getenv("FRUIT"))'],
stdout=subprocess.PIPE,
env=newenv)
- self.assertEqual(p.stdout.read(), "orange")
+ self.assertEqual(self.read_no_intr(p.stdout), "orange")
def test_communicate(self):
p = subprocess.Popen([sys.executable, "-c",
@@ -305,7 +316,8 @@
'sys.stdout.write("\\nline6");'],
stdout=subprocess.PIPE,
universal_newlines=1)
- stdout = p.stdout.read()
+
+ stdout = self.read_no_intr(p.stdout)
if hasattr(open, 'newlines'):
# Interpreter with universal newline support
self.assertEqual(stdout,
@@ -343,7 +355,7 @@
def test_no_leaking(self):
# Make sure we leak no resources
- max_handles = 1026 # too much for most UNIX systems
+ max_handles = 10 # too much for most UNIX systems
if mswindows:
max_handles = 65 # a full test is too slow on Windows
for i in range(max_handles):
@@ -424,7 +436,7 @@
'sys.stdout.write(os.getenv("FRUIT"))'],
stdout=subprocess.PIPE,
preexec_fn=lambda: os.putenv("FRUIT", "apple"))
- self.assertEqual(p.stdout.read(), "apple")
+ self.assertEqual(self.read_no_intr(p.stdout), "apple")
def test_args_string(self):
# args is a string
@@ -457,7 +469,7 @@
p = subprocess.Popen(["echo $FRUIT"], shell=1,
stdout=subprocess.PIPE,
env=newenv)
- self.assertEqual(p.stdout.read().strip(), "apple")
+ self.assertEqual(self.read_no_intr(p.stdout).strip(), "apple")
def test_shell_string(self):
# Run command through the shell (string)
@@ -466,7 +478,7 @@
p = subprocess.Popen("echo $FRUIT", shell=1,
stdout=subprocess.PIPE,
env=newenv)
- self.assertEqual(p.stdout.read().strip(), "apple")
+ self.assertEqual(self.read_no_intr(p.stdout).strip(), "apple")
def test_call_string(self):
# call() function with string argument on UNIX
@@ -525,7 +537,7 @@
p = subprocess.Popen(["set"], shell=1,
stdout=subprocess.PIPE,
env=newenv)
- self.assertNotEqual(p.stdout.read().find("physalis"), -1)
+ self.assertNotEqual(self.read_no_intr(p.stdout).find("physalis"), -1)
def test_shell_string(self):
# Run command through the shell (string)
@@ -534,7 +546,7 @@
p = subprocess.Popen("set", shell=1,
stdout=subprocess.PIPE,
env=newenv)
- self.assertNotEqual(p.stdout.read().find("physalis"), -1)
+ self.assertNotEqual(self.read_no_intr(p.stdout).find("physalis"), -1)
def test_call_string(self):
# call() function with string argument on Windows
/Peter Åstrand <astrand(a)lysator.liu.se>
At 02:47 PM 2/24/2007 -0600, Tarek Ziadé wrote:
>I have created a setup.py file for distirbution and I bumped into
>a small bug when i tried to set my name in the contact field (Tarek Ziadé)
>
>Using string (utf8 file):
>
>setup(
> maintainer="Tarek Ziadé"
>)
>
>leads to:
>
> File ".../lib/python2.5/distutils/command/register.py", line 162, in
> send_metadata
> auth)
> File ".../lib/python2.5/distutils/command/register.py", line 257, in
> post_to_server
> value = unicode(value).encode("utf-8")
>UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10:
>ordinal not in range(128)
>
>
>Using unicode:
>
>setup(
> maintainer=u"Tarek Ziadé"
>)
>
>leads to:
>
> File ".../lib/python2.5/distutils/dist.py", line 1094, in write_pkg_file
> file.write('Author: %s\n' % self.get_contact() )
>UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
>position 18: ordinal not in range(128)
>
>I would propose a patch for this problem but i don't know what would be
>the best input (i guess unicode
> for names)
At 05:45 PM 2/24/2007 -0500, Tres Seaver wrote:
>Don't you still need to tell Python about the encoding of your string
>literals [1] [2] ? E.g.::
That's not the problem, it's that the code that writes the PKG-INFO file
doesn't handle Unicode. See
distutils.dist.DistributionMetadata.write_pkg_info(). It needs to use a
file with encoding support, if it's doing unicode
However, there's currently no standard, as far as I know, for what encoding
the PKG-INFO file should use. Meanwhile, the 'register' command accepts
Unicode, but is broken in handling it.
Essentially, the problem is that Python 2.5 broke this by adding a unicode
*requirement* to the "register" command. Previously, register simply sent
whatever you gave it, and the PKG-INFO writing code still
does. Unfortunately, this means that there is no longer any one value that
you can use for your name that will be accepted by both "register" and
anything that writes a PKG-INFO file.
Both register and write_pkg_info() are arguably broken here, and should be
able to work with either strings or unicode, and degrade gracefully in the
event of non-ASCII characters in a string. (Because even though "register"
is only run by the package's author, users may run other commands that
require a PKG-INFO, so a package prepared using Python <2.5 must still be
usable with Python 2.5 distutils, and Python <2.5 allows 8-bit maintainer
names.)
Unfortunately, this isn't fixable until there's a new 2.5.x release. For
previous Python versions, both register and write_pkg_info() accepted 8-bit
strings and passed them on as-is, so the only workaround for this issue at
the moment is to revert to Python 2.4 or less.
This may seem like it's coming out of left field for a minute, but
bear with me.
There is no doubt that Ruby's success is a concern for anyone who
sees it as diminishing Python's status. One of the reasons for
Ruby's success is certainly the notion (originally advocated by Bruce
Tate, if I'm not mistaken) that it is the "next Java" -- the language
and environment that mainstream Java developers are, or will, look to
as a natural next step.
One thing that would help Python in this "debate" (or, perhaps simply
put it in the running, at least as a "next Java" candidate) would be
if Python had an easier migration path for Java developers that
currently rely upon various third-party libraries. The wealth of
third-party libraries available for Java has always been one of its
great strengths. Ergo, if Python had an easy-to-use, recommended way
to use those libraries within the Python environment, that would be a
significant advantage to present to Java developers and those who
would choose Ruby over Java. Platform compatibility is always a huge
motivator for those looking to migrate or upgrade.
In that vein, I would point to JPype (http://jpype.sourceforge.net).
JPype is a module that gives "python programs full access to java
class libraries". My suggestion would be to either:
(a) include JPype in the standard library, or barring that,
(b) make a very strong push to support JPype
(a) might be difficult or cumbersome technically, as JPype does need
to build against Java headers, which may or may not be possible given
the way that Python is distributed, etc.
However, (b) is very feasible. I can't really say what "supporting
JPype" means exactly -- maybe GvR and/or other heavyweights in the
Python community make public statements regarding its existence and
functionality, maybe JPype gets a strong mention or placement on
python.org....all those details are obviously not up to me, and I
don't know the workings of the "official" Python organizations enough
to make serious suggestions.
Regardless of the form of support, I think raising people's awareness
of JPype and what it adds to the Python environment would be a Good
Thing (tm).
For our part, we've used JPype to make PDFTextStream (our previously
Java-only PDF text extraction library) available and supported for
Python. You can read some about it here:
http://snowtide.com/PDFTextStream.Python
And I've blogged about how PDFTextStream.Python came about, and how
we worked with Steve Ménard, the maintainer of JPype, to make it all
happen (watch out for this URL wrapping):
http://blog.snowtide.com/2006/08/21/working-together-pythonjava-open-
sourcecommercial
Cheers,
Chas Emerick
Founder, Snowtide Informatics Systems
Enterprise-class PDF content extraction
cemerick(a)snowtide.com
http://snowtide.com | +1 413.519.6365
Phillip.eby wrote:
> Author: phillip.eby
> Date: Tue Apr 18 02:59:55 2006
> New Revision: 45510
>
> Modified:
> python/trunk/Lib/pkgutil.py
> python/trunk/Lib/pydoc.py
> Log:
> Second phase of refactoring for runpy, pkgutil, pydoc, and setuptools
> to share common PEP 302 support code, as described here:
>
> http://mail.python.org/pipermail/python-dev/2006-April/063724.html
Shouldn't this new module be named "pkglib" to be in line with
the naming scheme used for all the other utility modules, e.g. httplib,
imaplib, poplib, etc. ?
> pydoc now supports PEP 302 importers, by way of utility functions in
> pkgutil, such as 'walk_packages()'. It will properly document
> modules that are in zip files, and is backward compatible to Python
> 2.3 (setuptools installs for Python <2.5 will bundle it so pydoc
> doesn't break when used with eggs.)
Are you saying that the installation of setuptools in Python 2.3
and 2.4 will then overwrite the standard pydoc included with
those versions ?
I think that's the wrong way to go if not made an explicit
option in the installation process or a separate installation
altogether.
I bothered by the fact that installing setuptools actually changes
the standard Python installation by either overriding stdlib modules
or monkey-patching them at setuptools import time.
> What has not changed is that pydoc command line options do not support
> zip paths or other importer paths, and the webserver index does not
> support sys.meta_path. Those are probably okay as limitations.
>
> Tasks remaining: write docs and Misc/NEWS for pkgutil/pydoc changes,
> and update setuptools to use pkgutil wherever possible, then add it
> to the stdlib.
Add setuptools to the stdlib ? I'm still missing the PEP for this
along with the needed discussion touching among other things,
the change of the distutils standard "python setup.py install"
to install an egg instead of a site package.
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Apr 18 2006)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
Python allows arbitrary sequences after * in calls, but an expression
following ** must be a (subclass of) dict. I believe * and ** should
be treated similarly and since f(*UserList(..)) is valid,
f(**UserDict(..)) should be valid as well.
Of course, I can work around this limitation by writing
f(**dict(UserDict(..))), but this is unnatural given that I don't have
to do f(*tuple(UserList(..))).
I have submitted a patch on SF that makes Python accept arbitrary
mappings after **.
See
<https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1686487&group_…>.
I'm interested in how builtins could be more efficient. I've read over
some of the PEPs having to do with making global variables more
efficient (search for "global"):
http://www.python.org/doc/essays/pepparade.html
But I think the problem can be simplified by focusing strictly on
builtins.
One of my assumptions is that only a small fractions of modules override
the default builtins with something like:
import mybuiltins
__builtins__ = mybuiltins
As you probably know each access of a builtin requires two hash table
lookups. First, the builtin is not found in the list of globals. It is
then found in the list of builtins.
Why not have a means of referencing the default builtins with some sort
of index the way the LOAD_FAST op code currently works? In other words,
by default each module gets the default set of builtins indexed (where
the index indexes into an array) in a certain order. The version stored
in the pyc file would be bumped each time the set of default builtins
is changed.
I don't have very strong feelings whether things like True = (1 == 1)
would be a syntax error, but assigning to a builtin could just do the
equivalent of STORE_FAST. I also don't have very strong feelings about
whether the array of default builtins would be shared between modules.
To simulate the current behavior where attempting to assign to builtin
actually alters that module's global hashtable a separate array of
builtins could be used for each module.
As to assigning to __builtins__ (like I mentioned at the beginning of
this post) perhaps it could assign to the builtin array for those items
that have a name that matches a default builtin (such as "True" or
"len"). Those items that don't match a default builtin would just
create global variables.
Perhaps what I'm suggesting isn't feasible for reasons that have already
been discussed. But it seems like it should be possible to make "while
True" as efficient as "while 1".
--
-----------------------------------------------------------------------
| Steven Elliott | selliott4(a)austin.rr.com |
-----------------------------------------------------------------------
The lib ref claims that minidom supports DOM Level 1. Does anyone
know what parts of Level 2 are not implemented? I wasn't able to find
anything offhand. It seems to be more a matter of what's not
documented, or what's not covered by the regression tests.
So. I'd be happy to do some diffing between the implementation,
documentation, tests, and the Recommendation, and submit patches for
whatever needs it. If anyone thinks that's worthwhile. Anyone?
-j
urllib2.py, after receiving an HTTP response, decides if it was an error
and raises an Exception, or it just returns the info.
For example, you make ``urllib2.urlopen("http://www.google.com")``. If
you receive 200, it's ok; if you receive 500, you get an exception
raised.
How it decides? Function HTTPErrorProcessor, line 490, actually says:
class HTTPErrorProcessor(BaseHandler):
...
if code not in (200, 206):
# it prepares an error response
...
Why only 200 and 206? A coworker of mine found this (he was receiving
202, "Accepted").
In RFC 2616 (http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html) it
says about codes "2xx"...
This class of status code indicates that the client's request was
successfully received, understood, and accepted.
I know it's no difficult to work this around (you have to catch all the
exceptions, and check for the code), but I was wondering the reasoning
of this.
IMHO, "2xx" should not raise an exception. If you also think it's a bug,
I can fix it.
Regards,
--
. Facundo
.
Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/
Has anyone found a way of persuading distutils to
pass MacOSX -framework options properly when linking?
Using extra_link_args doesn't work, because they need
to be at the beginning of the command line to work
properly, but extra_link_args gets put at the end.
And extra_compile_args is only used when compiling,
not linking.
--
Greg