Hello everyone!
We have been encountering several deadlocks in a threaded Python
application which calls subprocess.Popen (i.e. fork()) in some of its
threads.
This has occurred on Python 2.4.1 on a 2.4.27 Linux kernel.
Preliminary analysis of the hang shows that the child process blocks
upon entering the execvp function, in which the import_lock is acquired
due to the following line:
def _ execvpe(file, args, env=None):
from errno import ENOENT, ENOTDIR
...
It is known that when forking from a pthreaded application, acquisition
attempts on locks which were already locked by other threads while
fork() was called will deadlock.
Due to these oddities we were wondering if it would be better to extract
the above import line from the execvpe call, to prevent lock
acquisition attempts in such cases.
Another workaround could be re-assigning a new lock to import_lock
(such a thing is done with the global interpreter lock) at PyOS_AfterFork or
pthread_atfork.
We'd appreciate any opinions you might have on the subject.
Thanks in advance,
Yair and Rotem
On Wed, 10 Nov 2004, John P Speno wrote:
Hi, sorry for the delayed response.
> While using subprocess (aka popen5), I came across one potential gotcha. I've had
> exceptions ending like this:
>
> File "test.py", line 5, in test
> cmd = popen5.Popen(args, stdout=PIPE)
> File "popen5.py", line 577, in __init__
> data = os.read(errpipe_read, 1048576) # Exceptions limited to 1 MB
> OSError: [Errno 4] Interrupted system call
>
> (on Solaris 9)
>
> Would it make sense for subprocess to use a more robust read() function
> which can handle these cases, i.e. when the parent's read on the pipe
> to the child's stderr is interrupted by a system call, and returns EINTR?
> I imagine it could catch EINTR and EAGAIN and retry the failed read().
I assume you are using signals in your application? The os.read above is
not the only system call that can fail with EINTR. subprocess.py is full
of other system calls that can fail, and I suspect that many other Python
modules are as well.
I've made a patch (attached) to subprocess.py (and test_subprocess.py)
that should guard against EINTR, but I haven't committed it yet. It's
quite large.
Are Python modules supposed to handle EINTR? Why not let the C code handle
this? Or, perhaps the signal module should provide a sigaction function,
so that users can use SA_RESTART.
Index: subprocess.py
===================================================================
RCS file: /cvsroot/python/python/dist/src/Lib/subprocess.py,v
retrieving revision 1.8
diff -u -r1.8 subprocess.py
--- subprocess.py 7 Nov 2004 14:30:34 -0000 1.8
+++ subprocess.py 17 Nov 2004 19:42:30 -0000
@@ -888,6 +888,50 @@
pass
+ def _read_no_intr(self, fd, buffersize):
+ """Like os.read, but retries on EINTR"""
+ while True:
+ try:
+ return os.read(fd, buffersize)
+ except OSError, e:
+ if e.errno == errno.EINTR:
+ continue
+ else:
+ raise
+
+
+ def _read_all(self, fd, buffersize):
+ """Like os.read, but retries on EINTR, and reads until EOF"""
+ all = ""
+ while True:
+ data = self._read_no_intr(fd, buffersize)
+ all += data
+ if data == "":
+ return all
+
+
+ def _write_no_intr(self, fd, s):
+ """Like os.write, but retries on EINTR"""
+ while True:
+ try:
+ return os.write(fd, s)
+ except OSError, e:
+ if e.errno == errno.EINTR:
+ continue
+ else:
+ raise
+
+ def _waitpid_no_intr(self, pid, options):
+ """Like os.waitpid, but retries on EINTR"""
+ while True:
+ try:
+ return os.waitpid(pid, options)
+ except OSError, e:
+ if e.errno == errno.EINTR:
+ continue
+ else:
+ raise
+
def _execute_child(self, args, executable, preexec_fn, close_fds,
cwd, env, universal_newlines,
startupinfo, creationflags, shell,
@@ -963,7 +1007,7 @@
exc_value,
tb)
exc_value.child_traceback = ''.join(exc_lines)
- os.write(errpipe_write, pickle.dumps(exc_value))
+ self._write_no_intr(errpipe_write, pickle.dumps(exc_value))
# This exitcode won't be reported to applications, so it
# really doesn't matter what we return.
@@ -979,7 +1023,7 @@
os.close(errwrite)
# Wait for exec to fail or succeed; possibly raising exception
- data = os.read(errpipe_read, 1048576) # Exceptions limited to 1 MB
+ data = self._read_all(errpipe_read, 1048576) # Exceptions limited to 1 MB
os.close(errpipe_read)
if data != "":
child_exception = pickle.loads(data)
@@ -1003,7 +1047,7 @@
attribute."""
if self.returncode == None:
try:
- pid, sts = os.waitpid(self.pid, os.WNOHANG)
+ pid, sts = self._waitpid_no_intr(self.pid, os.WNOHANG)
if pid == self.pid:
self._handle_exitstatus(sts)
except os.error:
@@ -1015,7 +1059,7 @@
"""Wait for child process to terminate. Returns returncode
attribute."""
if self.returncode == None:
- pid, sts = os.waitpid(self.pid, 0)
+ pid, sts = self._waitpid_no_intr(self.pid, 0)
self._handle_exitstatus(sts)
return self.returncode
@@ -1049,27 +1093,33 @@
stderr = []
while read_set or write_set:
- rlist, wlist, xlist = select.select(read_set, write_set, [])
+ try:
+ rlist, wlist, xlist = select.select(read_set, write_set, [])
+ except select.error, e:
+ if e[0] == errno.EINTR:
+ continue
+ else:
+ raise
if self.stdin in wlist:
# When select has indicated that the file is writable,
# we can write up to PIPE_BUF bytes without risk
# blocking. POSIX defines PIPE_BUF >= 512
- bytes_written = os.write(self.stdin.fileno(), input[:512])
+ bytes_written = self._write_no_intr(self.stdin.fileno(), input[:512])
input = input[bytes_written:]
if not input:
self.stdin.close()
write_set.remove(self.stdin)
if self.stdout in rlist:
- data = os.read(self.stdout.fileno(), 1024)
+ data = self._read_no_intr(self.stdout.fileno(), 1024)
if data == "":
self.stdout.close()
read_set.remove(self.stdout)
stdout.append(data)
if self.stderr in rlist:
- data = os.read(self.stderr.fileno(), 1024)
+ data = self._read_no_intr(self.stderr.fileno(), 1024)
if data == "":
self.stderr.close()
read_set.remove(self.stderr)
Index: test/test_subprocess.py
===================================================================
RCS file: /cvsroot/python/python/dist/src/Lib/test/test_subprocess.py,v
retrieving revision 1.14
diff -u -r1.14 test_subprocess.py
--- test/test_subprocess.py 12 Nov 2004 15:51:48 -0000 1.14
+++ test/test_subprocess.py 17 Nov 2004 19:42:30 -0000
@@ -7,6 +7,7 @@
import tempfile
import time
import re
+import errno
mswindows = (sys.platform == "win32")
@@ -35,6 +36,16 @@
fname = tempfile.mktemp()
return os.open(fname, os.O_RDWR|os.O_CREAT), fname
+ def read_no_intr(self, obj):
+ while True:
+ try:
+ return obj.read()
+ except IOError, e:
+ if e.errno == errno.EINTR:
+ continue
+ else:
+ raise
+
#
# Generic tests
#
@@ -123,7 +134,7 @@
p = subprocess.Popen([sys.executable, "-c",
'import sys; sys.stdout.write("orange")'],
stdout=subprocess.PIPE)
- self.assertEqual(p.stdout.read(), "orange")
+ self.assertEqual(self.read_no_intr(p.stdout), "orange")
def test_stdout_filedes(self):
# stdout is set to open file descriptor
@@ -151,7 +162,7 @@
p = subprocess.Popen([sys.executable, "-c",
'import sys; sys.stderr.write("strawberry")'],
stderr=subprocess.PIPE)
- self.assertEqual(remove_stderr_debug_decorations(p.stderr.read()),
+ self.assertEqual(remove_stderr_debug_decorations(self.read_no_intr(p.stderr)),
"strawberry")
def test_stderr_filedes(self):
@@ -186,7 +197,7 @@
'sys.stderr.write("orange")'],
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
- output = p.stdout.read()
+ output = self.read_no_intr(p.stdout)
stripped = remove_stderr_debug_decorations(output)
self.assertEqual(stripped, "appleorange")
@@ -220,7 +231,7 @@
stdout=subprocess.PIPE,
cwd=tmpdir)
normcase = os.path.normcase
- self.assertEqual(normcase(p.stdout.read()), normcase(tmpdir))
+ self.assertEqual(normcase(self.read_no_intr(p.stdout)), normcase(tmpdir))
def test_env(self):
newenv = os.environ.copy()
@@ -230,7 +241,7 @@
'sys.stdout.write(os.getenv("FRUIT"))'],
stdout=subprocess.PIPE,
env=newenv)
- self.assertEqual(p.stdout.read(), "orange")
+ self.assertEqual(self.read_no_intr(p.stdout), "orange")
def test_communicate(self):
p = subprocess.Popen([sys.executable, "-c",
@@ -305,7 +316,8 @@
'sys.stdout.write("\\nline6");'],
stdout=subprocess.PIPE,
universal_newlines=1)
- stdout = p.stdout.read()
+
+ stdout = self.read_no_intr(p.stdout)
if hasattr(open, 'newlines'):
# Interpreter with universal newline support
self.assertEqual(stdout,
@@ -343,7 +355,7 @@
def test_no_leaking(self):
# Make sure we leak no resources
- max_handles = 1026 # too much for most UNIX systems
+ max_handles = 10 # too much for most UNIX systems
if mswindows:
max_handles = 65 # a full test is too slow on Windows
for i in range(max_handles):
@@ -424,7 +436,7 @@
'sys.stdout.write(os.getenv("FRUIT"))'],
stdout=subprocess.PIPE,
preexec_fn=lambda: os.putenv("FRUIT", "apple"))
- self.assertEqual(p.stdout.read(), "apple")
+ self.assertEqual(self.read_no_intr(p.stdout), "apple")
def test_args_string(self):
# args is a string
@@ -457,7 +469,7 @@
p = subprocess.Popen(["echo $FRUIT"], shell=1,
stdout=subprocess.PIPE,
env=newenv)
- self.assertEqual(p.stdout.read().strip(), "apple")
+ self.assertEqual(self.read_no_intr(p.stdout).strip(), "apple")
def test_shell_string(self):
# Run command through the shell (string)
@@ -466,7 +478,7 @@
p = subprocess.Popen("echo $FRUIT", shell=1,
stdout=subprocess.PIPE,
env=newenv)
- self.assertEqual(p.stdout.read().strip(), "apple")
+ self.assertEqual(self.read_no_intr(p.stdout).strip(), "apple")
def test_call_string(self):
# call() function with string argument on UNIX
@@ -525,7 +537,7 @@
p = subprocess.Popen(["set"], shell=1,
stdout=subprocess.PIPE,
env=newenv)
- self.assertNotEqual(p.stdout.read().find("physalis"), -1)
+ self.assertNotEqual(self.read_no_intr(p.stdout).find("physalis"), -1)
def test_shell_string(self):
# Run command through the shell (string)
@@ -534,7 +546,7 @@
p = subprocess.Popen("set", shell=1,
stdout=subprocess.PIPE,
env=newenv)
- self.assertNotEqual(p.stdout.read().find("physalis"), -1)
+ self.assertNotEqual(self.read_no_intr(p.stdout).find("physalis"), -1)
def test_call_string(self):
# call() function with string argument on Windows
/Peter Åstrand <astrand(a)lysator.liu.se>
During the PyCon sprint I tried to make BaseException accept only a single
argument and bind it to BaseException.message . I was successful (see the
p3yk_no_args_on_exc branch), but it was very painful to pull off as anyone
who sat around me the last three days of the sprint will tell you as they
had to listen to me curse incessantly.
Because of the pain that I went through in the transition and thus the
lessons learned, Guido and I discussed it and we think it would be best to
give up on forcing BaseException to accept only a single argument. I think
it is still doable, but requires a multi-release transition period and not
the one that 2.6 -> 3.0 is offering. And so Guido and I plan on deprecating
BaseException.message as its entire point in existence was to help
transition to what we are not going to have happen. =)
Now that means BaseException.message might hold the record for shortest
lived feature as it was only introduced in 2.5 and is now to be deprecated
in 2.6 and removed in 2.7/3.0. =)
Below is PEP 352, revised to reflect the removal of
BaseException.messageand for letting the 'args' attribute stay (along
with suggesting one should
only pass a single argument to BaseException). Basically the interface for
exceptions doesn't really change in 3.0 except for the removal of
__getitem__.
--------------------------------------------------------------------------
PEP: 352
Title: Required Superclass for Exceptions
Version: $Revision: 53592 $
Last-Modified: $Date: 2007-01-28 21:54:11 -0800 (Sun, 28 Jan 2007) $
Author: Brett Cannon <brett(a)python.org>
Guido van Rossum <guido(a)python.org>
Status: Final
Type: Standards Track
Content-Type: text/x-rst
Created: 27-Oct-2005
Post-History:
Abstract
========
In Python 2.4 and before, any (classic) class can be raised as an
exception. The plan for 2.5 was to allow new-style classes, but this
makes the problem worse -- it would mean *any* class (or
instance) can be raised! This is a problem as it prevents any
guarantees from being made about the interface of exceptions.
This PEP proposes introducing a new superclass that all raised objects
must inherit from. Imposing the restriction will allow a standard
interface for exceptions to exist that can be relied upon. It also
leads to a known hierarchy for all exceptions to adhere to.
One might counter that requiring a specific base class for a
particular interface is unPythonic. However, in the specific case of
exceptions there's a good reason (which has generally been agreed to
on python-dev): requiring hierarchy helps code that wants to *catch*
exceptions by making it possible to catch *all* exceptions explicitly
by writing ``except BaseException:`` instead of
``except *:``. [#hierarchy-good]_
Introducing a new superclass for exceptions also gives us the chance
to rearrange the exception hierarchy slightly for the better. As it
currently stands, all exceptions in the built-in namespace inherit
from Exception. This is a problem since this includes two exceptions
(KeyboardInterrupt and SystemExit) that often need to be excepted from
the application's exception handling: the default behavior of shutting
the interpreter down without a traceback is usually more desirable than
whatever the application might do (with the possible exception of
applications that emulate Python's interactive command loop with
``>>>`` prompt). Changing it so that these two exceptions inherit
from the common superclass instead of Exception will make it easy for
people to write ``except`` clauses that are not overreaching and not
catch exceptions that should propagate up.
This PEP is based on previous work done for PEP 348 [#pep348]_.
Requiring a Common Superclass
=============================
This PEP proposes introducing a new exception named BaseException that
is a new-style class and has a single attribute, ``args``. Below
is the code as the exception will work in Python 3.0 (how it will
work in Python 2.x is covered in the `Transition Plan`_ section)::
class BaseException(object):
"""Superclass representing the base of the exception hierarchy.
Provides a 'message' attribute that contains either the single
argument to the constructor or the empty string. This attribute
is used in the string representation for the
exception. This is so that it provides the extra details in the
traceback.
"""
def __init__(self, *args):
"""Set the 'message' attribute'"""
self.args = args
def __str__(self):
"""Return the str of 'message'"""
if len(self.args) == 1:
return str(self.args[0])
else:
return str(self.args)
def __repr__(self):
return "%s(*%s)" % (self.__class__.__name__, repr(self.args))
No restriction is placed upon what may be passed in for ``args``
for backwards-compatibility reasons. In practice, though, only
a single string argument should be used. This keeps the string
representation of the exception to be a useful message about the
exception that is human-readable; this is why the ``__str__`` method
special-cases on length-1 ``args`` value. Including programmatic
information (e.g., an error code number) should be stored as a
separate attribute in a subclass.
The ``raise`` statement will be changed to require that any object
passed to it must inherit from BaseException. This will make sure
that all exceptions fall within a single hierarchy that is anchored at
BaseException [#hierarchy-good]_. This also guarantees a basic
interface that is inherited from BaseException. The change to
``raise`` will be enforced starting in Python 3.0 (see the `Transition
Plan`_ below).
With BaseException being the root of the exception hierarchy,
Exception will now inherit from it.
Exception Hierarchy Changes
===========================
With the exception hierarchy now even more important since it has a
basic root, a change to the existing hierarchy is called for. As it
stands now, if one wants to catch all exceptions that signal an error
*and* do not mean the interpreter should be allowed to exit, you must
specify all but two exceptions specifically in an ``except`` clause
or catch the two exceptions separately and then re-raise them and
have all other exceptions fall through to a bare ``except`` clause::
except (KeyboardInterrupt, SystemExit):
raise
except:
...
That is needlessly explicit. This PEP proposes moving
KeyboardInterrupt and SystemExit to inherit directly from
BaseException.
::
- BaseException
|- KeyboardInterrupt
|- SystemExit
|- Exception
|- (all other current built-in exceptions)
Doing this makes catching Exception more reasonable. It would catch
only exceptions that signify errors. Exceptions that signal that the
interpreter should exit will not be caught and thus be allowed to
propagate up and allow the interpreter to terminate.
KeyboardInterrupt has been moved since users typically expect an
application to exit when the press the interrupt key (usually Ctrl-C).
If people have overly broad ``except`` clauses the expected behaviour
does not occur.
SystemExit has been moved for similar reasons. Since the exception is
raised when ``sys.exit()`` is called the interpreter should normally
be allowed to terminate. Unfortunately overly broad ``except``
clauses can prevent the explicitly requested exit from occurring.
To make sure that people catch Exception most of the time, various
parts of the documentation and tutorials will need to be updated to
strongly suggest that Exception be what programmers want to use. Bare
``except`` clauses or catching BaseException directly should be
discouraged based on the fact that KeyboardInterrupt and SystemExit
almost always should be allowed to propagate up.
Transition Plan
===============
Since semantic changes to Python are being proposed, a transition plan
is needed. The goal is to end up with the new semantics being used in
Python 3.0 while providing a smooth transition for 2.x code. All
deprecations mentioned in the plan will lead to the removal of the
semantics starting in the version following the initial deprecation.
Here is BaseException as implemented in the 2.x series::
class BaseException(object):
"""Superclass representing the base of the exception hierarchy.
The __getitem__ method is provided for backwards-compatibility
and will be deprecated at some point.
"""
def __init__(self, *args):
"""Set the 'args' attribute."""
self.args = args
def __str__(self):
"""Return the str of args[0] or args, depending on length."""
return str(self.args[0]
if len(self.args) <= 1
else self.args)
def __repr__(self):
func_args = repr(self.args) if self.args else "()"
return self.__class__.__name__ + func_args
def __getitem__(self, index):
"""Index into arguments passed in during instantiation.
Provided for backwards-compatibility and will be
deprecated.
"""
return self.args[index]
Deprecation of features in Python 2.9 is optional. This is because it
is not known at this time if Python 2.9 (which is slated to be the
last version in the 2.x series) will actively deprecate features that
will not be in 3.0 . It is conceivable that no deprecation warnings
will be used in 2.9 since there could be such a difference between 2.9
and 3.0 that it would make 2.9 too "noisy" in terms of warnings. Thus
the proposed deprecation warnings for Python 2.9 will be revisited
when development of that version begins to determine if they are still
desired.
* Python 2.5 [done]
- all standard exceptions become new-style classes
- introduce BaseException
- Exception, KeyboardInterrupt, and SystemExit inherit from BaseException
- deprecate raising string exceptions
* Python 2.6
- deprecate catching string exceptions
- deprecate ``message`` attribute (see `Retracted Ideas`_)
* Python 2.7
- deprecate raising exceptions that do not inherit from BaseException
- remove ``message`` attribute
* Python 2.8
- deprecate catching exceptions that do not inherit from BaseException
* Python 2.9
- deprecate ``__getitem__`` (optional)
* Python 3.0 [done]
- drop everything that was deprecated above:
+ string exceptions (both raising and catching)
+ all exceptions must inherit from BaseException
+ drop ``__getitem__``
Retracted Ideas
===============
A previous version of this PEP that was implemented in Python 2.5
included a 'message' attribute on BaseException. Its purpose was to
begin a transition to BaseException accepting only a single argument.
This was to tighten the interface and to force people to use
attributes in subclasses to carry arbitrary information with an
exception instead of cramming it all into ``args``.
Unfortunately, while implementing the removal of the ``args``
attribute in Python 3.0 at the PyCon 2007 sprint
[#pycon2007-sprint-email]_, it was discovered that the transition was
very painful, especially for C extension modules. It was decided that
it would be better to deprecate the ``message`` attribute in
Python 2.6 (and remove in Python 2.7 and Python 3.0) and consider a
more long-term transition strategy in Python 3.0 to remove
multiple-argument support in BaseException in preference of accepting
only a single argument. Thus the introduction of ``message`` and the
original deprecation of ``args`` has been retracted.
References
==========
.. [#pep348] PEP 348 (Exception Reorganization for Python 3.0)
http://www.python.org/peps/pep-0348.html
.. [#hierarchy-good] python-dev Summary for 2004-08-01 through 2004-08-15
http://www.python.org/dev/summary/2004-08-01_2004-08-15.html#an-exception-i…
.. [#SF_1104669] SF patch #1104669 (new-style exceptions)
http://www.python.org/sf/1104669
.. [#pycon2007-sprint-email] python-3000 email ("How far to go with
cleaning up exceptions")
http://mail.python.org/pipermail/python-3000/2007-March/005911.html
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End:
At 02:47 PM 2/24/2007 -0600, Tarek Ziadé wrote:
>I have created a setup.py file for distirbution and I bumped into
>a small bug when i tried to set my name in the contact field (Tarek Ziadé)
>
>Using string (utf8 file):
>
>setup(
> maintainer="Tarek Ziadé"
>)
>
>leads to:
>
> File ".../lib/python2.5/distutils/command/register.py", line 162, in
> send_metadata
> auth)
> File ".../lib/python2.5/distutils/command/register.py", line 257, in
> post_to_server
> value = unicode(value).encode("utf-8")
>UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10:
>ordinal not in range(128)
>
>
>Using unicode:
>
>setup(
> maintainer=u"Tarek Ziadé"
>)
>
>leads to:
>
> File ".../lib/python2.5/distutils/dist.py", line 1094, in write_pkg_file
> file.write('Author: %s\n' % self.get_contact() )
>UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
>position 18: ordinal not in range(128)
>
>I would propose a patch for this problem but i don't know what would be
>the best input (i guess unicode
> for names)
At 05:45 PM 2/24/2007 -0500, Tres Seaver wrote:
>Don't you still need to tell Python about the encoding of your string
>literals [1] [2] ? E.g.::
That's not the problem, it's that the code that writes the PKG-INFO file
doesn't handle Unicode. See
distutils.dist.DistributionMetadata.write_pkg_info(). It needs to use a
file with encoding support, if it's doing unicode
However, there's currently no standard, as far as I know, for what encoding
the PKG-INFO file should use. Meanwhile, the 'register' command accepts
Unicode, but is broken in handling it.
Essentially, the problem is that Python 2.5 broke this by adding a unicode
*requirement* to the "register" command. Previously, register simply sent
whatever you gave it, and the PKG-INFO writing code still
does. Unfortunately, this means that there is no longer any one value that
you can use for your name that will be accepted by both "register" and
anything that writes a PKG-INFO file.
Both register and write_pkg_info() are arguably broken here, and should be
able to work with either strings or unicode, and degrade gracefully in the
event of non-ASCII characters in a string. (Because even though "register"
is only run by the package's author, users may run other commands that
require a PKG-INFO, so a package prepared using Python <2.5 must still be
usable with Python 2.5 distutils, and Python <2.5 allows 8-bit maintainer
names.)
Unfortunately, this isn't fixable until there's a new 2.5.x release. For
previous Python versions, both register and write_pkg_info() accepted 8-bit
strings and passed them on as-is, so the only workaround for this issue at
the moment is to revert to Python 2.4 or less.
This may seem like it's coming out of left field for a minute, but
bear with me.
There is no doubt that Ruby's success is a concern for anyone who
sees it as diminishing Python's status. One of the reasons for
Ruby's success is certainly the notion (originally advocated by Bruce
Tate, if I'm not mistaken) that it is the "next Java" -- the language
and environment that mainstream Java developers are, or will, look to
as a natural next step.
One thing that would help Python in this "debate" (or, perhaps simply
put it in the running, at least as a "next Java" candidate) would be
if Python had an easier migration path for Java developers that
currently rely upon various third-party libraries. The wealth of
third-party libraries available for Java has always been one of its
great strengths. Ergo, if Python had an easy-to-use, recommended way
to use those libraries within the Python environment, that would be a
significant advantage to present to Java developers and those who
would choose Ruby over Java. Platform compatibility is always a huge
motivator for those looking to migrate or upgrade.
In that vein, I would point to JPype (http://jpype.sourceforge.net).
JPype is a module that gives "python programs full access to java
class libraries". My suggestion would be to either:
(a) include JPype in the standard library, or barring that,
(b) make a very strong push to support JPype
(a) might be difficult or cumbersome technically, as JPype does need
to build against Java headers, which may or may not be possible given
the way that Python is distributed, etc.
However, (b) is very feasible. I can't really say what "supporting
JPype" means exactly -- maybe GvR and/or other heavyweights in the
Python community make public statements regarding its existence and
functionality, maybe JPype gets a strong mention or placement on
python.org....all those details are obviously not up to me, and I
don't know the workings of the "official" Python organizations enough
to make serious suggestions.
Regardless of the form of support, I think raising people's awareness
of JPype and what it adds to the Python environment would be a Good
Thing (tm).
For our part, we've used JPype to make PDFTextStream (our previously
Java-only PDF text extraction library) available and supported for
Python. You can read some about it here:
http://snowtide.com/PDFTextStream.Python
And I've blogged about how PDFTextStream.Python came about, and how
we worked with Steve Ménard, the maintainer of JPype, to make it all
happen (watch out for this URL wrapping):
http://blog.snowtide.com/2006/08/21/working-together-pythonjava-open-
sourcecommercial
Cheers,
Chas Emerick
Founder, Snowtide Informatics Systems
Enterprise-class PDF content extraction
cemerick(a)snowtide.com
http://snowtide.com | +1 413.519.6365
Hi,
I noticed lately that quite a few projects are implementing their own
subclasses of `dict` that retain the order of the key/value pairs.
However half of the implementations I came across are not implementing
the whole dict interface which leads to weird bugs, also the performance
of a Python implementation is not that great.
To fight that problem I want to proposed a new class in "collections"
called odict which is a dict that keeps the items sorted, similar to
a PHP array.
The interface would be fully compatible with dict and implemented as
dict subclass. Updates to existing keys does not change the order of
a key but new keys are inserted at the end.
Additionally it would support slicing where a list of key, value tuples
is returned and sort/reverse/index methods that work like their list
equivalents. Index based lookup could work via odict.byindex().
An implementation of that exists as part of the ordereddict implementation
which however goes beyond that and is pretty much a fork of the python
dict[1].
Some reasons why ordered dicts are a useful feature:
- in XML/HTML processing it's often desired to keep the attributes of
an tag ordered during processing. So that input ordering is the
same as the output ordering.
- Form data transmitted via HTTP is usually ordered by the position
of the input/textarea/select field in the HTML document. That
information is currently lost in most Python web applications /
frameworks.
- Eaiser transition of code from Ruby/PHP which have sorted
associative arrays / hashmaps.
- Having an ordered dict in the standard library would allow other
libraries support them. For example a PHP serializer could return
odicts rather then dicts which drops the ordering information.
XML libraries such as etree could add support for it when creating
elements or return attribute dicts.
Regards,
Armin
[1]: http://www.xs4all.nl/~anthon/Python/ordereddict/
Dear fellow Python developers!
Ten minutes ago I raised a concern about speed differences between the
old style % formatting and the new .format() code. Some quick
benchmarking from Benjamin and me showed, that it's even worse than I
expected.
$ ./python -m timeit "'%s'.format('nothing')"
100000 loops, best of 3: 2.63 usec per loop
$ ./python -m timeit "'%s' % 'nothing'"
10000000 loops, best of 3: 0.188 usec per loop
$ ./python -m timeit "'some text with {0}'.format('nothing')"
100000 loops, best of 3: 4.34 usec per loop
$ ./python -m timeit "'some text with %s' % 'nothing'"
100000 loops, best of 3: 2.04 usec per loop
$ ./python -m timeit "'some text with {0} {1}'.format('nothing', 'more')"
100000 loops, best of 3: 6.77 usec per loop
$ ./python -m timeit "'some text with %s %s' % ('nothing', 'more')"
100000 loops, best of 3: 2.22 usec per loop
As you can clearly see the new .format() code is *much* slower than the
old style % code. I recommend we spend some time on optimizing common
code paths of the new .format() code.
As first step I propose the move the __format__ method to a new type
slot. __format__() is called for every object. My gut feeling says that
a slot method is going to speed up the most common usage
"{0}".format(some_string).
Christian
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I can't find any PEP about detailed 2.6 -> 3000 migration guidelines,
specially in the module (C code) extension area. Yes, I know about the
2to3 tool, but I'm interested in updating my 2.x code in order to keep
(automatic via "2to3") difference between 2.x and 3.x codebase so small
as possible. Also, 2to3 doesn't manage migration for C modules.
Since I need to port bsddb3 to py3k, what I need to know?. Is any
*updated* document out there?.
PS: My plan is keep working in the python side under 2.x, and manage
Python 3.0 via "2to3", for a long time. In the C side, I plan keeping
the same codebase, with conditional compilation. Ideas?.
- --
Jesus Cea Avion _/_/ _/_/_/ _/_/_/
jcea(a)jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/
jabber / xmpp:jcea@jabber.org _/_/ _/_/ _/_/_/_/_/
~ _/_/ _/_/ _/_/ _/_/ _/_/
"Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/
"My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iQCVAwUBSDjqJplgi5GaxT1NAQLP3QP/Y2bWmNzHKsIq23dRz9sxd237TSfzbs5X
UzEQ6Ze9Kv1lRXYRfiGjk37aOeiO4xqoKPfASc/WjMN/TmRkUNZxMQEQWwFYhmCA
aAVqGxNMEX3KCXvRP9MkkZYLPa9RZg8boPTMYCn+uLDW1Ff2Pwj96hFKbWKG0dkJ
RamNp8BFhkM=
=ngK+
-----END PGP SIGNATURE-----
The parser module exports each function and type twice, once with "AST" in
the name, once with "ST". Since "AST" now has a different meaning for
Python code compilation, I propose to deprecate the "ast" variants in 2.6
and remove them in Python 3000.
(Also, all keyword arguments are called "ast", that cannot change in 2.6
but should in 3.0.)
I'm at the moment changing the documentation of parser to only refer to
the "st" variants anymore.
Georg
--
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.
Hi all,
Jesus, apologies that this has taken so long for me to get back to, I've been completely and utterly swamped with client work the past few weeks. However, thanks to a couple of hours spare at Detroit airport yesterday, I was finally able to make some progress on updating the Windows Berkeley DB build to 4.7.25. I've checked in the work I've done so far to branches/tnelson-trunk-bsddb-47-upgrade. One thing I wanted to double check with you is the following change:
Modified: python/branches/tnelson-trunk-bsddb-47-upgrade/Lib/bsddb/test/test_replication.py
==============================================================================
--- python/branches/tnelson-trunk-bsddb-47-upgrade/Lib/bsddb/test/test_replication.py (original)
+++ python/branches/tnelson-trunk-bsddb-47-upgrade/Lib/bsddb/test/test_replication.py Wed Jun 18 06:13:44 2008
@@ -94,7 +94,7 @@
# The timeout is necessary in BDB 4.5, since DB_EVENT_REP_STARTUPDONE
# is not generated if the master has no new transactions.
# This is solved in BDB 4.6 (#15542).
- timeout = time.time()+2
+ timeout = time.time()+10
while (time.time()<timeout) and not (self.confirmed_master and self.client_startupdone) :
time.sleep(0.02)
if db.version() >= (4,6) :
Basically, when using +2, the test failed every so often when running the entire test_bsddb3 suite. I picked 10 arbitrarily; it improves things, but it's still not 100%, I still encounter the following failure every so often:
======================================================================
ERROR: test01_basic_replication (bsddb.test.test_replication.DBReplicationManager)
----------------------------------------------------------------------
Traceback (most recent call last):
File "s:\src\svn+ssh\pythondev@svn.python.org\python\branches\tnelson-trunk-bsddb-47-upgrade\lib\bsddb\test\test_replication.py", line 101, in setUp
self.assertTrue(time.time()<timeout)
AssertionError
Can you comment on this?
Apart from this small issue, the other 311 tests pass on x86 and x64 with flying colours, so nice work, whatever you've been doing ;-)
Regards,
Trent.