Hello everyone!
We have been encountering several deadlocks in a threaded Python
application which calls subprocess.Popen (i.e. fork()) in some of its
threads.
This has occurred on Python 2.4.1 on a 2.4.27 Linux kernel.
Preliminary analysis of the hang shows that the child process blocks
upon entering the execvp function, in which the import_lock is acquired
due to the following line:
def _ execvpe(file, args, env=None):
from errno import ENOENT, ENOTDIR
...
It is known that when forking from a pthreaded application, acquisition
attempts on locks which were already locked by other threads while
fork() was called will deadlock.
Due to these oddities we were wondering if it would be better to extract
the above import line from the execvpe call, to prevent lock
acquisition attempts in such cases.
Another workaround could be re-assigning a new lock to import_lock
(such a thing is done with the global interpreter lock) at PyOS_AfterFork or
pthread_atfork.
We'd appreciate any opinions you might have on the subject.
Thanks in advance,
Yair and Rotem
On Wed, 10 Nov 2004, John P Speno wrote:
Hi, sorry for the delayed response.
> While using subprocess (aka popen5), I came across one potential gotcha. I've had
> exceptions ending like this:
>
> File "test.py", line 5, in test
> cmd = popen5.Popen(args, stdout=PIPE)
> File "popen5.py", line 577, in __init__
> data = os.read(errpipe_read, 1048576) # Exceptions limited to 1 MB
> OSError: [Errno 4] Interrupted system call
>
> (on Solaris 9)
>
> Would it make sense for subprocess to use a more robust read() function
> which can handle these cases, i.e. when the parent's read on the pipe
> to the child's stderr is interrupted by a system call, and returns EINTR?
> I imagine it could catch EINTR and EAGAIN and retry the failed read().
I assume you are using signals in your application? The os.read above is
not the only system call that can fail with EINTR. subprocess.py is full
of other system calls that can fail, and I suspect that many other Python
modules are as well.
I've made a patch (attached) to subprocess.py (and test_subprocess.py)
that should guard against EINTR, but I haven't committed it yet. It's
quite large.
Are Python modules supposed to handle EINTR? Why not let the C code handle
this? Or, perhaps the signal module should provide a sigaction function,
so that users can use SA_RESTART.
Index: subprocess.py
===================================================================
RCS file: /cvsroot/python/python/dist/src/Lib/subprocess.py,v
retrieving revision 1.8
diff -u -r1.8 subprocess.py
--- subprocess.py 7 Nov 2004 14:30:34 -0000 1.8
+++ subprocess.py 17 Nov 2004 19:42:30 -0000
@@ -888,6 +888,50 @@
pass
+ def _read_no_intr(self, fd, buffersize):
+ """Like os.read, but retries on EINTR"""
+ while True:
+ try:
+ return os.read(fd, buffersize)
+ except OSError, e:
+ if e.errno == errno.EINTR:
+ continue
+ else:
+ raise
+
+
+ def _read_all(self, fd, buffersize):
+ """Like os.read, but retries on EINTR, and reads until EOF"""
+ all = ""
+ while True:
+ data = self._read_no_intr(fd, buffersize)
+ all += data
+ if data == "":
+ return all
+
+
+ def _write_no_intr(self, fd, s):
+ """Like os.write, but retries on EINTR"""
+ while True:
+ try:
+ return os.write(fd, s)
+ except OSError, e:
+ if e.errno == errno.EINTR:
+ continue
+ else:
+ raise
+
+ def _waitpid_no_intr(self, pid, options):
+ """Like os.waitpid, but retries on EINTR"""
+ while True:
+ try:
+ return os.waitpid(pid, options)
+ except OSError, e:
+ if e.errno == errno.EINTR:
+ continue
+ else:
+ raise
+
def _execute_child(self, args, executable, preexec_fn, close_fds,
cwd, env, universal_newlines,
startupinfo, creationflags, shell,
@@ -963,7 +1007,7 @@
exc_value,
tb)
exc_value.child_traceback = ''.join(exc_lines)
- os.write(errpipe_write, pickle.dumps(exc_value))
+ self._write_no_intr(errpipe_write, pickle.dumps(exc_value))
# This exitcode won't be reported to applications, so it
# really doesn't matter what we return.
@@ -979,7 +1023,7 @@
os.close(errwrite)
# Wait for exec to fail or succeed; possibly raising exception
- data = os.read(errpipe_read, 1048576) # Exceptions limited to 1 MB
+ data = self._read_all(errpipe_read, 1048576) # Exceptions limited to 1 MB
os.close(errpipe_read)
if data != "":
child_exception = pickle.loads(data)
@@ -1003,7 +1047,7 @@
attribute."""
if self.returncode == None:
try:
- pid, sts = os.waitpid(self.pid, os.WNOHANG)
+ pid, sts = self._waitpid_no_intr(self.pid, os.WNOHANG)
if pid == self.pid:
self._handle_exitstatus(sts)
except os.error:
@@ -1015,7 +1059,7 @@
"""Wait for child process to terminate. Returns returncode
attribute."""
if self.returncode == None:
- pid, sts = os.waitpid(self.pid, 0)
+ pid, sts = self._waitpid_no_intr(self.pid, 0)
self._handle_exitstatus(sts)
return self.returncode
@@ -1049,27 +1093,33 @@
stderr = []
while read_set or write_set:
- rlist, wlist, xlist = select.select(read_set, write_set, [])
+ try:
+ rlist, wlist, xlist = select.select(read_set, write_set, [])
+ except select.error, e:
+ if e[0] == errno.EINTR:
+ continue
+ else:
+ raise
if self.stdin in wlist:
# When select has indicated that the file is writable,
# we can write up to PIPE_BUF bytes without risk
# blocking. POSIX defines PIPE_BUF >= 512
- bytes_written = os.write(self.stdin.fileno(), input[:512])
+ bytes_written = self._write_no_intr(self.stdin.fileno(), input[:512])
input = input[bytes_written:]
if not input:
self.stdin.close()
write_set.remove(self.stdin)
if self.stdout in rlist:
- data = os.read(self.stdout.fileno(), 1024)
+ data = self._read_no_intr(self.stdout.fileno(), 1024)
if data == "":
self.stdout.close()
read_set.remove(self.stdout)
stdout.append(data)
if self.stderr in rlist:
- data = os.read(self.stderr.fileno(), 1024)
+ data = self._read_no_intr(self.stderr.fileno(), 1024)
if data == "":
self.stderr.close()
read_set.remove(self.stderr)
Index: test/test_subprocess.py
===================================================================
RCS file: /cvsroot/python/python/dist/src/Lib/test/test_subprocess.py,v
retrieving revision 1.14
diff -u -r1.14 test_subprocess.py
--- test/test_subprocess.py 12 Nov 2004 15:51:48 -0000 1.14
+++ test/test_subprocess.py 17 Nov 2004 19:42:30 -0000
@@ -7,6 +7,7 @@
import tempfile
import time
import re
+import errno
mswindows = (sys.platform == "win32")
@@ -35,6 +36,16 @@
fname = tempfile.mktemp()
return os.open(fname, os.O_RDWR|os.O_CREAT), fname
+ def read_no_intr(self, obj):
+ while True:
+ try:
+ return obj.read()
+ except IOError, e:
+ if e.errno == errno.EINTR:
+ continue
+ else:
+ raise
+
#
# Generic tests
#
@@ -123,7 +134,7 @@
p = subprocess.Popen([sys.executable, "-c",
'import sys; sys.stdout.write("orange")'],
stdout=subprocess.PIPE)
- self.assertEqual(p.stdout.read(), "orange")
+ self.assertEqual(self.read_no_intr(p.stdout), "orange")
def test_stdout_filedes(self):
# stdout is set to open file descriptor
@@ -151,7 +162,7 @@
p = subprocess.Popen([sys.executable, "-c",
'import sys; sys.stderr.write("strawberry")'],
stderr=subprocess.PIPE)
- self.assertEqual(remove_stderr_debug_decorations(p.stderr.read()),
+ self.assertEqual(remove_stderr_debug_decorations(self.read_no_intr(p.stderr)),
"strawberry")
def test_stderr_filedes(self):
@@ -186,7 +197,7 @@
'sys.stderr.write("orange")'],
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
- output = p.stdout.read()
+ output = self.read_no_intr(p.stdout)
stripped = remove_stderr_debug_decorations(output)
self.assertEqual(stripped, "appleorange")
@@ -220,7 +231,7 @@
stdout=subprocess.PIPE,
cwd=tmpdir)
normcase = os.path.normcase
- self.assertEqual(normcase(p.stdout.read()), normcase(tmpdir))
+ self.assertEqual(normcase(self.read_no_intr(p.stdout)), normcase(tmpdir))
def test_env(self):
newenv = os.environ.copy()
@@ -230,7 +241,7 @@
'sys.stdout.write(os.getenv("FRUIT"))'],
stdout=subprocess.PIPE,
env=newenv)
- self.assertEqual(p.stdout.read(), "orange")
+ self.assertEqual(self.read_no_intr(p.stdout), "orange")
def test_communicate(self):
p = subprocess.Popen([sys.executable, "-c",
@@ -305,7 +316,8 @@
'sys.stdout.write("\\nline6");'],
stdout=subprocess.PIPE,
universal_newlines=1)
- stdout = p.stdout.read()
+
+ stdout = self.read_no_intr(p.stdout)
if hasattr(open, 'newlines'):
# Interpreter with universal newline support
self.assertEqual(stdout,
@@ -343,7 +355,7 @@
def test_no_leaking(self):
# Make sure we leak no resources
- max_handles = 1026 # too much for most UNIX systems
+ max_handles = 10 # too much for most UNIX systems
if mswindows:
max_handles = 65 # a full test is too slow on Windows
for i in range(max_handles):
@@ -424,7 +436,7 @@
'sys.stdout.write(os.getenv("FRUIT"))'],
stdout=subprocess.PIPE,
preexec_fn=lambda: os.putenv("FRUIT", "apple"))
- self.assertEqual(p.stdout.read(), "apple")
+ self.assertEqual(self.read_no_intr(p.stdout), "apple")
def test_args_string(self):
# args is a string
@@ -457,7 +469,7 @@
p = subprocess.Popen(["echo $FRUIT"], shell=1,
stdout=subprocess.PIPE,
env=newenv)
- self.assertEqual(p.stdout.read().strip(), "apple")
+ self.assertEqual(self.read_no_intr(p.stdout).strip(), "apple")
def test_shell_string(self):
# Run command through the shell (string)
@@ -466,7 +478,7 @@
p = subprocess.Popen("echo $FRUIT", shell=1,
stdout=subprocess.PIPE,
env=newenv)
- self.assertEqual(p.stdout.read().strip(), "apple")
+ self.assertEqual(self.read_no_intr(p.stdout).strip(), "apple")
def test_call_string(self):
# call() function with string argument on UNIX
@@ -525,7 +537,7 @@
p = subprocess.Popen(["set"], shell=1,
stdout=subprocess.PIPE,
env=newenv)
- self.assertNotEqual(p.stdout.read().find("physalis"), -1)
+ self.assertNotEqual(self.read_no_intr(p.stdout).find("physalis"), -1)
def test_shell_string(self):
# Run command through the shell (string)
@@ -534,7 +546,7 @@
p = subprocess.Popen("set", shell=1,
stdout=subprocess.PIPE,
env=newenv)
- self.assertNotEqual(p.stdout.read().find("physalis"), -1)
+ self.assertNotEqual(self.read_no_intr(p.stdout).find("physalis"), -1)
def test_call_string(self):
# call() function with string argument on Windows
/Peter Åstrand <astrand(a)lysator.liu.se>
During the PyCon sprint I tried to make BaseException accept only a single
argument and bind it to BaseException.message . I was successful (see the
p3yk_no_args_on_exc branch), but it was very painful to pull off as anyone
who sat around me the last three days of the sprint will tell you as they
had to listen to me curse incessantly.
Because of the pain that I went through in the transition and thus the
lessons learned, Guido and I discussed it and we think it would be best to
give up on forcing BaseException to accept only a single argument. I think
it is still doable, but requires a multi-release transition period and not
the one that 2.6 -> 3.0 is offering. And so Guido and I plan on deprecating
BaseException.message as its entire point in existence was to help
transition to what we are not going to have happen. =)
Now that means BaseException.message might hold the record for shortest
lived feature as it was only introduced in 2.5 and is now to be deprecated
in 2.6 and removed in 2.7/3.0. =)
Below is PEP 352, revised to reflect the removal of
BaseException.messageand for letting the 'args' attribute stay (along
with suggesting one should
only pass a single argument to BaseException). Basically the interface for
exceptions doesn't really change in 3.0 except for the removal of
__getitem__.
--------------------------------------------------------------------------
PEP: 352
Title: Required Superclass for Exceptions
Version: $Revision: 53592 $
Last-Modified: $Date: 2007-01-28 21:54:11 -0800 (Sun, 28 Jan 2007) $
Author: Brett Cannon <brett(a)python.org>
Guido van Rossum <guido(a)python.org>
Status: Final
Type: Standards Track
Content-Type: text/x-rst
Created: 27-Oct-2005
Post-History:
Abstract
========
In Python 2.4 and before, any (classic) class can be raised as an
exception. The plan for 2.5 was to allow new-style classes, but this
makes the problem worse -- it would mean *any* class (or
instance) can be raised! This is a problem as it prevents any
guarantees from being made about the interface of exceptions.
This PEP proposes introducing a new superclass that all raised objects
must inherit from. Imposing the restriction will allow a standard
interface for exceptions to exist that can be relied upon. It also
leads to a known hierarchy for all exceptions to adhere to.
One might counter that requiring a specific base class for a
particular interface is unPythonic. However, in the specific case of
exceptions there's a good reason (which has generally been agreed to
on python-dev): requiring hierarchy helps code that wants to *catch*
exceptions by making it possible to catch *all* exceptions explicitly
by writing ``except BaseException:`` instead of
``except *:``. [#hierarchy-good]_
Introducing a new superclass for exceptions also gives us the chance
to rearrange the exception hierarchy slightly for the better. As it
currently stands, all exceptions in the built-in namespace inherit
from Exception. This is a problem since this includes two exceptions
(KeyboardInterrupt and SystemExit) that often need to be excepted from
the application's exception handling: the default behavior of shutting
the interpreter down without a traceback is usually more desirable than
whatever the application might do (with the possible exception of
applications that emulate Python's interactive command loop with
``>>>`` prompt). Changing it so that these two exceptions inherit
from the common superclass instead of Exception will make it easy for
people to write ``except`` clauses that are not overreaching and not
catch exceptions that should propagate up.
This PEP is based on previous work done for PEP 348 [#pep348]_.
Requiring a Common Superclass
=============================
This PEP proposes introducing a new exception named BaseException that
is a new-style class and has a single attribute, ``args``. Below
is the code as the exception will work in Python 3.0 (how it will
work in Python 2.x is covered in the `Transition Plan`_ section)::
class BaseException(object):
"""Superclass representing the base of the exception hierarchy.
Provides a 'message' attribute that contains either the single
argument to the constructor or the empty string. This attribute
is used in the string representation for the
exception. This is so that it provides the extra details in the
traceback.
"""
def __init__(self, *args):
"""Set the 'message' attribute'"""
self.args = args
def __str__(self):
"""Return the str of 'message'"""
if len(self.args) == 1:
return str(self.args[0])
else:
return str(self.args)
def __repr__(self):
return "%s(*%s)" % (self.__class__.__name__, repr(self.args))
No restriction is placed upon what may be passed in for ``args``
for backwards-compatibility reasons. In practice, though, only
a single string argument should be used. This keeps the string
representation of the exception to be a useful message about the
exception that is human-readable; this is why the ``__str__`` method
special-cases on length-1 ``args`` value. Including programmatic
information (e.g., an error code number) should be stored as a
separate attribute in a subclass.
The ``raise`` statement will be changed to require that any object
passed to it must inherit from BaseException. This will make sure
that all exceptions fall within a single hierarchy that is anchored at
BaseException [#hierarchy-good]_. This also guarantees a basic
interface that is inherited from BaseException. The change to
``raise`` will be enforced starting in Python 3.0 (see the `Transition
Plan`_ below).
With BaseException being the root of the exception hierarchy,
Exception will now inherit from it.
Exception Hierarchy Changes
===========================
With the exception hierarchy now even more important since it has a
basic root, a change to the existing hierarchy is called for. As it
stands now, if one wants to catch all exceptions that signal an error
*and* do not mean the interpreter should be allowed to exit, you must
specify all but two exceptions specifically in an ``except`` clause
or catch the two exceptions separately and then re-raise them and
have all other exceptions fall through to a bare ``except`` clause::
except (KeyboardInterrupt, SystemExit):
raise
except:
...
That is needlessly explicit. This PEP proposes moving
KeyboardInterrupt and SystemExit to inherit directly from
BaseException.
::
- BaseException
|- KeyboardInterrupt
|- SystemExit
|- Exception
|- (all other current built-in exceptions)
Doing this makes catching Exception more reasonable. It would catch
only exceptions that signify errors. Exceptions that signal that the
interpreter should exit will not be caught and thus be allowed to
propagate up and allow the interpreter to terminate.
KeyboardInterrupt has been moved since users typically expect an
application to exit when the press the interrupt key (usually Ctrl-C).
If people have overly broad ``except`` clauses the expected behaviour
does not occur.
SystemExit has been moved for similar reasons. Since the exception is
raised when ``sys.exit()`` is called the interpreter should normally
be allowed to terminate. Unfortunately overly broad ``except``
clauses can prevent the explicitly requested exit from occurring.
To make sure that people catch Exception most of the time, various
parts of the documentation and tutorials will need to be updated to
strongly suggest that Exception be what programmers want to use. Bare
``except`` clauses or catching BaseException directly should be
discouraged based on the fact that KeyboardInterrupt and SystemExit
almost always should be allowed to propagate up.
Transition Plan
===============
Since semantic changes to Python are being proposed, a transition plan
is needed. The goal is to end up with the new semantics being used in
Python 3.0 while providing a smooth transition for 2.x code. All
deprecations mentioned in the plan will lead to the removal of the
semantics starting in the version following the initial deprecation.
Here is BaseException as implemented in the 2.x series::
class BaseException(object):
"""Superclass representing the base of the exception hierarchy.
The __getitem__ method is provided for backwards-compatibility
and will be deprecated at some point.
"""
def __init__(self, *args):
"""Set the 'args' attribute."""
self.args = args
def __str__(self):
"""Return the str of args[0] or args, depending on length."""
return str(self.args[0]
if len(self.args) <= 1
else self.args)
def __repr__(self):
func_args = repr(self.args) if self.args else "()"
return self.__class__.__name__ + func_args
def __getitem__(self, index):
"""Index into arguments passed in during instantiation.
Provided for backwards-compatibility and will be
deprecated.
"""
return self.args[index]
Deprecation of features in Python 2.9 is optional. This is because it
is not known at this time if Python 2.9 (which is slated to be the
last version in the 2.x series) will actively deprecate features that
will not be in 3.0 . It is conceivable that no deprecation warnings
will be used in 2.9 since there could be such a difference between 2.9
and 3.0 that it would make 2.9 too "noisy" in terms of warnings. Thus
the proposed deprecation warnings for Python 2.9 will be revisited
when development of that version begins to determine if they are still
desired.
* Python 2.5 [done]
- all standard exceptions become new-style classes
- introduce BaseException
- Exception, KeyboardInterrupt, and SystemExit inherit from BaseException
- deprecate raising string exceptions
* Python 2.6
- deprecate catching string exceptions
- deprecate ``message`` attribute (see `Retracted Ideas`_)
* Python 2.7
- deprecate raising exceptions that do not inherit from BaseException
- remove ``message`` attribute
* Python 2.8
- deprecate catching exceptions that do not inherit from BaseException
* Python 2.9
- deprecate ``__getitem__`` (optional)
* Python 3.0 [done]
- drop everything that was deprecated above:
+ string exceptions (both raising and catching)
+ all exceptions must inherit from BaseException
+ drop ``__getitem__``
Retracted Ideas
===============
A previous version of this PEP that was implemented in Python 2.5
included a 'message' attribute on BaseException. Its purpose was to
begin a transition to BaseException accepting only a single argument.
This was to tighten the interface and to force people to use
attributes in subclasses to carry arbitrary information with an
exception instead of cramming it all into ``args``.
Unfortunately, while implementing the removal of the ``args``
attribute in Python 3.0 at the PyCon 2007 sprint
[#pycon2007-sprint-email]_, it was discovered that the transition was
very painful, especially for C extension modules. It was decided that
it would be better to deprecate the ``message`` attribute in
Python 2.6 (and remove in Python 2.7 and Python 3.0) and consider a
more long-term transition strategy in Python 3.0 to remove
multiple-argument support in BaseException in preference of accepting
only a single argument. Thus the introduction of ``message`` and the
original deprecation of ``args`` has been retracted.
References
==========
.. [#pep348] PEP 348 (Exception Reorganization for Python 3.0)
http://www.python.org/peps/pep-0348.html
.. [#hierarchy-good] python-dev Summary for 2004-08-01 through 2004-08-15
http://www.python.org/dev/summary/2004-08-01_2004-08-15.html#an-exception-i…
.. [#SF_1104669] SF patch #1104669 (new-style exceptions)
http://www.python.org/sf/1104669
.. [#pycon2007-sprint-email] python-3000 email ("How far to go with
cleaning up exceptions")
http://mail.python.org/pipermail/python-3000/2007-March/005911.html
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End:
At 02:47 PM 2/24/2007 -0600, Tarek Ziadé wrote:
>I have created a setup.py file for distirbution and I bumped into
>a small bug when i tried to set my name in the contact field (Tarek Ziadé)
>
>Using string (utf8 file):
>
>setup(
> maintainer="Tarek Ziadé"
>)
>
>leads to:
>
> File ".../lib/python2.5/distutils/command/register.py", line 162, in
> send_metadata
> auth)
> File ".../lib/python2.5/distutils/command/register.py", line 257, in
> post_to_server
> value = unicode(value).encode("utf-8")
>UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10:
>ordinal not in range(128)
>
>
>Using unicode:
>
>setup(
> maintainer=u"Tarek Ziadé"
>)
>
>leads to:
>
> File ".../lib/python2.5/distutils/dist.py", line 1094, in write_pkg_file
> file.write('Author: %s\n' % self.get_contact() )
>UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
>position 18: ordinal not in range(128)
>
>I would propose a patch for this problem but i don't know what would be
>the best input (i guess unicode
> for names)
At 05:45 PM 2/24/2007 -0500, Tres Seaver wrote:
>Don't you still need to tell Python about the encoding of your string
>literals [1] [2] ? E.g.::
That's not the problem, it's that the code that writes the PKG-INFO file
doesn't handle Unicode. See
distutils.dist.DistributionMetadata.write_pkg_info(). It needs to use a
file with encoding support, if it's doing unicode
However, there's currently no standard, as far as I know, for what encoding
the PKG-INFO file should use. Meanwhile, the 'register' command accepts
Unicode, but is broken in handling it.
Essentially, the problem is that Python 2.5 broke this by adding a unicode
*requirement* to the "register" command. Previously, register simply sent
whatever you gave it, and the PKG-INFO writing code still
does. Unfortunately, this means that there is no longer any one value that
you can use for your name that will be accepted by both "register" and
anything that writes a PKG-INFO file.
Both register and write_pkg_info() are arguably broken here, and should be
able to work with either strings or unicode, and degrade gracefully in the
event of non-ASCII characters in a string. (Because even though "register"
is only run by the package's author, users may run other commands that
require a PKG-INFO, so a package prepared using Python <2.5 must still be
usable with Python 2.5 distutils, and Python <2.5 allows 8-bit maintainer
names.)
Unfortunately, this isn't fixable until there's a new 2.5.x release. For
previous Python versions, both register and write_pkg_info() accepted 8-bit
strings and passed them on as-is, so the only workaround for this issue at
the moment is to revert to Python 2.4 or less.
This may seem like it's coming out of left field for a minute, but
bear with me.
There is no doubt that Ruby's success is a concern for anyone who
sees it as diminishing Python's status. One of the reasons for
Ruby's success is certainly the notion (originally advocated by Bruce
Tate, if I'm not mistaken) that it is the "next Java" -- the language
and environment that mainstream Java developers are, or will, look to
as a natural next step.
One thing that would help Python in this "debate" (or, perhaps simply
put it in the running, at least as a "next Java" candidate) would be
if Python had an easier migration path for Java developers that
currently rely upon various third-party libraries. The wealth of
third-party libraries available for Java has always been one of its
great strengths. Ergo, if Python had an easy-to-use, recommended way
to use those libraries within the Python environment, that would be a
significant advantage to present to Java developers and those who
would choose Ruby over Java. Platform compatibility is always a huge
motivator for those looking to migrate or upgrade.
In that vein, I would point to JPype (http://jpype.sourceforge.net).
JPype is a module that gives "python programs full access to java
class libraries". My suggestion would be to either:
(a) include JPype in the standard library, or barring that,
(b) make a very strong push to support JPype
(a) might be difficult or cumbersome technically, as JPype does need
to build against Java headers, which may or may not be possible given
the way that Python is distributed, etc.
However, (b) is very feasible. I can't really say what "supporting
JPype" means exactly -- maybe GvR and/or other heavyweights in the
Python community make public statements regarding its existence and
functionality, maybe JPype gets a strong mention or placement on
python.org....all those details are obviously not up to me, and I
don't know the workings of the "official" Python organizations enough
to make serious suggestions.
Regardless of the form of support, I think raising people's awareness
of JPype and what it adds to the Python environment would be a Good
Thing (tm).
For our part, we've used JPype to make PDFTextStream (our previously
Java-only PDF text extraction library) available and supported for
Python. You can read some about it here:
http://snowtide.com/PDFTextStream.Python
And I've blogged about how PDFTextStream.Python came about, and how
we worked with Steve Ménard, the maintainer of JPype, to make it all
happen (watch out for this URL wrapping):
http://blog.snowtide.com/2006/08/21/working-together-pythonjava-open-
sourcecommercial
Cheers,
Chas Emerick
Founder, Snowtide Informatics Systems
Enterprise-class PDF content extraction
cemerick(a)snowtide.com
http://snowtide.com | +1 413.519.6365
Hi all,
I hope the cross-post is appropriate.
I've started playing with getting the pywin32 extensions building under
the AMD64 architecture. I started building with Visual Studio 8 (it was
what I had handy) and I struck a few issues relating to the compiler version
that I thought worth sharing.
* In trying to build x64 from a 32bit VS7 (ie, cross-compiling via the
PCBuild directory), the python.exe project fails with:
pythoncore fatal error LNK1112: module machine type 'X86' conflicts with
target machine type 'AMD64'
is this a known issue, or am I doing something wrong?
* The PCBuild8 project files appear to work without modification (I only
tried native compilation here though, not a cross-compile) - however, unlike
the PCBuild directory, they place all binaries in a 'PCBuild8/x64'
directory. While this means that its possible to build for multiple
architectures from the same source tree, it makes life harder for tools like
'distutils' - eg, distutils already knows about the 'PCBuild' directory, but
it knows nothing about either PCBuild8 or PCBuild8/x64.
A number of other build processes also know to look inside a PCBuild
directory (eg, Mozilla), so instead of formalizing PCBuild8, I think we
should merge PCBuild8 into PCBuild. This could mean PCBuild/vs7 and
PCBuild/vs8 directories with the "project" files, but binaries would still
be generated in the 'PCBuild' (or PCBuild/x64) directory. This would mean
the same tree isn't capable of hosting 2 builds from different VS compilers,
but I think that is reasonable (if it's a problem, just have multiple source
directories). I understand that PCBuild8 is not "official", but in the
assumption that future versions of Python will use a compiler later than
VS7, it makes sense to me to clean this up now - what are others opinions on
this?
* Re the x64 directory used by the PCBuild8 process. IMO, it makes sense to
generate x64 binaries to their own directory - my expectation is that
cross-compiling between platforms is a reasonable use-case, and we should
support multiple achitectures for the same compiler version. This would
mean formalizing the x64 directory in both 'PCBuild' and distutils, and
leaving other external build processes to update as they support x64 builds.
Does this make sense? Would this fatally break other scripts used for
packaging (eg, the MSI framework)?
* Wide characters in VS8: PC/pyconfig.h defines PY_UNICODE_TYPE as 'unsigned
short', which corresponds with both 'WCHAR' and 'wchar' in previous compiler
versions. VS8 defines this as wchar_t, which I'm struggling to find a
formal definition for beyond being 2 bytes. My problem is that code which
assumes a 'Py_UNICODE *' could be used in place of a 'WCHAR *' now fails. I
believe the intent on Windows has always been "PyUNICODE == 'native
unicode'" - should PC/pyconfig.h reflect this (ie, should pyconfig.h grow a
version specific definition of PyUNICODE as wchar_t)?
* Finally, as something from left-field which may well take 12 months or
more to pull off - but would there be any interest to moving the Windows
build process to a cygwin environment based on the existing autoconf
scripts? I know a couple of projects are doing this successfully, including
Mozilla, so it has precendent. It does impose a greater burden on people
trying to build on Windows, but I'd suggest that in recent times, many
people who are likely to want to build Python on Windows are already likely
to have a cygwin environment. Simpler mingw builds and nuking MSVC specific
build stuff are among the advantages this would bring. It is not worth
adding this as "yet another windows build option" - so IMO it is only worth
progressing with if it became the "blessed" build process for windows - if
there is support for this, I'll work on it as the opportunity presents
itself...
I'm (obviously) only suggesting we do this on the trunk and am happy to make
all agreed changes - but I welcome all suggestions or critisisms of this
endeavour...
Cheers,
Mark
Phillip.eby wrote:
> Author: phillip.eby
> Date: Tue Apr 18 02:59:55 2006
> New Revision: 45510
>
> Modified:
> python/trunk/Lib/pkgutil.py
> python/trunk/Lib/pydoc.py
> Log:
> Second phase of refactoring for runpy, pkgutil, pydoc, and setuptools
> to share common PEP 302 support code, as described here:
>
> http://mail.python.org/pipermail/python-dev/2006-April/063724.html
Shouldn't this new module be named "pkglib" to be in line with
the naming scheme used for all the other utility modules, e.g. httplib,
imaplib, poplib, etc. ?
> pydoc now supports PEP 302 importers, by way of utility functions in
> pkgutil, such as 'walk_packages()'. It will properly document
> modules that are in zip files, and is backward compatible to Python
> 2.3 (setuptools installs for Python <2.5 will bundle it so pydoc
> doesn't break when used with eggs.)
Are you saying that the installation of setuptools in Python 2.3
and 2.4 will then overwrite the standard pydoc included with
those versions ?
I think that's the wrong way to go if not made an explicit
option in the installation process or a separate installation
altogether.
I bothered by the fact that installing setuptools actually changes
the standard Python installation by either overriding stdlib modules
or monkey-patching them at setuptools import time.
> What has not changed is that pydoc command line options do not support
> zip paths or other importer paths, and the webserver index does not
> support sys.meta_path. Those are probably okay as limitations.
>
> Tasks remaining: write docs and Misc/NEWS for pkgutil/pydoc changes,
> and update setuptools to use pkgutil wherever possible, then add it
> to the stdlib.
Add setuptools to the stdlib ? I'm still missing the PEP for this
along with the needed discussion touching among other things,
the change of the distutils standard "python setup.py install"
to install an egg instead of a site package.
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Apr 18 2006)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
I'm sure you've heard most/all of this before but..it..just..seems..so..true...
Finally this week I've "written" (ported from sh) a bunch of Perl, 2000 sparse lines.
While it sure beats Perl, it has some glaring flaws, more glaring due to
its overall goodness.
I feel compelled to voice my opinion, as if we don't live in a benevolent dictatorship :),
and as if the weight of existing code was zero.
Much of what I dislike cannot be changed without massive breakage.
Below is what i get from:
jbook:/dev2/cm3/scripts/python/flaws jay$ ls -l
total 72
-rw-r--r-- 1 jay admin 834 Dec 29 16:02 1_not_lexically_scoped.py
-rw-r--r-- 1 jay admin 238 Dec 29 16:02 2_reads_scoped_writes_not.py
-rw-r--r-- 1 jay admin 593 Dec 29 16:02 3_lambda_is_neutered.py
-rw-r--r-- 1 jay admin 377 Dec 29 16:03 4_assignment_is_not_expression.py
-rw-r--r-- 1 jay admin 760 Dec 29 16:02 5_for_loop_off_by_one.py
-rw-r--r-- 1 jay admin 412 Dec 29 16:01 6_docs_good_but_a_complaint.txt
-rw-r--r-- 1 jay admin 254 Dec 29 16:02 7_print_is_wierd.py
-rw-r--r-- 1 jay admin 286 Dec 29 16:06 8_eval_vs_exec.py
-rw-r--r-- 1 jay admin 824 Dec 29 16:14 9_typo_on_read_error_but_write_ok.py
jbook:/dev2/cm3/scripts/python/flaws jay$ cat * > all.txt
jbook:/dev2/cm3/scripts/python/flaws jay$ edit all.txt
Each "---" seperates a file and they are executable Python.
- Jay
#----------------------------------------------------------
# flaw #1
# not lexically scopied
# Functions have their own locals, but other blocks do not.
# This is true both for "normal" variables and lexically nested functions.
#
#
# This is probably largely an outcome of not declaring variables?
#
A = "A1:global"
def F1():
A = "A1:in first F1"
print "F1:global"
if True:
A = "A1:in if"
def F1():
A = "A1:in if.F1" # This does the right thing.
print "F1:in if"
F1() # This does the right thing.
# This should go to "global" but it goes to "in if"
F1()
def F2():
A = "A1:in F2"
def F1():
A = "A1:in F2.F1"
print "F1:in F2"
# Given how if behaved, you'd expect this to go to F2.F1 but it does not.
F1()
# This should be "global" but is "in if".
print("A is " + A)
#----------------------------------------------------------
# flaw #2
#
# In functions, reads are scoped. Writes are not.
#
A = "1"
def F1():
A = "2" # not an error
def F2():
#B = A # error
A = "3"
F1()
F2()
print(A)
#----------------------------------------------------------
# flaw #3:
# lambda is neutered
# It allows just one expression, and no statements
#
# This should work:
import os
#os.path.walk(
# "..",
# lambda(a, b, c):
# print(b)
# print(b)
# None)
# Instead I must say:
def Callback(a, b, c):
print(b)
print(b)
os.path.walk(
"..",
Callback,
None)
#
# Moving callbacks away from their point of use hurts readability.
# This is probably mitigated by lexical scoping of functions, but
# notice that other blocks are not lexically scoped.
#
#----------------------------------------------------------
# flaw #4:
# assignment is not an expression
#
# This should work (seen in Perl code, nice idiom):
#A = [1,2]
#while (B = A.pop()):
# print(B)
# instead I must say:
A = [1,2]
while (A):
B = A.pop()
print(B)
# Even if you reject popping an empty list, ok
# there are PLENTY of applications of this.
#----------------------------------------------------------
# flaw #5
#
# for loops suck
#
# It should be easy to iterate from 0 to n, not 0 to n - 1,
# thereby knowing why the loop terminated
#
#This should work:
# print the first even number, if there are any
A = [1, 3]
for i in range(0, len(A)):
if ((A[i] % 2) == 0):
print("even number found")
break;
if (i == len(A)):
print("no even numbers found")
# instead I must write:
i = 0
while (i != len(A)):
if ((A[i] % 2) == 0):
print("even number found")
break;
i += 1
if (i == len(A)):
print("no even numbers found")
# with the attendent problem of not being able to "continue" ever, since
# the end of the loop body must be run in order to proceed
Flaw #6
The docs are very good.
However the reference doesn't give much in the way
of semantic description.
It is surprising that an experienced programmer MUST
depend on the tutorial or any examples or semantics.
Light on examples, ok for reference.
The language reference is little more than the grammar in parts.
There needs to be links from the reference back to the tutorial.
Perhaps merge the indices for Tutorial, Language Reference, Library Reference.
Or maybe that's what search is for.
On the other hand, way better than Perl.
#----------------------------------------------------------
# Flaw #7
#
# This is a compatibility issue.
# print is both a statement and a function, or something.
#
print() # should print just a newline, but prints two parens
# workaround:
print("")
#----------------------------------------------------------
# flaw #8
#
# Having to eval expressions but exec statements feels wrong.
# There should just be eval.
#
# eval("print(1)") # error
exec("print(1)") # not an error
exec("1 + 2") # not an error?
eval("1 + 2") # not an error
#----------------------------------------------------------
# flaw #9
#
# Python protects me, via early checking, from typos
# when I read, but not when I write. It should do both.
# That is, I should declare variables.
#
Correct = 1
# proposed typo is Corect
#A = Corect # error, good
Corect = 1 # no error, bad
# For classes, by default, same thing:
class Foo(object):
def __init__(self):
self.Correct =1
F = Foo()
# print(F.Corect) # error, good
F.Corect = 1 # no error, bad
# __slots__ fixes this, but is not the default
class Bar(object):
__slots__ = ["Correct"]
def __init__(self):
self.Correct =1
B = Bar()
# print(B.Corect) # error, good
#B.Corect = 1 # error, good
# __slots__ should be the default and then some other syntax
# for "expandable" types, like __slots__ = [ "*" ]
_________________________________________________________________
Get the power of Windows + Web with the new Windows Live.
http://www.windowslive.com?ocid=TXT_TAGHM_Wave2_powerofwindows_122007
When I build from scratch and run most tests (regrtest.py -uall) I get
some strange failures with test_sys.py:
test test_sys failed -- Traceback (most recent call last):
File "/usr/local/google/home/guido/python/py3kd/Lib/test/test_sys.py",
line 302, in test_43581
self.assertEqual(sys.__stdout__.encoding, sys.__stderr__.encoding)
AssertionError: 'ascii' != 'ISO-8859-1'
The same test doesn't fail when run in isolation.
Interestingly, I saw this with 2.5 as well as 3.0, but not with 2.6!
Any ideas?
--
--Guido van Rossum (home page: http://www.python.org/~guido/)
This is a VERY VERY rough draft of a PEP. The idea is that there should be
some formal way that reST parsers can differentiate (in docstrings) between
variable/function names and identical English words, within comments.
PEP: XXX
Title: Catching unmarked identifiers in docstrings
Version: 0.0.0.0.1
Last-Modified: 23-Aug-2007
Author: Jameson Quinn <firstname dot lastname at gmail>
Status: Draft
Type: Informational
Content-Type: text/x-rst
Created: 23-Aug-2007
Post-History: 30-Aug-2002
Abstract
========
This PEP makes explicit some additional ways to parse docstrings and
comments
for python identifiers. These are intended to be implementable on their own
or
as extensions to reST, and to make as many existing docstrings
as possible usable by tools that change the visible
representation of identifiers, such as translating (non-english) code
editors
or visual programming environments. Docstrings in widely-used modules are
encouraged to use \`explicit backquotes\` to mark identifiers which are not
caught by these cases.
THIS IS AN EARLY DRAFT OF THIS PEP FOR DISCUSSION PURPOSES ONLY. ALL LOGIC
IS
INTENTIONALLY DEFINED ONLY BY EXAMPLES AND THERE IS NO REFERENCE
IMPLEMENTATION
UNTIL A THERE ARE AT LEAST GLIMMERINGS OF CONSENSUS ON THE RULE SET.
Rationale
=========
Python, like most computer languages, is based on English. This can
represent a hurdle to those who do not speak English. Work is underway
on Bityi_, a code viewer/editor which translates code to another language
on load and save. Among the many design issues in Bityi is that of
identifiers in docstrings. A view which translates the identifiers in
code, but leaves the untranslated identifier in the docstrings, makes
the docstrings worse than useless, even if the programmer has a
rudimentary grasp of English. Yet if all identifiers in docstrings are
translated, there is the problem of overtranslation in either direction.
It is necessary to distinguish between the variable named "variable",
which should be translated, and the comment that something is "highly
variable", which should not.
.. _Bityi: http://wiki.laptop.org/go/Bityi
Note that this is just one use-case; syntax coloring and docstring
hyperlinks are another one. This PEP is not the place for a discussion of
all the pros
and cons of a translating viewer.
PEP 287 standardizes reST as an optional way to markup docstrings.
This includes the possibility of using \`backquotes\` to flag Python
identifiers. However, as this PEP is purely optional, there are many
cases of identifiers in docstrings which are not flagged as such.
Moreover, many of these unflagged cases could be caught programatically.
This would reduce the task of making a module internationally-viewable,
or hyperlinkable, considerably.
This syntax is kept relatively open to allow for reuse with
other programming languages.
Common cases of identifiers in docstrings
=========================================
The most common case is that of lists of argument or
method names. We call these "identifier lists"::
def register(func, *targs, **kargs):
"""register a function to be executed someday
func - function to be called
targs - optional arguments to pass
kargs - optional keyword arguments to pass
"""
#func, targs, and kargs would be recognized as identifiers in the
above.
class MyClass(object):
"""Just a silly demonstration, with some methods:
thisword : is a class method and you can call
it - it may even return a value.
As with reST, the associated text can have
several paragraphs.
BUT - you can't nest this construct, so BUT isn't counted.
anothermethod: is another method.
eventhis -- is counted as a method.
anynumber --- of dashes are allowed in this syntax
But consider: two words are NOT counted as an identifier.
things(that,look,like,functions): are functions (see below)
Also, the docstring may have explanatory text, below or by
itself: so we have to deal with that.
Thus, any paragraph which is NOT preceded by an empty line
or another identifier list - like "itself" above - does not count
as an identifier.
"""
#thisword, anothermethod, eventhis, anynumber, and things would be
#recognized as identifiers in the above.
Another case is things which look like functions, lists, indexes, or
dicts::
"""
afunction(is,a,word,with,parentheses)
[a,list,is,a,bunch,of,words,in,brackets]
anindex[is, like, a, cross, between, the, above]
{adict:is,just:words,in:curly, brackets: likethis}
"""
#all of the above would be recogniszed as identifiers.
The "syntax" of what goes inside these is very loose.
identifier_list ::= [<initial_word>]<opening_symbol> <content_word>
{<separator_symbol> <content_word>} <closing symbol>
, with no whitespace after initial_word, and where separator_symbol is the
set of symbols ".,<>{}[]+-*^%=|/()[]{}" MINUS closing_symbol. content_word
could maybe be a quoted string, too.
In the "function name", no whitespace
is allowed, but the symbols ".,*^=><-" are. Thus::
"""
this.long=>function.*name(counts, and: so |do| these {so-called]
arguments)
{but,you - cant|use[nested]brackets{so,these,are.identifiers
}but,these,arent}
{heres.an.example.of."a string, no identifiers in here",but.out.here.yes
}
{ even.one.pair.of.words.with.no
symbols.means.nothing.here.is.an.identifier}
Any of these structures that open on one line {but.close.on.
the.next} are NOT counted as identifiers.
"""
#in the above: lines 1,2,and the parts of 3 outside the quotes
#would be recognized as identifiers
The above flexibility is intended to cover the various possibilities for
argument lists in a fair subset of other languages. Languages which use only
whitespace for argument separation are not covered by these rules.
The final case is words that are in some_kind of mixedCase. These are only
optionally counted as identifiers if they are also present as an identifier
OUTSIDE
the comments somewhere in the same file.
Doctest and preformatted reST sections should be considered as 100% python
code and treated as identifiers (or keywords).
Recommended use
===============
The rules above are designed to catch the large majority of identifiers
already present in docstrings, while applying only extremely rarely to words
that should properly be considered as natural language. However, they are
inevitably imperfect. All docstrings of modules intended for wide use should
manually fix all cases in which these rules fail. If the rules underapply,
you can use either \`back quotes\` or parentheses() to mark words as
identifiers; if they overapply and reformatting tricks don't fix the
problem, <SOME DIRECTIVE TO TURN OFF ALL THIS LOGIC FOR A STRING>
Optional use inside comments or non-docstring strings
=====================================================
Comments
--------
Comments or blocks of comments alone on consecutive lines should be able,
optionally, to use these same tricks to spotlight identifiers.
Other strings
-------------
I'm not sure yet what the rules should be here. One option I'm considering
is to be able to turn on all the above logic with some evil hack such
as '' 'a string like this, concatenated with an empty string'.
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: