Hello everyone!
We have been encountering several deadlocks in a threaded Python
application which calls subprocess.Popen (i.e. fork()) in some of its
threads.
This has occurred on Python 2.4.1 on a 2.4.27 Linux kernel.
Preliminary analysis of the hang shows that the child process blocks
upon entering the execvp function, in which the import_lock is acquired
due to the following line:
def _ execvpe(file, args, env=None):
from errno import ENOENT, ENOTDIR
...
It is known that when forking from a pthreaded application, acquisition
attempts on locks which were already locked by other threads while
fork() was called will deadlock.
Due to these oddities we were wondering if it would be better to extract
the above import line from the execvpe call, to prevent lock
acquisition attempts in such cases.
Another workaround could be re-assigning a new lock to import_lock
(such a thing is done with the global interpreter lock) at PyOS_AfterFork or
pthread_atfork.
We'd appreciate any opinions you might have on the subject.
Thanks in advance,
Yair and Rotem
On Wed, 10 Nov 2004, John P Speno wrote:
Hi, sorry for the delayed response.
> While using subprocess (aka popen5), I came across one potential gotcha. I've had
> exceptions ending like this:
>
> File "test.py", line 5, in test
> cmd = popen5.Popen(args, stdout=PIPE)
> File "popen5.py", line 577, in __init__
> data = os.read(errpipe_read, 1048576) # Exceptions limited to 1 MB
> OSError: [Errno 4] Interrupted system call
>
> (on Solaris 9)
>
> Would it make sense for subprocess to use a more robust read() function
> which can handle these cases, i.e. when the parent's read on the pipe
> to the child's stderr is interrupted by a system call, and returns EINTR?
> I imagine it could catch EINTR and EAGAIN and retry the failed read().
I assume you are using signals in your application? The os.read above is
not the only system call that can fail with EINTR. subprocess.py is full
of other system calls that can fail, and I suspect that many other Python
modules are as well.
I've made a patch (attached) to subprocess.py (and test_subprocess.py)
that should guard against EINTR, but I haven't committed it yet. It's
quite large.
Are Python modules supposed to handle EINTR? Why not let the C code handle
this? Or, perhaps the signal module should provide a sigaction function,
so that users can use SA_RESTART.
Index: subprocess.py
===================================================================
RCS file: /cvsroot/python/python/dist/src/Lib/subprocess.py,v
retrieving revision 1.8
diff -u -r1.8 subprocess.py
--- subprocess.py 7 Nov 2004 14:30:34 -0000 1.8
+++ subprocess.py 17 Nov 2004 19:42:30 -0000
@@ -888,6 +888,50 @@
pass
+ def _read_no_intr(self, fd, buffersize):
+ """Like os.read, but retries on EINTR"""
+ while True:
+ try:
+ return os.read(fd, buffersize)
+ except OSError, e:
+ if e.errno == errno.EINTR:
+ continue
+ else:
+ raise
+
+
+ def _read_all(self, fd, buffersize):
+ """Like os.read, but retries on EINTR, and reads until EOF"""
+ all = ""
+ while True:
+ data = self._read_no_intr(fd, buffersize)
+ all += data
+ if data == "":
+ return all
+
+
+ def _write_no_intr(self, fd, s):
+ """Like os.write, but retries on EINTR"""
+ while True:
+ try:
+ return os.write(fd, s)
+ except OSError, e:
+ if e.errno == errno.EINTR:
+ continue
+ else:
+ raise
+
+ def _waitpid_no_intr(self, pid, options):
+ """Like os.waitpid, but retries on EINTR"""
+ while True:
+ try:
+ return os.waitpid(pid, options)
+ except OSError, e:
+ if e.errno == errno.EINTR:
+ continue
+ else:
+ raise
+
def _execute_child(self, args, executable, preexec_fn, close_fds,
cwd, env, universal_newlines,
startupinfo, creationflags, shell,
@@ -963,7 +1007,7 @@
exc_value,
tb)
exc_value.child_traceback = ''.join(exc_lines)
- os.write(errpipe_write, pickle.dumps(exc_value))
+ self._write_no_intr(errpipe_write, pickle.dumps(exc_value))
# This exitcode won't be reported to applications, so it
# really doesn't matter what we return.
@@ -979,7 +1023,7 @@
os.close(errwrite)
# Wait for exec to fail or succeed; possibly raising exception
- data = os.read(errpipe_read, 1048576) # Exceptions limited to 1 MB
+ data = self._read_all(errpipe_read, 1048576) # Exceptions limited to 1 MB
os.close(errpipe_read)
if data != "":
child_exception = pickle.loads(data)
@@ -1003,7 +1047,7 @@
attribute."""
if self.returncode == None:
try:
- pid, sts = os.waitpid(self.pid, os.WNOHANG)
+ pid, sts = self._waitpid_no_intr(self.pid, os.WNOHANG)
if pid == self.pid:
self._handle_exitstatus(sts)
except os.error:
@@ -1015,7 +1059,7 @@
"""Wait for child process to terminate. Returns returncode
attribute."""
if self.returncode == None:
- pid, sts = os.waitpid(self.pid, 0)
+ pid, sts = self._waitpid_no_intr(self.pid, 0)
self._handle_exitstatus(sts)
return self.returncode
@@ -1049,27 +1093,33 @@
stderr = []
while read_set or write_set:
- rlist, wlist, xlist = select.select(read_set, write_set, [])
+ try:
+ rlist, wlist, xlist = select.select(read_set, write_set, [])
+ except select.error, e:
+ if e[0] == errno.EINTR:
+ continue
+ else:
+ raise
if self.stdin in wlist:
# When select has indicated that the file is writable,
# we can write up to PIPE_BUF bytes without risk
# blocking. POSIX defines PIPE_BUF >= 512
- bytes_written = os.write(self.stdin.fileno(), input[:512])
+ bytes_written = self._write_no_intr(self.stdin.fileno(), input[:512])
input = input[bytes_written:]
if not input:
self.stdin.close()
write_set.remove(self.stdin)
if self.stdout in rlist:
- data = os.read(self.stdout.fileno(), 1024)
+ data = self._read_no_intr(self.stdout.fileno(), 1024)
if data == "":
self.stdout.close()
read_set.remove(self.stdout)
stdout.append(data)
if self.stderr in rlist:
- data = os.read(self.stderr.fileno(), 1024)
+ data = self._read_no_intr(self.stderr.fileno(), 1024)
if data == "":
self.stderr.close()
read_set.remove(self.stderr)
Index: test/test_subprocess.py
===================================================================
RCS file: /cvsroot/python/python/dist/src/Lib/test/test_subprocess.py,v
retrieving revision 1.14
diff -u -r1.14 test_subprocess.py
--- test/test_subprocess.py 12 Nov 2004 15:51:48 -0000 1.14
+++ test/test_subprocess.py 17 Nov 2004 19:42:30 -0000
@@ -7,6 +7,7 @@
import tempfile
import time
import re
+import errno
mswindows = (sys.platform == "win32")
@@ -35,6 +36,16 @@
fname = tempfile.mktemp()
return os.open(fname, os.O_RDWR|os.O_CREAT), fname
+ def read_no_intr(self, obj):
+ while True:
+ try:
+ return obj.read()
+ except IOError, e:
+ if e.errno == errno.EINTR:
+ continue
+ else:
+ raise
+
#
# Generic tests
#
@@ -123,7 +134,7 @@
p = subprocess.Popen([sys.executable, "-c",
'import sys; sys.stdout.write("orange")'],
stdout=subprocess.PIPE)
- self.assertEqual(p.stdout.read(), "orange")
+ self.assertEqual(self.read_no_intr(p.stdout), "orange")
def test_stdout_filedes(self):
# stdout is set to open file descriptor
@@ -151,7 +162,7 @@
p = subprocess.Popen([sys.executable, "-c",
'import sys; sys.stderr.write("strawberry")'],
stderr=subprocess.PIPE)
- self.assertEqual(remove_stderr_debug_decorations(p.stderr.read()),
+ self.assertEqual(remove_stderr_debug_decorations(self.read_no_intr(p.stderr)),
"strawberry")
def test_stderr_filedes(self):
@@ -186,7 +197,7 @@
'sys.stderr.write("orange")'],
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
- output = p.stdout.read()
+ output = self.read_no_intr(p.stdout)
stripped = remove_stderr_debug_decorations(output)
self.assertEqual(stripped, "appleorange")
@@ -220,7 +231,7 @@
stdout=subprocess.PIPE,
cwd=tmpdir)
normcase = os.path.normcase
- self.assertEqual(normcase(p.stdout.read()), normcase(tmpdir))
+ self.assertEqual(normcase(self.read_no_intr(p.stdout)), normcase(tmpdir))
def test_env(self):
newenv = os.environ.copy()
@@ -230,7 +241,7 @@
'sys.stdout.write(os.getenv("FRUIT"))'],
stdout=subprocess.PIPE,
env=newenv)
- self.assertEqual(p.stdout.read(), "orange")
+ self.assertEqual(self.read_no_intr(p.stdout), "orange")
def test_communicate(self):
p = subprocess.Popen([sys.executable, "-c",
@@ -305,7 +316,8 @@
'sys.stdout.write("\\nline6");'],
stdout=subprocess.PIPE,
universal_newlines=1)
- stdout = p.stdout.read()
+
+ stdout = self.read_no_intr(p.stdout)
if hasattr(open, 'newlines'):
# Interpreter with universal newline support
self.assertEqual(stdout,
@@ -343,7 +355,7 @@
def test_no_leaking(self):
# Make sure we leak no resources
- max_handles = 1026 # too much for most UNIX systems
+ max_handles = 10 # too much for most UNIX systems
if mswindows:
max_handles = 65 # a full test is too slow on Windows
for i in range(max_handles):
@@ -424,7 +436,7 @@
'sys.stdout.write(os.getenv("FRUIT"))'],
stdout=subprocess.PIPE,
preexec_fn=lambda: os.putenv("FRUIT", "apple"))
- self.assertEqual(p.stdout.read(), "apple")
+ self.assertEqual(self.read_no_intr(p.stdout), "apple")
def test_args_string(self):
# args is a string
@@ -457,7 +469,7 @@
p = subprocess.Popen(["echo $FRUIT"], shell=1,
stdout=subprocess.PIPE,
env=newenv)
- self.assertEqual(p.stdout.read().strip(), "apple")
+ self.assertEqual(self.read_no_intr(p.stdout).strip(), "apple")
def test_shell_string(self):
# Run command through the shell (string)
@@ -466,7 +478,7 @@
p = subprocess.Popen("echo $FRUIT", shell=1,
stdout=subprocess.PIPE,
env=newenv)
- self.assertEqual(p.stdout.read().strip(), "apple")
+ self.assertEqual(self.read_no_intr(p.stdout).strip(), "apple")
def test_call_string(self):
# call() function with string argument on UNIX
@@ -525,7 +537,7 @@
p = subprocess.Popen(["set"], shell=1,
stdout=subprocess.PIPE,
env=newenv)
- self.assertNotEqual(p.stdout.read().find("physalis"), -1)
+ self.assertNotEqual(self.read_no_intr(p.stdout).find("physalis"), -1)
def test_shell_string(self):
# Run command through the shell (string)
@@ -534,7 +546,7 @@
p = subprocess.Popen("set", shell=1,
stdout=subprocess.PIPE,
env=newenv)
- self.assertNotEqual(p.stdout.read().find("physalis"), -1)
+ self.assertNotEqual(self.read_no_intr(p.stdout).find("physalis"), -1)
def test_call_string(self):
# call() function with string argument on Windows
/Peter Åstrand <astrand(a)lysator.liu.se>
During the PyCon sprint I tried to make BaseException accept only a single
argument and bind it to BaseException.message . I was successful (see the
p3yk_no_args_on_exc branch), but it was very painful to pull off as anyone
who sat around me the last three days of the sprint will tell you as they
had to listen to me curse incessantly.
Because of the pain that I went through in the transition and thus the
lessons learned, Guido and I discussed it and we think it would be best to
give up on forcing BaseException to accept only a single argument. I think
it is still doable, but requires a multi-release transition period and not
the one that 2.6 -> 3.0 is offering. And so Guido and I plan on deprecating
BaseException.message as its entire point in existence was to help
transition to what we are not going to have happen. =)
Now that means BaseException.message might hold the record for shortest
lived feature as it was only introduced in 2.5 and is now to be deprecated
in 2.6 and removed in 2.7/3.0. =)
Below is PEP 352, revised to reflect the removal of
BaseException.messageand for letting the 'args' attribute stay (along
with suggesting one should
only pass a single argument to BaseException). Basically the interface for
exceptions doesn't really change in 3.0 except for the removal of
__getitem__.
--------------------------------------------------------------------------
PEP: 352
Title: Required Superclass for Exceptions
Version: $Revision: 53592 $
Last-Modified: $Date: 2007-01-28 21:54:11 -0800 (Sun, 28 Jan 2007) $
Author: Brett Cannon <brett(a)python.org>
Guido van Rossum <guido(a)python.org>
Status: Final
Type: Standards Track
Content-Type: text/x-rst
Created: 27-Oct-2005
Post-History:
Abstract
========
In Python 2.4 and before, any (classic) class can be raised as an
exception. The plan for 2.5 was to allow new-style classes, but this
makes the problem worse -- it would mean *any* class (or
instance) can be raised! This is a problem as it prevents any
guarantees from being made about the interface of exceptions.
This PEP proposes introducing a new superclass that all raised objects
must inherit from. Imposing the restriction will allow a standard
interface for exceptions to exist that can be relied upon. It also
leads to a known hierarchy for all exceptions to adhere to.
One might counter that requiring a specific base class for a
particular interface is unPythonic. However, in the specific case of
exceptions there's a good reason (which has generally been agreed to
on python-dev): requiring hierarchy helps code that wants to *catch*
exceptions by making it possible to catch *all* exceptions explicitly
by writing ``except BaseException:`` instead of
``except *:``. [#hierarchy-good]_
Introducing a new superclass for exceptions also gives us the chance
to rearrange the exception hierarchy slightly for the better. As it
currently stands, all exceptions in the built-in namespace inherit
from Exception. This is a problem since this includes two exceptions
(KeyboardInterrupt and SystemExit) that often need to be excepted from
the application's exception handling: the default behavior of shutting
the interpreter down without a traceback is usually more desirable than
whatever the application might do (with the possible exception of
applications that emulate Python's interactive command loop with
``>>>`` prompt). Changing it so that these two exceptions inherit
from the common superclass instead of Exception will make it easy for
people to write ``except`` clauses that are not overreaching and not
catch exceptions that should propagate up.
This PEP is based on previous work done for PEP 348 [#pep348]_.
Requiring a Common Superclass
=============================
This PEP proposes introducing a new exception named BaseException that
is a new-style class and has a single attribute, ``args``. Below
is the code as the exception will work in Python 3.0 (how it will
work in Python 2.x is covered in the `Transition Plan`_ section)::
class BaseException(object):
"""Superclass representing the base of the exception hierarchy.
Provides a 'message' attribute that contains either the single
argument to the constructor or the empty string. This attribute
is used in the string representation for the
exception. This is so that it provides the extra details in the
traceback.
"""
def __init__(self, *args):
"""Set the 'message' attribute'"""
self.args = args
def __str__(self):
"""Return the str of 'message'"""
if len(self.args) == 1:
return str(self.args[0])
else:
return str(self.args)
def __repr__(self):
return "%s(*%s)" % (self.__class__.__name__, repr(self.args))
No restriction is placed upon what may be passed in for ``args``
for backwards-compatibility reasons. In practice, though, only
a single string argument should be used. This keeps the string
representation of the exception to be a useful message about the
exception that is human-readable; this is why the ``__str__`` method
special-cases on length-1 ``args`` value. Including programmatic
information (e.g., an error code number) should be stored as a
separate attribute in a subclass.
The ``raise`` statement will be changed to require that any object
passed to it must inherit from BaseException. This will make sure
that all exceptions fall within a single hierarchy that is anchored at
BaseException [#hierarchy-good]_. This also guarantees a basic
interface that is inherited from BaseException. The change to
``raise`` will be enforced starting in Python 3.0 (see the `Transition
Plan`_ below).
With BaseException being the root of the exception hierarchy,
Exception will now inherit from it.
Exception Hierarchy Changes
===========================
With the exception hierarchy now even more important since it has a
basic root, a change to the existing hierarchy is called for. As it
stands now, if one wants to catch all exceptions that signal an error
*and* do not mean the interpreter should be allowed to exit, you must
specify all but two exceptions specifically in an ``except`` clause
or catch the two exceptions separately and then re-raise them and
have all other exceptions fall through to a bare ``except`` clause::
except (KeyboardInterrupt, SystemExit):
raise
except:
...
That is needlessly explicit. This PEP proposes moving
KeyboardInterrupt and SystemExit to inherit directly from
BaseException.
::
- BaseException
|- KeyboardInterrupt
|- SystemExit
|- Exception
|- (all other current built-in exceptions)
Doing this makes catching Exception more reasonable. It would catch
only exceptions that signify errors. Exceptions that signal that the
interpreter should exit will not be caught and thus be allowed to
propagate up and allow the interpreter to terminate.
KeyboardInterrupt has been moved since users typically expect an
application to exit when the press the interrupt key (usually Ctrl-C).
If people have overly broad ``except`` clauses the expected behaviour
does not occur.
SystemExit has been moved for similar reasons. Since the exception is
raised when ``sys.exit()`` is called the interpreter should normally
be allowed to terminate. Unfortunately overly broad ``except``
clauses can prevent the explicitly requested exit from occurring.
To make sure that people catch Exception most of the time, various
parts of the documentation and tutorials will need to be updated to
strongly suggest that Exception be what programmers want to use. Bare
``except`` clauses or catching BaseException directly should be
discouraged based on the fact that KeyboardInterrupt and SystemExit
almost always should be allowed to propagate up.
Transition Plan
===============
Since semantic changes to Python are being proposed, a transition plan
is needed. The goal is to end up with the new semantics being used in
Python 3.0 while providing a smooth transition for 2.x code. All
deprecations mentioned in the plan will lead to the removal of the
semantics starting in the version following the initial deprecation.
Here is BaseException as implemented in the 2.x series::
class BaseException(object):
"""Superclass representing the base of the exception hierarchy.
The __getitem__ method is provided for backwards-compatibility
and will be deprecated at some point.
"""
def __init__(self, *args):
"""Set the 'args' attribute."""
self.args = args
def __str__(self):
"""Return the str of args[0] or args, depending on length."""
return str(self.args[0]
if len(self.args) <= 1
else self.args)
def __repr__(self):
func_args = repr(self.args) if self.args else "()"
return self.__class__.__name__ + func_args
def __getitem__(self, index):
"""Index into arguments passed in during instantiation.
Provided for backwards-compatibility and will be
deprecated.
"""
return self.args[index]
Deprecation of features in Python 2.9 is optional. This is because it
is not known at this time if Python 2.9 (which is slated to be the
last version in the 2.x series) will actively deprecate features that
will not be in 3.0 . It is conceivable that no deprecation warnings
will be used in 2.9 since there could be such a difference between 2.9
and 3.0 that it would make 2.9 too "noisy" in terms of warnings. Thus
the proposed deprecation warnings for Python 2.9 will be revisited
when development of that version begins to determine if they are still
desired.
* Python 2.5 [done]
- all standard exceptions become new-style classes
- introduce BaseException
- Exception, KeyboardInterrupt, and SystemExit inherit from BaseException
- deprecate raising string exceptions
* Python 2.6
- deprecate catching string exceptions
- deprecate ``message`` attribute (see `Retracted Ideas`_)
* Python 2.7
- deprecate raising exceptions that do not inherit from BaseException
- remove ``message`` attribute
* Python 2.8
- deprecate catching exceptions that do not inherit from BaseException
* Python 2.9
- deprecate ``__getitem__`` (optional)
* Python 3.0 [done]
- drop everything that was deprecated above:
+ string exceptions (both raising and catching)
+ all exceptions must inherit from BaseException
+ drop ``__getitem__``
Retracted Ideas
===============
A previous version of this PEP that was implemented in Python 2.5
included a 'message' attribute on BaseException. Its purpose was to
begin a transition to BaseException accepting only a single argument.
This was to tighten the interface and to force people to use
attributes in subclasses to carry arbitrary information with an
exception instead of cramming it all into ``args``.
Unfortunately, while implementing the removal of the ``args``
attribute in Python 3.0 at the PyCon 2007 sprint
[#pycon2007-sprint-email]_, it was discovered that the transition was
very painful, especially for C extension modules. It was decided that
it would be better to deprecate the ``message`` attribute in
Python 2.6 (and remove in Python 2.7 and Python 3.0) and consider a
more long-term transition strategy in Python 3.0 to remove
multiple-argument support in BaseException in preference of accepting
only a single argument. Thus the introduction of ``message`` and the
original deprecation of ``args`` has been retracted.
References
==========
.. [#pep348] PEP 348 (Exception Reorganization for Python 3.0)
http://www.python.org/peps/pep-0348.html
.. [#hierarchy-good] python-dev Summary for 2004-08-01 through 2004-08-15
http://www.python.org/dev/summary/2004-08-01_2004-08-15.html#an-exception-i…
.. [#SF_1104669] SF patch #1104669 (new-style exceptions)
http://www.python.org/sf/1104669
.. [#pycon2007-sprint-email] python-3000 email ("How far to go with
cleaning up exceptions")
http://mail.python.org/pipermail/python-3000/2007-March/005911.html
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End:
At 02:47 PM 2/24/2007 -0600, Tarek Ziadé wrote:
>I have created a setup.py file for distirbution and I bumped into
>a small bug when i tried to set my name in the contact field (Tarek Ziadé)
>
>Using string (utf8 file):
>
>setup(
> maintainer="Tarek Ziadé"
>)
>
>leads to:
>
> File ".../lib/python2.5/distutils/command/register.py", line 162, in
> send_metadata
> auth)
> File ".../lib/python2.5/distutils/command/register.py", line 257, in
> post_to_server
> value = unicode(value).encode("utf-8")
>UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10:
>ordinal not in range(128)
>
>
>Using unicode:
>
>setup(
> maintainer=u"Tarek Ziadé"
>)
>
>leads to:
>
> File ".../lib/python2.5/distutils/dist.py", line 1094, in write_pkg_file
> file.write('Author: %s\n' % self.get_contact() )
>UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
>position 18: ordinal not in range(128)
>
>I would propose a patch for this problem but i don't know what would be
>the best input (i guess unicode
> for names)
At 05:45 PM 2/24/2007 -0500, Tres Seaver wrote:
>Don't you still need to tell Python about the encoding of your string
>literals [1] [2] ? E.g.::
That's not the problem, it's that the code that writes the PKG-INFO file
doesn't handle Unicode. See
distutils.dist.DistributionMetadata.write_pkg_info(). It needs to use a
file with encoding support, if it's doing unicode
However, there's currently no standard, as far as I know, for what encoding
the PKG-INFO file should use. Meanwhile, the 'register' command accepts
Unicode, but is broken in handling it.
Essentially, the problem is that Python 2.5 broke this by adding a unicode
*requirement* to the "register" command. Previously, register simply sent
whatever you gave it, and the PKG-INFO writing code still
does. Unfortunately, this means that there is no longer any one value that
you can use for your name that will be accepted by both "register" and
anything that writes a PKG-INFO file.
Both register and write_pkg_info() are arguably broken here, and should be
able to work with either strings or unicode, and degrade gracefully in the
event of non-ASCII characters in a string. (Because even though "register"
is only run by the package's author, users may run other commands that
require a PKG-INFO, so a package prepared using Python <2.5 must still be
usable with Python 2.5 distutils, and Python <2.5 allows 8-bit maintainer
names.)
Unfortunately, this isn't fixable until there's a new 2.5.x release. For
previous Python versions, both register and write_pkg_info() accepted 8-bit
strings and passed them on as-is, so the only workaround for this issue at
the moment is to revert to Python 2.4 or less.
This may seem like it's coming out of left field for a minute, but
bear with me.
There is no doubt that Ruby's success is a concern for anyone who
sees it as diminishing Python's status. One of the reasons for
Ruby's success is certainly the notion (originally advocated by Bruce
Tate, if I'm not mistaken) that it is the "next Java" -- the language
and environment that mainstream Java developers are, or will, look to
as a natural next step.
One thing that would help Python in this "debate" (or, perhaps simply
put it in the running, at least as a "next Java" candidate) would be
if Python had an easier migration path for Java developers that
currently rely upon various third-party libraries. The wealth of
third-party libraries available for Java has always been one of its
great strengths. Ergo, if Python had an easy-to-use, recommended way
to use those libraries within the Python environment, that would be a
significant advantage to present to Java developers and those who
would choose Ruby over Java. Platform compatibility is always a huge
motivator for those looking to migrate or upgrade.
In that vein, I would point to JPype (http://jpype.sourceforge.net).
JPype is a module that gives "python programs full access to java
class libraries". My suggestion would be to either:
(a) include JPype in the standard library, or barring that,
(b) make a very strong push to support JPype
(a) might be difficult or cumbersome technically, as JPype does need
to build against Java headers, which may or may not be possible given
the way that Python is distributed, etc.
However, (b) is very feasible. I can't really say what "supporting
JPype" means exactly -- maybe GvR and/or other heavyweights in the
Python community make public statements regarding its existence and
functionality, maybe JPype gets a strong mention or placement on
python.org....all those details are obviously not up to me, and I
don't know the workings of the "official" Python organizations enough
to make serious suggestions.
Regardless of the form of support, I think raising people's awareness
of JPype and what it adds to the Python environment would be a Good
Thing (tm).
For our part, we've used JPype to make PDFTextStream (our previously
Java-only PDF text extraction library) available and supported for
Python. You can read some about it here:
http://snowtide.com/PDFTextStream.Python
And I've blogged about how PDFTextStream.Python came about, and how
we worked with Steve Ménard, the maintainer of JPype, to make it all
happen (watch out for this URL wrapping):
http://blog.snowtide.com/2006/08/21/working-together-pythonjava-open-
sourcecommercial
Cheers,
Chas Emerick
Founder, Snowtide Informatics Systems
Enterprise-class PDF content extraction
cemerick(a)snowtide.com
http://snowtide.com | +1 413.519.6365
Hi,
Would it possible to get more permissions on Python bugtracker, especially to
add keywords, close a duplicate bug, etc.?
I also would like an SVN account for Python trunk and Python 3000 branch. I
know that both branches are near "frozen" and commit are only allowed after
patch review and/or discussion (on the mailing list or the bug tracker. So I
will after the final versions (2.6 and 3.0) to commit. Or would it be
possible to have a "mentor"? Someone to ask questions about Python SVN/bug
tracker, or someone who reviews my patchs/commits.
--
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
Hi,
I would like to know if a Python security team does exist. I sent an email
about an imageop issue, and I didn't get any answer. Later I learned that a
security ticket was created, I don't have access to it.
First, I would like to access to these informations. Not only this issue, but
all security related issues. I have some knowledges about security and I can
help to resolve issues and/or estimate the criticity of an issue.
Second, I would like to help to fix all Python security issues. It looks like
Python community isn't very reactive (proactive?) about security. Eg. a DoS
was reported in smtpd server (integrated to Python)... 15 months ago. A patch
is available but it's not applied in Python trunk.
Third, I'm also looking for a document explaining "how Python is secure" (!).
If an user can run arbitrary Python code, we know that it can do anything
(read/remove any file, create/kill any process, read/write anywhere in
memory, etc.). Brett wrote a paper about CPython sandboxing. PyPy is also
working on sandboxing using two interpreters: one has high priviledge and
execute instructions from the second interpreter (after checking the
permissions and arguments). So is there somewhere a document to explain to
current status of Python security?
--
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
On 12:47 am, victor.stinner(a)haypocalc.com wrote:
This is the most sane contribution I've seen so far :).
>See attached patch: python3_bytes_filename.patch
>
>Using the patch, you will get:
>- open() support bytes
>- listdir(unicode) -> only unicode, *skip* invalid filenames
> (as asked by Guido)
Forgive me for being a bit dense, but I couldn't find this hunk in the
patch. Do I understand properly that (listdir(bytes) -> bytes)?
If so, this seems basically sane to me, since it provides text behavior
where possible and allows more sophisticated filesystem wrappers (i.e.
Twisted's FilePath, Will McGugan's "FS") to do more tricky things,
separating filenames for display to the user and filenames for exchange
with the FS.
>- remove os.getcwdu()
>- create os.getcwdb() -> bytes
>- glob.glob() support bytes
>- fnmatch.filter() support bytes
>- posixpath.join() and posixpath.split() support bytes
It sounds like maybe there should be some 2to3 fixers in here somewhere,
too? Not necessarily as part of this patch, but somewhere related? I
don't know what they would do, but it does seem quite likely that code
which was previously correct under 2.6 (using bytes) would suddenly be
mixing bytes and unicode with these APIs.
>> On Windows, we might reject bytes filenames for all file operations: open(),
>> unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError)
>
> Since I've seen no objections to this yet: please no. If we offer a
> "lower-level" bytes filename API, it should work for all platforms.
Unfortunately, it can't. You cannot represent all possible file names
in a byte string in Windows (just as you can't do so in a Unicode
string on Unix).
So using byte strings on Windows would work for some files, but fail
for others. In particular, listdir might give you a list of file names
which you then can't open/stat/recurse into.
(of course, you could use UTF-8 as the file system encoding on Windows,
but then you will have to rewrite a lot of C code first)
Regards,
Martin