I've received some enthusiastic emails from someone who wants to
revive restricted mode. He started out with a bunch of patches to the
CPython runtime using ctypes, which he attached to an App Engine bug:
Based on his code (the file secure.py is all you need, included in
secure.tar.gz) it seems he believes the only security leaks are
__subclasses__, gi_frame and gi_code. (I have since convinced him that
if we add "restricted" guards to these attributes, he doesn't need the
functions added to sys.)
I don't recall the exploits that Samuele once posted that caused the
death of rexec.py -- does anyone recall, or have a pointer to the
--Guido van Rossum (home page: http://www.python.org/~guido/)
Alright, I will re-submit with the contents pasted. I never use double
backquotes as I think them rather ugly; that is the work of an editor
or some automated program in the chain. Plus, it also messed up my
line formatting and now I have lines with one word on them... Anyway,
the contents of PEP 3145:
Title: Asynchronous I/O For subprocess.Popen
Author: (James) Eric Pruitt, Charles R. McCreary, Josiah Carlson
Type: Standards Track
In its present form, the subprocess.Popen implementation is prone to
dead-locking and blocking of the parent Python script while waiting on data
from the child process.
A search for "python asynchronous subprocess" will turn up numerous
accounts of people wanting to execute a child process and communicate with
it from time to time reading only the data that is available instead of
blocking to wait for the program to produce data   . The current
behavior of the subprocess module is that when a user sends or receives
data via the stdin, stderr and stdout file objects, dead locks are common
and documented  . While communicate can be used to alleviate some of
the buffering issues, it will still cause the parent process to block while
attempting to read data when none is available to be read from the child
There is a documented need for asynchronous, non-blocking functionality in
subprocess.Popen    . Inclusion of the code would improve the
utility of the Python standard library that can be used on Unix based and
Windows builds of Python. Practically every I/O object in Python has a
file-like wrapper of some sort. Sockets already act as such and for
strings there is StringIO. Popen can be made to act like a file by simply
using the methods attached the the subprocess.Popen.stderr, stdout and
stdin file-like objects. But when using the read and write methods of
those options, you do not have the benefit of asynchronous I/O. In the
proposed solution the wrapper wraps the asynchronous methods to mimic a
I have been maintaining a Google Code repository that contains all of my
changes including tests and documentation  as well as blog detailing
the problems I have come across in the development process .
I have been working on implementing non-blocking asynchronous I/O in the
subprocess.Popen module as well as a wrapper class for subprocess.Popen
that makes it so that an executed process can take the place of a file by
duplicating all of the methods and attributes that file objects have.
There are two base functions that have been added to the subprocess.Popen
class: Popen.send and Popen._recv, each with two separate implementations,
one for Windows and one for Unix based systems. The Windows
implementation uses ctypes to access the functions needed to control pipes
in the kernel 32 DLL in an asynchronous manner. On Unix based systems,
the Python interface for file control serves the same purpose. The
different implementations of Popen.send and Popen._recv have identical
arguments to make code that uses these functions work across multiple
When calling the Popen._recv function, it requires the pipe name be
passed as an argument so there exists the Popen.recv function that passes
selects stdout as the pipe for Popen._recv by default. Popen.recv_err
selects stderr as the pipe by default. "Popen.recv" and "Popen.recv_err"
are much easier to read and understand than "Popen._recv('stdout' ..." and
"Popen._recv('stderr' ..." respectively.
Since the Popen._recv function does not wait on data to be produced
before returning a value, it may return empty bytes. Popen.asyncread
handles this issue by returning all data read over a given time
The ProcessIOWrapper class uses the asyncread and asyncwrite functions to
allow a process to act like a file so that there are no blocking issues
that can arise from using the stdout and stdin file objects produced from
a subprocess.Popen call.
 [ python-Feature Requests-1191964 ] asynchronous Subprocess
 Daily Life in an Ivory Basement : /feb-07/problems-with-subprocess
 How can I run an external command asynchronously from Python? - Stack
 18.1. subprocess - Subprocess management - Python v2.6.2 documentation
 18.1. subprocess - Subprocess management - Python v2.6.2 documentation
 Issue 1191964: asynchronous Subprocess - Python tracker
 Module to allow Asynchronous subprocess use on Windows and Posix
platforms - ActiveState Code
 subprocess.rst - subprocdev - Project Hosting on Google Code
 subprocdev - Project Hosting on Google Code
 Python Subprocess Dev
This P.E.P. is licensed under the Open Publication License;
On Tue, Sep 8, 2009 at 22:56, Benjamin Peterson <benjamin(a)python.org> wrote:
> 2009/9/7 Eric Pruitt <eric.pruitt(a)gmail.com>:
>> Hello all,
>> I have been working on adding asynchronous I/O to the Python
>> subprocess module as part of my Google Summer of Code project. Now
>> that I have finished documenting and pruning the code, I present PEP
>> 3145 for its inclusion into the Python core code. Any and all feedback
>> on the PEP (http://www.python.org/dev/peps/pep-3145/) is appreciated.
> Hi Eric,
> One of the reasons you're not getting many response is that you've not
> pasted the contents of the PEP in this message. That makes it really
> easy for people to comment on various sections.
> BTW, it seems like you were trying to use reST formatting with the
> text PEP layout. Double backquotes only mean something in reST.
In reviewing a fix for the metaclass calculation in __build_class__
, I realised that PEP 3115 poses a potential problem for the common
practice of using "type(name, bases, ns)" for dynamic class creation.
Specifically, if one of the base classes has a metaclass with a
significant __prepare__() method, then the current idiom will do the
wrong thing (and most likely fail as a result), since "ns" will
probably be an ordinary dictionary instead of whatever __prepare__()
would have returned.
Initially I was going to suggest making __build_class__ part of the
language definition rather than a CPython implementation detail, but
then I realised that various CPython specific elements in its
signature made that a bad idea.
Instead, I'm thinking along the lines of an
"operator.prepare(metaclass, bases)" function that does the metaclass
calculation dance, invoking __prepare__() and returning the result if
it exists, otherwise returning an ordinary dict. Under the hood we
would refactor this so that operator.prepare and __build_class__ were
using a shared implementation of the functionality at the C level - it
may even be advisable to expose that implementation via the C API as
The correct idiom for dynamic type creation in a PEP 3115 world would then be:
from operator import prepare
cls = type(name, bases, prepare(type, bases))
Nick Coghlan | ncoghlan(a)gmail.com | Brisbane, Australia
while there is at least some interest in incorporating my
optimizations, response has still been low. I figure that the changes
are probably too much for a single big incorporation step. On a recent
flight, I thought about cutting it down to make it more easily
digestible. The basic idea is to remove the optimized interpreter
dispatch loop and advanced instruction format and use the existing
ones. Currently (rev. ca8a0dfb2176), opcode.h uses 109 of potentially
available 255 instructions using the current instruction format.
Hence, up to 149 instruction opcodes could be given to optimized
instruction derivatives. Consequently, a possible change would require
a) opcode.h to add new instruction opcodes,
b) ceval.c to include the new instruction opcodes in PyEval_EvalFrameEx,
c) abstract.c, object.c (possible other files) to add the
quickening/rewriting function calls.
If this is more interesting, I could start evaluating which
instruction opcodes should be allocated to which derivatives to get
the biggest benefit. This is a lot easier to implement (because I can
re-use the existing instruction implementations) and can easily be
made to be conditionally compile-able, similar to the computed-gotos
option. Since the changes are minimal it is also simpler to understand
and deal with for everybody else, too. On the "downside", however, not
all optimizations are possible and/or make sense in the given limit of
instructions (no data-object inlining and no reference-count
How does that sound?
Have a nice day,
Currently if you work in console and define a function and then
immediately call it - it will fail with SyntaxError.
For example, copy paste this completely valid Python script into console:
There is an issue for that that was just closed by Eric. However, I'd
like to know if there are people here that agree that if you paste a
valid Python script into console - it should work without changes.
> branch: 2.7
> user: Jason R. Coombs <jaraco(a)jaraco.com>
> date: Thu Nov 17 18:03:24 2011 -0500
> PDB now will properly escape backslashes in the names of modules it executes. Fixes #7750
> diff --git a/Lib/test/test_pdb.py b/Lib/test/test_pdb.py
> +class Tester7750(unittest.TestCase):
I think we have an unwritten rule that test class and method names
should tell something about what they test. (We do have things like
TestWeirdBugs and test_12345, but I don’t think it’s a useful pattern to
follow :) Not a big deal anyway.
> + # if the filename has something that resolves to a python
> + # escape character (such as \t), it will fail
> + test_fn = '.\\test7750.py'
> + msg = "issue7750 only applies when os.sep is a backslash"
> + @unittest.skipUnless(os.path.sep == '\\', msg)
> + def test_issue7750(self):
> + with open(self.test_fn, 'w') as f:
> + f.write('print("hello world")')
> + cmd = [sys.executable, '-m', 'pdb', self.test_fn,]
> + proc = subprocess.Popen(cmd,
> + stdout=subprocess.PIPE,
> + stdin=subprocess.PIPE,
> + stderr=subprocess.STDOUT,
> + )
> + stdout, stderr = proc.communicate('quit\n')
> + self.assertNotIn('IOError', stdout, "pdb munged the filename")
Why not check for assertIn(filename, stdout)? (In other words, check
for intended behavior rather than implementation of the erstwhile bug.)
BTW, I’ve just tested that giving a message argument to assertNotIn (the
third argument), unittest still displays the other arguments to allow
for easier debugging. I didn’t know that, it’s cool!
> + def tearDown(self):
> + if os.path.isfile(self.test_fn):
> + os.remove(self.test_fn)
In my own tests, I’ve become fond of using “self.addCleanup(os.remove,
filename)”: It’s shorter that a tearDown and is right there on the line
that follows or precedes the file creation.
> if __name__ == '__main__':
> + unittest.main()
This looks strange.
Given GCC's announcement that Intel's STM will be an extension for C
and C++ in GCC 4.7, what does this mean for Python, and the GIL?
I've seen efforts made to make STM available as a context, and for use
in user code. I've also read about the "old attempts way back" that
attempted to use finer grain locking. The understandably failed due to
the heavy costs involved in both the locking mechanisms used, and the
overhead of a reference counting garbage collection system.
However given advances in locking and garbage collection in the last
decade, what attempts have been made recently to try these new ideas
out? In particular, how unlikely is it that all the thread safe
primitives, global contexts, and reference counting functions be made
__transaction_atomic, and magical parallelism performance boosts
I'm aware that C89, platforms without STM/GCC, and single threaded
performance are concerns. Please ignore these for the sake of
discussion about possibilities.
I apologize in advance for the length of this mail.
When a script or a module is executed by invoking python with proper
arguments, sys.path is extended. When a path to script is given, the
directory containing the script is prepended. When '-m' or '-c' is used,
$CWD is prepended. This is documented in
http://docs.python.org/dev/using/cmdline.html, so far ok.
sys.path and $PYTHONPATH is like $PATH -- if you can convince someone to
put a directory under your control in any of them, you can execute code
as this someone. Therefore, sys.path is dangerous and important.
Unfortunately, sys.path manipulations are only described very briefly,
and without any commentary, in the on-line documentation. python(1)
manpage doesn't even mention them.
The problem: each of the commands below is insecure:
python /tmp/script.py (when script.py is safe by itself)
('/tmp' is added to sys.path, so an attacker can override any
module imported in /tmp/script.py by writing to /tmp/module.py)
cd /tmp && python -mtimeit -s 'import numpy' 'numpy.test()'
(UNIX users are accustomed to being able to safely execute
programs in any directory, e.g. ls, or gcc, or something.
Here '' is added to sys.path, so it is not secure to run
python is other-user-writable directories.)
cd /tmp/ && python -c 'import numpy; print(numpy.version.version)'
(The same as above, '' is added to sys.path.)
cd /tmp && python
(The same as above).
IMHO, if this (long-lived) behaviour is necessary, it should at least be
prominently documented. Also in the manpage.
Before adding a directory to sys.path as described above, Python
actually runs os.path.realpath over it. This means that if the path to a
script given on the commandline is actually a symlink, the directory
containing the real file will be executed. This behaviour is not really
documented (the documentation only says "the directory containing that
file is added to the start of sys.path"), but since the integrity of
sys.path is so important, it should be, IMHO.
Using realpath instead of the (expected) path specified by the user
breaks imports of non-pure-python (mixed .py and .so) modules from
modules executed as scripts on Debian. This is because Debian installs
architecture-independent python files in /usr/share/pyshared, and
symlinks those files into /usr/lib/pymodules/pythonX.Y/. The
architecture-dependent .so and python-version-dependent .pyc files are
installed in /usr/lib/pymodules/pythonX.Y/. When a script, e.g.
/usr/lib/pymodules/pythonX.Y/script.py, is executed, the directory
/usr/share/pyshared is prepended to sys.path. If the script tries to
import a module which has architecture-dependent parts (e.g. numpy) it
first sees the incomplete module in /usr/share/pyshared and fails.
This happens for example in parallel python
(http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=620551) and recently
when packaging CellProfiler for Debian.
Again, if this is on purpose, it should be documented.
PEP 395 (Qualified Names for Modules)
PEP 395 proposes another sys.path manipulation. When running a script,
the directory tree will be walked upwards as long as there are
__init__.py files, and then the first directory without will be added.
This is of course a fine idea, but it makes a scenario, which was
previously safe, insecure. More precisely, when executing a script in a
directory in a parent directory-writable-by-other-users, the parent
directory will be added to sys.path.
So the (safe) operation of downloading an archive with a package,
unzipping it in /tmp, changing into the created directory, checking that
the script doesn't do anything bad, and running a script is now insecure
if there is __init__.py in the archive root.
I guess that it would be useful to have an option to turn off those
I’ve read PEP 402 and would like to offer comments.
I know a bit about the import system, but not down to the nitty-gritty
details of PEP 302 and __path__ computations and all this fun stuff (by
which I mean, not fun at all). As such, I can’t find nasty issues in
dark corners, but I can offer feedback as a user. I think it’s a very
well-written explanation of a very useful feature: +1 from me. If it is
accepted, the docs will certainly be much more concise, but the PEP as a
thought process is a useful document to read.
> When new users come to Python from other languages, they are often
> confused by Python's packaging semantics.
Minor: I would reserve “packaging” for
packaging/distribution/installation/deployment matters, not Python
modules. I suggest “Python package semantics”.
> On the negative side, however, it is non-intuitive for beginners, and
> requires a more complex step to turn a module into a package. If
> ``Foo`` begins its life as ``Foo.py``, then it must be moved and
> renamed to ``Foo/__init__.py``.
Minor: In the UNIX world, or with version control tools, moving and
renaming are the same one thing (hg mv spam.py spam/__init__.py for
example). Also, if you turn a module into a package, you may want to
move code around, change imports, etc., so I’m not sure the renaming
part is such a big step. Anyway, if the import-sig people say that
users think it’s a complex or costly operation, I can believe it.
> (By the way, both of these additions to the import protocol (i.e. the
> dynamically-added ``__path__``, and dynamically-created modules)
> apply recursively to child packages, using the parent package's
> ``__path__`` in place of ``sys.path`` as a basis for generating a
> child ``__path__``. This means that self-contained and virtual
> packages can contain each other without limitation, with the caveat
> that if you put a virtual package inside a self-contained one, it's
> gonna have a really short ``__path__``!)
I don’t understand the caveat or its implications.
> In other words, we don't allow pure virtual packages to be imported
> directly, only modules and self-contained packages. (This is an
> acceptable limitation, because there is no *functional* value to
> importing such a package by itself. After all, the module object
> will have no *contents* until you import at least one of its
> subpackages or submodules!)
> Once ``zc.buildout`` has been successfully imported, though, there
> *will* be a ``zc`` module in ``sys.modules``, and trying to import it
> will of course succeed. We are only preventing an *initial* import
> from succeeding, in order to prevent false-positive import successes
> when clashing subdirectories are present on ``sys.path``.
I find that limitation acceptable. After all, there is no zc project,
and no zc module, just a zc namespace. I’ll just regret that it’s not
possible to provide a module docstring to inform that this is a
namespace package used for X and Y.
> The resulting list (whether empty or not) is then stored in a
> ``sys.virtual_package_paths`` dictionary, keyed by module name.
This was probably said on import-sig, but here I go: yet another import
artifact in the sys module! I hope we get ImportEngine in 3.3 to clean
up all this.
> * A new ``extend_virtual_paths(path_entry)`` function, to extend
> existing, already-imported virtual packages' ``__path__`` attributes
> to include any portions found in a new ``sys.path`` entry. This
> function should be called by applications extending ``sys.path``
> at runtime, e.g. when adding a plugin directory or an egg to the
Let’s imagine my application Spam has a namespace spam.ext for plugins.
To use a custom directory where plugins are stored, or a zip file with
plugins (I don’t use eggs, so let me talk about zip files here), I’d
have to call sys.path.append *and* pkgutil.extend_virtual_paths?
> * ``ImpImporter.iter_modules()`` should be changed to also detect and
> yield the names of modules found in virtual packages.
Is there any value in providing an argument to get the pre-PEP behavior?
Or to look at it from a different place, how can Python code know that
some module is a virtual or pure virtual package, if that is even a
useful thing to know?
> Last, but not least, the ``imp`` module (or ``importlib``, if
> appropriate) should expose the algorithm described in the `virtual
> paths`_ section above, as a
> ``get_virtual_path(modulename, parent_path=None)`` function, so that
> creators of ``__import__`` replacements can use it.
If I’m not mistaken, the rule of thumb these days is that imp is edited
when it’s absolutely necessary, otherwise code goes into importlib (more
easily written, read and maintained).
I wonder if importlib.import_module could implement the new import
semantics all by itself, so that we can benefit from this PEP in older
Pythons (importlib is on PyPI).
> * If you are changing a currently self-contained package into a
> virtual one, it's important to note that you can no longer use its
> ``__file__`` attribute to locate data files stored in a package
> directory. Instead, you must search ``__path__`` or use the
> ``__file__`` of a submodule adjacent to the desired files, or
> of a self-contained subpackage that contains the desired files.
Wouldn’t pkgutil.get_data help here?
Besides, putting data files in a Python package is held very poorly by
some (mostly people following the File Hierarchy Standard), and in
distutils2/packaging, we (will) have a resources system that’s as
convenient for users and more flexible for OS packagers. Using __file__
for more than information on the module is frowned upon for other
reasons anyway (I talked about a Debian developer about this one day but
forgot), so I think the limitation is okay.
> * XXX what is the __file__ of a "pure virtual" package? ``None``?
> Some arbitrary string? The path of the first directory with a
> trailing separator? No matter what we put, *some* code is
> going to break, but the last choice might allow some code to
> accidentally work. Is that good or bad?
A pure virtual package having no source file, I think it should have no
__file__ at all. I don’t know if that would break more code than using
an empty string for example, but it feels righter.
> For those implementing PEP \302 importer objects:
Minor: Here I think a link would not be a nuisance (IOW remove the