I've received some enthusiastic emails from someone who wants to
revive restricted mode. He started out with a bunch of patches to the
CPython runtime using ctypes, which he attached to an App Engine bug:
Based on his code (the file secure.py is all you need, included in
secure.tar.gz) it seems he believes the only security leaks are
__subclasses__, gi_frame and gi_code. (I have since convinced him that
if we add "restricted" guards to these attributes, he doesn't need the
functions added to sys.)
I don't recall the exploits that Samuele once posted that caused the
death of rexec.py -- does anyone recall, or have a pointer to the
--Guido van Rossum (home page: http://www.python.org/~guido/)
Alright, I will re-submit with the contents pasted. I never use double
backquotes as I think them rather ugly; that is the work of an editor
or some automated program in the chain. Plus, it also messed up my
line formatting and now I have lines with one word on them... Anyway,
the contents of PEP 3145:
Title: Asynchronous I/O For subprocess.Popen
Author: (James) Eric Pruitt, Charles R. McCreary, Josiah Carlson
Type: Standards Track
In its present form, the subprocess.Popen implementation is prone to
dead-locking and blocking of the parent Python script while waiting on data
from the child process.
A search for "python asynchronous subprocess" will turn up numerous
accounts of people wanting to execute a child process and communicate with
it from time to time reading only the data that is available instead of
blocking to wait for the program to produce data   . The current
behavior of the subprocess module is that when a user sends or receives
data via the stdin, stderr and stdout file objects, dead locks are common
and documented  . While communicate can be used to alleviate some of
the buffering issues, it will still cause the parent process to block while
attempting to read data when none is available to be read from the child
There is a documented need for asynchronous, non-blocking functionality in
subprocess.Popen    . Inclusion of the code would improve the
utility of the Python standard library that can be used on Unix based and
Windows builds of Python. Practically every I/O object in Python has a
file-like wrapper of some sort. Sockets already act as such and for
strings there is StringIO. Popen can be made to act like a file by simply
using the methods attached the the subprocess.Popen.stderr, stdout and
stdin file-like objects. But when using the read and write methods of
those options, you do not have the benefit of asynchronous I/O. In the
proposed solution the wrapper wraps the asynchronous methods to mimic a
I have been maintaining a Google Code repository that contains all of my
changes including tests and documentation  as well as blog detailing
the problems I have come across in the development process .
I have been working on implementing non-blocking asynchronous I/O in the
subprocess.Popen module as well as a wrapper class for subprocess.Popen
that makes it so that an executed process can take the place of a file by
duplicating all of the methods and attributes that file objects have.
There are two base functions that have been added to the subprocess.Popen
class: Popen.send and Popen._recv, each with two separate implementations,
one for Windows and one for Unix based systems. The Windows
implementation uses ctypes to access the functions needed to control pipes
in the kernel 32 DLL in an asynchronous manner. On Unix based systems,
the Python interface for file control serves the same purpose. The
different implementations of Popen.send and Popen._recv have identical
arguments to make code that uses these functions work across multiple
When calling the Popen._recv function, it requires the pipe name be
passed as an argument so there exists the Popen.recv function that passes
selects stdout as the pipe for Popen._recv by default. Popen.recv_err
selects stderr as the pipe by default. "Popen.recv" and "Popen.recv_err"
are much easier to read and understand than "Popen._recv('stdout' ..." and
"Popen._recv('stderr' ..." respectively.
Since the Popen._recv function does not wait on data to be produced
before returning a value, it may return empty bytes. Popen.asyncread
handles this issue by returning all data read over a given time
The ProcessIOWrapper class uses the asyncread and asyncwrite functions to
allow a process to act like a file so that there are no blocking issues
that can arise from using the stdout and stdin file objects produced from
a subprocess.Popen call.
 [ python-Feature Requests-1191964 ] asynchronous Subprocess
 Daily Life in an Ivory Basement : /feb-07/problems-with-subprocess
 How can I run an external command asynchronously from Python? - Stack
 18.1. subprocess - Subprocess management - Python v2.6.2 documentation
 18.1. subprocess - Subprocess management - Python v2.6.2 documentation
 Issue 1191964: asynchronous Subprocess - Python tracker
 Module to allow Asynchronous subprocess use on Windows and Posix
platforms - ActiveState Code
 subprocess.rst - subprocdev - Project Hosting on Google Code
 subprocdev - Project Hosting on Google Code
 Python Subprocess Dev
This P.E.P. is licensed under the Open Publication License;
On Tue, Sep 8, 2009 at 22:56, Benjamin Peterson <benjamin(a)python.org> wrote:
> 2009/9/7 Eric Pruitt <eric.pruitt(a)gmail.com>:
>> Hello all,
>> I have been working on adding asynchronous I/O to the Python
>> subprocess module as part of my Google Summer of Code project. Now
>> that I have finished documenting and pruning the code, I present PEP
>> 3145 for its inclusion into the Python core code. Any and all feedback
>> on the PEP (http://www.python.org/dev/peps/pep-3145/) is appreciated.
> Hi Eric,
> One of the reasons you're not getting many response is that you've not
> pasted the contents of the PEP in this message. That makes it really
> easy for people to comment on various sections.
> BTW, it seems like you were trying to use reST formatting with the
> text PEP layout. Double backquotes only mean something in reST.
Which I noticed since it's cited in the BeOpen license we still refer
to in LICENSE. Since pythonlabs.com itself is still up, it probably
isn't much work to make the logos.html URI work again, but I don't know
who maintains that page.
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.
[I've got no response from python-ideas, so I am forwarding to python-dev.]
With addition of fixed offset timezone class and the timezone.utc
instance , it is easy to get UTC time as an aware datetime
datetime.datetime(2010, 8, 3, 14, 16, 10, 670308, tzinfo=datetime.timezone.utc)
However, if you want to keep time in your local timezone, getting an
aware datetime is almost a catch 22. If you know your timezone UTC
offset, you can do
>>> EDT = timezone(timedelta(hours=-4))
datetime.datetime(2010, 8, 3, 10, 20, 23, 769537,
but the problem is that there is no obvious or even correct way to
find local timezone UTC offset. 
In a comment on issue #5094 ("datetime lacks concrete tzinfo
implementation for UTC"), I proposed to address this problem in a
localtime([t]) function that would return current time (or time
corresponding to the optional datetime argument) as an aware datetime
object carrying local timezone information in a tzinfo set to an
appropriate timezone instance. This solution is attractive by its
simplicity, but there are several problems:
1. An aware datetime cannot carry all information that system
localtime() supplies in a time tuple. Specifically, the is_dst flag
is lost. This is not a problem for most applications as long as
timezone UTC offset and timezone name are available, but may be an
issue when interoperability with the time module is required.
2. Datetime's tzinfo interface was designed with the idea that
<2010-11-06 12:00 EDT> + <1 day> = <2010-11-07 12:00 EST>, not
<2010-11-07 12:00 EDT>. It other words, if I have lunch with someone
at noon (12:00 EDT) on Saturday the day before first Sunday in
November, and want to meet again "at the same time tomorrow", I mean
12:00 EST, not 24 hours later. With localtime() returning datetime
with tzinfo set to fixed offset timezone, however, localtime() +
timedelta(1) will mean exactly 24 hours later and the result will be
expressed in an unusual for the given location timezone.
An alternative approach is the one recommended in the python manual.
 One could implement a LocalTimezone class with utcoffset(),
tzname() and dst() extracting information from system mktime and
localtime calls. This approach has its own shortcomings:
1. While adding integral number of days to datetimes in business
setting, it is natural to expect automatic timezone adjustments, it is
not as clearcut when adding hours or minutes.
2. The tzinfo.utcoffset() interface that expects *standard* local time
as an argument is confusing to many users. Even the "official"
example in the python manual gets it wrong. 
3. datetime(..., tzinfo=LocalTimezone()) is ambiguous during the
"repeated hour" when local clock is set back in DST to standard time
As far as I can tell, the only way to resolve the last problem is to
add is_dst flag to the datetime object, which would also be the
only way to achieve full interoperability between datetime objects and
time tuples. 
The traditional answer to call for improvement of timezone support in
datetime module has been: "this is upto 3rd parties to implement."
Unfortunately, stdlib is asking 3rd parties to implement an impossible
interface without giving access to the necessary data. The
impossibility comes from the requirement that dst() method should find
out whether local time represents DST or standard time while there is
an hour each year when the same local time can be either. The missing
data is the system UTC offset when it changes historically. The time
module only gives access to the current UTC offset.
My preference is to implement the first alternative - localtime([t])
returning aware datetime with fixed offset timezone. This will solve
the problem of python's lack of access to the universally available
system facilities that are necessary to implement any kind of aware
local time support.
I see several problems with the two hex-conversion function pairs that
1. binascii.hexlify and binascii.unhexlify
2. bytes.fromhex and bytes.hex
bytes.hex is not implemented, although it was specified in PEP 358.
This means there is no symmetrical function to accompany bytes.fromhex.
Both pairs perform the same function, although The Zen Of Python suggests
"There should be one-- and preferably only one --obvious way to do it."
I do not understand why PEP 358 specified the bytes function pair although
it mentioned the binascii pair...
bytes.fromhex may receive spaces in the input string, although
binascii.unhexlify may not.
I see no good reason for these two functions to have different features.
binascii.unhexlify may receive both input types: strings or bytes, whereas
bytes.fromhex raises an exception when given a bytes parameter.
Again there is no reason for these functions to be different.
binascii.hexlify returns a bytes type - although ideally, converting to hex
always return string types and converting from hex should always return
IMO there is no meaning of bytes as an output of hexlify, since the output
representation of other bytes.
This is also the suggested behavior of bytes.hex in PEP 358
Problems #4 and #5 call for a decision about the input and output of the
functions being discussed:
Option A : Strict input and output
unhexlify (and bytes.fromhex) may only receives string and may only return
hexlify (and bytes.hex) may only receives bytes and may only return strings
Option B : Robust input and strict output
unhexlify (and bytes.fromhex) may receive bytes and strings and may only
hexlify (and bytes.hex) may receive bytes or strings and may only return
Of course we may also consider a third option, which will allow the return
all functions to be robust (perhaps specified in a keyword argument), but as
I wrote in
the description of problem #5, I see no sense in that.
Note that PEP 3137 describes: "... the more strict definitions of encoding
and decoding in
Python 3000: encoding always takes a Unicode string and returns a bytes
sequence, and decoding
always takes a bytes sequence and returns a Unicode string." - suggesting
To repeat problems #4 and #5, the current behavior does not match any
* The return type of binascii.hexlify should be string, and this is not the
As for the input:
* Option A is not the current behavior because binascii.unhexlify may
receive both input types.
* Option B is not the current behavior because bytes.fromhex does not allow
bytes as input.
To fix these issues, three changes should be applied:
1. Deprecate bytes.fromhex. This fixes the following problems:
#4 (go with option B and remove the function that does not allow bytes
#2 (the binascii functions will be the only way to "do it")
#1 (bytes.hex should not be implemented)
2. In order to keep the functionality that bytes.fromhex has over unhexlify,
the latter function should be able to handle spaces in its input (fix #3)
3. binascii.hexlify should return string as its return type (fix #5)
I have now started an initial patch for PEP 384, in the pep-0384 branch.
This has the following features:
- modules can be compiled under Py_LIMITED_API
- Tools/scripts/abitype.py converts C code containing static
PyTypeObject definitions to use the new API for type definitions.
The following aspects are still missing:
- there is no support for generating python3.dll on Windows yet
- there has been no validation whether the API is actually feasible
to use in extension modules.
I started looking into porting the sqlite extension, and ran into
- certain fields of PyTypeObject are called directly:
- PyObject_Print is used, but can't be supported, as it uses a FILE*
For the first issue, it would be possible to provide a generic
accessor function that fetches fields from a type object. Alternatively,
each case could be considered, suggesting an alternative code for the
I'll be off the net for the next two weeks most of the time, so
I might not be able to respond quickly.
Anybody interested in advancing that patch, feel free to commit
changes into the branch.
This is a follow up to PEP 3147. That PEP, already implemented in Python 3.2,
allows for Python source files from different Python versions to live together
in the same directory. It does this by putting a magic tag in the .pyc file
name and placing the .pyc file in a __pycache__ directory.
Distros such as Debian and Ubuntu will use this to greatly simplifying
deploying Python, and Python applications and libraries. Debian and Ubuntu
usually ship more than one version of Python, and currently have to play
complex games with symlinks to make this work. PEP 3147 will go a long way to
eliminating the need for extra directories and symlinks.
One more thing I've found we need though, is a way to handled shared libraries
for extension modules. Just as we can get name collisions on foo.pyc, we can
get collisions on foo.so. We obviously cannot install foo.so built for Python
3.2 and foo.so built for Python 3.3 in the same location. So symlink
nightmare's mini-me is back.
I have a fairly simple fix for this. I'd actually be surprised if this hasn't
been discussed before, but teh Googles hasn't turned up anything.
The idea is to put the Python version number in the shared library file name,
and extend .so lookup to find these extended file names. So for example, we'd
see foo.3.2.so instead, and Python would know how to dynload both that and the
traditional foo.so file too (for backward compatibility).
(On file naming: the original patch used foo.so.3.2 and that works just as
well, but I thought there might be tools that expect exactly a '.so' suffix,
so I changed it to put the Major.Minor version number to the left of the
extension. The exact naming scheme is of course open to debate.)
This is a much simpler patch than PEP 3147, though I'm not 100% sure it's the
right approach. The way this works is by modifying the configure and
Makefile.pre.in to put the version number in the $SO make variable. Python
parses its (generated) Makefile to find $SO and it uses this deep in the
bowels of distutils to decide what suffix to use when writing shared libraries
built by 'python setup.py build_ext'.
This means the patched Python only writes versioned .so files by default. I
personally don't see that as a problem, and it does not affect the test suite,
with the exception of one easily tweaked test. I don't know if third party
tools will care. The fact that traditional foo.so shared libraries will still
satisfy the import should be enough, I think.
The patch is currently Linux only, since I need this for Debian and Ubuntu and
wanted to keep the change narrow.
Other possible approaches:
* Extend the distutils API so that the .so file extension can be passed in,
instead of being essentially hardcoded to what Python's Makefile contains.
* Keep the dynload_shlib.c change, but modify the Debian/Ubuntu build
environment to pass in $SO to make (though the configure.in warning and
sleep is a little annoying).
* Add a ./configure option to enable this, which Debuntu's build would use.
The patch is available here:
and my working branch is here:
Please let me know what you think. I'm happy to just commit this to the py3k
branch if there are no objections <wink>. I don't think a new PEP is in
order, but an update to PEP 3147 might make sense.
While the EuroPython sprints are still going on, I am back home, and
after a somewhat restful night of sleep, I have some thoughts I'd like
to share before I get distracted. Note, I am jumping wildly between
- Commit privileges: Maybe we've been too careful with only giving
commit privileges to to experienced and trusted new developers. I
spoke to Ezio Melotti and from his experience with getting commit
privileges, it seems to be a case of "the lion is much more afraid of
you than you are afraid of the lion". I.e. having got privileges he
was very concerned about doing something wrong, worried about the
complexity of SVN, and so on. Since we've got lots of people watching
the commit stream, I think that there really shouldn't need to be a
worry at all about a new committer doing something malicious, and
there shouldn't be much worry about honest beginners' mistakes either
-- the main worry remains that new committers don't use their
privileges enough. So, my recommendation (which surely is a
turn-around of my *own* attitude in the past) is to give out more
commit privileges sooner.
- Concurrency and parallelism: Russel Winder and Sarah Mount pushed
the idea of CSP
several talks at the conference. They (at least Russell) emphasized
the difference between concurrency (interleaved event streams) and
parallelism (using many processors to speed things up). Their
prediction is that as machines with many processing cores become more
prevalent, the relevant architecture will change from cores sharing a
single coherent memory (the model on which threads are based) to one
where each core has a limited amount of private memory, and
communication is done via message passing between the cores. This
gives them (and me :-) hope that the GIL won't be a problem as long as
we adopt a parallel processing model. Two competing models are the
Actor model, which is based on asynchronous communication, and CSP,
which is synchronous (when a writer writes to a channel, it blocks
until a reader reads that value -- a rendezvous). At least Sarah
suggested that both models are important. She also mentioned that a
merger is under consideration between the two major CSP-for-Python
packages, Py-CSP and Python-CSP. I also believe that the merger will
be based on the stdlib multiprocessing package, but I'm not sure. I do
expect that we may get some suggestions from that corner to make some
minor changes to details of multiprocessing (and perhaps threading),
and I think we should be open to those (I expect these will be good
suggestions for small tweaks, not major overhauls).
- After seeing Raymond's talk about monocle (search for it on PyPI) I
am getting excited again about PEP 380 (yield from, return values from
generators). Having read the PEP on the plane back home I didn't see
anything wrong with it, so it could just be accepted in its current
form. Implementation will still have to wait for Python 3.3 because of
the moratorium. (Although I wouldn't mind making an exception to get
it into 3.2.)
- This made me think of how the PEP process should evolve so as to not
require my personal approval for every PEP. I think the model for
future PEPs should be the one we used for PEP 3148 (futures, which was
just approved by Jesse): the discussion is led and moderated by one
designated "PEP handler" (a different one for each PEP) and the PEP
handler, after reviewing the discussion, decides when the PEP is
approved. A PEP handler should be selected for each PEP as soon as
possible; without a PEP handler, discussing a PEP is not all that
useful. The PEP handler should be someone respected by the community
with an interest in the subject of the PEP but at an arms' length (at
least) from the PEP author. The PEP handler will have to moderate
feedback, separating useful comments from (too much) bikeshedding,
repetitious lines of questioning, and other forms of obstruction. The
PEP handler should also set and try to maintain a schedule for the
discussion. Note that a schedule should not be used to break a tie --
it should be used to stop bikeshedding and repeat discussions, while
giving all interested parties a chance to comment. (I should say that
this is probably similar to the role of an IETF working group director
with respect to RFCs.)
- Specifically, if Raymond is interested, I wouldn't mind seeing him
as the PEP handler for PEP 380. For some of Martin von Löwis's PEPs
(382, 384) I think a PEP handler is sorely lacking -- from the
language summit it appeared as if nobody besides Martin understands
- A lot of things seem to be happening to make PyPI better. Is this
being summarized somewhere? Based on some questions I received during
my keynote Q&A (http://bit.ly/bdflqa) I think not enough people are
aware of what we are already doing in this area. Frankly, I'm not sure
I do, either: I think I've heard of a GSOC student and of plans to
take over pypi.appspot.com (with the original developer's permission)
to become a full and up-to-date mirror. Mirroring apparently also
requires some client changes. Oh, and there's a proposed solution for
the "register user" problem where apparently the clients had been
broken by a unilateral change to the server to require a certain "yes
I agree" checkbox.
For a hopefully eventually exhaustive overview of what was
accomplished at EuroPython, go to http://wiki.europython.eu/After --
and if you know some blog about EuroPython not yet listed, please add
--Guido van Rossum (python.org/~guido)
I have two somewhat unrelated thoughts about PEPs.
* Accepted: header
When PEP 3147 was accepted, I had a few folks ask that this be recorded in the
PEP by including a link to the BDFL pronouncement email. I realized that
there's no formal way to express this in a PEP, and many PEPs in fact don't
record more than the fact that it was accepted. I'd like to propose
officially adding an Accepted: header which should include a URL to the email
or other web resource where the PEP is accepted. I've come as close as
possible to this (without modifying the supporting scripts or PEP 1) in PEP
I'd be willing to update things if there are no objections.
* EOL schedule for older releases.
We have semi-formal policies for the lifetimes of Python releases, though I'm
not sure this policy is written down in any of the existing informational
PEPs. However, we have release schedule PEPs going back to Python 1.6. It
seems reasonable to me that we include end-of-life information in those PEPs.
For example, we could state that Python 2.4 is no longer even being maintained
for security, and we could state the projected date that Python 2.6 will go
into security-only maintenance mode.
I would not mandate that we go back and update all previous PEPs for either of
these ideas. We'd adopt them moving forward and allow anyone who's motivated
to backfill information opportunistically.
Is there any kind of internal file descriptor counter that can be
queried to debug issues with leaking resources?
It can be used in tests to check that all tests are finish with 0
It will be very useful while porting Python applications from Unix to
Windows. Unix is more tolerant to open files and can overwrite them
and do other nasty things. See the thread from comment #17 -
https://bugs.edge.launchpad.net/dulwich/+bug/557585/ - there is an
example of mmap that starts holding file descriptor somewhere long
before an error occurs. How could one debug this?
Right now I have to use FileMon. It includes information about
operated filenames, but no info about source code where this happens.
It will be nice to have some kind of counter with filename information
inside Python, so that it can be possible to get the full log of
events without manually messing with external system-specific tools