PEP 255 ("Simple Generators") closes with:
> Q. Then why not allow an expression on "return" too?
> A. Perhaps we will someday. In Icon, "return expr" means both "I'm
> done", and "but I have one final useful value to return too, and
> this is it". At the start, and in the absence of compelling uses
> for "return expr", it's simply cleaner to use "yield" exclusively
> for delivering values.
Now that Python 2.5 gained enhanced generators (multitudes rejoice!), i think
there is a compelling use for valued return statements in cooperative
multitasking code, of the kind:
Data = yield Client.read()
MoreData = yield Client.read()
Result = yield foo()
For generators written in this style, "yield" means "suspend execution of the
current call until the requested result/resource can be provided", and
"return" regains its full conventional meaning of "terminate the current call
with a given result".
The simplest / most straightforward implementation would be for "return Foo"
to translate to "raise StopIteration, Foo". This is consistent with "return"
translating to "raise StopIteration", and does not break any existing
(Another way to think about this change is that if a plain StopIteration means
"the iterator terminated", then a valued StopIteration, by extension, means
"the iterator terminated with the given value".)
Motivation by real-world example:
One system that could benefit from this change is Christopher Armstrong's
defgen.py for Twisted, which he recently reincarnated (as newdefgen.py) to
use enhanced generators. The resulting code is much cleaner than before, and
closer to the conventional synchronous style of writing.
 the saga of which is summarized here:
However, because enhanced generators have no way to differentiate their
intermediate results from their "real" result, the current solution is a
somewhat confusing compromise: the last value yielded by the generator
implicitly becomes the result returned by the call. Thus, to return
something, in general, requires the idiom "yield Foo; return". If valued
returns are allowed, this would become "return Foo" (and the code implementing
defgen itself would probably end up simpler, as well).
Okay, basic principal first. You start with a sandboxed thread that
has access to nothing. No modules, no builtins, *nothing*. This
means it can run without the GIL but it can't do any work. To make it
do something useful we need to give it two things: first, immutable
types that can be safely accessed without locks, and second a
thread-safe queue to coordinate. With those you can bring modules and
builtins back into the picture, either by making them immutable or
using a proxy that handles all the methods in a single thread.
Unfortunately python has a problem with immutable types. For the most
part it uses an honor system, trusting programmers not to make a class
that claims to be immutable yet changes state anyway. We need more
than that, and "freezing" a dict would work well enough, so it's not
the problem. The problem is the reference counting, and even if we do
it "safely" all the memory writes just kill performance so we need to
avoid it completely.
Turns out it's quite easy and it doesn't harm performance of existing
code or require modification (but a recompile is necessary). The idea
is to only use a cyclic garbage collector for cleaning them up, which
means we need to disable the reference counting. That requires we
modify Py_INCREF and Py_DECREF to be a no-op if ob_refcnt is set to a
magic constant (probably a negative value).
That's all it takes. Modify Py_INCREF and Py_DECREFs to check for a
magic constant. Ahh, but the performance? See for yourself.
rhamph@factor:~/src/Python-2.4.1$ ./python Lib/test/pystone.py 500000
Pystone(1.1) time for 500000 passes = 13.34
This machine benchmarks at 37481.3 pystones/second
Modified Py_INCREF/Py_DECREF with magic constant
rhamph@factor:~/src/Python-2.4.1-sandbox$ ./python Lib/test/pystone.py 500000
Pystone(1.1) time for 500000 passes = 13.38
This machine benchmarks at 37369.2 pystones/second
The numbers aren't significantly different. In fact the second one is
often slightly faster, which shows the difference is smaller than the
So to sum up, by prohibiting mutable objects from being transferred
between sandboxes we can achieve scalability on multiple CPUs, making
threaded programming easier and more reliable, as a bonus get secure
sandboxes, and do that all while maintaining single-threaded
performance and requiring minimal changes to existing C modules
A "proof of concept" patch to Py_INCREF/Py_DECREF (only demonstrates
performance effects, does not create or utilize any new functionality)
can be found here:
 We need to remove any backdoor methods of getting to mutable
objects outside of your sandbox, which gets us most of the way towards
a restricted execution environment.
Adam Olsen, aka Rhamphoryncus
Based on Jason's comments regarding decimal.Context, and to explicitly cover
the terminology agreed on during the documentation discussion back in July,
I'm proposing a number of changes to PEP 343. I'll be updating the checked in
PEP assuming there aren't any objections in the next week or so (and assuming
I get CVS access sorted out ;).
The idea of dropping __enter__/__exit__ and defining the with statement solely
in terms of coroutines is *not* included in the suggested changes, but I added
a new item under "Resolved Open Issues" to cover some of the reasons why.
1. Amend the statement specification such that:
with EXPR as VAR:
is translated as:
abc = (EXPR).__with__()
exc = (None, None, None)
VAR = abc.__enter__()
exc = sys.exc_info()
2. Add the following to the subsequent explanation:
The call to the __with__ method serves a similar purpose to the __iter__
method for iterables and iterators. An object such as threading.Lock may
provide its own __enter__ and __exit__ methods, and simply return 'self'
from its __with__ method. A more complex object such as decimal.Context may
return a distinct context manager which takes care of setting and restoring
the appropriate decimal context in the thread.
3. Update ContextWrapper in the "Generator Decorator" section to include:
4. Add a paragraph to the end of the "Generator Decorator" section:
By applying the @contextmanager decorator to a context's __with__ method,
it is as easy to write a generator-based context manager for the context as
it is to write a generator-based iterator for an iterable (see the
decimal.Context example below).
5. Add three items under "Resolved Open Issues":
2. After this PEP was originally approved, a subsequent discussion on
python-dev  settled on the term "context manager" for objects which
provide __enter__ and __exit__ methods, and "context management
protocol" for the protocol itself. With the addition of the __with__
method to the protocol, a natural extension is to call objects which
provide only a __with__ method "contexts" (or "manageable contexts" in
situations where the general term "context" would be ambiguous).
The distinction between a context and a context manager is very
similar to the distinction between an iterable and an iterator.
3. The originally approved version of this PEP did not include a __with__
method - the method was only added to the PEP after Jason Orendorff
pointed out the difficulty of writing appropriate __enter__ and __exit__
methods for decimal.Context .
This approach allows a class to use the @contextmanager decorator
to defines a native context manager using generator syntax. It also
allows a class to use an existing independent context manager as its
native context manager by applying the independent context manager to
'self' in its __with__ method. It even allows a class written in C to
use a coroutine based context manager written in Python.
The __with__ method parallels the __iter__ method which forms part of
the iterator protocol.
4. The suggestion was made by Jason Orendorff that the __enter__ and
__exit__ methods could be removed from the context management protocol,
and the protocol instead defined directly in terms of the coroutine
interface described in PEP 342 (or a cleaner version of that interface
with start() and finish() convenience methods) .
Guido rejected this idea . The following are some of benefits of
keeping the __enter__ and __exit__ methods:
- it makes it easy to implement a simple context manager in C
without having to rely on a separate coroutine builder
- it makes it easy to provide a low-overhead implementation for
context managers which don't need to maintain any special state
between the __enter__ and __exit__ methods (having to use a
coroutine for these would impose unnecessary overhead without any
- it makes it possible to understand how the with statement works
without having to first understand the concept of a coroutine
6. Add new references:
7. Update Example 4 to include a __with__ method:
8. Replace Example 9 with the following example:
9. Here's a proposed native context manager for decimal.Context:
# This would be a new decimal.Context method
# We set the thread context to a copy of this context
# to ensure that changes within the block are kept
# local to the block. This also gives us thread safety
# and supports nested usage of a given context.
newctx = self.copy()
oldctx = decimal.getcontext()
with decimal.getcontext() as ctx:
ctx.prec += 2
# Rest of sin calculation algorithm
# uses a precision 2 greater than normal
return +s # Convert result to normal precision
# Rest of sin calculation algorithm
# uses the Extended Context from the
# General Decimal Arithmetic Specification
return +s # Convert result to normal context
Nick Coghlan | ncoghlan(a)gmail.com | Brisbane, Australia
My name is Martin Maly and I am a developer at Microsoft, working on the
IronPython project with Jim Hugunin. I am spending lot of time making
IronPython compatible with Python to the extent possible.
I came across a case which I am not sure if by design or a bug in Python
(Python 2.4.1 (#65, Mar 30 2005, 09:13:57)). Consider following Python
# module begin
__doc__ = "class doc" (1)
# module end
When ran, it prints:
Based on the binding rules described in the Python documentation, I
would expect the code to throw because binding created on the line (1)
is local to the class block and all the other __doc__ uses should
reference that binding. Apparently, it is not the case.
Is this bug in Python or are __doc__ strings in classes subject to some
At 12:15 PM 10/7/2005 -0700, Martin Maly wrote:
>Based on the binding rules described in the Python documentation, I
>would expect the code to throw because binding created on the line (1)
>is local to the class block and all the other __doc__ uses should
>reference that binding. Apparently, it is not the case.
Correct - the scoping rules about local bindings causing a symbol to be
local only apply to *function* scopes. Class scopes are able to refer to
module-level names until the name is shadowed in the class scope.
>Is this bug in Python or are __doc__ strings in classes subject to some
Neither; the behavior you're seeing doesn't have anything to do with
docstrings per se, it's just normal Python binding behavior, coupled with
the fact that the class' docstring isn't set until the class suite is
It's currently acceptable (if questionable style) to do things like this in
X = 1
X = X + 1
print X.X # this will print "2"
More commonly, and less questionably, this would manifest as something like:
def function_taking_foo(foo, bar):
function_taking_foo = function_taking_foo
This makes it possible to call 'function_taking_foo(aFooInstance, someBar)'
or 'aFooInstance.function_taking_foo(someBar)'. I've used this pattern a
couple times myself, and I believe there may actually be cases in the
standard library that do something like this, although maybe not binding
the method under the same name as the function.
At 07:34 PM 10/6/2005 -0700, Guido van Rossum wrote:
>How does this sound to the non-AST-branch developers who have to
>suffer the inevitable post-merge instability? I think it's now or
>never -- waiting longer isn't going to make this thing easier (not
>with several more language changes approved: with-statement, extended
>import, what else...)
Do the AST branch changes affect the interface of the "parser" module? Or
do they just add new functionality?
If type indicates that the object participates in the cyclic garbage
detector, it is added to the detector's set of observed objects.
Is this really correct? I thought you need to invoke PyObject_GC_TRACK
> Date: Wed, 05 Oct 2005 00:21:20 +0200
> From: "Martin v. L?wis" <martin(a)v.loewis.de>
> Subject: Re: [Python-Dev] Static builds on Windows (continued)
> Cc: python-dev(a)python.org
> Message-ID: <43430060.6070909(a)v.loewis.de>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> Marvin wrote:
>>I built pythoncore and python. The resulting python.exe worked fine, but did
>>indeed fail when I tried to dynamically load anything (Dialog said: the
>>application terminated abnormally)
> Not sure what you are trying to do here. In your case, dynamic loading
> simply cannot work. The extension modules all link with python24.dll,
> which you don't have. It may find some python24.dll, which then gives
> conflicts with the Python interpreter that is already running.
> So what you really should do is disable dynamic loading entirely. To do
> so, remove dynload_win from your project, and #undef
> HAVE_DYNAMIC_LOADING in PC/pyconfig.h.
> Not sure if anybody has recently tested whether this configuration
> actually works - if you find that it doesn't, please post your patches
> to sf.net/projects/python.
> If you really want to provide dynamic loading of some kind, you should
> arrange the extension modules to import the symbols from your .exe.
> Linking the exe should generate an import library, and you should link
> the extensions against that.
I'll try that when I get back to this and feed back my results. I figured out
that I can avoid the need for dynamic loading. I wanted to use some existing
extension modules, but the whole point was to use the existing ones which as you
point out are linked against a dll. So even if I created an .EXE that exported
the symbols, I'd still have to rebuild the extensions.
I posted this question to python-help, but I think I have a better chance
of getting the answer here.
I'm looking for clarification on when NEWLINE tokens are generated during
lexical analysis of Python source code. In particular, I'm confused about
some of the top-level components in Python's grammar (file_input,
interactive_input, and eval_input).
Section 2.1.7 of the reference manual states that blank lines (lines
consisting only of whitespace and possibly a comment) do not generate
NEWLINE tokens. This is supported by the definition of a suite, which
does not allow for standalone or consecutive NEWLINE tokens.
suite ::= stmt_list NEWLINE | NEWLINE INDENT statement+ DEDENT
Yet the grammar for top-level components seems to suggest that a parsable
input may consist entirely of a single NEWLINE token, or include
consecutive NEWLINE tokens.
file_input ::= (NEWLINE | statement)*
interactive_input ::= [stmt_list] NEWLINE | compound_stmt NEWLINE
eval_input ::= expression_list NEWLINE*
To me this seems to contradict section 2.1.7 in so far as I don't see how
it's possible to generate such a sequence of tokens.
What kind of input would generate NEWLINE tokens in the top-level
components of the grammar?
Is there a faster way to transcode from 8-bit chars (charmaps) to utf-8
than going through unicode()?
I'm writing a small card-file program. As a test, I use a 53 MB MBox file,
in mac-roman encoding. My program reads and parses the file into messages
in about 3 to 5 seconds (Wow! Go Python!), but takes about 14 seconds to
iterate over the cards and convert them to utf-8:
for i in xrange(len(cards)):
u = unicode(cards[i], encoding)
cards[i] = u.encode('utf-8')
The time is nearly all in the unicode() call. It's not so much how much
time it takes, but that it takes 4 times as long as the real work, just to
do table lookups.
Looking at the source (which, if I have it right, is
PyUnicode_DecodeCharmap() in unicodeobject.c), I think it is doing a
dictionary lookup for each character. I would have thought that it would
make and cache a LUT the size of the charmap (and hook the relevent
dictionary stuff to delete the cached LUT if the dictionary is changed).
(You may consider this a request for enhancement. ;)
I thought of using U"".translate(), but the unicode version is defined to
be slow, and anyway I can't find any way to just shove my 8-bit data into a
unicode string without translation. Is there some similar approach? I'm
almost (but not quite) ready to try it in Pyrex.
I'm new to Python. I didn't google anything relevent on python.org or in
groups. I posted this in comp.lang.python yesterday, got a couple of
responses, but I think this may be too sophisticated a question for that
I'm not a member of this list, so please copy me on replies so I don't have
to hunt them down in the archive.