I think it would be a good idea if Python tracebacks could be translated
into languages other than English - and it would set a good example.
For example, using French as my default local language, instead of
>>> 1/0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: integer division or modulo by zero
I might get something like
>>> 1/0
Suivi d'erreur (appel le plus récent en dernier) :
Fichier "<stdin>", à la ligne 1, dans <module>
ZeroDivisionError: division entière ou modulo par zéro
André
Here's an updated version of the PEP reflecting my
recent suggestions on how to eliminate 'codef'.
PEP: XXX
Title: Cofunctions
Version: $Revision$
Last-Modified: $Date$
Author: Gregory Ewing <greg.ewing(a)canterbury.ac.nz>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 13-Feb-2009
Python-Version: 3.x
Post-History:
Abstract
========
A syntax is proposed for defining and calling a special type of generator
called a 'cofunction'. It is designed to provide a streamlined way of
writing generator-based coroutines, and allow the early detection of
certain kinds of error that are easily made when writing such code, which
otherwise tend to cause hard-to-diagnose symptoms.
This proposal builds on the 'yield from' mechanism described in PEP 380,
and describes some of the semantics of cofunctions in terms of it. However,
it would be possible to define and implement cofunctions independently of
PEP 380 if so desired.
Specification
=============
Cofunction definitions
----------------------
A cofunction is a special kind of generator, distinguished by the presence
of the keyword ``cocall`` (defined below) at least once in its body. It may
also contain ``yield`` and/or ``yield from`` expressions, which behave as
they do in other generators.
From the outside, the distinguishing feature of a cofunction is that it cannot
be called the same way as an ordinary function. An exception is raised if an
ordinary call to a cofunction is attempted.
Cocalls
-------
Calls from one cofunction to another are made by marking the call with
a new keyword ``cocall``. The expression
::
cocall f(*args, **kwds)
is evaluated by first checking whether the object ``f`` implements
a ``__cocall__`` method. If it does, the cocall expression is
equivalent to
::
yield from f.__cocall__(*args, **kwds)
except that the object returned by __cocall__ is expected to be an
iterator, so the step of calling iter() on it is skipped.
If ``f`` does not have a ``__cocall__`` method, or the ``__cocall__``
method returns ``NotImplemented``, then the cocall expression is
treated as an ordinary call, and the ``__call__`` method of ``f``
is invoked.
Objects which implement __cocall__ are expected to return an object
obeying the iterator protocol. Cofunctions respond to __cocall__ the
same way as ordinary generator functions respond to __call__, i.e. by
returning a generator-iterator.
Certain objects that wrap other callable objects, notably bound methods,
will be given __cocall__ implementations that delegate to the underlying
object.
Grammar
-------
The full syntax of a cocall expression is described by the following
grammar lines:
::
atom: cocall | <existing alternatives for atom>
cocall: 'cocall' atom cotrailer* '(' [arglist] ')'
cotrailer: '[' subscriptlist ']' | '.' NAME
Note that this syntax allows cocalls to methods and elements of sequences
or mappings to be expressed naturally. For example, the following are valid:
::
y = cocall self.foo(x)
y = cocall funcdict[key](x)
y = cocall a.b.c[i].d(x)
Also note that the final calling parentheses are mandatory, so that for example
the following is invalid syntax:
::
y = cocall f # INVALID
New builtins, attributes and C API functions
--------------------------------------------
To facilitate interfacing cofunctions with non-coroutine code, there will
be a built-in function ``costart`` whose definition is equivalent to
::
def costart(obj, *args, **kwds):
try:
m = obj.__cocall__
except AttributeError:
result = NotImplemented
else:
result = m(*args, **kwds)
if result is NotImplemented:
raise TypeError("Object does not support cocall")
return result
There will also be a corresponding C API function
::
PyObject *PyObject_CoCall(PyObject *obj, PyObject *args, PyObject *kwds)
It is left unspecified for now whether a cofunction is a distinct type
of object or, like a generator function, is simply a specially-marked
function instance. If the latter, a read-only boolean attribute
``__iscofunction__`` should be provided to allow testing whether a given
function object is a cofunction.
Motivation and Rationale
========================
The ``yield from`` syntax is reasonably self-explanatory when used for the
purpose of delegating part of the work of a generator to another function. It
can also be used to good effect in the implementation of generator-based
coroutines, but it reads somewhat awkwardly when used for that purpose, and
tends to obscure the true intent of the code.
Furthermore, using generators as coroutines is somewhat error-prone. If one
forgets to use ``yield from`` when it should have been used, or uses it when it
shouldn't have, the symptoms that result can be extremely obscure and confusing.
Finally, sometimes there is a need for a function to be a coroutine even though
it does not yield anything, and in these cases it is necessary to resort to
kludges such as ``if 0: yield`` to force it to be a generator.
The ``cocall`` construct address the first issue by making the syntax directly
reflect the intent, that is, that the function being called forms part of a
coroutine.
The second issue is addressed by making it impossible to mix coroutine and
non-coroutine code in ways that don't make sense. If the rules are violated, an
exception is raised that points out exactly what and where the problem is.
Lastly, the need for dummy yields is eliminated by making it possible for a
cofunction to call both cofunctions and ordinary functions with the same syntax,
so that an ordinary function can be used in place of a cofunction that yields
zero times.
Record of Discussion
====================
An earlier version of this proposal required a special keyword ``codef`` to be
used in place of ``def`` when defining a cofunction, and disallowed calling an
ordinary function using ``cocall``. However, it became evident that these
features were not necessary, and the ``codef`` keyword was dropped in the
interests of minimising the number of new keywords required.
The use of a decorator instead of ``codef`` was also suggested, but the current
proposal makes this unnecessary as well.
It has been questioned whether some combination of decorators and functions
could be used instead of a dedicated ``cocall`` syntax. While this might be
possible, to achieve equivalent error-detecting power it would be necessary
to write cofunction calls as something like
::
yield from cocall(f)(args)
making them even more verbose and inelegant than an unadorned ``yield from``.
It is also not clear whether it is possible to achieve all of the benefits of
the cocall syntax using this kind of approach.
Prototype Implementation
========================
An implementation of an earlier version of this proposal in the form of patches
to Python 3.1.2 can be found here:
http://www.cosc.canterbury.ac.nz/greg.ewing/python/generators/cofunctions...
If this version of the proposal is received favourably, the implementation will
be updated to match.
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:
The problem of the 'default argument hack' and it's use for early
binding and shared state in function definitions is one that has been
bugging me for years. You could rightly say that the degree to which
it irritates me is all out of proportion to the significance of the
use case and frequency with which it arises, and I'd agree with you.
That, at least in part, is what has made it such an interesting
problem for me: the status quo is suboptimal and confusing (there's a
reason the term 'default argument hack' gets thrown around, including
by me), but most proposed solutions have involved fairly significant
changes to the semantics and syntax of the language that cannot be
justified by such a niche use case. The proposed cures (including my
own suggestions) have all been worse than the disease.
I finally have a possible answer I actually *like* ("nonlocal VAR from
EXPR"), but it involves a somewhat novel way of thinking about
closures and lexical scopes for it to make sense. This post is an
attempt to explain that thought process. The history outlined here
will be familiar to many folks on the list, but I hope to make the
case that this approach actually simplifies and unifies a few aspects
of the language rather than adding anything fundamentally new. The
novel aspect lies in recognising and exposing to developers as a
coherent feature something that is already implicit in the operation
of the language as a whole.
== Default Arguments ==
Default arguments have been a part of Python function definitions for
a very long time (since the beginning, even?), so it makes sense to
start with those. At function call time, if the relevant parameters
are not supplied as arguments, they're populated on the frame object
based on the values stored on the function object. Their behaviour is
actually quite like a closure: they define shared state that is common
to all invocations of the function.
== Lexical Scoping ==
The second step in this journey is the original introduction of
lexical scoping by PEP 227 back in Python 2.1 (or 2.2 without a
__future__ statement). This changed Python from its original
locals->globals->builtins lookup mechanism (still used in class scope
to this day), to the closure semantics for nested functions that we're
familiar with. However, at this stage, there was no ability to rebind
names in outer scopes - they were read-only, so you needed to use
other techniques (like 'boxing' in a list) to update immutable values.
== Writing to Outer Scopes ==
PEP 3104 added the ability to write to outer scopes by using the
'nonlocal' statement to declare that a particular variable was not a
local in the current frame, but rather a local in an outer frame which
is alive at the time the inner function definition statement is
executed. It expects the variable to already exist in an outer
lexically nested scope and complains if it can't find one.
== The "__class__" cell reference ==
The final entrant in this game, the "__class__" cell reference was
added to the language by PEP 335 in order to implement the 3.x super()
shorthand. For functions defined within a class body, this effectively
lets the class definition play a role in lexical scoping, as the
compiler and eval loop cooperate to give the function an indirect
reference to the class being defined, even though the function
definition completes first.
== The Status Quo ==
If you go look up the definition of 'closure', you'll find that it
doesn't actually say anything about nested functions. Instead, it will
talk about 'free variables' in the algorithm definition without
placing any restrictions on how those variables are later hooked up to
the appropriate values.
In current Python, ordinary named references can refer to one of 4 namespaces:
- locals (stored on the currently executing frame object)
- closure reference (stored in a cell object by the function that
defined it, kept alive after the frame is recycled by references from
still living inner functions that need it)
- globals (stored on the module object)
- builtins (also stored on a module object, specifically the one for
the builtin namespace)
PEP 335 also creates a closure reference for "__class__", but in a
slightly unusual way. Whereas most targets for closure references are
created by the code in the outer function when it runs [1], this
closure reference is populated implicitly by the type machinery [2].
The important aspect from my point of view is that this implementation
technique starts to break down Python's historical correlation between
"function closure" and "lexically nested scope".
== Conceptual Unification ==
The moment of clarity for me came when I realised that default
arguments, lexically nested scopes and the new super() implementation
can all be seen as just special cases of the broader concept of free
variables and function closures.
Lexically nested scopes have always been talked about in those terms,
so that aspect shouldn't surprise anyone. The new super()
implementation is also fairly obviously a closure, since it uses the
closure machinery to work its magic. The only difference is in the way
the value gets populated in the first place (i.e. by the type
machinery rather than by the execution of an outer function).
Due to history, default argument *values* aren't often thought of as
closure references, but they really are anonymous closures. Instead of
using cells, the references are stored in dedicated attributes that
are known to the argument parsing machinery, but you could quite
easily dispense with that and store everything as cells in the
function closure (you wouldn't, since it would be a waste of time and
energy, I'm just pointing out the conceptual equivalence. A *new*
Python implementation, though, could choose to go down that path).
After I had that realisation, the natural follow-up question seemed to
be: if I wanted to explicitly declare a closure variable, and provide
it with an initial value, without introducing a nested function purely
for that purpose, how should I spell that?
Well, I think PEP 3104 has already given us the answer: by declaring
the variable name as explicitly 'nonlocal', but also providing an
initial value so the compiler knows it is a *new* closure variable,
rather than one from an outer lexically nested scope. This is a far
more useful and meaningful addition than the trivial syntactic sugar
mentioned in the PEP (but ultimately not implemented).
The other question is what scope the initialisation operation should
be executed in, and I think there, default arguments have the answer:
in the containing scope, before the function has been defined.
== Precise Syntax ==
By reusing 'nonlocal', we would make it clear that we're not adding a
new concept to the language, but rather generalising an existing one
(i.e. closure references) to provide additional flexibility in the way
they're used. So I *really* want to use that keyword rather than
adding a new one just for this task. However, I'm less certain about
the spelling of the rest of the statement.
There are at least a few possible alternative spellings:
nonlocal VAR = EXPR # My initial suggestion
nonlocal VAR from EXPR # Strongly indicates there's more than a
simple assignment going on here
nonlocal EXPR as VAR # Parser may struggle with this one
Of the three, 'nonlocal VAR from EXPR' may be the best bet - it's easy
for the compiler to parse, PEP 380 set the precedent for the 'from
EXPR' clause to introduce a subexpression and 'nonlocal VAR = EXPR'
may be too close to 'nonlocal VAR; VAR = EXPR'.
Regards,
Nick.
[1]
Some dis module details regarding the different kinds of name
reference. Most notable for my point is the correspondence between the
'cell variable' in the outer function and the 'free variable' in the
inner function:
>>> def outer():
... closure_ref = 1
... def inner():
... local_ref = 2
... print(local_ref, closure_ref, global_ref, len)
...
>>> global_ref = 3
>>> import dis
>>> dis.show_code(outer)
Name: outer
Filename: <stdin>
Argument count: 0
Kw-only arguments: 0
Number of locals: 1
Stack size: 2
Flags: OPTIMIZED, NEWLOCALS
Constants:
0: None
1: 1
2: <code object inner at 0xee78b0, file "<stdin>", line 3>
Variable names:
0: inner
Cell variables:
0: closure_ref
>>> dis.show_code(outer.__code__.co_consts[2])
Name: inner
Filename: <stdin>
Argument count: 0
Kw-only arguments: 0
Number of locals: 1
Stack size: 5
Flags: OPTIMIZED, NEWLOCALS, NESTED
Constants:
0: None
1: 2
Names:
0: print
1: global_ref
2: len
Variable names:
0: local_ref
Free variables:
0: closure_ref
[2]
Some dis module output to show that there's no corresponding
'__class__' cell variable anywhere when the implicit closure entry is
created by the new super() machinery.
>>> def outer2():
... class C:
... def inner():
... print(__class__)
...
>>> dis.show_code(outer2)
Name: outer2
Filename: <stdin>
Argument count: 0
Kw-only arguments: 0
Number of locals: 1
Stack size: 3
Flags: OPTIMIZED, NEWLOCALS, NOFREE
Constants:
0: None
1: <code object C at 0x1275c68, file "<stdin>", line 2>
2: 'C'
Variable names:
0: C
>>> dis.show_code(outer2.__code__.co_consts[1])
Name: C
Filename: <stdin>
Argument count: 1
Kw-only arguments: 0
Number of locals: 1
Stack size: 2
Flags: NEWLOCALS
Constants:
0: <code object inner at 0x1275608, file "<stdin>", line 3>
Names:
0: __name__
1: __module__
2: inner
Variable names:
0: __locals__
Cell variables:
0: __class__
>>> dis.show_code(outer2.__code__.co_consts[1].co_consts[0])
Name: inner
Filename: <stdin>
Argument count: 0
Kw-only arguments: 0
Number of locals: 0
Stack size: 2
Flags: OPTIMIZED, NEWLOCALS, NESTED
Constants:
0: None
Names:
0: print
Free variables:
0: __class__
--
Nick Coghlan | ncoghlan(a)gmail.com | Brisbane, Australia
It seems there could be a cleaner way of reading the first n lines of
a file and additionally not seeking past those lines (ie peek). This
is obviously a trivial task for 1 line ie...
f.readline()
f.seek(0)
but one that I think would make sense to add to the IO implementation,
given that we already have readline, readlines, and peek I think
peekline() or peeklines(n) is only a natural addition. The argument
for doing so (in 3.3 of course), is primarily readability but also
that the maintenance burden *seems* like it would be low. This
addition would also be helpful and more concise where n > 1.
I think readlines() should also take an optional argument for a max
number of lines to read. It seems more common/helpful to me than
'hint' for max bytes. In n>1 case one could do...
f.readlines(maxlines=10)
or for the 'peek' case
f.peeklines(10)
I also didn't find any of the answers from
http://stackoverflow.com/questions/1767513/read-first-n-lines-of-a-file-i...
to be very compelling.
I am more than willing to propose a patch if the idea(s) are supported.
- John
Hey,
not sure how people do this, or if I missed something obvious in the
stdlib, but I often have this pattern:
starts = ('a', 'b', 'c')
somestring = 'acapulco'
for start in starts:
if somestring.startswith(start):
print "yeah"
So what about a startsin() method, that would iterate over a sequence:
if somestring.startsin('a', 'b', 'c'):
print "yeah"
Implementing it in C should be faster as well
same deal with .endswith I guess
Cheers
Tarek
--
Tarek Ziadé | http://ziade.org
I propose adding a basic calculator statistics module to the standard
library, similar to the sorts of functions you would get on a scientific
calculator:
mean (average)
variance (population and sample)
standard deviation (population and sample)
correlation coefficient
and similar. I am volunteering to provide, and support, this module,
written in pure Python so other implementations will be able to use it.
Simple calculator-style statistics seem to me to be a fairly obvious
"battery" to be included, more useful in practice than some functions
already available such as factorial and the hyperbolic functions.
The lack of a standard solution leads people who need basic stats to
roll their own. This seems seductively simple, as the basic stats
formulae are quite simple. Unfortunately doing it *correctly* is much
harder than it seems. Variance, in particular, is prone to serious
inaccuracies. Here is the most obvious algorithm, using the so-called
"computational formula for the variance":
def variance(data):
# σ2 = 1/n**2 * (n*Σ(x**2) - (Σx)**2)
n = len(data)
s1 = sum(x**2 for x in data)
s2 = sum(data)
return (n*s1 - s2**2)/(n*n)
Many stats text books recommend this as the best way to calculate
variance, advice which makes sense when you're talking about hand
calculations of small numbers of moderate sized data, but not for
floating point. It appears to work:
>>> data = [1, 2, 4, 5, 8]
>>> variance(data) # exact value = 6
6.0
but unfortunately it is numerically unstable. Shifting all the data
points by a constant amount shouldn't change the variance, but it does:
>>> data = [x+1e12 for x in data]
>>> variance(data)
171798691.84
Even worse, variance should never be negative:
>>> variance(data*100)
-1266637395.197952
Note that using math.fsum instead of the built-in sum does not fix the
numeric instability problem, and it adds the additional problem that it
coerces the data points to float. (If you use Decimal, this may not be
what you want.)
Here is an example of published code which suffers from exactly this
problem:
https://bitbucket.org/larsyencken/simplestats/src/c42e048a6625/src/basic.py
and here is an example on StackOverflow. Note the most popular answer
given is to use the Computational Formula, which is the wrong answer.
http://stackoverflow.com/questions/2341340/calculate-mean-and-variance-wi...
I would like to add a module to the standard library to solve these
sorts of simple stats problems the right way, once and for all.
Thoughts, comments, objections or words of encouragement are welcome.
--
Steven
I think `strtr`_ in php is also very useful when escaping something.
_ strtr: http://jp.php.net/manual/en/function.strtr.php
For example:
.. code-block:: php
php> = strtr("foo\\\"bar\\'baz\\\\", array("\\\\"=>"\\",
'\\"'=>'"', "\\'"=>"'"));
"foo\"bar'baz\\"
.. code-block:: python
In [1]: "foo\\\"bar\\'baz\\\\".replace('\\"', '"').replace("\\'",
"'").replace('\\\\', '\\')
Out[1]: 'foo"bar\'baz\\'
In Python, lookup of 'replace' method occurs many times and temporary
strings is created many times too.
It makes Python slower than php.
And replacing order may cause very common mistake.
.. code-block:: python
In [4]: "foo\\\"bar\\'baz\\\\'".replace('\\\\',
'\\').replace('\\"', '"').replace("\\'", "'")
Out[4]: 'foo"bar\'baz\''
When I wrote HandlerSocket_ client in pure Python. I use dirty hack for speed.
http://bazaar.launchpad.net/~songofacandy/+junk/pyhandlersocket/view/head...
I believe Pythonic means simple and efficient. My code is not Pythonic at all.
.. _HandlerSocket: https://github.com/ahiguti/HandlerSocket-Plugin-for-MySQL
On Sat, Oct 1, 2011 at 12:30 AM, Tarek Ziadé <ziade.tarek(a)gmail.com> wrote:
> Hey,
>
> not sure how people do this, or if I missed something obvious in the
> stdlib, but I often have this pattern:
>
> starts = ('a', 'b', 'c')
> somestring = 'acapulco'
>
> for start in starts:
> if somestring.startswith(start):
> print "yeah"
>
>
> So what about a startsin() method, that would iterate over a sequence:
>
> if somestring.startsin('a', 'b', 'c'):
> print "yeah"
>
> Implementing it in C should be faster as well
>
> same deal with .endswith I guess
>
> Cheers
> Tarek
>
> --
> Tarek Ziadé | http://ziade.org
> _______________________________________________
> Python-ideas mailing list
> Python-ideas(a)python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
--
INADA Naoki <songofacandy(a)gmail.com>
Here's a draft of an update to PEP 335. It includes a couple of
fully worked and tested examples, plus discussion of some
potential simplifications and ways to optimise the generated
bytecode.
-------------------------------------------------------------
PEP: 335
Title: Overloadable Boolean Operators
Version: $Revision$
Last-Modified: $Date$
Author: Gregory Ewing <greg(a)cosc.canterbury.ac.nz>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 29-Aug-2004
Python-Version: Unspecified
Post-History: 05-Sep-2004
Abstract
========
This PEP proposes an extension to permit objects to define their own
meanings for the boolean operators 'and', 'or' and 'not', and suggests
an efficient strategy for implementation. A prototype of this
implementation is available for download.
Background
==========
Python does not currently provide any '__xxx__' special methods
corresponding to the 'and', 'or' and 'not' boolean operators. In the
case of 'and' and 'or', the most likely reason is that these operators
have short-circuiting semantics, i.e. the second operand is not
evaluated if the result can be determined from the first operand. The
usual technique of providing special methods for these operators
therefore would not work.
There is no such difficulty in the case of 'not', however, and it
would be straightforward to provide a special method for this
operator. The rest of this proposal will therefore concentrate mainly
on providing a way to overload 'and' and 'or'.
Motivation
==========
There are many applications in which it is natural to provide custom
meanings for Python operators, and in some of these, having boolean
operators excluded from those able to be customised can be
inconvenient. Examples include:
1. NumPy, in which almost all the operators are defined on
arrays so as to perform the appropriate operation between
corresponding elements, and return an array of the results. For
consistency, one would expect a boolean operation between two
arrays to return an array of booleans, but this is not currently
possible.
There is a precedent for an extension of this kind: comparison
operators were originally restricted to returning boolean results,
and rich comparisons were added so that comparisons of NumPy
arrays could return arrays of booleans.
2. A symbolic algebra system, in which a Python expression is
evaluated in an environment which results in it constructing a tree
of objects corresponding to the structure of the expression.
3. A relational database interface, in which a Python expression is
used to construct an SQL query.
A workaround often suggested is to use the bitwise operators '&', '|'
and '~' in place of 'and', 'or' and 'not', but this has some
drawbacks. The precedence of these is different in relation to the
other operators, and they may already be in use for other purposes (as
in example 1). There is also the aesthetic consideration of forcing
users to use something other than the most obvious syntax for what
they are trying to express. This would be particularly acute in the
case of example 3, considering that boolean operations are a staple of
SQL queries.
Rationale
=========
The requirements for a successful solution to the problem of allowing
boolean operators to be customised are:
1. In the default case (where there is no customisation), the existing
short-circuiting semantics must be preserved.
2. There must not be any appreciable loss of speed in the default
case.
3. Ideally, the customisation mechanism should allow the object to
provide either short-circuiting or non-short-circuiting semantics,
at its discretion.
One obvious strategy, that has been previously suggested, is to pass
into the special method the first argument and a function for
evaluating the second argument. This would satisfy requirements 1 and
3, but not requirement 2, since it would incur the overhead of
constructing a function object and possibly a Python function call on
every boolean operation. Therefore, it will not be considered further
here.
The following section proposes a strategy that addresses all three
requirements. A `prototype implementation`_ of this strategy is
available for download.
.. _prototype implementation:
http://www.cosc.canterbury.ac.nz/~greg/python/obo//Python_OBO.tar.gz
Specification
=============
Special Methods
---------------
At the Python level, objects may define the following special methods.
=============== ================= ========================
Unary Binary, phase 1 Binary, phase 2
=============== ================= ========================
* __not__(self) * __and1__(self) * __and2__(self, other)
* __or1__(self) * __or2__(self, other)
* __rand2__(self, other)
* __ror2__(self, other)
=============== ================= ========================
The __not__ method, if defined, implements the 'not' operator. If it
is not defined, or it returns NotImplemented, existing semantics are
used.
To permit short-circuiting, processing of the 'and' and 'or' operators
is split into two phases. Phase 1 occurs after evaluation of the first
operand but before the second. If the first operand defines the
relevant phase 1 method, it is called with the first operand as
argument. If that method can determine the result without needing the
second operand, it returns the result, and further processing is
skipped.
If the phase 1 method determines that the second operand is needed, it
returns the special value NeedOtherOperand. This triggers the
evaluation of the second operand, and the calling of a relevant
phase 2 method. During phase 2, the __and2__/__rand2__ and
__or2__/__ror2__ method pairs work as for other binary operators.
Processing falls back to existing semantics if at any stage a relevant
special method is not found or returns NotImplemented.
As a special case, if the first operand defines a phase 2 method but
no corresponding phase 1 method, the second operand is always
evaluated and the phase 2 method called. This allows an object which
does not want short-circuiting semantics to simply implement the
phase 2 methods and ignore phase 1.
Bytecodes
---------
The patch adds four new bytecodes, LOGICAL_AND_1, LOGICAL_AND_2,
LOGICAL_OR_1 and LOGICAL_OR_2. As an example of their use, the
bytecode generated for an 'and' expression looks like this::
.
.
.
evaluate first operand
LOGICAL_AND_1 L
evaluate second operand
LOGICAL_AND_2
L: .
.
.
The LOGICAL_AND_1 bytecode performs phase 1 processing. If it
determines that the second operand is needed, it leaves the first
operand on the stack and continues with the following code. Otherwise
it pops the first operand, pushes the result and branches to L.
The LOGICAL_AND_2 bytecode performs phase 2 processing, popping both
operands and pushing the result.
Type Slots
----------
At the C level, the new special methods are manifested as five new
slots in the type object. In the patch, they are added to the
tp_as_number substructure, since this allows making use of some
existing code for dealing with unary and binary operators. Their
existence is signalled by a new type flag,
Py_TPFLAGS_HAVE_BOOLEAN_OVERLOAD.
The new type slots are::
unaryfunc nb_logical_not;
unaryfunc nb_logical_and_1;
unaryfunc nb_logical_or_1;
binaryfunc nb_logical_and_2;
binaryfunc nb_logical_or_2;
Python/C API Functions
----------------------
There are also five new Python/C API functions corresponding to the
new operations::
PyObject *PyObject_LogicalNot(PyObject *);
PyObject *PyObject_LogicalAnd1(PyObject *);
PyObject *PyObject_LogicalOr1(PyObject *);
PyObject *PyObject_LogicalAnd2(PyObject *, PyObject *);
PyObject *PyObject_LogicalOr2(PyObject *, PyObject *);
Alternatives and Optimisations
==============================
This section discusses some possible variations on the proposal,
and ways in which the bytecode sequences generated for boolean
expressions could be optimised.
Reduced special method set
--------------------------
For completeness, the full version of this proposal includes a
mechanism for types to define their own customised short-circuiting
behaviour. However, the full mechanism is not needed to address the
main use cases put forward here, and it would be possible to
define a simplified version that only includes the phase 2
methods. There would then only be 5 new special methods (__and2__,
__rand2__, __or2__, __ror2__, __not__) with 3 associated type slots
and 3 API functions.
This simplified version could be expanded to the full version
later if desired.
Additional bytecodes
--------------------
As defined here, the bytecode sequence for code that branches on
the result of a boolean expression would be slightly longer than
it currently is. For example, in Python 2.7,
::
if a and b:
statement1
else:
statement2
generates
LOAD_GLOBAL a
POP_JUMP_IF_FALSE false_branch
LOAD_GLOBAL b
POP_JUMP_IF_FALSE false_branch
<code for statement1>
JUMP_FORWARD end_branch
false_branch:
<code for statement2>
end_branch:
Under this proposal as described so far, it would become something like
::
LOAD_GLOBAL a
LOGICAL_AND_1 test
LOAD_GLOBAL b
LOGICAL_AND_2
test:
POP_JUMP_IF_FALSE false_branch
<code for statement1>
JUMP_FORWARD end_branch
false_branch:
<code for statement2>
end_branch:
This involves executing one extra bytecode in the short-circuiting
case and two extra bytecodes in the non-short-circuiting case.
However, by introducing extra bytecodes that combine the logical
operations with testing and branching on the result, it can be
reduced to the same number of bytecodes as the original:
::
LOAD_GLOBAL a
AND1_JUMP true_branch, false_branch
LOAD_GLOBAL b
AND2_JUMP_IF_FALSE false_branch
true_branch:
<code for statement1>
JUMP_FORWARD end_branch
false_branch:
<code for statement2>
end_branch:
Here, AND1_JUMP performs phase 1 processing as above,
and then examines the result. If there is a result, it is popped
from the stack, its truth value is tested and a branch taken to
one of two locations.
Otherwise, the first operand is left on the stack and execution
continues to the next bytecode. The AND2_JUMP_IF_FALSE bytecode
performs phase 2 processing, pops the result and branches if
it tests false
For the 'or' operator, there would be corresponding OR1_JUMP
and OR2_JUMP_IF_TRUE bytecodes.
If the simplified version without phase 1 methods is used, then
early exiting can only occur if the first operand is false for
'and' and true for 'or'. Consequently, the two-target AND1_JUMP and
OR1_JUMP bytecodes can be replaced with AND1_JUMP_IF_FALSE and
OR1_JUMP_IF_TRUE, these being ordinary branch instructions with
only one target.
Optimisation of 'not'
---------------------
Recent versions of Python implement a simple optimisation in
which branching on a negated boolean expression is implemented
by reversing the sense of the branch, saving a UNARY_NOT opcode.
Taking a strict view, this optimisation should no longer be
performed, because the 'not' operator may be overridden to produce
quite different results from usual. However, in typical use cases,
it is not envisaged that expressions involving customised boolean
operations will be used for branching -- it is much more likely
that the result will be used in some other way.
Therefore, it would probably do little harm to specify that the
compiler is allowed to use the laws of boolean algebra to
simplify any expression that appears directly in a boolean
context. If this is inconvenient, the result can always be assigned
to a temporary name first.
This would allow the existing 'not' optimisation to remain, and
would permit future extensions of it such as using De Morgan's laws
to extend it deeper into the expression.
Usage Examples
==============
Example 1: NumPy Arrays
-----------------------
::
#-----------------------------------------------------------------
#
# This example creates a subclass of numpy array to which
# 'and', 'or' and 'not' can be applied, producing an array
# of booleans.
#
#-----------------------------------------------------------------
from numpy import array, ndarray
class BArray(ndarray):
def __str__(self):
return "barray(%s)" % ndarray.__str__(self)
def __and2__(self, other):
return (self & other)
def __or2__(self, other):
return (self & other)
def __not__(self):
return (self == 0)
def barray(*args, **kwds):
return array(*args, **kwds).view(type = BArray)
a0 = barray([0, 1, 2, 4])
a1 = barray([1, 2, 3, 4])
a2 = barray([5, 6, 3, 4])
a3 = barray([5, 1, 2, 4])
print "a0:", a0
print "a1:", a1
print "a2:", a2
print "a3:", a3
print "not a0:", not a0
print "a0 == a1 and a2 == a3:", a0 == a1 and a2 == a3
print "a0 == a1 or a2 == a3:", a0 == a1 or a2 == a3
Example 1 Output
---------------
::
a0: barray([0 1 2 4])
a1: barray([1 2 3 4])
a2: barray([5 6 3 4])
a3: barray([5 1 2 4])
not a0: barray([ True False False False])
a0 == a1 and a2 == a3: barray([False False False True])
a0 == a1 or a2 == a3: barray([False False False True])
Example 2: Database Queries
---------------------------
::
#-----------------------------------------------------------------
#
# This example demonstrates the creation of a DSL for database
# queries allowing 'and' and 'or' operators to be used to
# formulate the query.
#
#-----------------------------------------------------------------
class SQLNode(object):
def __and2__(self, other):
return SQLBinop("and", self, other)
def __rand2__(self, other):
return SQLBinop("and", other, self)
def __eq__(self, other):
return SQLBinop("=", self, other)
class Table(SQLNode):
def __init__(self, name):
self.__tablename__ = name
def __getattr__(self, name):
return SQLAttr(self, name)
def __sql__(self):
return self.__tablename__
class SQLBinop(SQLNode):
def __init__(self, op, opnd1, opnd2):
self.op = op.upper()
self.opnd1 = opnd1
self.opnd2 = opnd2
def __sql__(self):
return "(%s %s %s)" % (sql(self.opnd1), self.op, sql(self.opnd2))
class SQLAttr(SQLNode):
def __init__(self, table, name):
self.table = table
self.name = name
def __sql__(self):
return "%s.%s" % (sql(self.table), self.name)
class SQLSelect(SQLNode):
def __init__(self, targets):
self.targets = targets
self.where_clause = None
def where(self, expr):
self.where_clause = expr
return self
def __sql__(self):
result = "SELECT %s" % ", ".join([sql(target) for target in
self.targets])
if self.where_clause:
result = "%s WHERE %s" % (result, sql(self.where_clause))
return result
def sql(expr):
if isinstance(expr, SQLNode):
return expr.__sql__()
elif isinstance(expr, str):
return "'%s'" % expr.replace("'", "''")
else:
return str(expr)
def select(*targets):
return SQLSelect(targets)
#--------------------------------------------------------------------------------
dishes = Table("dishes")
customers = Table("customers")
orders = Table("orders")
query = select(customers.name, dishes.price, orders.amount).where(
customers.cust_id == orders.cust_id and orders.dish_id == dishes.dish_id
and dishes.name == "Spam, Eggs, Sausages and Spam")
print repr(query)
print sql(query)
Example 2 Output
----------------
::
<__main__.SQLSelect object at 0x1cc830>
SELECT customers.name, dishes.price, orders.amount WHERE
(((customers.cust_id = orders.cust_id) AND (orders.dish_id =
dishes.dish_id)) AND (dishes.name = 'Spam, Eggs, Sausages and Spam'))
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End:
Based on the testability comments in the closure threads, I created
http://bugs.python.org/issue13062 to propose two new introspection
functions:
inspect.getclosure(func)
Returns a dictionary mapping closure references from the supplied
function to their current values.
inspect.getgeneratorlocals(generator)
Returns the same result as would be reported by calling locals()
in the generator's frame of execution
The former would just involve syncing up the names on the code object
with the cell references on the function object, while the latter
would be equivalent to doing generator.gi_frame.f_locals with some
nice error checking for when the generator's frame is already gone (or
the supplied object isn't a generator iterator).
Cheers,
Nick.
--
Nick Coghlan | ncoghlan(a)gmail.com | Brisbane, Australia
(This idea may have been suggested before, because it seems too obvious
to me.)
How about if we remove the requirement that the colon be on the same
line as the function name.
And, what if the colon is then used to split the difference between the
definition time, and call time code in function definitions. So that
every thing before the colon is done at definition time. Everything
after the colon is done at call time.
def foo(...):
""" doc string """
<function body>
Then could become ...
def foo(...)
""" doc string """ # foo.__doc__ = """ doc string """
:
<function body>
I think this represents what is actually happening a bit better.
One possibility for define time code is to have decorators listed that
read in the order they are applied instead of bottom up.
def foo(n)
""" function to be decorated. """
@deco1 # foo = deco1(foo) The '@' notation still works.
@deco2 # foo = deco2(foo)
:
<function body>
Note, that putting the doc string after the decorators may be better as
it would put the doc string on the decorated function instead of the
original.
I think there may be more things possible with this idea than the simple
cases above.
Cheers,
Ron