[Python-ideas] 'Injecting' objects as function-local constants

Nick Coghlan ncoghlan at gmail.com
Fri Jun 17 15:26:43 CEST 2011


On Fri, Jun 17, 2011 at 10:12 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> You have missed a fourth option, which I have been championing: make inject
> an ordinary function, available from the functools module. The
> *implementation* of inject almost certainly will require support from the
> compiler, but that doesn't mean the interface should!

No, I didn't miss it, I left it out on purpose because I think messing
with the runtime name lookup semantics is a terrible idea. You and
others seem fond of it, but namespace semantics are the heart and soul
of why functions are so much faster than module level code and we
shouldn't be touching that logic with a 10 foot pole.

Adding a new cell-based shared namespace that uses the same runtime
lookup semantics as closures to replace *existing* uses of the default
argument hack? Sure, that's a reasonable proposal (it may still get
rejected due to devils in the details, but it has at least as much
going for it as PEP 308 did). Messing with normal locals from outside
a function, or providing an officially sanctioned way to convert
global references to some other kind of reference *after* the function
has already been defined? Hell no, that's a solution looking for a
problem and the concept of eliminating the default argument hack
shouldn't be burdened with that kind of overreaching.

The secret to the speed of functions lies in the fact that the
compiler knows all the names at compile time so it can generate
appropriate load/store operations for the different scopes (array
lookup for locals, cell dereference for closure variables,
global-or-builtin lookup for everything else). This benefits not just
CPython, but all Python implementations: inside a function, they're
allowed to assume that the *only* code changing the state of the
locals is the function code itself. Cell dereferencing allows for the
fact that closure variables might change (but are still reasonably
close to locals in speed, since the *cells* are referenced from an
array), and global and builtin lookup is the slowest of all (since it
involves actually looking up identifiers in namespace dictionaries).
Even a JIT compiler like PyPy can be more aggressive about optimising
local and cell access than it can be about the officially shifting
sands that are the global and builtin namespaces.

This is why the nonlocal and global directives exist: to tell the
compiler to change how it treats certain names. Arguments (including
the associated default values) are given additional special treatment
due to their placement in the function header. If we want to create a
new namespace that is given special treatment by the compiler, those
are the two options that are even remotely viable: placement in the
function header (after the ** entry) or flagged via a new compiler
directive (and the precedent of "nonlocal" and "global" suggests that
directive should occur inside the function body rather than anywhere
else). "@def" is primarily a proposal to avoid having to do the from
__future__ dance in defining a new keyword, so I'll modify it to the
more explicit "atdef" to avoid confusion with decorators).

A new compiler directive is my own preference (due to the major
semantic differences between how shared variables will be handled and
how default arguments are handled), and I now believe it makes sense
to use nonlocal, global and default arguments as the model for how
that would work:

  atdef VAR=EXPR [, VAR=EXPR]*

As with nonlocal and global, definition time statements could
technically appear anywhere in the function body (with their full
effect), but style guidelines would recommend placing them at the
beginning of the function, just after the docstring. Parentheses
around the var list would not be permitted - use multiple shared
statements instead (parentheses would, however, naturally permit the
expressions themselves to span multiple lines).

Such a statement would readily cover the speed enhancement,
early-binding and shared state use cases for the default argument hack
(indeed, the compiler could conceivably detect if a shared value was
never rebound and simply load the cell contents into each frame as a
local variable in that case, avoiding even the cell dereference
overhead relative to the speed hack).

The 'atdef' phrasing slightly emphasises the early-binding use case,
but still seems reasonable for the speed enhancement and shared state
use cases. In contrast, a keyword like 'shared' which emphasised the
shared state use case, would feel far more out of place when used for
speed enhancement or early binding (as well as being far more likely
to conflict with existing variables names).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia



More information about the Python-ideas mailing list