
On Fri, Jun 17, 2011 at 10:12 PM, Steven D'Aprano <steve@pearwood.info> wrote:
You have missed a fourth option, which I have been championing: make inject an ordinary function, available from the functools module. The *implementation* of inject almost certainly will require support from the compiler, but that doesn't mean the interface should!
No, I didn't miss it, I left it out on purpose because I think messing with the runtime name lookup semantics is a terrible idea. You and others seem fond of it, but namespace semantics are the heart and soul of why functions are so much faster than module level code and we shouldn't be touching that logic with a 10 foot pole. Adding a new cell-based shared namespace that uses the same runtime lookup semantics as closures to replace *existing* uses of the default argument hack? Sure, that's a reasonable proposal (it may still get rejected due to devils in the details, but it has at least as much going for it as PEP 308 did). Messing with normal locals from outside a function, or providing an officially sanctioned way to convert global references to some other kind of reference *after* the function has already been defined? Hell no, that's a solution looking for a problem and the concept of eliminating the default argument hack shouldn't be burdened with that kind of overreaching. The secret to the speed of functions lies in the fact that the compiler knows all the names at compile time so it can generate appropriate load/store operations for the different scopes (array lookup for locals, cell dereference for closure variables, global-or-builtin lookup for everything else). This benefits not just CPython, but all Python implementations: inside a function, they're allowed to assume that the *only* code changing the state of the locals is the function code itself. Cell dereferencing allows for the fact that closure variables might change (but are still reasonably close to locals in speed, since the *cells* are referenced from an array), and global and builtin lookup is the slowest of all (since it involves actually looking up identifiers in namespace dictionaries). Even a JIT compiler like PyPy can be more aggressive about optimising local and cell access than it can be about the officially shifting sands that are the global and builtin namespaces. This is why the nonlocal and global directives exist: to tell the compiler to change how it treats certain names. Arguments (including the associated default values) are given additional special treatment due to their placement in the function header. If we want to create a new namespace that is given special treatment by the compiler, those are the two options that are even remotely viable: placement in the function header (after the ** entry) or flagged via a new compiler directive (and the precedent of "nonlocal" and "global" suggests that directive should occur inside the function body rather than anywhere else). "@def" is primarily a proposal to avoid having to do the from __future__ dance in defining a new keyword, so I'll modify it to the more explicit "atdef" to avoid confusion with decorators). A new compiler directive is my own preference (due to the major semantic differences between how shared variables will be handled and how default arguments are handled), and I now believe it makes sense to use nonlocal, global and default arguments as the model for how that would work: atdef VAR=EXPR [, VAR=EXPR]* As with nonlocal and global, definition time statements could technically appear anywhere in the function body (with their full effect), but style guidelines would recommend placing them at the beginning of the function, just after the docstring. Parentheses around the var list would not be permitted - use multiple shared statements instead (parentheses would, however, naturally permit the expressions themselves to span multiple lines). Such a statement would readily cover the speed enhancement, early-binding and shared state use cases for the default argument hack (indeed, the compiler could conceivably detect if a shared value was never rebound and simply load the cell contents into each frame as a local variable in that case, avoiding even the cell dereference overhead relative to the speed hack). The 'atdef' phrasing slightly emphasises the early-binding use case, but still seems reasonable for the speed enhancement and shared state use cases. In contrast, a keyword like 'shared' which emphasised the shared state use case, would feel far more out of place when used for speed enhancement or early binding (as well as being far more likely to conflict with existing variables names). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia