'Injecting' objects as function-local constants

== Use cases == A quite common practice is 'injecting' objects into a function as its locals, at def-time, using function arguments with default values... Sometimes to keep state using a mutable container: def do_and_remember(val, verbose=False, mem=collections.Counter()): result = do_something(val) mem[val] += 1 if verbose: print('Done {} times for {!r}'.format(mem[val], val)) Sometimes, when creating functions dynamically (making use of nested scopes), e.g. to keep some individual function features (usable within that functions): def make_my_callbacks(callback_params): my_callbacks = [] for params in callback_params: def fun1(*args, _params=params, **kwargs): "...do something with args and params..." def fun2(*args, _params=params, **kwargs): "...do something with args and params..." def fun3(*args, _fun1=fun1, _fun2=fun2, **kwargs): """...do something with args and with functions fun1, fun2, for example pass them as callbacks to other functions..." my_callbacks.append((fun1, fun2, fun3)) return my_callbacks Sometimes simply to make critical parts of code optimised... def do_it_quickly(fields, _len=len, _split=str.split, _sth=something): return [_len(f), _split(f), _sth(f) for f in fields] ...or even for readability -- keeping function-specific constants within the function definition: def check_value(val, VAL_REGEX=re.compile('^...$'), VAL_MAX_LEN=38): return len(val) <= VAL_MAX_LEN and VAL_RE.search(val) is not None In all that cases (and probably some other too) that technique appears to be quite useful. == The problem == ...is that it is not very elegant. We add arguments which: a) mess up function signatures (both in the code and in auto-generated docs); b) can be incidentally overriden (especially when a function has an "open" signature with **kwargs). == Proposed solutions == I see three possibilities: 1. To add a new keyword, e.g. `inject': def do_and_remember(val, verbose=False): inject mem = collections.Counter() ... or maybe: def do_and_remember(val, verbose=False): inject collections.Counter() as mem ... 2. (which personally I would prefer) To add `dummy' (or `hidden') keyword arguments, defined after **kwargs (and after bare ** if kwargs are not needed; we have already have keyword-only arguments after *args or bare *): def do_and_remember(val, verbose=False, **, mem=collections.Counter()): ... do_and_remember(val, False, mem='something') would raise TypeError and `mem' shoudn not appear in help() etc. as a function argument. 3. To provide a special decorator, e.g. functools.within: @functools.within(mem=collections.Counter()) def do_and_remember(val, verbose=False): ... Regards. *j

On 6/11/2011 9:30 AM, Jan Kaliszewski wrote:
One problem with trying to 'fix' this is that there can be defaulted args which are not intended to be overwritten by users but which are intended to be replaced in recursive calls.
The body should all be runtime. Deftime expression should be in the header.
I thought of this while reading 'the problem'. It is at least plausible to me.
The decorator would have to modify the code object as well as the function objects, probably in ways not currently allowed. -- Terry Jan Reedy

Terry Reedy wrote:
I think any solution to this would have to be backward compatible. A big NO to anything which changes the behaviour of existing code.
The body should all be runtime. Deftime expression should be in the header.
That's not even the case now. The global and nonlocal keywords are in the body, and they apply at compile-time. I don't like the name inject as shown, but I like the idea of injecting locals into a function from the outside. (Or rather, into a *copy* of the function.) This suggests generalising the idea: take any function, and make a copy of it with the specified names/values defined as locals. The obvious API is a decorator (presumably living in functools). Assume we can write such a decorator, and postpone discussion of any implementation for now. Firstly, this provides a way of setting locals at function definition time without polluting the parameter list and exposing local variables to the caller. Function arguments should be used for arguments, not internal implementation details. @inject(mem=collections.Counter()) def do_and_remember(val, verbose=False): # like do_and_remember(val, verbose=False, mem=...) But more importantly, it has wider applications, like testing, introspection, or adding logging to functions: def my_function(alist): return random.choice(alist) + 1 You might not be able to modify my_function, it may be part of a library you don't control. As written, if you want to test it, you need to monkey-patch the random module, which is a dangerous anti-pattern. Better to do this: class randomchoice_mock: def choice(self, arg): return 0 mock = randomchoice_mock() test_func = inject(random=mock)(my_function) Because test_func is a copy of my_function, you can be sure that you won't break anything. Adding logging is just as easy. This strikes me as the best solution: the decorator is at the head of the function, so it looks like a declaration, and it has its effect at function definition time. But as Terry points out, such a decorator might not be currently possible without language support, or at least messy byte-code hacking. -- Steven

On 11 Jun 2011, at 14:30, Jan Kaliszewski wrote:
That's hard to do as (assuming the function is defined at the global scope), mem will be compiled as a global, meaning that you will have to modify the bytecode. Oh but this makes me think about something I wrote a while ago (see below). 4. Use closures. def factory(mem): def do_and_remember(val, verbose=False) result = do_something(val) mem[val] += 1 if verbose: print('Done {} times for {!r}'.format(mem[val], val)) .... return do_and_remember do_and_remember = factory(mem=collections.Counter()) Added bonus: you can create many instances of do_and_remember. ---------- Related to this, here's a "localize" decorator that I wrote some time ago for fun (I think it was from a discussion on this list). It was for python 2.x (could easily be modified for 3.x I think, it's a matter of adapting the attribute names of the function object). It "freezes" all non local variables in the function. It's a hack! It may be possible to adapt it. def new_closure(vals): args = ','.join('x%i' % i for i in range(len(vals))) f = eval("lambda %s:lambda:(%s)" % (args, args)) return f(*vals).func_closure def localize(f): f_globals = dict((n, f.func_globals[n]) for n in f.func_code.co_names) f_closure = ( f.func_closure and new_closure([c.cell_contents for c in f.func_closure]) ) return type(f)(f.func_code, f_globals, f.func_name, f.func_defaults, f_closure) # Examples of how localize works: x, y = 1, 2 @localize def f(): return x + y def test(): acc = [] for i in range(10): @localize def pr(): print i acc.append(pr) return acc def lambdatest(): return [localize(lambda: i) for i in range(10)] # These examples will behave as follows:
-- Arnaud

Terry Reedy dixit (2011-06-11, 16:09):
I think this is another case... Although I can imagine that such 'private' arguments could be specified when calling -- after **{...}/bare **, e.g.: fun(1, b=3, **{'c':3}, my_secret_hidden_arg='xyz') fun(1, b=3, **, my_secret_hidden_arg='xyz') Though at the first sight I don't like this (`after-** args in calls') idea so much (contrary to `after-** args in definitions' idea). [...]
Arnaud Delobelle dixit (2011-06-11, 21:47):
Here mem is a keyword argument, not a variable. Though I understand that making it local/closure would need some code/closures hacking... Unless built in to the interpreter.
Yes, but this method makes code longer and more complex. And simple is better :) Consider my multi-factory example: def make_my_callbacks(callback_params): my_callbacks = [] for params in callback_params: def fun1(*args, **kwargs, params=params): "...do something with args and params..." def fun2(*args, **kwargs, params=params): "...do something with args and params..." def fun3(*args, **kwargs, fun1=fun1, fun2=fun2): """...do something with args and with functions fun1, fun2, for example pass them as callbacks to other functions..." my_callbacks.append((fun1, fun2, fun3)) return my_callbacks ...compared to: def make_fun1(params): def fun1(*args, **kwargs): "...do something with args and params..." return fun1 def make_fun2(params): def fun2(*args, **kwargs): "...do something with args and params..." return fun2 def make_fun3(fun1, fun2): def fun3(*args, **kwargs): """...do something with args and with functions fun1, fun2, for example pass them as callbacks to other functions..." return fun3 def make_my_callbacks(callback_params): my_callbacks = [] for params in callback_params: fun1 = make_fun1(params) fun2 = make_fun2(params) fun3 = make_fun3(fun1, fun2) my_callbacks.append((fun1, fun2, fun3)) return my_callbacks Though, maybe it'a a matter of individual taste...
Nice :) (and, as far as I understand, it could be used to implement the decorator I ment). Best regards. *j

I'm -1 on any proposal that somehow tries to make the default-argument hack more acceptable. The main reason people still feel the need to use it is that the for-loop is broken, insofar as it doesn't create a new binding for each iteration. The right way to address that is to fix the for-loop, IMO. -- Greg

Greg Ewing dixit (2011-06-13, 10:30):
I'm -1 on any proposal that somehow tries to make the default-argument hack more acceptable.
My propositions don't make that hack less acceptable -- proposing an alternative.
Do you mean that each iteration should create separate local scope? Then: j = 0 my_lambdas = [] for i in range(10): print(j) # would raise UnboundLocalError j = i my_lambdas.append(lambda: i) Or that the loop variable should be treated specially? Then: i_lambdas, j_lambdas = [], [] for i in range(10): j = i i_lambdas.append(lambda: i) j_lambdas.append(lambda: j) print(i_lambdas[2]()) # would print 2 print(j_lambdas[2]()) # would print 9 Cheers. *j

Jan Kaliszewski wrote:
My propositions don't make that hack less acceptable -- proposing an alternative.
You seem to be proposing yet another feature whose main purpose is to patch over a mismatch between existing features. That's not the path to elegant language design.
Do you mean that each iteration should create separate local scope?
No...
Or that the loop variable should be treated specially?
Yes, but in a way that you're probably not expecting. :-) My proposal is that, if the loop variable is referenced by an inner function (and is therefore in a cell), a new cell is created on each iteration instead of replacing the contents of the existing cell. This would mean that: * If the loop variable is *not* referenced by an inner function (the vast majority of cases), there would be no change from current semantics and no impact on performance. * In any case, the loop variable can still be referenced after the loop has finished with the expected results. One objection that's been raised is that, as described, it's somewhat CPython-specific, and it's uncertain how other Pythons would get on trying to implement it.
Yes, that's true. An extension to the idea would be to provide a way of specifying cell-replacement behaviour for any assignment, maybe something like j = new i Then your example would print 2 both times, and the values of both i and j after the loop would be 9. One slightly curly aspect would be that if you *changed* the value of i or j after the loop, the change would be seen by the *last* lambdas created, and not any of the others. :-) But I find it hard to imagine anyone doing this -- if you're capturing variables in a loop, you don't normally expect to have access to the loop variable at all after the loop finishes. -- Greg

On Mon, Jun 13, 2011 at 8:30 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Yikes, now *there's* a radical proposal. -lots on any idea that would make: def f(): i = 0 def g1(): return i i = 1 def g2(): return i return [g1, g2] differ in external behaviour from: def f(): result = [] for i in range(2): def g(): return i result.append(g) return result or: def f(): return [lambda: i for i in range(2)] or: def _inner(): for i in range(2): def g(): return i yield g def f(): return list(_inner()) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 6/12/2011 6:30 PM, Greg Ewing wrote:
Or use closures, which were partly designed to replace default arg use. This case is quite different from the multiple capture in for-loop case. The OP is simply trying to localize names for speed instead of using module constants, which would otherwise do quite fine and are routinely used in the stdlib. -- Terry Jan Reedy

Terry Reedy wrote:
Default args are specifically used in at least one use-case where closures give the wrong result.
The usual solution is to *not* use a closure:
That's just one use-case. Jan gave two others. Optimizations might be common in the stdlib, but it's a hack, and an ugly one. Function parameters should be kept for actual arguments, not for optimizing name look-ups. -- Steven

On Mon, Jun 13, 2011 at 5:11 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Function parameters should be kept for actual arguments, not for optimizing name look-ups.
Still, the post-** shared state (Jan's option 2) is likely the most obvious way to get early binding for *any* purpose without polluting the externally visible parameter list. Several questions that are unclear in the general case of definition-time code are resolved in obvious ways by that approach: Q. When is the code executed? A. At definition time, just like default argument values Q. Where are the results of the calculation stored? A. On the function object, just like default argument values Q. How does the compiler know to generate local variable lookups for those attributes? A. The names are specified in the function header, just like public parameters Q. What is the advantage over custom classes with __call__ methods? A. Aside from the obvious speed disadvantage, moving from a function with state that is preserved between calls to a stateful class that happens to be callable is a surprisingly large mental shift that may not fit well with the conceptual structure of a piece of code. While *technically* they're the same thing (just expressed in different ways), in reality the difference in relative emphasis of algorithm vs shared state can make one mode of expression far more natural than the other in a given context. class DoAndRemember(): def __init__(self): self.mem = collections.Counter() def __call__(self, val, verbose=False): result = do_something(val) self.mem[val] += 1 if verbose: print('Done {} times for {!r}'.format(self.mem[val], val)) do_and_remember = DoAndRemember() Custom classes also suffer grievously when it comes to supporting introspection (e.g. try help() or inspect.getargspec() on the above) and lack natural support for other features of functions (such as easy decorator compatibility, descriptor protocol support, standard annotations, appropriate __name__ assignment). Q. What is the advantage over using an additional level of closure? A. This is actually the most viable alternative, since the conceptual model is quite a close match and it doesn't break introspection the way a custom class does. The problems with this approach are largely syntactic: def _make_do_and_remember(): mem=collections.Counter() def do_and_remember(val, verbose=False): result = do_something(val) mem[val] += 1 if verbose: print('Done {} times for {!r}'.format(mem[val], val)) return do_and_remember do_and_remember = _make_do_and_remember() 1. The function signature is buried inside "_make_do_and_remember" (the class approach and even PEP 3150 have the same problem) 2. The name of the function in the current namespace and its __name__ attribute have been decoupled, require explicit repetition to keep them the same 3. This is basically an unreadable mess I'd actually be far happier with the default argument hack equivalent: def do_and_remember(val, verbose=False, *, _mem=collections.Counter()): result = do_something(val) _mem[val] += 1 if verbose: print('Done {} times for {!r}'.format(_mem[val], val)) All a "persistent state" proposal would do is create an alternative to the default argument hack that doesn't suffer from the same problems: def do_and_remember(val, verbose=False, **, mem=collections.Counter()): result = do_something(val) mem[val] += 1 if verbose: print('Done {} times for {!r}'.format(_mem[val], val)) It seems like the path of least resistance to me - the prevalence of the default argument hack means there's an existing, widespread practice that solves real programming issues, but is flawed in some ways (specifically, messing with the function's signature). Allowing declarations of shared state after the keyword-only arguments seems like a fairly obvious answer. The one potential trap is the classic one with immutable nonlocal variables that haven't been declared as such (this trap also applies to any existing use of the default argument hack): reassignment will *not* modify the shared state, only the name binding in the current invocation. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Jun 13, 2011 at 6:05 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
As yet another shade for this particular bikeshed, this one just occurred to me: def do_and_remember(val, verbose=False): @def mem=collections.Counter() result = do_something(val) mem[val] += 1 if verbose: print('Done {} times for {!r}'.format(_mem[val], val)) The @def ("at def") statement is just a new flavour of the same proposal that has been made many times before: a way to indicate that a simple assignment statement should be executed once at function definition time rather than repeatedly on every call to the function. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 13 June 2011 12:57, Nick Coghlan <ncoghlan@gmail.com> wrote:
Or to link this to PEP 3150: given: mem = collections.Counter() def do_and_remember(val, verbose=False): result = do_something(val) mem[val] += 1 if verbose: print('Done {} times for {!r}'.format(_mem[val], val)) (Or the other way around) -- Arnaud

Nick Coghlan dixit (2011-06-13, 21:57):
If using '@' character, I'd rather prefer: @in(mem=collections.Counter()) def do_and_remember(val, verbose=False): result = do_something(val) mem[val] += 1 if verbose: print('Done {} times for {!r}'.format(_mem[val], val)) @in (or @with, or @within, or @withlocal, or...) could be a language syntax construct, not a real decorator, though using -- already well settled -- decorator-like syntax. Important advantage of this variant is IMHO that then it is obvious for everybody that the binding(s) is (are) being done *early*. Regards. *j

Jan Kaliszewski dixit (2011-06-14, 00:30):
On second thought: no. I mean: no -- for a separate syntax construct with limited usage possibilities (see: cases mentioned by Steven); yes -- for language improvements that would make possible one of the solutions: 1. A real decorator: a) quasi-argument-locals-based (names could be used to read injected value and later could be rebound, like arguments); or b) another-level-closure-based (names could not be used to read injected values if rebound later: it's *either* a free variable *or* a local variable). or 2. `after-** hidden pseudo-arguments' (see previous posts...). Now I don't know which of them I'd prefer... And probably any of them would need some core-language modifications... (at least the '2' and '1a' variants) Regards. *j

On Mon, Jun 13, 2011 at 9:12 PM, Jan Kaliszewski <zuo@chopin.edu.pl> wrote:
Forgive me if im wrong but i believe that this is possible without any language changes using pure python. this is my attempt at it:
this uses a closure to hold the values of the injected values and hides them all test pass exactly as if the values were defined with makeLocals were globals within the function but all act as if they are locals outside of it. this means that if we define this this
and run
if we try this
print(aList) we get a NameError
and if we try this
all problems with global values still apply with this however. for example just as throws a UnboundLocalError so does
so what do you think? --Alex

Alex Light dixit (2011-06-14, 13:44):
Changing global state on each call seems to be both concurrency-and-recurrency-unsafe and inefficient. Though that 1a (closure-based) variant should be possible using techniques like that: http://mail.python.org/pipermail/python-ideas/2008-October/002227.html Retards. *j

On Tue, Jun 14, 2011 at 5:27 PM, Jan Kaliszewski <zuo@chopin.edu.pl> wrote:
well some of your safety concerns can be allayed, i hope, by replacing this snipet:
with this one:
with _modifyGlobals(func.__globals__, localArgs): ret = func(*args, **kwargs)
with _modifyGlobals defined as:
as for performance you are correct that it is more efficient to just use global variables. the dictionary updates add about 1 x 10**-4 (or, if the check for collisions with KWargs is removed, 5 x 10**-5) seconds to the run time of a function, at least on this computer. so not terribly significant just be sure to use sparingly also with the link you mentioned i could not seem to get it to work. Whenever i tried to use any built-in functions it would start throwing NameErrors. also that is only useful if you want to inject all global variables into the function. --Alex

Nick Coghlan wrote:
I wouldn't call adding even more complexity to function signatures "obvious", although I grant that it depends on whether you're Dutch :) Another disadvantage is that it uses a symbol instead of a word. Too many symbols, and your code looks like Perl (or APL). It's hard to google for ** to find out what it means. It's harder to talk about a symbol than a word. (In written text you can just write ** but in speech you have to use circumlocutions or made-up names like double-splat.) [...]
The problem with injecting locals in the parameter list is that it can only happen at write-time. That's useful, but there's a major opportunity being missed: to be able to inject at runtime. You could add test mocks, optimized functions, logging, turn global variables into local constants, and probably things I've never thought of. Here's one use-case to give a flavour of what I have in mind: if you're writing Unix-like scripts, one piece of useful functionality is "verbose mode". Here's one way of doing so: def do_work(args, verbose=False): if verbose: pr = print else: pr = lambda *args: None pr("doing spam") spam() pr("doing ham") ham() # and so on if __name__ == '__main__': verbose = '--verbose' in sys.argv do_work(my_arguments, verbose) But why does do_work take a verbose flag? That isn't part of the API for the do_work function itself, which might be usefully called by other bits of code. The verbose argument is only there to satisfy the needs of the user interface. Using a ** hidden argument would solve that problem, but you then have to specify the value of verbose at write-time, defeating the purpose. Here's an injection solution. First, the body of the function needs a generic hook, with a global do-nothing default: def hook(*args): pass def do_work(args): hook("doing spam") spam() hook("doing ham") ham() # and so on if __name__ == '__main__': if '--verbose' in sys.argv: wrap = inject(hook=print) else: wrap = lambda func: func # do nothing # or `inject(hook=hook)` to micro-optimize wrap(do_work)(my_arguments) If you want to add logging, its easy: just add an elif clause with wrap = inject(hook=logger). Because you aren't monkey-patching the hook function (or, heaven help us, monkey-patching builtins.print!) you don't need to fear side-effects. No globals are patched, hence no mysterious action-at-a-distance bugs. And because the injected function is a copy of the original, other parts of the code that use do_work are unaffected. But for this to work, you have to be able to inject at run-time, not just at write-time. -- Steven

On Mon, Jun 13, 2011 at 10:33 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Function parameters should be kept for actual arguments, not for optimizing name look-ups.
Even the bind-it-now behavior isn't always for optimization; it can also be used as a way of forcing stability in case the global name gets rebound. That is often an anti-pattern in practice, but ... not always.
I would say the most obvious place is in a decorator, using the function object (or a copy) as the namespace. Doing this properly would require some variant of PEP 3130, which was rejected largely for insufficient use.
Using the function object as a namespace (largely) gets around that, because you can use a with statement to change the settings temporarily.
[A verbose mode -- full example below, but the new spelling here at the top] Just replace:
with: def do_work(args): __function__.hook("doing spam") spam() __function__.hook("doing ham") ham() If you want to change the bindings, just rebind do_work.hook to the correct function. If you are doing this as part of a test, do so within a with statement that sets it back at the end. (The reason this requires a variant of 3130 is that the name do_work may itself be rebound, so do_work.hook isn't a reliable pointer.) -jJ [only quotes below here]

Jim Jewett wrote:
Acknowledged. But whatever the purpose, my comment still stands: function arguments should be used for arguments, not for their side-effect of injecting a local variable into the function namespace.
You mean something like this? with make_logging_len() as len: x = some_function_that_calls_len() That's fine for some purposes, but you're still modifying global state. If some_function_that_calls_len() calls spam(), and spam() also contains a call to len, you've unexpectedly changed the behaviour of spam. If that's the behaviour that you want, fine, but it probably isn't. There are all sorts of opportunities for breaking things when patching globals, which makes it somewhat of an anti-pattern. Better to make the patched version a local.
Ah, that's why it doesn't work for me! :) Even if it did work, you're still messing with global state. If two functions are using do_work, and one wants a print hook, and the other wants a logging hook (or whatever), only one can be satisfied. Also this trick can't work for optimizations. A call to do_work.hook requires a global lookup followed by a second lookup in the function object namespace, which is not as fast as using a local. -- Steven

On Mon, Jun 13, 2011 at 5:33 PM, Steven D'Aprano <steve@pearwood.info> wrote:
It's quite promising idea. Currenlty there are notion of cell for closures. What if globals would also use a cell? So that cell cound be either bound to a value or to a name in globals or builtin dictionary. With this in mind it could be possible to either change binding from name to value or vice versa, our to make a copy of the function with another cells. I think this adheres to Python philosophy of having anything modifyable. It will add at most two words of memory for each cell (name and global dict), and probably will not make interpreter slower. Also will probably allow to remove __globals__ attribute from functions in the long term. Then it even be possible to make some modules faster by either from __future__ import fast_bindings or it could be done by some external library like: __super_freezer_allow__ = True ... import sys, super_freezer super_freezer.apply(sys.modules) Probably about 80% modules do not need to rebind globals, so they can run faster. And if you need to monkeypatch them, just either not freeze globals in this module or change the bindings in all its functions. Thoughts? -- Paul

On 6/13/2011 10:33 AM, Steven D'Aprano wrote:
Given the expense of function calls, I would write the above as hook = None def do(args): if hook: hook("doing spam") ... if __name__ == '__main__': if '--verbose' in sys.argv: wrap = inject(hook=print) I do not see the point of all this complication. If you are not trying to optimize the function (and adding such hooks is obviously not), hook = print works just fine (in 3.x ;-). -- Terry Jan Reedy

Terry Reedy wrote: [...]
You're modifying a global variable. Now any other function that calls do_work() for its own purposes suddenly finds it mysteriously printing. A classic action-at-a-distance bug. For a simple stand-alone script, there's no problem, but once you have more complexity in your app, or a library, things become very different. My apologies, I've been doing a lot of reading about the pros and cons (mostly cons *wink*) of monkey-patching in the Ruby world, the open/closed principle, and various forms of bugs caused by the use of globals. I assumed that the problems would be blindingly obvious. I suppose they were only obvious to me because I'd just immersed myself in them for the last day or so! -- Steven

Steven D'Aprano wrote:
It's still rather non-obvious what's going on, though. Copious commenting would be needed to make this style of coding understandable. Also, it doesn't seem to generalise. What if the function in question calls other functions, which call other functions, which themselves need a verbose option? It seems you would need to explicitly wrap all the sub-function calls to pass the hook on to them. And what if there is more than one option to be hooked? You'd rapidly end up with a nightmarish mess. Here's another way to approach the problem: class HookableWorker(object): def hook(self, arg): pass def do_work(self): self.hook("Starting work") ... self.hook("Stopping work") def be_verbose(arg): print arg def main(): worker = HookableWorker() if "--verbose" in sys.argv: worker.hook = be_verbose worker.do_work() Now you can expand the HookableWorker class by adding more methods that all share the same hook, still without anything being global. -- Greg

Greg Ewing wrote:
I don't think so. The injection happens right at the top of the function. True, you need to know what "inject" does, but that's no different from any other function. Provided you know that "inject" adds a local binding to the function namespace, instead of using a global, it's easy to understand what this does: x = 42 @inject(x=23) def spam(): print(x) Not terribly mysterious. The only tricky thing is that some programmers aren't comfortable with the idea that functions are first class objects, and so: @inject(len=my_len) def spam(arg): return len(arg)+1 will discombobulate them. ("What do you mean, len isn't the built-in len?") But then again, they're likely to be equally put off by global patches too: len=my_len def spam(arg): return len(arg)+1 Doesn't stop us using that technique when appropriate.
That's a feature, not a bug! Patches are *local* to the function, not global. If you want to change global state, you can already do it, by monkey-patching the module. We don't need a new magic inject function to do that. This is not meant to be used for making wholesale changes to multiple functions at once, but for localized changes to one function at a time. A scalpel, not a chainsaw.
Absolutely. And that will still be a viable approach for many things. But... * You can only patch things that are already written as a class. If you want to add a test mock or logging to a function, this strategy doesn't help you because there's nothing to subclass. * There's a performance and (arguably) readability cost to using callable classes instead of functions. * Nor does it clean up the func(arg, len=len) hack. -- Steven

On Tue, Jun 14, 2011 at 12:33 AM, Steven D'Aprano <steve@pearwood.info> wrote:
This is getting deep into major structural changes to the way name lookups work, though. Pre-seeding locals with values that are calculated at run-time is a much simpler concept. A more explicit way to do the same thing might work along the following lines: 1. Add a writeable f_initlocals dict attribute to function objects (None by default) 2. When a function is called, if f_initlocals is not None, use it to initialise the locals() namespace 3. Add a new "local" statement to tell the compiler to treat names as local. Using this statement will create an f_initlocals dict mapping those names to None. 4. Add a new decorator to functools that works like the following: def initlocals(**kwargs): def inner(f): new_names = kwargs.keys() - f.f_initlocals.keys() if new_names: raise ValueError("{} are not local variables of {!r}".format(new_names, f)) f.f_initlocals.update(kwargs) return f return inner @functools.initlocals(mem=collections.Counter()) def do_and_remember(val, verbose=False): local mem result = do_something(val) mem[val] += 1 if verbose: print('Done {} times for {!r}'.format(mem[val], val)) You could still inject changes at runtime with that concept, but would need to be careful with thread-safety issues if you only wanted the change to apply to some invocations and not others. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Jun 14, 2011 at 12:33 AM, Steven D'Aprano <steve@pearwood.info> wrote:
As with *, ** and @, you don't search for them directly, you search for "def" (although redirects from the multiplication and power docs to the def statement docs may not be the worst idea ever). If we hadn't already added keyword-only arguments in Python 3, I'd consider this significantly more obscure. Having the function signature progress from "positional-or-keyword arguments" to "keyword-only arguments" to "implicit arguments", on the other hand, seems a lot cleaner than the status quo without being significantly more complicated. After all, there's no new symbols involved - merely a modification to allow a bare "**" to delimit the start of the implicit arguments when arbitrary keyword arguments are not accepted. Who knows, maybe explicitly teaching that behaviour would make people less likely to fall into the default argument trap. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 6/13/2011 3:11 AM, Steven D'Aprano wrote:
Terry Reedy wrote:
I meant an explicit user-defined closure with a separate cell for each function ...
not this implicit one where each function uses the *same* cell referring to the same int object.
The fundamental problem with this code for funcs is that "lambda x: x+i" is a *constant* equivalent to "def _(x): return x+i". Executing either 10 times creates 10 duplicate functions. The hypnotic effect of 'lambda' is that some do not immediately see the equivalence.
The explicit closure solution intended to replace "lambda x,i=i:x+i" is
def makef(j): return lambda x: x+j
We now have difference cells containing different ints. To get different functions from multiple compilations of one body we need either different defaults for pseudo-parameters or different closure cells. The rationale for adding the latter was partly to be an alternative to the former. Once closure cells were made writable with 'nonlocal', they gained additional uses, or rather, replaced the awkward hack of using mutable 1-element lists as closure contents, with the one elements being the true desired content. -- Terry Jan Reedy

On Sat, Jun 11, 2011 at 11:30 PM, Jan Kaliszewski <zuo@chopin.edu.pl> wrote:
This particular alternative to the default argument hack has come up before, as has the "hidden parameters after '**'" approach. (I thought there was a PEP on this, but I can't find anything other than the reference in the description of Option 4 in PEP 3103 - however, there is a thread on the topic that starts as part of the PEP 3103 discussion at http://mail.python.org/pipermail/python-dev/2006-June/066603.html) Institutional-memory'ly yours, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

It seems to me this discussion is mixing some different issues. (1) having secret parameters that don't show in help(). (2) injecting a value in some way I think these should be thought about separately. *For secret parameters:* help() could have a convention that it doesn't display variables with that start with _ or something like that, but that might have compatibility issues. Alternatively, there could be some new syntax or a decorator like
*@help.hide*("x,y") def foo(a, b, x=[], y={}): pass
help(foo) Help on function foo in module __main__:
f(a, b) *For injecting values:* There are several different cases. I'll use the arbitrary keyword *special* in the snippets below (but note that the keyword means something slightly different in each case): def foo1(): x = *special *[] ... x.append(t) Sets x every time the function is called to the same static list. What's special here is that I have one list created that gets reused, not a new list every time. This is how default arguments work and is useful for an accumulating list. def foo2(): *special *y = 0 ... y += t Initializes y once to 0 (at some time before the next line of code is reached). This is how static works in C++. This is what I want if my accumulating variable is a counter since numbers are immutable. This case easily handles the first case. If you never rebind x, you don't need to do anything special, otherwise something like this: def foo1a(): *special *_x = [] x = _x ... # might rebind x x.append(t) It's a bit clumsy to use the first case to handle the second case: def foo2a(): y = *special *[0] ... y[0] += t In addition, there are other use cases being discussed. This creates a new scope for i every time through the loop: def foo3(): result = [] for *special *i in range(10): def z(): return i result.append(z) And this injects a mock to replace a library function: def foo4(): return random.random() w = *special*(random.random=lambda: 0.1) foo4() Just because we might use similar hacks to do these now, doesn't mean that they are necessarily the same and I think the discussion has been going in several different directions simultaneously. I think all these cases have merits but I don't know which are more important. The last case seems to be handled reasonably well by various mock libraries using with, so I'm not particularly worried about it. I would like support for case 1 or 2. I don't like the idea of using a different function argument hack instead of the current one. --- Bruce Follow me: http://www.twitter.com/Vroo http://www.vroospeak.com

Feels like we're just repeating October 2008: http://mail.python.org/pipermail/python-ideas/2008-October/thread.html Are there any considerations we missed that time around?

Carl M. Johnson dixit (2011-06-13, 20:06):
Feels like we're just repeating October 2008:
http://mail.python.org/pipermail/python-ideas/2008-October/thread.html
Not exactly. That discussion was only about closure-based solutions and cases. Please note, that e.g. after-**-idea is a bit different. Cheers. *j

Bruce Leban dixit (2011-06-13, 18:45):
No, the idea is that after-**-constans are not only hidden-in-help- -arguments but that they also cannot be specified/overriden in a function call. So their usage would not be a hack that causes risk of incidental override by caller or that makes function signatures obfuscated (they would have to be defined separately, after **|**kwargs, in the righmost signature part). def compute(num1, num2, **, MAX_CACHE_LEN=100, cache=dict()): try: return cache[(num1, num2)] except KeyError: if len(cache) >= MAX_CACHE_LEN: cache.popitem() cache[(num1, num2)] = result = _compute(num1, num2) return result help(compute) # -> "... compute(num1, num2)" compute(1, 2) # OK compute(1, 2, MAX_CACHE_LEN=3) # would raise TypeError compute(1, 2, cache={}) # would raise TypeError ---- Open question: It's obvious that such a repetition must be prohibited (SyntaxError, at compile time): def sth(my_var, **, my_var): "do something" But in case of: def sth(*args, **kwargs, my_var='foo'): "do something" -- should 'my_var' in kwargs be allowed? (it's a runtime question) There is no real conflict here, so at the first sight I'd say: yes. Regards. *j

Jan Kaliszewski dixit (2011-06-14, 12:17):
Of course, I ment: def sth(my_var, **, my_var='foo'): "do something"
On second thought: no, such repetitions also should *not* be allowed. If a programmer, by mistake, would try to specify the argument value in a call, an explicit TypeError should be raised -- otherwise it'd become a trap (especially for beginners and absent-minded programmers). Regards. *j

On 2011-06-15 02:28, Jan Kaliszewski wrote:
I disagree. One of the main selling points of this feature for me is that adding a few "hidden parameters" to a function does not change the signature of the function. If you raise a TypeError when the name of a hidden parameter is in kwargs this is a change in signature. - Jacob

Jacob Holm wrote:
On 2011-06-15 02:28, Jan Kaliszewski wrote:
This is another reason why function parameters should not be used for something that is not a function parameter! +1 on the ability to inject locals into a function namespace. -1 on having the syntax for that masquerade as function arguments. -- Steven

Steven D'Aprano dixit (2011-06-15, 21:35):
OK, so the decorator or decorator-like syntax (using 'inject', 'within', 'owns' or other decorator name...) seems to be the most promising alternative. If so, next question is: which variant? 1. Decorator function with closure-like injecting (possibly could be implemented using closures): @functools.owns(cache=dict(), MAX_CACHE_LEN=100) def calculate(a, b): result = cache[(a, b)] if result is not None: return result ... # 'cache' identifier cannot be rebound to another object # because it was already used above in the function body # to refer to the injected object functools.owns() would be a real decorator function -- to apply either with @-syntax or dynamically, e.g.: decorated = [functools.owns(func) for func in functions] One question is whether it is technically possible to avoid introducing a new keyword (e.g. staticlocal) explicitly marking injected locals. Using such a keyword would be redundant from user point of view and non-DRY: @functools.owns(cache=dict(), MAX_CACHE_LEN=100) def calculate(a, b): staticlocal cache, MAX_CACHE_LEN # <- redundant and non-DRY :-( result = cache[(a, b)] if result is not None: return result ... 2. Decorator function with argument-like injecting. @functools.owns(cache=dict(), MAX_CACHE_LEN=100) def calculate(a, b): result = cache[(a, b)] if result is not None: return result ... # 'cache' identifier *can* be rebound to another object # than the injected object -- in the same way arguments can functools.owns() would be a real decorator function -- to apply either with @-syntax or dynamically, e.g.: decorated = [functools.owns(func) for func in functions] To implement such variant -- a new function constructor argument(s) and/or function/function code attribute(s) (read-only or writable?) most probably would have to be introduced... 3. Decorator-like language syntax construct: @in(cache=dict(), MAX_CACHE_LEN=100) # or 'owns' or 'inject' or... def calculate(a, b): result = cache[(a, b)] if result is not None: return result ... # 'cache' identifier *can* be rebound to another object # than the injected object -- in the same way arguments can It would not be a real decorator function -- so it would be applicable only using this syntax, and not dynamically, not after function creation. Which do you prefer? (or any other?) Regards. *j

Jan Kaliszewski wrote:
"owns"? I don't see how that testing for ownership describes what the function does. Likewise for "within", which sounds like it should be a synonym for the "in" operator: "if value within range" sort of thing.
Making locals unrebindable is a change of semantics that is far beyond anything I've been discussed here. This will be a big enough change without overloading it with changes that will be even more controversial! (I actually do like the idea of having unrebindable names, but that should be kept as a separate issue and not grafted on to this proposal.)
There shouldn't even be a question about that. Decorator syntax is sugar for func = decorator(func). Introducing magic syntax that is recognised by the compiler but otherwise is not usable as a function is completely unacceptable. If func is a pre-existing function: def func(a, b, c): pass then: new_func = functools.inject(x=1, y=2)(func) should be the same as: def new_func(a, b, c): # inject locals into the body of the function x = 1 y = 2 # followed by the body of the original pass except that new_func.__name__ may still reflect the old name "func". * If the original function previously referenced global or nonlocal x and y, the new function must now treat them as local; * Bindings to x and y should occur once, at function definition time, similar to the way default arguments occur once; * The original function (before the decorator applies) must be untouched rather than modified in place. This implies to me that inject must copy the original function and make modifications to the code object. This sounds to me that a proof-of-concept implementation would be doable using a byte-code hack. -- Steven

Steven D'Aprano dixit (2011-06-16, 10:46):
That variant (#1) would be simply shortcut for a closure application -- nothing really new. def factory(n): """Today's Python example.""" closuring = 'foo' def func(m): s = min(n, m) * closuring # here you also cannot rebind 'closuring' because is has # been referenced above
Introducing magic syntax that is recognised by the compiler but otherwise is not usable as a function is completely unacceptable.
Because of?... And it would not be more 'magic' than any other language syntax construct -- def, class, decorating with their @, *, ** arguments etc. The fact that such a new syntax would be similar to something already known and well settled (decorator function application syntax) would be rather an andantage than a drawback.
That's what I propose as variant #2. But that would need byte code hacking -- or some core language ('magic') modifications. Cheers. *j

On Thu, Jun 16, 2011 at 12:58 PM, Jan Kaliszewski <zuo@chopin.edu.pl> wrote:
I agree with D'Aprano, both option one and two would require modifying the code object, bytecode hacks, or core changes to the language, because, although the result is the same as your factory example the function is already compiled when it is given to the decorator. my best guess is that to implement this we would need to slightly redefine the way that python looks up variables by making it look in a special 'injected' dictionay after looking through locals and before globals. --Alex

Alex Light dixit (2011-06-16, 13:21):
But one of the default-argument-hack reasons is to optimize variable access by avoid dictionary loopup (locals are not dictionary-based). I'd rather stick to the implementation sketch described by Nick (changing the syntax a bit; as I said, imho @keyword... should be placed before function def). Cheers. *j

Jan Kaliszewski wrote:
The error occurs BEFORE the rebinding attempt. You get UnboundLocalError when you attempt to execute min(n, m), not when rebinding the closure variable. This is a side-effect of the compiler's rule "if you see an assignment to a variable, make it a local", that is all. You can rebind closuring if you tell the compiler that it isn't a local variable:
In that regard, closure variables are no different from globals. You wouldn't say that global are unrebindable because of this:
The situation is very similar.
But the problem is that it is deceptively similar: it only *seems* similar, while the differences are profound. super() is the only magic function I know of in Python, and that change was controversial, hard to implement, and fragile. super() is special cased by the compiler and works in ways that no other function can do. Hence it is magic. I can't imagine that Guido will agree to a second example, at least not without a blindingly obvious benefit. You can't reason about super()'s behaviour like any other function. Things which should work if super() were non-magical break, such as aliasing: my_super = super # Just another name for the same function. class MyList(list): def __init__(self, *args): my_super().__init__(*args) self.attr = None
And wrapping: _saved_super = super def super(*args, **kwargs): print(args, kwargs) return _saved_super(*args, **kwargs) class MyList(list): def __init__(self, *args): super().__init__(*args) self.attr = None
Only the exact incantation of built-in super() inside a method of a class works. As I said: magic. (Although you can supply all the arguments for super manually, which is tricky to get right but non-magic.) You are proposing that inject should also be magic: only the exact incantation @inject(...) directly above a function will work. We won't be able to wrap inject in another function, or alias it, or use it without the @ syntax. inject() isn't really a decorator, although it superficially looks like one. It's actually a compiler directive. If you want to propose #pragma for Python, do so, but don't call it a decorator! Most importantly, we won't be able to apply it to functions that already exist: list_of_functions = [spam, ham, cheese] # defined elsewhere decorator = inject(a=1) decorated = [decorator(f) for f in list_of_functions] will fail. I consider this completely unacceptable. -- Steven

On Thu, Jun 16, 2011 at 9:15 AM, Jan Kaliszewski <zuo@chopin.edu.pl> wrote:
There has to be *something* that tells the compiler to generate different bytecode (either a local lookup, a closure lookup or something new). This lands in the same category as "nonlocal" and "global" (and no, a "magic" decorator that the compiler recognises is not a reasonable alternative). That's why I quite liked the @def idea - it would just define a sequence of simple statements that the compiler will run at function *definition* time that is then used to preseed the local namespace at function *call* time (remember, it is intended to be a mnemonic for "at definition time" and the use of '@' also reflects the fact that this code would run just before function decorators are executed). Just like a class body, the @def code itself would be thrown away and only the resulting preseeded locals information would be retained. To allow rebinding to work correctly, this shared state could be implemented via cell variables rather than ordinary locals. Possible implementation sketch: Compile time: - @def statements are compiled in the context of the containing scope and stored on a new ASDL sequence attribute in the Function AST - symtable analysis notes explicitly which names are bound in the @def statements and this information is stored on the code object - code generation produces a cell lookup for any names bound in @def statements (even if they are also assigned as ordinary locals) - Raises a SyntaxError if there is a conflict between parameter names and names bound in @def statements Definition time: - @def statements are executed as a suite in the context of the containing scope but using a *copy* of the locals (so the containing scope is not modified) - names bound in the @def statements (as noted on the code object) are linked up to the appropriate cells on the function object Execution time: - Nothing special. The code is executed and references the cell variables precisely as if they came from a closure. An API could be provided in functools to provide a clean way to view (and perhaps modify) the contents of the cells from outside the function. And yes, I'm aware this blurs the line even further between functions and classes, but the core difference between "a specific algorithm with some persistent state" and "persistent state with optional associated algorithms" remains intact. And, to repeat the example of how it would look in practice: def do_and_remember(val, verbose=False): @def mem=collections.Counter() # Algorithm that calculates result given val mem[val] += 1 if verbose: print('Done {} times for {!r}'.format(_mem[val], val)) return result Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan dixit (2011-06-16, 13:41):
It is not less 'magic' than what I proposed as variant #3. And, in fact, it is almost the same -- the only important difference is the place: imho placing it *before* definition better emphasizes that the binding is an *early* one. @inject(mem=collections.Counter(), MAX_MEM=1000) def do_and_remember(val, verbose=False): or even (to stress that it is a language syntax construct: @inject mem=collections.Counter(), MAX_MEM=1000 def do_and_remember(val, verbose=False): or: @inject collections.Counter() as mem, 1000 as MAX_MEM def do_and_remember(val, verbose=False): or something similar... Also, such placement is imho more appropriate for @-that-starts-a-line- -syntax (because it would *resemble* decorating syntax anyway -- and what's wrong with that?) + we avoid misleading clusters such as: def do_something(func): @def mem=colletions.Counter # <- looks a bit like another @wraps(func) # decorator for wrapper_func() @my_decorator(mem) def wrapper_func(): ... Regards. *j

On Fri, Jun 17, 2011 at 3:15 AM, Jan Kaliszewski <zuo@chopin.edu.pl> wrote:
While that would require the from__future__ dance to make "inject" a new keyword, I like it much better than the looks-like-a-decorator-but-isn't syntax. The '@def' keyword would also technically work with that positioning, but @inject provides a better mnemonic for what is going on when the assignments are positioned outside the function.
The advantage of putting the '@def' lines *inside* the function is that it makes it clearer which namespace they're affecting. Examples like the above are readily addressed via style rules that say "don't do that, it's hard to read - leave a blank line between the @def code and the subsequent function decorators" Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan wrote:
What benefit is there in making inject a keyword? Even super isn't a keyword. As far as I'm concerned, inject need only be a function in the functools module, not even a built-in, let alone a keyword. Here's a quick and dirty version that comes close to the spirit of inject, as I see it. Thanks to Alex Light's earlier version. # Credit to Alex Light. from contextlib import contextmanager from functools import wraps def inject(**localArgs): def decorator(func): glbs = func.__globals__ @wraps(func) def inner(*args, **kwargs): with _modifyGlobals(glbs, localArgs): ret = func(*args, **kwargs) return ret return inner return decorator @contextmanager def _modifyGlobals(glbls, additions): frmglbls = glbls.copy() try: glbls.update(additions) yield finally: glbls.clear() glbls.update(frmglbls) And demonstrating it in use:
And as a decorator:
Unfortunately, this proof-of-concept inject function doesn't actually inject into locals, hence the "import builtins" work-around. But it demonstrates the intent, and the API. -- Steven

On Thu, Jun 16, 2011 at 11:37 PM, Steven D'Aprano <steve@pearwood.info> wrote:
If inject is a decorator then it would be used to inject into the locals of any function at runtime. This is in contrast to Nick's proposal where it's strictly tied to definition time and injection is internal to to the subject of the injection, the function body. In the latter case, the resulting code object from the function body could incorporate the injection right there. To accomplish the same thing with a decorator, the function definition would have to know about any possible decoration before the code object is compiled (good luck), or the decorator would replace/modify the compiled code. Seems like that's not far off from what Jan was proposing. The alternative is to have the injection handled externally to the compiled code object, like a co_static on the code object of a __static__ on the function object. Then the execution of the function code object would pull that in. -eric

On Fri, Jun 17, 2011 at 3:37 PM, Steven D'Aprano <steve@pearwood.info> wrote:
What benefit is there in making inject a keyword? Even super isn't a keyword.
Given the magic effects they have on the compiler, super and __class__ probably *should* be keywords. The only reason they aren't is that their effect (automatically defining __class__ as a local when inside a function in a class scope) is relatively harmless in the event that super has actually been rebound to refer to something other than the builtin: class C: def f(self): print(locals()) def g(self): __class__ print(locals()) def h(self): super print(locals())
Sorry, I meant to point out why this was a bad idea when Alex first posted it. The __globals__ reference on a function object refers to the globals of the module where the function is defined. Modify the contents of that dictionary and you modify the contents of that module. So this "injection" approach not only affects the function being decorated, but every other function in the module. Thread safety is completely non-existent and cannot be handled locally within the decorated function. The reason something like @inject or @def is needed as a language construct is because the object of the exercise is to define a new kind of scope (call it "shared locals" for lack of a better name) and we need the compiler's help to do it properly. Currently, the closest equivalent to a shared locals scope is the default argument namespace, which is why people use it that way: the names are assigned values at function definition time, and they are automatically copied into the frame locals whenever the function is invoked. A shared namespace can also be created explicitly by using a closure or a class, but both of those suffer from serious verbosity (and hence readability) problems when the design intent you are aiming to express is a single algorithm with some persistent state. As noted in Jan's original message, using the default argument namespace has its own flaws (rebinding of immutable targets not working properly, cluttering the function signature on introspection, risk of inadvertent replacement in the call), but if it didn't address a genuine design need, it wouldn't be so popular. Hence the current discussion, which reminds me a lot of the PEP 308 (ternary expressions) discussion. Developers have proven they want this functionality by coming up with a hack that does it, but the hack is inherently flawed. Telling them "don't do that" is never going to work, so the best way to eliminate usage of the hack is to provide a way to do it *right*. (Relating back to PEP 308: how often do you see the and/or hack in modern Python code written by anyone that learned the language post Python 2.4?) The runtime *semantics* of my implementation sketch (an additional set of cells stored on the function object that are known to the compiler and accessed via closure ) are almost certainly the right way to go: it's a solution that cleanly handles rebinding of immutable targets and avoids cluttering the externall visible function signature with additional garbage. The only question is how to tell the compiler about it, and there are three main options for that: 1. Embedded in the function header, modelled on the handling of keyword-only arguments: def example(arg, **, cache=set(), invocations=0): """Record and return arguments seen and count the number of times the function has been invoked""" invocations += 1 cache.add(arg) return arg Pros: no bikeshedding about the keyword for the new syntax, namespace for execution is clearly the same as that for default arguments (i.e. the containing namespace) Cons: look like part of the argument namespace (when they really aren't), no mnemonic to assist new users in remembering what they're for, no open questions 2. Inside the function as a new statement type (bikeshed colour options: @def, @shared, shared) def example(arg, **, cache=set(), invocations=0): """Record and return arguments seen and count the number of times the function has been invoked""" @def cache=set(), invocations=0 invocations += 1 cache.add(arg) return arg Pros: implementation detail of shared state is hidden inside the function where it belongs, keyword choice can provide a good mnemonic for functionality Cons: needs new style rules on appropriate placements of @def/shared statements (similar to nonlocal and global), use of containing namespace for execution may be surprising Open Questions: whether to allow only one line with a tuple of assignments or multiple lines, whether to allow simple assignments only or any simple non-flow control statement 3. After the decorators and before the function definition (bikeshed colour options: @def, @inject, @shared) @def cache=set(), invocations=0 def example(arg) """Record and return arguments seen and count the number of times the function has been invoked""" invocations += 1 cache.add(arg) return arg Pros: keyword choice can provide a good mnemonic for functionality, namespace for execution is clearly the same as that for decorator expressions (i.e. the containing namespace) Cons: puts private implementation details ahead of the public signature information, looks too much like an ordinary decorator Open Questions: whether to allow only one line with a tuple of assignments or multiple lines I already have too much on my to-do list to champion a PEP for this, but I'd be happy to help someone else with the mechanics of writing one and getting it published on python.org (hint, hint Jan!). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan wrote:
On Fri, Jun 17, 2011 at 3:37 PM, Steven D'Aprano <steve@pearwood.info> wrote:
I believe that you may have missed that the _modifyGlobals context manager makes a copy of globals before modifying it. But even if you are correct, and the implementation as given is broken, I had written: [quote] Unfortunately, this proof-of-concept inject function DOESN'T ACTUALLY INJECT INTO LOCALS [emphasis added], hence the "import builtins" work-around. But it demonstrates the intent, and the API. [end quote] There was no intention for this to be the working implementation, just to demonstrate the API. As I described earlier, "close to the spirit of inject". I agree with much of the rest of your post (snipped for brevity), with a few additional points below:
The only question is how to tell the compiler about it, and there are three main options for that:
Four actually.
Additional con: you can only inject such locals at the time you write the function. Cannot take an existing function and make a runtime modification of it. This is, essentially, a compiler directive masquerading as function parameters.
2. Inside the function as a new statement type (bikeshed colour options: @def, @shared, shared)
This proposal shouldn't be just about shared state. That is just one use-case out of a number, and not all use-cases should be hidden.
Additional cons: looks too much like a decorator, particularly if the function contains an inner function; also looks too much like a function definition. Can only be performed when the function is written, and cannot be applied to existing functions. This too is a compiler directive, this time masquerading as a decorator.
Whether the @def line must appear at the start of the function (before or after the docstring), or like globals, can it appear anywhere in the function?
3. After the decorators and before the function definition (bikeshed colour options: @def, @inject, @shared)
This too is a compiler directive masquerading as a decorator. It too suffers from much the same cons as putting @def inside the function body. You have missed a fourth option, which I have been championing: make inject an ordinary function, available from the functools module. The *implementation* of inject almost certainly will require support from the compiler, but that doesn't mean the interface should! Pros: - "Inject" is the obvious name, because that's what it does: inject the given keyword arguments into a (copy of a) function as locals. - Not a compiler directive, but an ordinary function that operates at runtime like any other function. - Hence it works like ordinary decorators. - Doesn't require a new keyword or new syntax. - Can be applied to any Python function at any time, not just when the function is written. Cons: - The implementation will require unsupported bytecode hacks, or compiler support, but so will any other solution. This is only a negative when compared to the alternative "do nothing". - Some people may disagree that "inject" is the obvious name. There may still be room for bikeshedding. Open questions: - Should injected locals go directly into the locals, as if executed in the body of the function, or into a new "shared/injected locals" namespace as suggested by Nick? -- Steven

On Fri, Jun 17, 2011 at 10:12 PM, Steven D'Aprano <steve@pearwood.info> wrote:
No, I didn't miss it, I left it out on purpose because I think messing with the runtime name lookup semantics is a terrible idea. You and others seem fond of it, but namespace semantics are the heart and soul of why functions are so much faster than module level code and we shouldn't be touching that logic with a 10 foot pole. Adding a new cell-based shared namespace that uses the same runtime lookup semantics as closures to replace *existing* uses of the default argument hack? Sure, that's a reasonable proposal (it may still get rejected due to devils in the details, but it has at least as much going for it as PEP 308 did). Messing with normal locals from outside a function, or providing an officially sanctioned way to convert global references to some other kind of reference *after* the function has already been defined? Hell no, that's a solution looking for a problem and the concept of eliminating the default argument hack shouldn't be burdened with that kind of overreaching. The secret to the speed of functions lies in the fact that the compiler knows all the names at compile time so it can generate appropriate load/store operations for the different scopes (array lookup for locals, cell dereference for closure variables, global-or-builtin lookup for everything else). This benefits not just CPython, but all Python implementations: inside a function, they're allowed to assume that the *only* code changing the state of the locals is the function code itself. Cell dereferencing allows for the fact that closure variables might change (but are still reasonably close to locals in speed, since the *cells* are referenced from an array), and global and builtin lookup is the slowest of all (since it involves actually looking up identifiers in namespace dictionaries). Even a JIT compiler like PyPy can be more aggressive about optimising local and cell access than it can be about the officially shifting sands that are the global and builtin namespaces. This is why the nonlocal and global directives exist: to tell the compiler to change how it treats certain names. Arguments (including the associated default values) are given additional special treatment due to their placement in the function header. If we want to create a new namespace that is given special treatment by the compiler, those are the two options that are even remotely viable: placement in the function header (after the ** entry) or flagged via a new compiler directive (and the precedent of "nonlocal" and "global" suggests that directive should occur inside the function body rather than anywhere else). "@def" is primarily a proposal to avoid having to do the from __future__ dance in defining a new keyword, so I'll modify it to the more explicit "atdef" to avoid confusion with decorators). A new compiler directive is my own preference (due to the major semantic differences between how shared variables will be handled and how default arguments are handled), and I now believe it makes sense to use nonlocal, global and default arguments as the model for how that would work: atdef VAR=EXPR [, VAR=EXPR]* As with nonlocal and global, definition time statements could technically appear anywhere in the function body (with their full effect), but style guidelines would recommend placing them at the beginning of the function, just after the docstring. Parentheses around the var list would not be permitted - use multiple shared statements instead (parentheses would, however, naturally permit the expressions themselves to span multiple lines). Such a statement would readily cover the speed enhancement, early-binding and shared state use cases for the default argument hack (indeed, the compiler could conceivably detect if a shared value was never rebound and simply load the cell contents into each frame as a local variable in that case, avoiding even the cell dereference overhead relative to the speed hack). The 'atdef' phrasing slightly emphasises the early-binding use case, but still seems reasonable for the speed enhancement and shared state use cases. In contrast, a keyword like 'shared' which emphasised the shared state use case, would feel far more out of place when used for speed enhancement or early binding (as well as being far more likely to conflict with existing variables names). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Jun 17, 2011 at 9:26 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
atdef VAR=EXPR [, VAR=EXPR]* Such a statement would readily cover the speed enhancement,
early-binding and shared state use cases for the default argument hack
Doing things this way however removes one of the chief benefits of the inject statement, the ability to create multiple versions of the same function with different sets of shared data. You earlier gave an example similar to this as an example of what the syntax under your proposal would look like. def example(arg): """Record and return arguments seen""" atdef cache=set() #do something cache.add(arg) return arg under proposal 4 (inject as full blown function, usable like any decorator) it would look like this instead @inject(cache = set()) def example(arg): """Record and return arguments seen""" #do something cache.add(arg) return arg but what if, later, you decided you needed 10 different 'example' functions each with its own cache as well as the ability to make more. under your proposal we would have to make extensive changes. turning the code into def make_example(): def example(arg): """Record and return arguments seen""" #do something cache.add(arg) return arg return example example_list = [make_example() for i in range(10)] this code is rather difficult to read and it is difficult at first glance to figure out what is actually happening under proposal 4 the result would be much simpler. we could just remove the @inject and put that on in the list comprehension, like so. def _example_func(arg): """Record and return arguments seen""" #do something cache.add(arg) return arg example_list = [inject(cache=set())( _example_func) for i in range(10)] make_example = lambda: inject(cache=set())( _example_func) this is IMO far easier to read and understand. Furthermore it gives the added benefit in that you can chose to run 'example' so that it uses a true global variable, instead of an injected one. --Alex

On Fri, Jun 17, 2011 at 9:22 AM, Alex Light <scialexlight@gmail.com> wrote:
The only way I could see runtime injection work is if you limited the injection to names already tied to the locals, in the same way parameters are. This would require one of the 3 solutions that Nick outlined. Let's assume the injection values were stored on the function object, like closures and defaults are, perhaps in an attribute named __atdef__. Then runtime injection could be used to replace __atdef__, like you can with the defaults, if it were not read-only. However, If __atdef__ were read-only, like __closure__, the runtime injection would have to do some trickery, like you have to do if you are going to mess with the closures [1]. This is a hack, in my mind, since being read-only indicates to me that the expectation is you shouldn't touch it! With that said, in that case a runtime injection function would have to generate a new function. The new function would have to have the desired __atdef__, and match any other read-only attribute. Then the injection function would copy into the new function object all the remaining attributes of the old function, including the code object. Here's an example of what I mean: def inject(f, *args): if len(args) != len(f.__atdef__): raise TypeError("__atdef__ mismatch") func = FunctionType(f.__code__, f.__globals__, f.__name__, f.__defaults__, f.__closure__, tuple(args)) # copy in the remaining attributes, like __doc__ return func You can already do this with closures and defaults. If the elements of __atdef__ are cells, like with __closure__ then you would have to throw in a little more logic. If you wanted to do kwargs, you would have to introspect the names corresponding to __atdef__. I don't know if this would have a performance impact on the new function, in case __defaults__ or __closure__ are more than just attributes on the function object. Like I said, this is a hack around the read-only attribute, but it shows that with the solutions Nick outlined, you can still have an injection function (could be used as a decorator, I suppose) . The only catch is that the names for __atdef__ would be tied into the function body at definition time, which I think is a good thing. Finally, regardless of if __atdef__ were read-only or not, I think a runtime injection function should return a new function and leave the old one untouched. That seems to meet the use-case that you presented. -eric [1] Here's an example: from types import FunctionType INJECTEDKEY = "injected_{}" OUTERLINE = " outer_{0} = injected_{0}" INNERLINE = " inner_{0} = outer_{0}" SOURCE= ("def not_important():", " def also_not_important():", " return also_not_important") def inject_closure(f, *args): injected = {} source = list(SOURCE) for i in range(len(args)): source.insert(1, OUTERLINE.format(i)) source.insert(-1, INNERLINE.format(i)) injected[INJECTEDKEY.format(i)] = args[i] exec("\n".join(source), injected, injected) closure = injected["not_important"]().__closure__ func = FunctionType(f.__code__, f.__globals__, f.__name__, f.__defaults__, closure) func.__annotations__ = f.__annotations__ func.__doc__ = f.__doc__ func.__kwdefaults__ = f.__kwdefaults__ func.__module__ = f.__module__ return func

On 6/17/2011 9:26 AM, Nick Coghlan wrote:
I do not really want a new namespace and for the purpose of the OP, named local constants (for speed or freezing the meaning of an expression or both), we do not need one. There is already a fourth 'namespace' for constants, a tuple f.__code__.co_consts, whose 'names' are indexes, just as with the locals array. Given def f(a, **, b=1001, len = len): return 2001 # one possible spelling def f(a): # alternate constant b = 1001, len = len return 2001 the compiler should put 1001 and len into co.consts and convert 'b' and 'len' into the corresponding indexes, just like it does with 'a', and use the LOAD_CONST bytecode just as with literal constants like 2001 in the body. Constant names would not go into .co_names and not increment .co_argcount. This would make named constants as fast and def-time frozen as default args without the disadvantages of being included in the signature and over-writable on calls. -- Terry Jan Reedy

On Fri, Jun 17, 2011 at 4:58 PM, Terry Reedy <tjreedy@udel.edu> wrote:
On 6/17/2011 9:26 AM, Nick Coghlan wrote:
I really like this idea. The only concerns I can see are losing the name for use in debugging or embedded functions, and I assume that those can be dealt with. -jJ

On 6/19/2011 3:28 PM, Jim Jewett wrote:
I consider that a secondary detail. If the names are recorded, in a method that distinguishes them from parameter names, then they can be introspected, just like other local names. They could be included in locals(), though I hardly ever use that and am not familiar with the details of its runtime construction. The main problem of my idea as originally conceived is that it only really works everywhere for constant expressions limited to literals and builtin names (as were my examples). While that would be useful and eliminate one category of default arg misuse, it probably is not enough for new syntax. For general expressions, it is only guaranteed to work as intended for top-level def statements in interactive mode, which are compiled immediately before execution. Otherwise, there is a gap between creation of the code-object and first execution of the def statement. So name-resolution may be too early or even impossible (as it is for .pyc files). Or some new mechanism would be needed to patch the code object. This gets to the point that compilation time and hence code-object creation time seems more an implementation detail than part of the language def. CPython creates code objects just once for each function body, and reuses them for each def or lambda invocation, but this may be an optimization rather than language requirement. -- Terry Jan Reedy

On Mon, Jun 20, 2011 at 7:10 AM, Terry Reedy <tjreedy@udel.edu> wrote:
Oops, I should have finished reading the thread before replying :)
The compilation time/definition time/execution time split is part of the language definition, as is the immutability of code objects (although, interestingly enough, the compile time distinction is mentioned in the language reference, but not well defined - it is mostly implicit in the definition of the compile() builtin). An implementation doesn't *have* to reuse the code objects when multiple function objects share the same definition, but there's no real reason not to. I created an issue pointing out that these semantics should really be clarified in the execution model section of the language reference: http://bugs.python.org/issue12374 Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan wrote:
Isn't changing name lookup semantics at runtime precisely what JIT compilers do? But it doesn't really matter, because that's not what I'm proposing. I'm not suggesting that the lookup semantics should be changed when the function is called. I'm saying that a new function should be created, based on the original function, with the desired semantics. In principle, this could be as simple as: - make a copy of the function object - in the copy, add cells for any injected variable - and modify the copied code object to change the appropriate LOAD_GLOBAL opcodes to LOAD_DEREF (and similarly for rebindings). although I dare say that in practice there'll be a certain amount of book-keeping required to make it work reliably. -- Steven

On 2011-06-18 06:03, Steven D'Aprano wrote:
I think that depends on what you count as part of the runtime name lookup semantics.
If you want the injected values to also affect functions that are defined within the decorated function it gets a lot more complicated. But yes, in theory it could work. One thing that would make it a *lot* easier to write such an "inject" function would be if we could replace the way globals are looked up to use cells as well. I am thinking of a different "kind" of cell that wouldn't hold its value itself but get it from the module globals the way LOAD_GLOBAL does today. This cell would be in the __closures__ of the function and could be replaced using a decorator like Steven proposed. A consequence of this would be that you could optionally allow "nonlocal" to bind global names when there are no suitable nonlocal names to bind to (e.g. in a top-level function). It has always slightly bothered me that you couldn't do that, because it makes it harder to move code between levels. As written, this would probably slow down access to globals a little bit. However I have another idea (basically a more backward-compatible variation of PEP 280) that would let us use cells or cell-like objects for almost all accesses, at the cost of changing <module>.__dict__ to be a dict *subclass*. Best regards - Jacob

On Sat, Jun 18, 2011 at 1:04 AM, Jacob Holm <jh@improva.dk> wrote:
I will say that I was surprised to discover that `nonlocal` can be used only for outer function locals and not for globals, since the name "nonlocal" seems very general, as if it encompassed all things not local. If it could be changed, I think it would be a little more intuitive. In addition, as you mention, it's slightly better for refactoring.

On 6/18/2011 5:22 PM, Carl M. Johnson wrote:
We were aware of that when it was selected, after much discussion. Closures were originally read-only partly because it was uncertain how to spell 'write'.
If it could be changed, I think it would be a little more intuitive.
It cannot, I hope for obvious reasons. I should hope that the docs make clear enought that names in a function are *partitioned* into module, closure, and local and that written names are local by default or one of the other 2 if declared. Actually, I think improving the nonlocal doc is part of some issue. -- Terry Jan Reedy

On Jun 18, 2011, at 10:22 PM, Carl M. Johnson wrote:
We should put a stop the notion that any time someone says, "I was surprised" that there needs to be a change to the language. If surprise happens because someone skipped reading the docs and made an incorrect guess about how a keyword behaves (i.e. using a new feature without reading about what it actually does), then "I was surprised" means very little. No matter what was implemented for "nonlocal", someone was going to be surprised that it didn't match their intuition. If nonlocal meant, "first match in the chain of enclosing scopes", then would you expect "nonlocal int" to write into the builtin scope? If nonlocal included globals, would it be a surprise that "global x" and "nonlocal x" would do exactly the same thing, but only if x already existed in the global scope? Whatever the answers, the important point is that it is hard to eliminate surprise when surprise is based on someone's guess about how a keyword is implemented. One the Python Weekly URL's quotes of the week last month was: "When did we suddenly come to expect that people could program in a language without actually learning it?" ISTM, it would be much better if language change proposals came in the form of: "change X makes the following code better and is worth the breaking of code Y and making person Z relearn what the feature does." That would get to essentials while taking the unpersuasive "I was surprised" off the table. my-two-cents-ly, Raymond

On Sun, Jun 19, 2011 at 3:41 PM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
Indeed. If anyone was curious as to why I've been recently trying to steer the discussion back specifically towards Jan's original idea of replacing current (ab)uses of the default argument hack (and nothing more), this is pretty much it. We *know* those use cases exist because there is code in the wild that uses them (including in the standard library). Saying "don't do that" isn't an adequate response as, for anyone that knows about the hack and how it works, the alternatives are far harder to read (in addition to being far more verbose and error prone to write in the first place). Extrapolating to additional functionality that is difficult or impossible in current Python code is sheer speculation that is likely to weigh down any proposal with unnecessary cruft that gets it rejected. What is needed now is one or more volunteers that are willing and able to: 1. Write a PEP that: - distils the core discussion in this thread down into a specific proposal that can be brought up on python-dev - explains *why* the default argument hack is undesirable in general (unintuitive, hard to look up, harmful to introspection, invites errors when calling affected functions) - clearly articulates the uses of the default argument hack that the proposal aims to eliminate (early binding, shared locals, performance) - include real world example of such uses from the standard library (and potentially other code bases) - optionally, also describe the potential "function factories" that could be developed based on the semantics I proposed (see Eric Snow's post for a sketch of how such factories might work once given a template function to work with) 2. Create a reference implementation targeting Python 3.3 (with step 1 being significantly more important at this stage - while the implementation can't be called *easy*, it should be a reasonably straightforward combination of the existing code that handles nonlocal statements and that which handles the calculation and storage of default arguments). I'm not going to do it myself (I already have a couple of open PEPs that I need to make the time to follow up on), but I'm more than happy to provide pointers and advice to someone else that steps up to do so. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 6/19/2011 2:56 AM, Nick Coghlan wrote:
Along that line, I posted the following concrete suggestion a couple of days ago, but have seen no response: ''' for the purpose of the OP, named local constants (for speed or freezing the meaning of an expression or both), we do not need [a new namespace]. There is already a fourth 'namespace' for constants, a tuple f.__code__.co_consts, whose 'names' are indexes, just as with the locals array. Given def f(a, **, b=1001, len = len): return 2001 # one possible spelling def f(a): # alternate constant b = 1001, len = len return 2001 the compiler should put 1001 and len into co.consts and convert 'b' and 'len' into the corresponding indexes, just like it does with 'a', and use the LOAD_CONST bytecode just as with literal constants like 2001 in the body. Constant names would not go into .co_names and not increment .co_argcount. This would make named constants as fast and def-time frozen as default args without the disadvantages of being included in the signature and over-writable on calls. ''' Did this not actually go through?
The above is a specific proposal (with two possible syntax spellings) and an outline of a specific implementation.
By 'shared locals' do you mean nonlocals? That is not quite what Jan was requesting.
I do not see what nonlocals has to do or needs to have to do with *local* constants. My proposal is that to store named constants, we reuse the current code to recognize and calculate named defaulted locals with the current code to store and retrieve anonymous constants, -- Terry Jan Reedy

On Mon, Jun 20, 2011 at 4:26 AM, Terry Reedy <tjreedy@udel.edu> wrote:
Did this not actually go through?
It did, I just forgot to reply. Sorry about that. co_consts is not a namespace - it's just a cache where the compiler stashes values (usually literals) that can be calculated at compile time (note, NOT definition time - it happens earlier than that, potentially even prior to the current application invocation if the module was cached in a .pyc file). Using it for mutable state is not possible, and thus would completely miss the point of some of the significant uses of the default argument hack. It is also quite possible for a single code object to be shared amongst multiple function definitions (e.g. when a function declaration is inside a loop). Accordingly, any new definition time state needs to go on the function object, for all the same reasons that current definition time state (i.e. annotations and default parameter values) is stored there.
I do not see what nonlocals has to do or needs to have to do with *local* constants. My proposal is that to store named constants, we reuse the current code to recognize and calculate named defaulted locals with the current code to store and retrieve anonymous constants,
The nonlocals handling code is relevant because creating state that is shared between function invocations is what happens with closures, and the nonlocal statement serves to tell the compiler that names that would otherwise be considered local should instead be considered closure references. We want the new statement to do something similar: the nominated names will exist in the new definition time namespace rather than being looked up in any of the existing locations (locals, outer scopes, globals/builtins). And, as noted above, the const calculation code isn't useful because it happens at the wrong time (compilation time instead of definition time) and because code objects may be shared amongst multiple function definitions. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan dixit (2011-06-17, 17:02): [...snip...]
Nick Coghlan dixit (2011-06-19, 16:56): [...snip...]
I could do it with pleasure, at least step #1 -- i.e. writing a PEP (step #2 seems to be quite non-trivial :), though for sure would be very instructive...). *But* -- only provided that it would be known and accepted that *I could not act urgently nor quickly* at all (I got a new job recently and even don't know yet how much spare time per week would I be able to save up during incoming months). Would that be OK? Cheers. *j

On Mon, Jun 20, 2011 at 9:34 PM, Jan Kaliszewski <zuo@chopin.edu.pl> wrote:
That's fine, there's still quite a lot of time before the first alpha of 3.3. The main goal is to get something captured in the form of a PEP so even if nothing happens for a while, the discussion doesn't have to restart from scratch when the topic comes up again. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 17 Jun, 2011, at 7:37, Steven D'Aprano wrote:
FYI <http://code.activestate.com/recipes/277940-decorator-for-bindingconstants-at...> implements a "bind all globals" variant of the injection using byte-code hacks. Changing that to only bind specific globals should be easy enough ;-) Is the inject functionality needed by other Python implementations, and in particular by PyPy? I use the keyword arguments hack mostly for two reasons: slightly higher speed in tight loops and a workaround for accessing globals in __del__ methods. Both are primairily CPython hacks, AFAIK the PyPy jit is smart enough to optimize globals access close to the speed of local variable acces. A stdlib function that implements the activestate recipe would be good enough if the functionality is only needed for CPython (with a fallback to an function that doesn't change the function body for the other Python implementations) Ronald

On Thu, Jun 16, 2011 at 10:39 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Even still, at first glance that looks like a decorator. Also, going multiline makes it worse: @inject (mem=collections.Counter(), MAX_MEM=1000) def do_and_remember(val, verbose=False): ...
In my mind, putting it inside the function is similar to docstrings being inside, especially if the simple statements are evaluated at definition time. Also, I kind of like the idea of combing @ with the keyword since it distinguishes the context. What about when you have several of these and they get long? Could you use parentheses? With the def keyword and parentheses it looks different enough from a function that I don't mind it, but maybe the keyword could be different (I do like that it reuses a keyword): def f(a,b): """Do something... """ @def (x=[name for name in names if name != None], y=something_else) print(a, b) print([y(name) for name in x]) (And a given statement would fit in nicely instead of those parentheses. ;) -eric

On Fri, Jun 17, 2011 at 12:39 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On Fri, Jun 17, 2011 at 3:15 AM, Jan Kaliszewski <zuo@chopin.edu.pl> wrote:
or even (to stress that it is a language syntax construct:
@inject mem=collections.Counter(), MAX_MEM=1000 def do_and_remember(val, verbose=False):
This is reminding me of the once (or final or static) discussion a few years ago, but I can't seem to find that PEP. -jJ

On Sat, Jun 11, 2011 at 9:30 AM, Jan Kaliszewski <zuo@chopin.edu.pl> wrote:
For these cases I use a class based solution. It's simple and easy to test. class DoAndRemember: def __init__(self): self._mem = collections.Counter() def __call__(self, val, verbose=False): result = do_something(val) self.mem[val] += 1 if verbose: print('Done {} times for {!r}'.format(mem[val], val)) do_and_remember = DoAndRemember() -- David blog: http://www.traceback.org twitter: http://twitter.com/dstanek www: http://dstanek.com

On 6/11/2011 9:30 AM, Jan Kaliszewski wrote:
One problem with trying to 'fix' this is that there can be defaulted args which are not intended to be overwritten by users but which are intended to be replaced in recursive calls.
The body should all be runtime. Deftime expression should be in the header.
I thought of this while reading 'the problem'. It is at least plausible to me.
The decorator would have to modify the code object as well as the function objects, probably in ways not currently allowed. -- Terry Jan Reedy

Terry Reedy wrote:
I think any solution to this would have to be backward compatible. A big NO to anything which changes the behaviour of existing code.
The body should all be runtime. Deftime expression should be in the header.
That's not even the case now. The global and nonlocal keywords are in the body, and they apply at compile-time. I don't like the name inject as shown, but I like the idea of injecting locals into a function from the outside. (Or rather, into a *copy* of the function.) This suggests generalising the idea: take any function, and make a copy of it with the specified names/values defined as locals. The obvious API is a decorator (presumably living in functools). Assume we can write such a decorator, and postpone discussion of any implementation for now. Firstly, this provides a way of setting locals at function definition time without polluting the parameter list and exposing local variables to the caller. Function arguments should be used for arguments, not internal implementation details. @inject(mem=collections.Counter()) def do_and_remember(val, verbose=False): # like do_and_remember(val, verbose=False, mem=...) But more importantly, it has wider applications, like testing, introspection, or adding logging to functions: def my_function(alist): return random.choice(alist) + 1 You might not be able to modify my_function, it may be part of a library you don't control. As written, if you want to test it, you need to monkey-patch the random module, which is a dangerous anti-pattern. Better to do this: class randomchoice_mock: def choice(self, arg): return 0 mock = randomchoice_mock() test_func = inject(random=mock)(my_function) Because test_func is a copy of my_function, you can be sure that you won't break anything. Adding logging is just as easy. This strikes me as the best solution: the decorator is at the head of the function, so it looks like a declaration, and it has its effect at function definition time. But as Terry points out, such a decorator might not be currently possible without language support, or at least messy byte-code hacking. -- Steven

On 11 Jun 2011, at 14:30, Jan Kaliszewski wrote:
That's hard to do as (assuming the function is defined at the global scope), mem will be compiled as a global, meaning that you will have to modify the bytecode. Oh but this makes me think about something I wrote a while ago (see below). 4. Use closures. def factory(mem): def do_and_remember(val, verbose=False) result = do_something(val) mem[val] += 1 if verbose: print('Done {} times for {!r}'.format(mem[val], val)) .... return do_and_remember do_and_remember = factory(mem=collections.Counter()) Added bonus: you can create many instances of do_and_remember. ---------- Related to this, here's a "localize" decorator that I wrote some time ago for fun (I think it was from a discussion on this list). It was for python 2.x (could easily be modified for 3.x I think, it's a matter of adapting the attribute names of the function object). It "freezes" all non local variables in the function. It's a hack! It may be possible to adapt it. def new_closure(vals): args = ','.join('x%i' % i for i in range(len(vals))) f = eval("lambda %s:lambda:(%s)" % (args, args)) return f(*vals).func_closure def localize(f): f_globals = dict((n, f.func_globals[n]) for n in f.func_code.co_names) f_closure = ( f.func_closure and new_closure([c.cell_contents for c in f.func_closure]) ) return type(f)(f.func_code, f_globals, f.func_name, f.func_defaults, f_closure) # Examples of how localize works: x, y = 1, 2 @localize def f(): return x + y def test(): acc = [] for i in range(10): @localize def pr(): print i acc.append(pr) return acc def lambdatest(): return [localize(lambda: i) for i in range(10)] # These examples will behave as follows:
-- Arnaud

Terry Reedy dixit (2011-06-11, 16:09):
I think this is another case... Although I can imagine that such 'private' arguments could be specified when calling -- after **{...}/bare **, e.g.: fun(1, b=3, **{'c':3}, my_secret_hidden_arg='xyz') fun(1, b=3, **, my_secret_hidden_arg='xyz') Though at the first sight I don't like this (`after-** args in calls') idea so much (contrary to `after-** args in definitions' idea). [...]
Arnaud Delobelle dixit (2011-06-11, 21:47):
Here mem is a keyword argument, not a variable. Though I understand that making it local/closure would need some code/closures hacking... Unless built in to the interpreter.
Yes, but this method makes code longer and more complex. And simple is better :) Consider my multi-factory example: def make_my_callbacks(callback_params): my_callbacks = [] for params in callback_params: def fun1(*args, **kwargs, params=params): "...do something with args and params..." def fun2(*args, **kwargs, params=params): "...do something with args and params..." def fun3(*args, **kwargs, fun1=fun1, fun2=fun2): """...do something with args and with functions fun1, fun2, for example pass them as callbacks to other functions..." my_callbacks.append((fun1, fun2, fun3)) return my_callbacks ...compared to: def make_fun1(params): def fun1(*args, **kwargs): "...do something with args and params..." return fun1 def make_fun2(params): def fun2(*args, **kwargs): "...do something with args and params..." return fun2 def make_fun3(fun1, fun2): def fun3(*args, **kwargs): """...do something with args and with functions fun1, fun2, for example pass them as callbacks to other functions..." return fun3 def make_my_callbacks(callback_params): my_callbacks = [] for params in callback_params: fun1 = make_fun1(params) fun2 = make_fun2(params) fun3 = make_fun3(fun1, fun2) my_callbacks.append((fun1, fun2, fun3)) return my_callbacks Though, maybe it'a a matter of individual taste...
Nice :) (and, as far as I understand, it could be used to implement the decorator I ment). Best regards. *j

I'm -1 on any proposal that somehow tries to make the default-argument hack more acceptable. The main reason people still feel the need to use it is that the for-loop is broken, insofar as it doesn't create a new binding for each iteration. The right way to address that is to fix the for-loop, IMO. -- Greg

Greg Ewing dixit (2011-06-13, 10:30):
I'm -1 on any proposal that somehow tries to make the default-argument hack more acceptable.
My propositions don't make that hack less acceptable -- proposing an alternative.
Do you mean that each iteration should create separate local scope? Then: j = 0 my_lambdas = [] for i in range(10): print(j) # would raise UnboundLocalError j = i my_lambdas.append(lambda: i) Or that the loop variable should be treated specially? Then: i_lambdas, j_lambdas = [], [] for i in range(10): j = i i_lambdas.append(lambda: i) j_lambdas.append(lambda: j) print(i_lambdas[2]()) # would print 2 print(j_lambdas[2]()) # would print 9 Cheers. *j

Jan Kaliszewski wrote:
My propositions don't make that hack less acceptable -- proposing an alternative.
You seem to be proposing yet another feature whose main purpose is to patch over a mismatch between existing features. That's not the path to elegant language design.
Do you mean that each iteration should create separate local scope?
No...
Or that the loop variable should be treated specially?
Yes, but in a way that you're probably not expecting. :-) My proposal is that, if the loop variable is referenced by an inner function (and is therefore in a cell), a new cell is created on each iteration instead of replacing the contents of the existing cell. This would mean that: * If the loop variable is *not* referenced by an inner function (the vast majority of cases), there would be no change from current semantics and no impact on performance. * In any case, the loop variable can still be referenced after the loop has finished with the expected results. One objection that's been raised is that, as described, it's somewhat CPython-specific, and it's uncertain how other Pythons would get on trying to implement it.
Yes, that's true. An extension to the idea would be to provide a way of specifying cell-replacement behaviour for any assignment, maybe something like j = new i Then your example would print 2 both times, and the values of both i and j after the loop would be 9. One slightly curly aspect would be that if you *changed* the value of i or j after the loop, the change would be seen by the *last* lambdas created, and not any of the others. :-) But I find it hard to imagine anyone doing this -- if you're capturing variables in a loop, you don't normally expect to have access to the loop variable at all after the loop finishes. -- Greg

On Mon, Jun 13, 2011 at 8:30 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Yikes, now *there's* a radical proposal. -lots on any idea that would make: def f(): i = 0 def g1(): return i i = 1 def g2(): return i return [g1, g2] differ in external behaviour from: def f(): result = [] for i in range(2): def g(): return i result.append(g) return result or: def f(): return [lambda: i for i in range(2)] or: def _inner(): for i in range(2): def g(): return i yield g def f(): return list(_inner()) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 6/12/2011 6:30 PM, Greg Ewing wrote:
Or use closures, which were partly designed to replace default arg use. This case is quite different from the multiple capture in for-loop case. The OP is simply trying to localize names for speed instead of using module constants, which would otherwise do quite fine and are routinely used in the stdlib. -- Terry Jan Reedy

Terry Reedy wrote:
Default args are specifically used in at least one use-case where closures give the wrong result.
The usual solution is to *not* use a closure:
That's just one use-case. Jan gave two others. Optimizations might be common in the stdlib, but it's a hack, and an ugly one. Function parameters should be kept for actual arguments, not for optimizing name look-ups. -- Steven

On Mon, Jun 13, 2011 at 5:11 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Function parameters should be kept for actual arguments, not for optimizing name look-ups.
Still, the post-** shared state (Jan's option 2) is likely the most obvious way to get early binding for *any* purpose without polluting the externally visible parameter list. Several questions that are unclear in the general case of definition-time code are resolved in obvious ways by that approach: Q. When is the code executed? A. At definition time, just like default argument values Q. Where are the results of the calculation stored? A. On the function object, just like default argument values Q. How does the compiler know to generate local variable lookups for those attributes? A. The names are specified in the function header, just like public parameters Q. What is the advantage over custom classes with __call__ methods? A. Aside from the obvious speed disadvantage, moving from a function with state that is preserved between calls to a stateful class that happens to be callable is a surprisingly large mental shift that may not fit well with the conceptual structure of a piece of code. While *technically* they're the same thing (just expressed in different ways), in reality the difference in relative emphasis of algorithm vs shared state can make one mode of expression far more natural than the other in a given context. class DoAndRemember(): def __init__(self): self.mem = collections.Counter() def __call__(self, val, verbose=False): result = do_something(val) self.mem[val] += 1 if verbose: print('Done {} times for {!r}'.format(self.mem[val], val)) do_and_remember = DoAndRemember() Custom classes also suffer grievously when it comes to supporting introspection (e.g. try help() or inspect.getargspec() on the above) and lack natural support for other features of functions (such as easy decorator compatibility, descriptor protocol support, standard annotations, appropriate __name__ assignment). Q. What is the advantage over using an additional level of closure? A. This is actually the most viable alternative, since the conceptual model is quite a close match and it doesn't break introspection the way a custom class does. The problems with this approach are largely syntactic: def _make_do_and_remember(): mem=collections.Counter() def do_and_remember(val, verbose=False): result = do_something(val) mem[val] += 1 if verbose: print('Done {} times for {!r}'.format(mem[val], val)) return do_and_remember do_and_remember = _make_do_and_remember() 1. The function signature is buried inside "_make_do_and_remember" (the class approach and even PEP 3150 have the same problem) 2. The name of the function in the current namespace and its __name__ attribute have been decoupled, require explicit repetition to keep them the same 3. This is basically an unreadable mess I'd actually be far happier with the default argument hack equivalent: def do_and_remember(val, verbose=False, *, _mem=collections.Counter()): result = do_something(val) _mem[val] += 1 if verbose: print('Done {} times for {!r}'.format(_mem[val], val)) All a "persistent state" proposal would do is create an alternative to the default argument hack that doesn't suffer from the same problems: def do_and_remember(val, verbose=False, **, mem=collections.Counter()): result = do_something(val) mem[val] += 1 if verbose: print('Done {} times for {!r}'.format(_mem[val], val)) It seems like the path of least resistance to me - the prevalence of the default argument hack means there's an existing, widespread practice that solves real programming issues, but is flawed in some ways (specifically, messing with the function's signature). Allowing declarations of shared state after the keyword-only arguments seems like a fairly obvious answer. The one potential trap is the classic one with immutable nonlocal variables that haven't been declared as such (this trap also applies to any existing use of the default argument hack): reassignment will *not* modify the shared state, only the name binding in the current invocation. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Jun 13, 2011 at 6:05 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
As yet another shade for this particular bikeshed, this one just occurred to me: def do_and_remember(val, verbose=False): @def mem=collections.Counter() result = do_something(val) mem[val] += 1 if verbose: print('Done {} times for {!r}'.format(_mem[val], val)) The @def ("at def") statement is just a new flavour of the same proposal that has been made many times before: a way to indicate that a simple assignment statement should be executed once at function definition time rather than repeatedly on every call to the function. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 13 June 2011 12:57, Nick Coghlan <ncoghlan@gmail.com> wrote:
Or to link this to PEP 3150: given: mem = collections.Counter() def do_and_remember(val, verbose=False): result = do_something(val) mem[val] += 1 if verbose: print('Done {} times for {!r}'.format(_mem[val], val)) (Or the other way around) -- Arnaud

Nick Coghlan dixit (2011-06-13, 21:57):
If using '@' character, I'd rather prefer: @in(mem=collections.Counter()) def do_and_remember(val, verbose=False): result = do_something(val) mem[val] += 1 if verbose: print('Done {} times for {!r}'.format(_mem[val], val)) @in (or @with, or @within, or @withlocal, or...) could be a language syntax construct, not a real decorator, though using -- already well settled -- decorator-like syntax. Important advantage of this variant is IMHO that then it is obvious for everybody that the binding(s) is (are) being done *early*. Regards. *j

Jan Kaliszewski dixit (2011-06-14, 00:30):
On second thought: no. I mean: no -- for a separate syntax construct with limited usage possibilities (see: cases mentioned by Steven); yes -- for language improvements that would make possible one of the solutions: 1. A real decorator: a) quasi-argument-locals-based (names could be used to read injected value and later could be rebound, like arguments); or b) another-level-closure-based (names could not be used to read injected values if rebound later: it's *either* a free variable *or* a local variable). or 2. `after-** hidden pseudo-arguments' (see previous posts...). Now I don't know which of them I'd prefer... And probably any of them would need some core-language modifications... (at least the '2' and '1a' variants) Regards. *j

On Mon, Jun 13, 2011 at 9:12 PM, Jan Kaliszewski <zuo@chopin.edu.pl> wrote:
Forgive me if im wrong but i believe that this is possible without any language changes using pure python. this is my attempt at it:
this uses a closure to hold the values of the injected values and hides them all test pass exactly as if the values were defined with makeLocals were globals within the function but all act as if they are locals outside of it. this means that if we define this this
and run
if we try this
print(aList) we get a NameError
and if we try this
all problems with global values still apply with this however. for example just as throws a UnboundLocalError so does
so what do you think? --Alex

Alex Light dixit (2011-06-14, 13:44):
Changing global state on each call seems to be both concurrency-and-recurrency-unsafe and inefficient. Though that 1a (closure-based) variant should be possible using techniques like that: http://mail.python.org/pipermail/python-ideas/2008-October/002227.html Retards. *j

On Tue, Jun 14, 2011 at 5:27 PM, Jan Kaliszewski <zuo@chopin.edu.pl> wrote:
well some of your safety concerns can be allayed, i hope, by replacing this snipet:
with this one:
with _modifyGlobals(func.__globals__, localArgs): ret = func(*args, **kwargs)
with _modifyGlobals defined as:
as for performance you are correct that it is more efficient to just use global variables. the dictionary updates add about 1 x 10**-4 (or, if the check for collisions with KWargs is removed, 5 x 10**-5) seconds to the run time of a function, at least on this computer. so not terribly significant just be sure to use sparingly also with the link you mentioned i could not seem to get it to work. Whenever i tried to use any built-in functions it would start throwing NameErrors. also that is only useful if you want to inject all global variables into the function. --Alex

Nick Coghlan wrote:
I wouldn't call adding even more complexity to function signatures "obvious", although I grant that it depends on whether you're Dutch :) Another disadvantage is that it uses a symbol instead of a word. Too many symbols, and your code looks like Perl (or APL). It's hard to google for ** to find out what it means. It's harder to talk about a symbol than a word. (In written text you can just write ** but in speech you have to use circumlocutions or made-up names like double-splat.) [...]
The problem with injecting locals in the parameter list is that it can only happen at write-time. That's useful, but there's a major opportunity being missed: to be able to inject at runtime. You could add test mocks, optimized functions, logging, turn global variables into local constants, and probably things I've never thought of. Here's one use-case to give a flavour of what I have in mind: if you're writing Unix-like scripts, one piece of useful functionality is "verbose mode". Here's one way of doing so: def do_work(args, verbose=False): if verbose: pr = print else: pr = lambda *args: None pr("doing spam") spam() pr("doing ham") ham() # and so on if __name__ == '__main__': verbose = '--verbose' in sys.argv do_work(my_arguments, verbose) But why does do_work take a verbose flag? That isn't part of the API for the do_work function itself, which might be usefully called by other bits of code. The verbose argument is only there to satisfy the needs of the user interface. Using a ** hidden argument would solve that problem, but you then have to specify the value of verbose at write-time, defeating the purpose. Here's an injection solution. First, the body of the function needs a generic hook, with a global do-nothing default: def hook(*args): pass def do_work(args): hook("doing spam") spam() hook("doing ham") ham() # and so on if __name__ == '__main__': if '--verbose' in sys.argv: wrap = inject(hook=print) else: wrap = lambda func: func # do nothing # or `inject(hook=hook)` to micro-optimize wrap(do_work)(my_arguments) If you want to add logging, its easy: just add an elif clause with wrap = inject(hook=logger). Because you aren't monkey-patching the hook function (or, heaven help us, monkey-patching builtins.print!) you don't need to fear side-effects. No globals are patched, hence no mysterious action-at-a-distance bugs. And because the injected function is a copy of the original, other parts of the code that use do_work are unaffected. But for this to work, you have to be able to inject at run-time, not just at write-time. -- Steven

On Mon, Jun 13, 2011 at 10:33 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Function parameters should be kept for actual arguments, not for optimizing name look-ups.
Even the bind-it-now behavior isn't always for optimization; it can also be used as a way of forcing stability in case the global name gets rebound. That is often an anti-pattern in practice, but ... not always.
I would say the most obvious place is in a decorator, using the function object (or a copy) as the namespace. Doing this properly would require some variant of PEP 3130, which was rejected largely for insufficient use.
Using the function object as a namespace (largely) gets around that, because you can use a with statement to change the settings temporarily.
[A verbose mode -- full example below, but the new spelling here at the top] Just replace:
with: def do_work(args): __function__.hook("doing spam") spam() __function__.hook("doing ham") ham() If you want to change the bindings, just rebind do_work.hook to the correct function. If you are doing this as part of a test, do so within a with statement that sets it back at the end. (The reason this requires a variant of 3130 is that the name do_work may itself be rebound, so do_work.hook isn't a reliable pointer.) -jJ [only quotes below here]

Jim Jewett wrote:
Acknowledged. But whatever the purpose, my comment still stands: function arguments should be used for arguments, not for their side-effect of injecting a local variable into the function namespace.
You mean something like this? with make_logging_len() as len: x = some_function_that_calls_len() That's fine for some purposes, but you're still modifying global state. If some_function_that_calls_len() calls spam(), and spam() also contains a call to len, you've unexpectedly changed the behaviour of spam. If that's the behaviour that you want, fine, but it probably isn't. There are all sorts of opportunities for breaking things when patching globals, which makes it somewhat of an anti-pattern. Better to make the patched version a local.
Ah, that's why it doesn't work for me! :) Even if it did work, you're still messing with global state. If two functions are using do_work, and one wants a print hook, and the other wants a logging hook (or whatever), only one can be satisfied. Also this trick can't work for optimizations. A call to do_work.hook requires a global lookup followed by a second lookup in the function object namespace, which is not as fast as using a local. -- Steven

On Mon, Jun 13, 2011 at 5:33 PM, Steven D'Aprano <steve@pearwood.info> wrote:
It's quite promising idea. Currenlty there are notion of cell for closures. What if globals would also use a cell? So that cell cound be either bound to a value or to a name in globals or builtin dictionary. With this in mind it could be possible to either change binding from name to value or vice versa, our to make a copy of the function with another cells. I think this adheres to Python philosophy of having anything modifyable. It will add at most two words of memory for each cell (name and global dict), and probably will not make interpreter slower. Also will probably allow to remove __globals__ attribute from functions in the long term. Then it even be possible to make some modules faster by either from __future__ import fast_bindings or it could be done by some external library like: __super_freezer_allow__ = True ... import sys, super_freezer super_freezer.apply(sys.modules) Probably about 80% modules do not need to rebind globals, so they can run faster. And if you need to monkeypatch them, just either not freeze globals in this module or change the bindings in all its functions. Thoughts? -- Paul

On 6/13/2011 10:33 AM, Steven D'Aprano wrote:
Given the expense of function calls, I would write the above as hook = None def do(args): if hook: hook("doing spam") ... if __name__ == '__main__': if '--verbose' in sys.argv: wrap = inject(hook=print) I do not see the point of all this complication. If you are not trying to optimize the function (and adding such hooks is obviously not), hook = print works just fine (in 3.x ;-). -- Terry Jan Reedy

Terry Reedy wrote: [...]
You're modifying a global variable. Now any other function that calls do_work() for its own purposes suddenly finds it mysteriously printing. A classic action-at-a-distance bug. For a simple stand-alone script, there's no problem, but once you have more complexity in your app, or a library, things become very different. My apologies, I've been doing a lot of reading about the pros and cons (mostly cons *wink*) of monkey-patching in the Ruby world, the open/closed principle, and various forms of bugs caused by the use of globals. I assumed that the problems would be blindingly obvious. I suppose they were only obvious to me because I'd just immersed myself in them for the last day or so! -- Steven

Steven D'Aprano wrote:
It's still rather non-obvious what's going on, though. Copious commenting would be needed to make this style of coding understandable. Also, it doesn't seem to generalise. What if the function in question calls other functions, which call other functions, which themselves need a verbose option? It seems you would need to explicitly wrap all the sub-function calls to pass the hook on to them. And what if there is more than one option to be hooked? You'd rapidly end up with a nightmarish mess. Here's another way to approach the problem: class HookableWorker(object): def hook(self, arg): pass def do_work(self): self.hook("Starting work") ... self.hook("Stopping work") def be_verbose(arg): print arg def main(): worker = HookableWorker() if "--verbose" in sys.argv: worker.hook = be_verbose worker.do_work() Now you can expand the HookableWorker class by adding more methods that all share the same hook, still without anything being global. -- Greg

Greg Ewing wrote:
I don't think so. The injection happens right at the top of the function. True, you need to know what "inject" does, but that's no different from any other function. Provided you know that "inject" adds a local binding to the function namespace, instead of using a global, it's easy to understand what this does: x = 42 @inject(x=23) def spam(): print(x) Not terribly mysterious. The only tricky thing is that some programmers aren't comfortable with the idea that functions are first class objects, and so: @inject(len=my_len) def spam(arg): return len(arg)+1 will discombobulate them. ("What do you mean, len isn't the built-in len?") But then again, they're likely to be equally put off by global patches too: len=my_len def spam(arg): return len(arg)+1 Doesn't stop us using that technique when appropriate.
That's a feature, not a bug! Patches are *local* to the function, not global. If you want to change global state, you can already do it, by monkey-patching the module. We don't need a new magic inject function to do that. This is not meant to be used for making wholesale changes to multiple functions at once, but for localized changes to one function at a time. A scalpel, not a chainsaw.
Absolutely. And that will still be a viable approach for many things. But... * You can only patch things that are already written as a class. If you want to add a test mock or logging to a function, this strategy doesn't help you because there's nothing to subclass. * There's a performance and (arguably) readability cost to using callable classes instead of functions. * Nor does it clean up the func(arg, len=len) hack. -- Steven

On Tue, Jun 14, 2011 at 12:33 AM, Steven D'Aprano <steve@pearwood.info> wrote:
This is getting deep into major structural changes to the way name lookups work, though. Pre-seeding locals with values that are calculated at run-time is a much simpler concept. A more explicit way to do the same thing might work along the following lines: 1. Add a writeable f_initlocals dict attribute to function objects (None by default) 2. When a function is called, if f_initlocals is not None, use it to initialise the locals() namespace 3. Add a new "local" statement to tell the compiler to treat names as local. Using this statement will create an f_initlocals dict mapping those names to None. 4. Add a new decorator to functools that works like the following: def initlocals(**kwargs): def inner(f): new_names = kwargs.keys() - f.f_initlocals.keys() if new_names: raise ValueError("{} are not local variables of {!r}".format(new_names, f)) f.f_initlocals.update(kwargs) return f return inner @functools.initlocals(mem=collections.Counter()) def do_and_remember(val, verbose=False): local mem result = do_something(val) mem[val] += 1 if verbose: print('Done {} times for {!r}'.format(mem[val], val)) You could still inject changes at runtime with that concept, but would need to be careful with thread-safety issues if you only wanted the change to apply to some invocations and not others. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Jun 14, 2011 at 12:33 AM, Steven D'Aprano <steve@pearwood.info> wrote:
As with *, ** and @, you don't search for them directly, you search for "def" (although redirects from the multiplication and power docs to the def statement docs may not be the worst idea ever). If we hadn't already added keyword-only arguments in Python 3, I'd consider this significantly more obscure. Having the function signature progress from "positional-or-keyword arguments" to "keyword-only arguments" to "implicit arguments", on the other hand, seems a lot cleaner than the status quo without being significantly more complicated. After all, there's no new symbols involved - merely a modification to allow a bare "**" to delimit the start of the implicit arguments when arbitrary keyword arguments are not accepted. Who knows, maybe explicitly teaching that behaviour would make people less likely to fall into the default argument trap. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 6/13/2011 3:11 AM, Steven D'Aprano wrote:
Terry Reedy wrote:
I meant an explicit user-defined closure with a separate cell for each function ...
not this implicit one where each function uses the *same* cell referring to the same int object.
The fundamental problem with this code for funcs is that "lambda x: x+i" is a *constant* equivalent to "def _(x): return x+i". Executing either 10 times creates 10 duplicate functions. The hypnotic effect of 'lambda' is that some do not immediately see the equivalence.
The explicit closure solution intended to replace "lambda x,i=i:x+i" is
def makef(j): return lambda x: x+j
We now have difference cells containing different ints. To get different functions from multiple compilations of one body we need either different defaults for pseudo-parameters or different closure cells. The rationale for adding the latter was partly to be an alternative to the former. Once closure cells were made writable with 'nonlocal', they gained additional uses, or rather, replaced the awkward hack of using mutable 1-element lists as closure contents, with the one elements being the true desired content. -- Terry Jan Reedy

On Sat, Jun 11, 2011 at 11:30 PM, Jan Kaliszewski <zuo@chopin.edu.pl> wrote:
This particular alternative to the default argument hack has come up before, as has the "hidden parameters after '**'" approach. (I thought there was a PEP on this, but I can't find anything other than the reference in the description of Option 4 in PEP 3103 - however, there is a thread on the topic that starts as part of the PEP 3103 discussion at http://mail.python.org/pipermail/python-dev/2006-June/066603.html) Institutional-memory'ly yours, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

It seems to me this discussion is mixing some different issues. (1) having secret parameters that don't show in help(). (2) injecting a value in some way I think these should be thought about separately. *For secret parameters:* help() could have a convention that it doesn't display variables with that start with _ or something like that, but that might have compatibility issues. Alternatively, there could be some new syntax or a decorator like
*@help.hide*("x,y") def foo(a, b, x=[], y={}): pass
help(foo) Help on function foo in module __main__:
f(a, b) *For injecting values:* There are several different cases. I'll use the arbitrary keyword *special* in the snippets below (but note that the keyword means something slightly different in each case): def foo1(): x = *special *[] ... x.append(t) Sets x every time the function is called to the same static list. What's special here is that I have one list created that gets reused, not a new list every time. This is how default arguments work and is useful for an accumulating list. def foo2(): *special *y = 0 ... y += t Initializes y once to 0 (at some time before the next line of code is reached). This is how static works in C++. This is what I want if my accumulating variable is a counter since numbers are immutable. This case easily handles the first case. If you never rebind x, you don't need to do anything special, otherwise something like this: def foo1a(): *special *_x = [] x = _x ... # might rebind x x.append(t) It's a bit clumsy to use the first case to handle the second case: def foo2a(): y = *special *[0] ... y[0] += t In addition, there are other use cases being discussed. This creates a new scope for i every time through the loop: def foo3(): result = [] for *special *i in range(10): def z(): return i result.append(z) And this injects a mock to replace a library function: def foo4(): return random.random() w = *special*(random.random=lambda: 0.1) foo4() Just because we might use similar hacks to do these now, doesn't mean that they are necessarily the same and I think the discussion has been going in several different directions simultaneously. I think all these cases have merits but I don't know which are more important. The last case seems to be handled reasonably well by various mock libraries using with, so I'm not particularly worried about it. I would like support for case 1 or 2. I don't like the idea of using a different function argument hack instead of the current one. --- Bruce Follow me: http://www.twitter.com/Vroo http://www.vroospeak.com

Feels like we're just repeating October 2008: http://mail.python.org/pipermail/python-ideas/2008-October/thread.html Are there any considerations we missed that time around?

Carl M. Johnson dixit (2011-06-13, 20:06):
Feels like we're just repeating October 2008:
http://mail.python.org/pipermail/python-ideas/2008-October/thread.html
Not exactly. That discussion was only about closure-based solutions and cases. Please note, that e.g. after-**-idea is a bit different. Cheers. *j

Bruce Leban dixit (2011-06-13, 18:45):
No, the idea is that after-**-constans are not only hidden-in-help- -arguments but that they also cannot be specified/overriden in a function call. So their usage would not be a hack that causes risk of incidental override by caller or that makes function signatures obfuscated (they would have to be defined separately, after **|**kwargs, in the righmost signature part). def compute(num1, num2, **, MAX_CACHE_LEN=100, cache=dict()): try: return cache[(num1, num2)] except KeyError: if len(cache) >= MAX_CACHE_LEN: cache.popitem() cache[(num1, num2)] = result = _compute(num1, num2) return result help(compute) # -> "... compute(num1, num2)" compute(1, 2) # OK compute(1, 2, MAX_CACHE_LEN=3) # would raise TypeError compute(1, 2, cache={}) # would raise TypeError ---- Open question: It's obvious that such a repetition must be prohibited (SyntaxError, at compile time): def sth(my_var, **, my_var): "do something" But in case of: def sth(*args, **kwargs, my_var='foo'): "do something" -- should 'my_var' in kwargs be allowed? (it's a runtime question) There is no real conflict here, so at the first sight I'd say: yes. Regards. *j

Jan Kaliszewski dixit (2011-06-14, 12:17):
Of course, I ment: def sth(my_var, **, my_var='foo'): "do something"
On second thought: no, such repetitions also should *not* be allowed. If a programmer, by mistake, would try to specify the argument value in a call, an explicit TypeError should be raised -- otherwise it'd become a trap (especially for beginners and absent-minded programmers). Regards. *j

On 2011-06-15 02:28, Jan Kaliszewski wrote:
I disagree. One of the main selling points of this feature for me is that adding a few "hidden parameters" to a function does not change the signature of the function. If you raise a TypeError when the name of a hidden parameter is in kwargs this is a change in signature. - Jacob

Jacob Holm wrote:
On 2011-06-15 02:28, Jan Kaliszewski wrote:
This is another reason why function parameters should not be used for something that is not a function parameter! +1 on the ability to inject locals into a function namespace. -1 on having the syntax for that masquerade as function arguments. -- Steven

Steven D'Aprano dixit (2011-06-15, 21:35):
OK, so the decorator or decorator-like syntax (using 'inject', 'within', 'owns' or other decorator name...) seems to be the most promising alternative. If so, next question is: which variant? 1. Decorator function with closure-like injecting (possibly could be implemented using closures): @functools.owns(cache=dict(), MAX_CACHE_LEN=100) def calculate(a, b): result = cache[(a, b)] if result is not None: return result ... # 'cache' identifier cannot be rebound to another object # because it was already used above in the function body # to refer to the injected object functools.owns() would be a real decorator function -- to apply either with @-syntax or dynamically, e.g.: decorated = [functools.owns(func) for func in functions] One question is whether it is technically possible to avoid introducing a new keyword (e.g. staticlocal) explicitly marking injected locals. Using such a keyword would be redundant from user point of view and non-DRY: @functools.owns(cache=dict(), MAX_CACHE_LEN=100) def calculate(a, b): staticlocal cache, MAX_CACHE_LEN # <- redundant and non-DRY :-( result = cache[(a, b)] if result is not None: return result ... 2. Decorator function with argument-like injecting. @functools.owns(cache=dict(), MAX_CACHE_LEN=100) def calculate(a, b): result = cache[(a, b)] if result is not None: return result ... # 'cache' identifier *can* be rebound to another object # than the injected object -- in the same way arguments can functools.owns() would be a real decorator function -- to apply either with @-syntax or dynamically, e.g.: decorated = [functools.owns(func) for func in functions] To implement such variant -- a new function constructor argument(s) and/or function/function code attribute(s) (read-only or writable?) most probably would have to be introduced... 3. Decorator-like language syntax construct: @in(cache=dict(), MAX_CACHE_LEN=100) # or 'owns' or 'inject' or... def calculate(a, b): result = cache[(a, b)] if result is not None: return result ... # 'cache' identifier *can* be rebound to another object # than the injected object -- in the same way arguments can It would not be a real decorator function -- so it would be applicable only using this syntax, and not dynamically, not after function creation. Which do you prefer? (or any other?) Regards. *j

Jan Kaliszewski wrote:
"owns"? I don't see how that testing for ownership describes what the function does. Likewise for "within", which sounds like it should be a synonym for the "in" operator: "if value within range" sort of thing.
Making locals unrebindable is a change of semantics that is far beyond anything I've been discussed here. This will be a big enough change without overloading it with changes that will be even more controversial! (I actually do like the idea of having unrebindable names, but that should be kept as a separate issue and not grafted on to this proposal.)
There shouldn't even be a question about that. Decorator syntax is sugar for func = decorator(func). Introducing magic syntax that is recognised by the compiler but otherwise is not usable as a function is completely unacceptable. If func is a pre-existing function: def func(a, b, c): pass then: new_func = functools.inject(x=1, y=2)(func) should be the same as: def new_func(a, b, c): # inject locals into the body of the function x = 1 y = 2 # followed by the body of the original pass except that new_func.__name__ may still reflect the old name "func". * If the original function previously referenced global or nonlocal x and y, the new function must now treat them as local; * Bindings to x and y should occur once, at function definition time, similar to the way default arguments occur once; * The original function (before the decorator applies) must be untouched rather than modified in place. This implies to me that inject must copy the original function and make modifications to the code object. This sounds to me that a proof-of-concept implementation would be doable using a byte-code hack. -- Steven

Steven D'Aprano dixit (2011-06-16, 10:46):
That variant (#1) would be simply shortcut for a closure application -- nothing really new. def factory(n): """Today's Python example.""" closuring = 'foo' def func(m): s = min(n, m) * closuring # here you also cannot rebind 'closuring' because is has # been referenced above
Introducing magic syntax that is recognised by the compiler but otherwise is not usable as a function is completely unacceptable.
Because of?... And it would not be more 'magic' than any other language syntax construct -- def, class, decorating with their @, *, ** arguments etc. The fact that such a new syntax would be similar to something already known and well settled (decorator function application syntax) would be rather an andantage than a drawback.
That's what I propose as variant #2. But that would need byte code hacking -- or some core language ('magic') modifications. Cheers. *j

On Thu, Jun 16, 2011 at 12:58 PM, Jan Kaliszewski <zuo@chopin.edu.pl> wrote:
I agree with D'Aprano, both option one and two would require modifying the code object, bytecode hacks, or core changes to the language, because, although the result is the same as your factory example the function is already compiled when it is given to the decorator. my best guess is that to implement this we would need to slightly redefine the way that python looks up variables by making it look in a special 'injected' dictionay after looking through locals and before globals. --Alex

Alex Light dixit (2011-06-16, 13:21):
But one of the default-argument-hack reasons is to optimize variable access by avoid dictionary loopup (locals are not dictionary-based). I'd rather stick to the implementation sketch described by Nick (changing the syntax a bit; as I said, imho @keyword... should be placed before function def). Cheers. *j

Jan Kaliszewski wrote:
The error occurs BEFORE the rebinding attempt. You get UnboundLocalError when you attempt to execute min(n, m), not when rebinding the closure variable. This is a side-effect of the compiler's rule "if you see an assignment to a variable, make it a local", that is all. You can rebind closuring if you tell the compiler that it isn't a local variable:
In that regard, closure variables are no different from globals. You wouldn't say that global are unrebindable because of this:
The situation is very similar.
But the problem is that it is deceptively similar: it only *seems* similar, while the differences are profound. super() is the only magic function I know of in Python, and that change was controversial, hard to implement, and fragile. super() is special cased by the compiler and works in ways that no other function can do. Hence it is magic. I can't imagine that Guido will agree to a second example, at least not without a blindingly obvious benefit. You can't reason about super()'s behaviour like any other function. Things which should work if super() were non-magical break, such as aliasing: my_super = super # Just another name for the same function. class MyList(list): def __init__(self, *args): my_super().__init__(*args) self.attr = None
And wrapping: _saved_super = super def super(*args, **kwargs): print(args, kwargs) return _saved_super(*args, **kwargs) class MyList(list): def __init__(self, *args): super().__init__(*args) self.attr = None
Only the exact incantation of built-in super() inside a method of a class works. As I said: magic. (Although you can supply all the arguments for super manually, which is tricky to get right but non-magic.) You are proposing that inject should also be magic: only the exact incantation @inject(...) directly above a function will work. We won't be able to wrap inject in another function, or alias it, or use it without the @ syntax. inject() isn't really a decorator, although it superficially looks like one. It's actually a compiler directive. If you want to propose #pragma for Python, do so, but don't call it a decorator! Most importantly, we won't be able to apply it to functions that already exist: list_of_functions = [spam, ham, cheese] # defined elsewhere decorator = inject(a=1) decorated = [decorator(f) for f in list_of_functions] will fail. I consider this completely unacceptable. -- Steven
participants (17)
-
Alex Light
-
Arnaud Delobelle
-
Bruce Leban
-
Carl M. Johnson
-
David Stanek
-
Eric Snow
-
Ethan Furman
-
Greg Ewing
-
Jacob Holm
-
Jan Kaliszewski
-
Jim Jewett
-
Nick Coghlan
-
Paul Colomiets
-
Raymond Hettinger
-
Ronald Oussoren
-
Steven D'Aprano
-
Terry Reedy