Explicit variable capture list
![](https://secure.gravatar.com/avatar/961457bc769068768b2e2a20a37c40d2.jpg?s=120&d=mm&r=g)
Hi C++ has a nice feature of explicit variable capture list for lambdas: int a = 1, b = 2, c = 3; auto fun = [a, b, c](int x, int y){ return a + b + c + x + y}; This allows easy construction of closures. In Python to achieve that, you need to say: def make_closure(a, b, c): def fun(x, y): return a + b + c + x + y return def a = 1 b = 2 c = 3 fun = make_closure(a, b, c) My proposal: create a special variable qualifier (like global and nonlocal) to automatically capture variables a = 1 b = 2 c = 3 def fun(x, y): capture a, b, c return a + b + c + x + y This will have an effect that symbols a, b and c in the body of the function have values as they had at the moment of function creation. The variables a, b, c must be defined at the time of function creation. If they are not, an error is thrown. The 'capture' qualifier may be combined with keywords global and nonlocal to change lookup behaviour. To make it more useful, we also need some syntax for inline lambdas. I.e.: a = 1 b = 2 c = 3 fun = lambda[a, b, c] x, y: a + b + c + x + y Thanks, haael
![](https://secure.gravatar.com/avatar/f9375b447dd668a10c19891379b9db2a.jpg?s=120&d=mm&r=g)
On Tue, Jan 19, 2016 at 6:10 AM, <haael@interia.pl> wrote:
Hi
C++ has a nice feature of explicit variable capture list for lambdas:
int a = 1, b = 2, c = 3; auto fun = [a, b, c](int x, int y){ return a + b + c + x + y};
This allows easy construction of closures. In Python to achieve that, you need to say:
This is worded very confusingly. Python has easy construction of closures with implicit variable capture. The difference has to do with "value semantics" in C++, which Python doesn't have. If you were using int* variables in your C++ example, you'd have the same semantics as Python does with its int references.
def make_closure(a, b, c): def fun(x, y): return a + b + c + x + y return def a = 1 b = 2 c = 3 fun = make_closure(a, b, c)
The usual workaround is actually: a = 1 b = 1 c = 1 def fun(x, y, a=a, b=b, c=c): return a + b + c + x + y -- Devin
![](https://secure.gravatar.com/avatar/f9375b447dd668a10c19891379b9db2a.jpg?s=120&d=mm&r=g)
Sorry, forget the first part entirely, I was still confused when I wrote it. Definitely the semantics of values are very different, but they don't matter for this. I think the rough equivalent of the capture-by-copy C++ lambda is the function definition I provided with default values. -- Devin On Tue, Jan 19, 2016 at 6:39 AM, Devin Jeanpierre <jeanpierreda@gmail.com> wrote:
On Tue, Jan 19, 2016 at 6:10 AM, <haael@interia.pl> wrote:
Hi
C++ has a nice feature of explicit variable capture list for lambdas:
int a = 1, b = 2, c = 3; auto fun = [a, b, c](int x, int y){ return a + b + c + x + y};
This allows easy construction of closures. In Python to achieve that, you need to say:
This is worded very confusingly. Python has easy construction of closures with implicit variable capture.
The difference has to do with "value semantics" in C++, which Python doesn't have. If you were using int* variables in your C++ example, you'd have the same semantics as Python does with its int references.
def make_closure(a, b, c): def fun(x, y): return a + b + c + x + y return def a = 1 b = 2 c = 3 fun = make_closure(a, b, c)
The usual workaround is actually:
a = 1 b = 1 c = 1 def fun(x, y, a=a, b=b, c=c): return a + b + c + x + y
-- Devin
![](https://secure.gravatar.com/avatar/d6b9415353e04ffa6de5a8f3aaea0553.jpg?s=120&d=mm&r=g)
On 1/19/2016 9:10 AM, haael@interia.pl wrote:
Hi
C++ has a nice feature of explicit variable capture list for lambdas:
int a = 1, b = 2, c = 3; auto fun = [a, b, c](int x, int y){ return a + b + c + x + y};
This allows easy construction of closures. In Python to achieve that, you need to say:
def make_closure(a, b, c): def fun(x, y): return a + b + c + x + y return def a = 1 b = 2 c = 3 fun = make_closure(a, b, c)
The purpose of writing a make_closure function is so it can be called more than once, to make more than one closure. f123 = make_closure(1, 2, 3) f456 = make_closure(4, 5, 6)
My proposal: create a special variable qualifier (like global and nonlocal) to automatically capture variables
a = 1 b = 2 c = 3 def fun(x, y): capture a, b, c return a + b + c + x + y
This will have an effect that symbols a, b and c in the body of the function have values as they had at the moment of function creation. The variables a, b, c must be defined at the time of function creation. If they are not, an error is thrown. The 'capture' qualifier may be combined with keywords global and nonlocal to change lookup behaviour.
This only allows one version of fun, not multiple, so it is not equivalent at all. As Devin stated, it is equivalent to to using parameters with default argument values. -- Terry Jan Reedy
![](https://secure.gravatar.com/avatar/7e41acaa8f6a0e0f5a7c645e93add55a.jpg?s=120&d=mm&r=g)
On Jan 19, 2016, at 06:10, haael@interia.pl wrote:
Hi
C++ has a nice feature of explicit variable capture list for lambdas:
int a = 1, b = 2, c = 3; auto fun = [a, b, c](int x, int y){ return a + b + c + x + y};
This allows easy construction of closures. In Python to achieve that, you need to say:
def make_closure(a, b, c): def fun(x, y): return a + b + c + x + y return def a = 1 b = 2 c = 3 fun = make_closure(a, b, c)
My proposal: create a special variable qualifier (like global and nonlocal) to automatically capture variables
a = 1 b = 2 c = 3 def fun(x, y): capture a, b, c return a + b + c + x + y
This will have an effect that symbols a, b and c in the body of the function have values as they had at the moment of function creation. The variables a, b, c must be defined at the time of function creation. If they are not, an error is thrown. The 'capture' qualifier may be combined with keywords global and nonlocal to change lookup behaviour.
What you're suggesting is the exact opposite of what you say you're suggesting. Capturing a, b, and c in a closure is what Python already does. What you're trying to do is _not_ capture them and _not_ create a closure. So calling the statement "capture" is very misleading, and saying it "allows easy construction of closures" even more so. In C++ terms, this: def fun(x, y): return a + b + c + x + y means: auto fun = [&](int x, int y) { return a + b + c + x + y; }; It obviously doesn't mean this, as you imply: auto fun = [](int x, int y) { return a + b + c + x + y; }; ... because that just gives you a compile-time error saying that local variables a, b, and c aren't defined, which is not what Python does. If you're looking for a way to copy references to the values, instead of capturing the variables, you write this: def fun(x, y, a=a, b=b, c=c): return a + b + c + x + y And if you want to actually copy the values themselves, you have to do that explicitly (which has no visible effect for ints, of course, but think about lists or dicts here): def fun(x, y, a=copy.copy(a), b=copy.copy(b), c=copy.copy(c)): return a + b + c + x + y ... because Python, unlike C++, never automatically copies values. (Again, think about lists or dicts. If passing them to a function or storing them in a variable made an automatic copy, as in C++, you'd be wasting lots of time and space copying them all over the place. That's why you have to explicitly create vector<int>& variables, or shared_ptr<vector<int>>, or pass around iterators instead of the container itself--because you almost never actually want to waste time and space making a copy if you're not mutating, and you almost always want the changes to be effective if you are mutating.)
![](https://secure.gravatar.com/avatar/047f2332cde3730f1ed661eebb0c5686.jpg?s=120&d=mm&r=g)
I think it's reasonable to divert this discussion to "value capture". Not sure if that's the usual terminology, but the idea should be that a reference to the value is captured, rather than (as Python normally does with closures) a reference to the variable (implemented as something called a "cell"). (However let's please not consider whether the value should be copied or deep-copied. Just capture the object reference at the point the capture is executed.) The best syntax for such capture remains to be seen. ("Capture" seems to universally make people think of "variable capture" which is the opposite of what we want here.) On Tue, Jan 19, 2016 at 8:22 AM, Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:
On Jan 19, 2016, at 06:10, haael@interia.pl wrote:
Hi
C++ has a nice feature of explicit variable capture list for lambdas:
int a = 1, b = 2, c = 3; auto fun = [a, b, c](int x, int y){ return a + b + c + x + y};
This allows easy construction of closures. In Python to achieve that,
you need to say:
def make_closure(a, b, c): def fun(x, y): return a + b + c + x + y return def a = 1 b = 2 c = 3 fun = make_closure(a, b, c)
My proposal: create a special variable qualifier (like global and
nonlocal) to automatically capture variables
a = 1 b = 2 c = 3 def fun(x, y): capture a, b, c return a + b + c + x + y
This will have an effect that symbols a, b and c in the body of the
function have values as they had at the moment of function creation. The variables a, b, c must be defined at the time of function creation. If they are not, an error is thrown. The 'capture' qualifier may be combined with keywords global and nonlocal to change lookup behaviour.
What you're suggesting is the exact opposite of what you say you're suggesting. Capturing a, b, and c in a closure is what Python already does. What you're trying to do is _not_ capture them and _not_ create a closure. So calling the statement "capture" is very misleading, and saying it "allows easy construction of closures" even more so.
In C++ terms, this:
def fun(x, y): return a + b + c + x + y
means:
auto fun = [&](int x, int y) { return a + b + c + x + y; };
It obviously doesn't mean this, as you imply:
auto fun = [](int x, int y) { return a + b + c + x + y; };
... because that just gives you a compile-time error saying that local variables a, b, and c aren't defined, which is not what Python does.
If you're looking for a way to copy references to the values, instead of capturing the variables, you write this:
def fun(x, y, a=a, b=b, c=c): return a + b + c + x + y
And if you want to actually copy the values themselves, you have to do that explicitly (which has no visible effect for ints, of course, but think about lists or dicts here):
def fun(x, y, a=copy.copy(a), b=copy.copy(b), c=copy.copy(c)): return a + b + c + x + y
... because Python, unlike C++, never automatically copies values. (Again, think about lists or dicts. If passing them to a function or storing them in a variable made an automatic copy, as in C++, you'd be wasting lots of time and space copying them all over the place. That's why you have to explicitly create vector<int>& variables, or shared_ptr<vector<int>>, or pass around iterators instead of the container itself--because you almost never actually want to waste time and space making a copy if you're not mutating, and you almost always want the changes to be effective if you are mutating.) _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
![](https://secure.gravatar.com/avatar/4c01705256aa2160c1354790e8c154db.jpg?s=120&d=mm&r=g)
On 19.01.16 18:47, Guido van Rossum wrote:
I think it's reasonable to divert this discussion to "value capture". Not sure if that's the usual terminology, but the idea should be that a reference to the value is captured, rather than (as Python normally does with closures) a reference to the variable (implemented as something called a "cell").
(However let's please not consider whether the value should be copied or deep-copied. Just capture the object reference at the point the capture is executed.)
The best syntax for such capture remains to be seen. ("Capture" seems to universally make people think of "variable capture" which is the opposite of what we want here.)
A number of variants of more powerful syntax were proposed in [1]. In neighbour topic Scott Sanderson had pointed to the asconstants decorator in codetransformer [2] that patches the code object by substituting a references to the variable with a reference to the constant. Ryan Gonzalez provided other implementation of similar decorator [3]. May be this feature doesn't need new syntax, but just new decorator in the stdlib. [1] http://comments.gmane.org/gmane.comp.python.ideas/37047 [2] http://permalink.gmane.org/gmane.comp.python.ideas/36958 [3] http://permalink.gmane.org/gmane.comp.python.ideas/37058
![](https://secure.gravatar.com/avatar/047f2332cde3730f1ed661eebb0c5686.jpg?s=120&d=mm&r=g)
On Tue, Jan 19, 2016 at 12:24 PM, Serhiy Storchaka <storchaka@gmail.com> wrote:
A number of variants of more powerful syntax were proposed in [1]. In neighbour topic Scott Sanderson had pointed to the asconstants decorator in codetransformer [2] that patches the code object by substituting a references to the variable with a reference to the constant. Ryan Gonzalez provided other implementation of similar decorator [3].
May be this feature doesn't need new syntax, but just new decorator in the stdlib.
Hmm... Using a decorator would mean that you'd probably have to add quotes around the names of the variables whose values you want to capture, and it'd require hacking the bytecode. That would mean that it'd only work for CPython, and it'd not be a real part of the language. This feels like it wants to be a language-level feature, like nonlocal.
[1] http://comments.gmane.org/gmane.comp.python.ideas/37047 [2] http://permalink.gmane.org/gmane.comp.python.ideas/36958 [3] http://permalink.gmane.org/gmane.comp.python.ideas/37058
-- --Guido van Rossum (python.org/~guido)
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Wed, Jan 20, 2016 at 3:47 AM, Guido van Rossum <guido@python.org> wrote:
I think it's reasonable to divert this discussion to "value capture". Not sure if that's the usual terminology, but the idea should be that a reference to the value is captured, rather than (as Python normally does with closures) a reference to the variable (implemented as something called a "cell").
+1. This would permit deprecation of the "def blah(...., len=len):" optimization - all you need to do is set a value capture on the name "len". ChrisA
![](https://secure.gravatar.com/avatar/5615a372d9866f203a22b2c437527bbb.jpg?s=120&d=mm&r=g)
On Wed, Jan 20, 2016 at 10:29:48AM +1100, Chris Angelico wrote:
On Wed, Jan 20, 2016 at 3:47 AM, Guido van Rossum <guido@python.org> wrote:
I think it's reasonable to divert this discussion to "value capture". Not sure if that's the usual terminology, but the idea should be that a reference to the value is captured, rather than (as Python normally does with closures) a reference to the variable (implemented as something called a "cell").
+1. This would permit deprecation of the "def blah(...., len=len):" optimization - all you need to do is set a value capture on the name "len".
Some might argue that the default argument trick is already the One Obvious Way to capture a value in a function. I don't think deprecation is the right word here, you can't deprecate "len=len" style code because it's just a special case of the more general name=expr function default argument syntax. I suppose a linter might complain if the expression on the right hand side is precisely the same as the name on the left, but _len=len would trivially work around that. -- Steve
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Wed, Jan 20, 2016 at 11:14 AM, Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Jan 20, 2016 at 10:29:48AM +1100, Chris Angelico wrote:
On Wed, Jan 20, 2016 at 3:47 AM, Guido van Rossum <guido@python.org> wrote:
I think it's reasonable to divert this discussion to "value capture". Not sure if that's the usual terminology, but the idea should be that a reference to the value is captured, rather than (as Python normally does with closures) a reference to the variable (implemented as something called a "cell").
+1. This would permit deprecation of the "def blah(...., len=len):" optimization - all you need to do is set a value capture on the name "len".
Some might argue that the default argument trick is already the One Obvious Way to capture a value in a function.
I disagree. There is nothing obvious about this, outside of the fact that it's already used in so many places. It's not even obvious after looking at the code.
I don't think deprecation is the right word here, you can't deprecate "len=len" style code because it's just a special case of the more general name=expr function default argument syntax. I suppose a linter might complain if the expression on the right hand side is precisely the same as the name on the left, but _len=len would trivially work around that.
The deprecation isn't of named arguments with defaults, but of the use of that for no reason other than optimization. IMO function arguments should always exist primarily so a caller can override them. In contrast, random.randrange has a parameter _int which is not mentioned in the docs, and which should never be provided. Why should it even be exposed? It exists solely as an optimization. Big one for the bike-shedding: Is this "capture as local" (the same semantics as the default arg - if you rebind it, it changes for the current invocation only), or "capture as static" (the same semantics as a closure if you use the 'nonlocal' directive - if you rebind it, it stays changed), or "capture as constant" (what people are usually going to be doing anyway)? ChrisA
![](https://secure.gravatar.com/avatar/f3ba3ecffd20251d73749afbfa636786.jpg?s=120&d=mm&r=g)
On 20 January 2016 at 10:38, Chris Angelico <rosuav@gmail.com> wrote:
Big one for the bike-shedding: Is this "capture as local" (the same semantics as the default arg - if you rebind it, it changes for the current invocation only), or "capture as static" (the same semantics as a closure if you use the 'nonlocal' directive - if you rebind it, it stays changed), or "capture as constant" (what people are usually going to be doing anyway)?
The "shared value" approach can already be achieved by binding a mutable object rather than an immutable one, and there's no runtime speed difference between looking up a local and looking up a constant, so I think it makes sense to just stick with "default argument semantics, but without altering the function signature" One possible name for such a directive would be "sharedlocal": it's in most respects a local variable, but the given definition time initialisation value is shared across all invocations to the function. With that spelling: def f(*, len=len): ... Would become: def f(): sharedlocal len=len ... And you'd also be able to do things like: def f(): sharedlocal cache={} Alternatively, if we just wanted to support early binding of pre-existing names, then "bindlocal" could work: def f(): bindlocal len ... Either approach could be used to handle early binding of loop iteration variables: for i in range(10): def f(): sharedlocal i=i ... for i in range(10): def f(): bindlocal i ... I'd be -1 on bindlocal (I think dynamic optimisers like PyPy or Numba, or static ones like Victor's FAT Python project are better answers there), but "sharedlocal" is more interesting, since it means you can avoid creating a closure if all you need is to persist a bit of state between invocations of a function. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
![](https://secure.gravatar.com/avatar/047f2332cde3730f1ed661eebb0c5686.jpg?s=120&d=mm&r=g)
But 'shared' and 'local' are both the wrong words to use here. Also probably this should syntactically be tied to the function header so the time of evaluation is clear(er). On Tue, Jan 19, 2016 at 10:37 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 20 January 2016 at 10:38, Chris Angelico <rosuav@gmail.com> wrote:
Big one for the bike-shedding: Is this "capture as local" (the same semantics as the default arg - if you rebind it, it changes for the current invocation only), or "capture as static" (the same semantics as a closure if you use the 'nonlocal' directive - if you rebind it, it stays changed), or "capture as constant" (what people are usually going to be doing anyway)?
The "shared value" approach can already be achieved by binding a mutable object rather than an immutable one, and there's no runtime speed difference between looking up a local and looking up a constant, so I think it makes sense to just stick with "default argument semantics, but without altering the function signature"
One possible name for such a directive would be "sharedlocal": it's in most respects a local variable, but the given definition time initialisation value is shared across all invocations to the function.
With that spelling:
def f(*, len=len): ...
Would become:
def f(): sharedlocal len=len ...
And you'd also be able to do things like:
def f(): sharedlocal cache={}
Alternatively, if we just wanted to support early binding of pre-existing names, then "bindlocal" could work:
def f(): bindlocal len ...
Either approach could be used to handle early binding of loop iteration variables:
for i in range(10): def f(): sharedlocal i=i ...
for i in range(10): def f(): bindlocal i ...
I'd be -1 on bindlocal (I think dynamic optimisers like PyPy or Numba, or static ones like Victor's FAT Python project are better answers there), but "sharedlocal" is more interesting, since it means you can avoid creating a closure if all you need is to persist a bit of state between invocations of a function.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
![](https://secure.gravatar.com/avatar/d6b9415353e04ffa6de5a8f3aaea0553.jpg?s=120&d=mm&r=g)
On 1/20/2016 11:48 AM, Guido van Rossum wrote:
But 'shared' and 'local' are both the wrong words to use here. Also probably this should syntactically be tied to the function header so the time of evaluation is clear(er).
Use ';' in the parameter list, followed by name=expr pairs. The question is whether names after are initialized local variables, subject to rebinding at runtime, or named constants, with the names replaced by the values at definition time. In the former case, a type hint could by included. In the latter case, which is much better for optimization, the fixed object would already be typed. def f(int a, int b=1; int c=2) => int -- Terry Jan Reedy
![](https://secure.gravatar.com/avatar/7e41acaa8f6a0e0f5a7c645e93add55a.jpg?s=120&d=mm&r=g)
On Wednesday, January 20, 2016 10:59 AM, Terry Reedy <tjreedy@udel.edu> wrote:
Use ';' in the parameter list, followed by name=expr pairs.
This is the best option anyone's suggested (short of just not doing anything and telling people to keep using the default-value trick on the rare occasions where this is necessary). However, I'd suggest one minor change: for the common case of `j=j, len=len`, allow people to just write the name once. The compiler can definitely handle this: def spam(eggs; _i=i, j, len):
The
question is whether names after are initialized local variables, subject to rebinding at runtime, or named constants, with the names replaced by the values at definition time.
They almost certainly should be variables, just like parameters, with the values stored in `__defaults__`. Otherwise, this code: powers = [lambda x; i: x**i for i in range(5)] ... produces functions with their own separate code objects, instead of functions that share a single code object. And this isn't some weird use case; the "defining functions in a loop that capture the loop iterator by value" is the paradigm case for this new feature. (It's even covered in the official FAQ.) The performance cost of those separate code objects (and the cache misses caused when you try to use them in a loop) has almost no compensating performance gain (`LOAD_CONST` isn't faster than `LOAD_FAST`, and the initial copy from `__defaults__` at call time is about 1/10th the cost of either). And it's more complicated to implement (especially from where we are today), and less flexible for reflective code that munges functions.
In the former case, a type hint could by
included. In the latter case, which is much better for optimization, the fixed object would already be typed.
def f(int a, int b=1; int c=2) => int
You've got the syntax wrong. But, more importantly, besides the latter case (const vs. default) actually being worse for optimization, it isn't any better for type inference. In this function: def f(a: int, b: int=1; c=2) -> int: or even this one: def f(): for i in range(5): def local(x: int; i) -> int: return x**i yield local ... the type checker can infer the type of `i`: it's initialized with an int literal (first version) or the value of a variable that's been inferred as an int; therefore, it's an int. So it can emit a warning if you assign anything but another int to it. The only problem with your solution is that we now have three different variations that are all spelled very differently: def spam(i; j): # captured by value def spam(i): nonlocal j # captured by variable def spam(i): # captured by variable if no assignment, else shadowed by a local Is that acceptable?
![](https://secure.gravatar.com/avatar/5615a372d9866f203a22b2c437527bbb.jpg?s=120&d=mm&r=g)
On Wed, Jan 20, 2016 at 01:58:46PM -0500, Terry Reedy wrote:
On 1/20/2016 11:48 AM, Guido van Rossum wrote:
But 'shared' and 'local' are both the wrong words to use here. Also probably this should syntactically be tied to the function header so the time of evaluation is clear(er).
Use ';' in the parameter list, followed by name=expr pairs. The question is whether names after are initialized local variables, subject to rebinding at runtime, or named constants, with the names replaced by the values at definition time. In the former case, a type hint could by included. In the latter case, which is much better for optimization, the fixed object would already be typed.
def f(int a, int b=1; int c=2) => int
I almost like that. The problem is that the difference between ; and , is visually indistinct and easy to mess up. I've occasionally typed ; in a parameter list and got a nice SyntaxError telling me I've messed up, but with your suggestion the function will just silently do the wrong thing. I suggest a second "parameter list": def func(a:int, b:int=1)(c:int)->int: ... is morally equivalent to: def func(a:int, b:int=1, c:int=c)->int: ... except that c is not a parameter of the function and cannot be passed as an argument: func(a=0, b=2) # okay func(a=0, b=2, c=1) # TypeError We still lack a good term for what the (c) thingy should be called. I'm not really happy with either of "static" or "capture", but for lack of anything better I'll go with capture for the moment. So a full function declaration looks like: def NAME ( PARAMETERS ) ( CAPTURES ) -> RETURN-HINT : (Bike-shedders: do you prefer () [] or {} for the list of captures?) CAPTURES is a comma-delimitered list of local variable names, with optional type hint and optional bindings. Here are some examples: # Capture the values of x and y from the enclosing scope. # Both x and y must exist at func definition time. def func(arg)(x, y): # inside the body of func, x and y are locals # Same as above, except with type-hinting. # If x or y in the enclosing scope are not floats, # a checker should report a type error. def func(arg)(x:float, y:float): # inside the body of func, x and y are locals # Capture the values of x and y from the enclosing scope, # binding to names x and z. # Both x and y must exist at func definition time. def func(arg)(x, z=y): # inside the body of func, x and z are locals # while y would follow the usual scoping rules # Capture a copy of the value of dict d from the enclosing scope. # d must exist at func definition time. def func(arg)(d:dict=d.copy()): # inside the body of func, d is a local If a capture consists of a name alone (or a name plus annotation), it declares a local variable of that name, and binds to it the captured value of the same name in the enclosing scope. E.g.: x = 999 def func()(x): # like x=x x += 1 return (x, globals()['x']) assert func() == (1000, 999) x = 0 assert func() == (1000, 0) If a capture consists of a name = expression, the expression is evaluated at function definition time, and the result captured. y = 999 def func()(x=y+1): return x assert func() == 1000 del y assert func() == 1000 Can we make this work with lambda? I think we can. The current lambda syntax is: lambda params: expression e.g. lambda x, y=y: x+y Could we keep that (for backwards compatibility) but allow parens to optionally surround the parameter list? If so, then we can allow an optional second set of parens after the first, allowing captures: lambda (x)(y): x+y The difference between lambda x,y=y: ... and lambda (x)(y): ... is that the first takes two arguments, mandatory x and optional y (which defaults to the value of y from the enclosing scope), while the second only takes one argument, x. -- Steve
![](https://secure.gravatar.com/avatar/8848a81d538f2fc428934988af5c8b42.jpg?s=120&d=mm&r=g)
On 21Jan2016 11:52, Steven D'Aprano <steve@pearwood.info> wrote:
So a full function declaration looks like:
def NAME ( PARAMETERS ) ( CAPTURES ) -> RETURN-HINT :
(Bike-shedders: do you prefer () [] or {} for the list of captures?)
Just to this: I prefer () - this is very much like a special parameter list. [] and {} should list and dict to me. Cheers, Cameron Simpson <cs@zip.com.au>
![](https://secure.gravatar.com/avatar/61a537f7b31ecf682e3269ea04056e94.jpg?s=120&d=mm&r=g)
Nick, On 2016-01-20 1:37 AM, Nick Coghlan wrote:
On 20 January 2016 at 10:38, Chris Angelico<rosuav@gmail.com> wrote:
Big one for the bike-shedding: Is this "capture as local" (the same semantics as the default arg - if you rebind it, it changes for the current invocation only), or "capture as static" (the same semantics as a closure if you use the 'nonlocal' directive - if you rebind it, it stays changed), or "capture as constant" (what people are usually going to be doing anyway)? The "shared value" approach can already be achieved by binding a mutable object rather than an immutable one, and there's no runtime speed difference between looking up a local and looking up a constant, so I think it makes sense to just stick with "default argument semantics, but without altering the function signature"
One possible name for such a directive would be "sharedlocal": it's in most respects a local variable, but the given definition time initialisation value is shared across all invocations to the function.
With that spelling:
def f(*, len=len): ...
Would become:
def f(): sharedlocal len=len
FWIW I strongly believe that this feature (at least the "len=len"-like optimizations) should be implemented as an optimization in the interpreter. We already have "nonlocal" and "global". Having a third modifier (such as sharedlocal, static, etc) will only introduce confusion and make Python less comprehensible. Yury
![](https://secure.gravatar.com/avatar/7e41acaa8f6a0e0f5a7c645e93add55a.jpg?s=120&d=mm&r=g)
On Wednesday, January 20, 2016 11:05 AM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
FWIW I strongly believe that this feature (at least the "len=len"-like optimizations) should be implemented as an optimization in the interpreter.
The problem is that there are two reasonable interpretations for free variables--variable capture or value capture--and Python can only do one or the other automatically. Python does variable capture, because that's what you usually want.[*] But when you _do_ want value capture, you need some way to signal it. In some cases, the only reason you want value capture is as an optimization, and maybe the optimizer can handle that for you. But sometimes there's a semantic reason you want it--such as the well known case (covered in the official Python Programming FAQ [1]) where you're trying to capture the separate values of an iteration variable in a bunch of separate functions defined in the loop. And we need some way to spell that. Of course we already have a way to spell that, the `a=a` default value trick. And I personally think that's good enough. But if the community disagrees, and we come up with a new syntax, I don't see why we should stop people from also using that new syntaxfor the optimization case when they know they want it.[**] [*] Note that in C++, which people keep referring to, the Core Guidelines suggest using variable capture by default. And their main exception--use value capture when you need to keep something around beyond the lifetime of its original scope, because otherwise you'd get a dangling reference to a destroyed object--doesn't apply to Python. [**] I don't think people are abusing the default-value trick for optimization--I generally only see `len=len` in low-level library code that may end up getting used inside an inner loop--so I doubt they'd abuse any new syntax for the same thing. [1] https://docs.python.org/3/faq/programming.html#why-do-lambdas-defined-in-a-l...
We already have "nonlocal" and "global". Having a third modifier (such as sharedlocal, static, etc) will only introduce confusion and make Python less comprehensible.
I agree with that. Also, none of the names people are proposing make much sense. "static" looks like a function-level static in C and its descendants, but does something completely different. "capture" means the exact opposite of what it says, and "sharedlocal" sounds like it's going to be "more shared" than the default for free variables when it's actually less shared.
![](https://secure.gravatar.com/avatar/f3ba3ecffd20251d73749afbfa636786.jpg?s=120&d=mm&r=g)
On 21 January 2016 at 06:42, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
On Wednesday, January 20, 2016 11:05 AM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
FWIW I strongly believe that this feature (at least the "len=len"-like optimizations) should be implemented as an optimization in the interpreter.
The problem is that there are two reasonable interpretations for free variables--variable capture or value capture--and Python can only do one or the other automatically.
Can we please use the longstanding early binding and late binding terminology for these two variants, rather than introducing new phrases that just confuse the matter... Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
![](https://secure.gravatar.com/avatar/89b67ecc87148f077c349f4fb6f705f6.jpg?s=120&d=mm&r=g)
On Wed, Jan 20, 2016 at 2:05 PM Yury Selivanov <yselivanov.ml@gmail.com> wrote:
On 2016-01-20 1:37 AM, Nick Coghlan wrote:
On 20 January 2016 at 10:38, Chris Angelico<rosuav@gmail.com> wrote: With that spelling:
def f(*, len=len): ...
Would become:
def f(): sharedlocal len=len
FWIW I strongly believe that this feature (at least the "len=len"-like optimizations) should be implemented as an optimization in the interpreter.
We already have "nonlocal" and "global". Having a third modifier (such as sharedlocal, static, etc) will only introduce confusion and make Python less comprehensible.
If the purpose is to improve speed, it certainly feels like an interpreter optimization. The other thread about adding ``ma_version`` to dicts might be useful for quickening the global variable lookup. If the purpose is to store the current global value, it might be reasonable to add a language feature to make that more explicit. Beginners often mistakenly think that default values are evaluated and assigned at call-time instead of def-time. However, adding a new, more explicit language feature wouldn't eliminate the current confusion. Instead we'd have two ways to do it.
![](https://secure.gravatar.com/avatar/5615a372d9866f203a22b2c437527bbb.jpg?s=120&d=mm&r=g)
On Tue, Jan 19, 2016 at 08:47:28AM -0800, Guido van Rossum wrote:
I think it's reasonable to divert this discussion to "value capture". Not sure if that's the usual terminology, but the idea should be that a reference to the value is captured, rather than (as Python normally does with closures) a reference to the variable (implemented as something called a "cell").
If I understand you correctly, that's precisely what a function default argument does: capture the current value of the default value expression at the time the function is called. This has the side-effect of exposing that as an argument, which may be underdesirable. partial() can be used to work around that.
(However let's please not consider whether the value should be copied or deep-copied. Just capture the object reference at the point the capture is executed.)
The best syntax for such capture remains to be seen. ("Capture" seems to universally make people think of "variable capture" which is the opposite of what we want here.)
If I recall correctly, there was a recent(?) proposal for a "static" keyword with similar semantics: def func(a): static b = expression ... would guarantee that expression was evaluated exactly once. If that evaluation occured when func was defined, rather than when it was first called, that might be the semantics you are looking for: def func(a): static b = b # captures the value of b from the enclosing scope Scoping rules might be tricky to get right. Perhaps rather than a declaration, "static" might be better treated as a block: def func(a): static: # Function initialisation section. Occurs once, when the # def statement runs. b = b # b on the left is local, b on the right is non-local # (just like in a parameter list) # Normal function body goes here. But neither of these approaches would be good for lambdas. I'm okay with that -- lambda is a lightweight syntax, for lightweight needs. If your needs are great (doc strings, annotations, multiple statements) don't use lambda. -- Steve
![](https://secure.gravatar.com/avatar/047f2332cde3730f1ed661eebb0c5686.jpg?s=120&d=mm&r=g)
On Tue, Jan 19, 2016 at 4:37 PM, Steven D'Aprano <steve@pearwood.info> wrote:
On Tue, Jan 19, 2016 at 08:47:28AM -0800, Guido van Rossum wrote:
I think it's reasonable to divert this discussion to "value capture". Not sure if that's the usual terminology, but the idea should be that a reference to the value is captured, rather than (as Python normally does with closures) a reference to the variable (implemented as something called a "cell").
If I understand you correctly, that's precisely what a function default argument does: capture the current value of the default value expression at the time the function is called.
I think you misspoke here (I don't think you actually believe what you said :-). Function defaults capture the current value at the time the function is *define*.
This has the side-effect of exposing that as an argument, which may be underdesirable.
Indeed. It's also non-obvious to people who haven't seen it before.
partial() can be used to work around that.
Hardly. Adding a partial() call usually makes code *less* obvious.
The best syntax for such capture remains to be seen. ("Capture" seems to universally make people think of "variable capture" which is the opposite of what we want here.)
If I recall correctly, there was a recent(?) proposal for a "static" keyword with similar semantics:
def func(a): static b = expression ...
would guarantee that expression was evaluated exactly once.
Once per what? In the lifetime of the universe? Per CPython process start? Per call? J/K, I think I know what you meant -- once per function definition (same as default values).
If that evaluation occurred when func was defined, rather than when it was first called,
(FWIW, "when it was first called" would be a recipe for disaster and irreproducible results.)
that might be the semantics you are looking for:
def func(a): static b = b # captures the value of b from the enclosing scope
Yeah, I think the OP proposed 'capture b' with these semantics.
Scoping rules might be tricky to get right. Perhaps rather than a declaration, "static" might be better treated as a block:
Why? This does smell like a directive similar to global and nonlocal.
def func(a): static: # Function initialisation section. Occurs once, when the # def statement runs. b = b # b on the left is local, b on the right is non-local # (just like in a parameter list)
Hm, this repetition of the name in parameter lists is actually a strike against it, and the flexibility it adds (of allowing arbitrary expressions to be captured) doesn't seem to be needed much in reality -- the examples for the argument default pattern invariably use 'foo=foo, bar=bar'.
# Normal function body goes here.
But neither of these approaches would be good for lambdas. I'm okay with that -- lambda is a lightweight syntax, for lightweight needs. If your needs are great (doc strings, annotations, multiple statements) don't use lambda.
Yeah, the connection with lambdas in C++ is unfortunate. In C++, IIRC, the term lambda is used to refer to any function nested inside another, and that's the only place where closures exist. -- --Guido van Rossum (python.org/~guido)
![](https://secure.gravatar.com/avatar/5615a372d9866f203a22b2c437527bbb.jpg?s=120&d=mm&r=g)
On Tue, Jan 19, 2016 at 05:01:42PM -0800, Guido van Rossum wrote:
On Tue, Jan 19, 2016 at 4:37 PM, Steven D'Aprano <steve@pearwood.info> wrote:
On Tue, Jan 19, 2016 at 08:47:28AM -0800, Guido van Rossum wrote:
I think it's reasonable to divert this discussion to "value capture". [...] If I understand you correctly, that's precisely what a function default argument does: capture the current value of the default value expression at the time the function is called.
I think you misspoke here (I don't think you actually believe what you said :-).
Function defaults capture the current value at the time the function is *define*.
Oops! You got me. Yes, I meant defined, not called. [...]
The best syntax for such capture remains to be seen. ("Capture" seems to universally make people think of "variable capture" which is the opposite of what we want here.)
If I recall correctly, there was a recent(?) proposal for a "static" keyword with similar semantics:
def func(a): static b = expression ...
would guarantee that expression was evaluated exactly once.
Once per what? In the lifetime of the universe? Per CPython process start? Per call?
J/K, I think I know what you meant -- once per function definition (same as default values).
That's what I mean. Although, I am curious as to how we might implement the once per lifetime of the universe requirement :-)
If that evaluation occurred when func was defined, rather than when it was first called,
(FWIW, "when it was first called" would be a recipe for disaster and irreproducible results.)
It probably would be a bug magnet. Good thing I'm not asking for that behaviour then :-) [...]
Scoping rules might be tricky to get right. Perhaps rather than a declaration, "static" might be better treated as a block:
Why? This does smell like a directive similar to global and nonlocal.
I'm just tossing the "static block" idea out for discussion, but if you want a justification here are two differences between capture/static and global/nonlocal which suggest they aren't that similar and so we shouldn't feel obliged to use the same syntax. (1) global and nonlocal operate on *names*, not values. E.g. after "global x", x refers to a name in the global scope, not the local scope. But "capture"/"static" doesn't affect the name, or the scope that x belongs to. x is still a local, it just gets pre-initialised to the value of x in the enclosing scope. That makes it more of a binding operation or assignment than a declaration. (2) If we limit this to only capturing the same name, then we can only write (say) "static x", and that does look like a declaration. But maybe we want to allow the local name to differ from the global name: static x = y or even arbitrary expressions on the right: static x = x + 1 Now that starts to look more like it should be in a block of code, especially if you have a lot of them: static x = x + 1 static len = len static data = open("data.txt").read() versus: static: x = x + 1 len = len data = open("data.txt").read() I acknowledge that this goes beyond what the OP asked for, and I think that YAGNI is a reasonable response to the static block idea. I'm not going to champion it any further unless there's a bunch of interest from others. (I'm saving my energy for Eiffel-like require/ensure blocks *wink*). -- Steve
![](https://secure.gravatar.com/avatar/047f2332cde3730f1ed661eebb0c5686.jpg?s=120&d=mm&r=g)
On Wed, Jan 20, 2016 at 4:10 PM, Steven D'Aprano <steve@pearwood.info> wrote:
I'm just tossing the "static block" idea out for discussion, but if you want a justification here are two differences between capture/static and global/nonlocal which suggest they aren't that similar and so we shouldn't feel obliged to use the same syntax.
(1) global and nonlocal operate on *names*, not values. E.g. after "global x", x refers to a name in the global scope, not the local scope.
But "capture"/"static" doesn't affect the name, or the scope that x belongs to. x is still a local, it just gets pre-initialised to the value of x in the enclosing scope. That makes it more of a binding operation or assignment than a declaration.
(2) If we limit this to only capturing the same name, then we can only write (say) "static x", and that does look like a declaration. But maybe we want to allow the local name to differ from the global name:
static x = y
or even arbitrary expressions on the right:
static x = x + 1
Now that starts to look more like it should be in a block of code, especially if you have a lot of them:
static x = x + 1 static len = len static data = open("data.txt").read()
versus:
static: x = x + 1 len = len data = open("data.txt").read()
I acknowledge that this goes beyond what the OP asked for, and I think that YAGNI is a reasonable response to the static block idea. I'm not going to champion it any further unless there's a bunch of interest from others.
Yeah, your arguments why it's different from global/nonlocal are reasonable, but the question remains whether we really need all that functionality. IIRC C++ lambdas only allow capturing a variable's value, not an expression's. So we should ask ourselves first: if we *only* had some directive that captures some variables' values, essentially like the len=len argument trick but without affecting the signature (e.g. just "static x, y, z"), how much of the current pain would be addressed, and how much would remain?
(I'm saving my energy for Eiffel-like require/ensure blocks *wink*).
Now you're making me curious. -- --Guido van Rossum (python.org/~guido)
![](https://secure.gravatar.com/avatar/72ee673975357d43d79069ac1cd6abda.jpg?s=120&d=mm&r=g)
My idea for handling this kind of thing is: for new x in things: funcs.append(lambda: dosomethingwith(x)) The 'new' modifier can be applied to any assignment target, and conceptually has the effect of creating a new binding instead of changing an existing binding. There is a very simple way to implement this in CPython: create a new cell each time instead of replacing the contents of an existing cell. -- Greg
![](https://secure.gravatar.com/avatar/7e41acaa8f6a0e0f5a7c645e93add55a.jpg?s=120&d=mm&r=g)
On Jan 20, 2016, at 20:59, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
My idea for handling this kind of thing is:
for new x in things: funcs.append(lambda: dosomethingwith(x))
The 'new' modifier can be applied to any assignment target, and conceptually has the effect of creating a new binding instead of changing an existing binding.
C# almost did this (but only in foreach statements, not all bindings), but in the end they decided that it was simpler to just make foreach _always_ create a new binding each time through the loop, instead of requiring new syntax. I think most of the rationale is in one of Eric Lippert's blog posts with a name like "loop closures considered harmful" (I can't remember the exact title, and searching while typing sucks on a phone), but I can summarize here. C# had the exact same problem, for the exact same reasons. And, since they don't have the default-value trick, the solution required defining a new local copy in the same scope as the function definition (which means, if you're defining the function in expression context, you have to wrap it in another lambda and call it). After years of closing bugs with "no, C# closures are not broken, what you're complaining about is the exact definition of a closure", they decided they had to do something about it. Every option they considered had some unacceptable feature, but in the end they decided that leaving it as-is was also unacceptable. So, borrowing a bit of practicality-beats-purity from some other language, they decided that a breaking semantic change, and making foreach and C-style for less consistent, and violating one of their fundamental design principles (left is always at least as outside as right) was the best choice. Python doesn't have the left-outside principle to break (see comprehensions), doesn't have a C-style for to be consistent with, and has probably less rather than more performance impact (we know whether a loop variable is captured, and can skip it for non-cellvars). But it probably has more backward compatibility issues (nobody writes new code expecting it to work for C# 3 as well as C# 5, but people are still writing code that has to work with Python 2.7). So, unless we can be sure that nobody intentionally writes code with a free variable that captures a loop variable, the C# solution isn't available. Which means your solution is probably the next best thing. And, while I don't see any compelling need for it anywhere other than loop variables, there's also no compelling reason to ban it elsewhere, so why not keep assignment targets consistent.
![](https://secure.gravatar.com/avatar/5615a372d9866f203a22b2c437527bbb.jpg?s=120&d=mm&r=g)
On Wed, Jan 20, 2016 at 05:04:21PM -0800, Guido van Rossum wrote:
On Wed, Jan 20, 2016 at 4:10 PM, Steven D'Aprano <steve@pearwood.info> wrote: [...]
(I'm saving my energy for Eiffel-like require/ensure blocks *wink*).
Now you're making me curious.
Okay, just to satisfy your curiosity, and not as a concrete proposal at this time, here is a sketch of the sort of thing Eiffel uses for Design By Contract. Each function or method has an (optional, but recommended) pre-condition and post-condition. Using a hybrid Eiffel/Python syntax, here is a toy example: class Lunch: def __init__(self, arg): self.meat = self.spam(arg) def spam(self, n:int=5): """Set the lunch meat to n servings of spam.""" require: # Assert the pre-conditions of the method. assert n >= 1 ensure: # Assert the post-conditions of the method. assert self.meat.startswith('Spam') if ' ' in self.meat: assert ' spam' in self.meat # main body of the method, as usual serves = ['spam']*n serves[0] = serves.title() self.meat = ' '.join(serves) The require block runs before the body of the method, and the ensure block runs after the body, but before the method returns to the caller. If either fail their assertions, the method fails and raises an exception. Benefits: - The pre- and post-conditions make up (part of) the method's contract, which is part of the executable documentation of the method. Documentation tools can extract the ensure and require sections as present them as part of the API docs. - The compiler can turn the contract checking on or off as needed, with the ensure/require sections handled independently. - Testing pre- and post-conditions is logically separate from the method's implementation. This allows the implementation to vary while keeping the contract the same. - But at the same time, the contract is right there with the method, not seperated in some potentially distant part of the code base. I'm not going to go into detail about Design By Contract, if anyone wants to learn more you can start here: https://www.eiffel.com/values/design-by-contract/introduction/ https://docs.eiffel.com/book/method/et-design-contract-tm-assertions-and-exc... I've just discovered there's an older PEP for something similar: https://www.python.org/dev/peps/pep-0316/ but that uses docstrings for the contracts. I don't like that. -- Steve
![](https://secure.gravatar.com/avatar/7e41acaa8f6a0e0f5a7c645e93add55a.jpg?s=120&d=mm&r=g)
On Jan 25, 2016, at 15:34, Steven D'Aprano <steve@pearwood.info> wrote:
Okay, just to satisfy your curiosity, and not as a concrete proposal at this time, here is a sketch of the sort of thing Eiffel uses for Design By Contract.
I think it's worth explaining why this has to be an actual language feature, not something you just do by writing functions named "requires" and "ensures". Many of the benefits you cited would work just fine with a PyPI-library solution, but there are some problems that are much harder to solve: * You usually want requires to be able to access the return value and exception state, and maybe even any locals, and ensure to be able to access the parameters. * Faking ensure usually means finally or with (which means indenting your entire function) or a wrapper function (while precludes many simple designs). * Many contract assertions are slow (or even dangerous, when not upheld) to calculate, so just no-opping out the checker functions doesn't help. * Class invariants should be automatically verified as ensures on all public methods except __del__ and (if it raises) __init__. * Subclasses that override a method need to automatically inherit the base class's pre- and post-conditions (as well as possibly adding some of their own), even if they don't call the super method. * Some contract assertions can be tested at compile time. (Eiffel doesn't have much experimentation here; C# does, and there are rumors about Swift with clang-static-analyzer.) Some of these things can be shoehorned in with frame hacks and metaclasses and so on, but it's not fun. There's a lot of history of people trying to fake it in other languages and then giving up and saying "just use comments until we can build it into language version n+1". (See D 1.0, Core C++ Standard for C++14/17, C# 4, Swift 2...) There have been a few attempts for Python, but most of them seem to have run into similar problems, after a lot of messing around with metaclasses and so on.
![](https://secure.gravatar.com/avatar/89b67ecc87148f077c349f4fb6f705f6.jpg?s=120&d=mm&r=g)
On Mon, Jan 25, 2016 at 6:43 PM Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:
On Jan 25, 2016, at 15:34, Steven D'Aprano <steve@pearwood.info> wrote:
Okay, just to satisfy your curiosity, and not as a concrete proposal at this time, here is a sketch of the sort of thing Eiffel uses for Design By Contract.
I think it's worth explaining why this has to be an actual language feature, not something you just do by writing functions named "requires" and "ensures". Many of the benefits you cited would work just fine with a PyPI-library solution, but there are some problems that are much harder to solve:
Some of these things can be shoehorned in with frame hacks and metaclasses and so on, but it's not fun. ... There have been a few attempts for Python, but most of them seem to have run into similar problems, after a lot of messing around with metaclasses and so on.
As you were writing this, I was sketching out an implementation using a callable FunctionWithContract context manager as a decorator. As you say, the trouble seems to be elegantly capturing the function output and passing that to an ensure or __exit__ method. The requires side isn't so bad. Still, I'm somewhat hopeful that someone more skilled than I might be able to write an elegant ``Contract`` type using current Python syntax.
![](https://secure.gravatar.com/avatar/92136170d43d61a5eeb6ea8784294aa2.jpg?s=120&d=mm&r=g)
Just curious, Michael, what would you like the Python syntax version to look like if you *can* do whatever metaclass or stack hackery that's needed? I'm a little confused when you mention a decorator and a context manager in the same sentence since those would seem like different approaches. E.g.: @Contract(pre=my_pre, post=my_post) def my_fun(...): ... Versus: with contract(pre=my_pre, post=my_post): def my_fun(...): ... Versus: def my_fun(...): with contract(pre=my_pre, post=my_post): <suite> I'm sure lots of other variations are possible too (if any can be made fully to work). On Mon, Jan 25, 2016 at 5:01 PM, Michael Selik <mike@selik.org> wrote:
On Mon, Jan 25, 2016 at 6:43 PM Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:
On Jan 25, 2016, at 15:34, Steven D'Aprano <steve@pearwood.info> wrote:
Okay, just to satisfy your curiosity, and not as a concrete proposal at this time, here is a sketch of the sort of thing Eiffel uses for Design By Contract.
I think it's worth explaining why this has to be an actual language feature, not something you just do by writing functions named "requires" and "ensures". Many of the benefits you cited would work just fine with a PyPI-library solution, but there are some problems that are much harder to solve:
Some of these things can be shoehorned in with frame hacks and metaclasses and so on, but it's not fun. ... There have been a few attempts for Python, but most of them seem to have run into similar problems, after a lot of messing around with metaclasses and so on.
As you were writing this, I was sketching out an implementation using a callable FunctionWithContract context manager as a decorator. As you say, the trouble seems to be elegantly capturing the function output and passing that to an ensure or __exit__ method. The requires side isn't so bad.
Still, I'm somewhat hopeful that someone more skilled than I might be able to write an elegant ``Contract`` type using current Python syntax.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
![](https://secure.gravatar.com/avatar/5615a372d9866f203a22b2c437527bbb.jpg?s=120&d=mm&r=g)
On Mon, Jan 25, 2016 at 05:24:29PM -0800, David Mertz wrote:
Just curious, Michael, what would you like the Python syntax version to look like if you *can* do whatever metaclass or stack hackery that's needed? I'm a little confused when you mention a decorator and a context manager in the same sentence since those would seem like different approaches. E.g.:
I'm not Michael, but since I started this discussion, I'll give an answer. I haven't got any working code, but I think something like this would be acceptable as a proof-of-concept. I'd use a class as a fake namespace, with either a decorator or metaclass: class myfunction(metaclass=DBC): def myfunction(args): # function implementation ... def requires(): ... def ensures(): ... The duplication of the name is a bit ugly, and it looks a bit funny for the decorator/metaclass to take a class as input and return a function, but we don't really have anything else that makes a good namespace. There's functions themselves, of course, but it's hard to get at the internals. The point is to avoid having to pre-define the pre- and post-condition functions. We don't write this: def __init__(self): ... def method(self, arg): ... class MyClass(init=__init__, method=method) and nor should we have to do the same for require/ensure. -- Steve
![](https://secure.gravatar.com/avatar/d995b462a98fea412efa79d17ba3787a.jpg?s=120&d=mm&r=g)
On 26 January 2016 at 14:26, Steven D'Aprano <steve@pearwood.info> wrote:
class myfunction(metaclass=DBC): def myfunction(args): # function implementation ... def requires(): ... def ensures(): ...
The duplication of the name is a bit ugly, and it looks a bit funny for the decorator/metaclass to take a class as input and return a function, but we don't really have anything else that makes a good namespace
Well, classes can be callable already, so how about @DBC class myfunction: def __call__(self, args): ... @precondition def requires(self): ... @postcondition def ensures(self, result): ... The DBC class decorator does something like def DBC(cls): def wrapper(*args, **kw): fn = cls() fn.args = args fn.kw = kw for pre in fn.__preconditions__: pre() result = fn(*args, **kw) for post in fn.__postconditions__: post(result) return wrapper Pre and post conditions can access the args via self.args and self.kw. The method decorators would let you have multiple pre- and post-conditions. Or you could use "magic" names and omit the decorators. Paul
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Wed, Jan 27, 2016 at 2:06 AM, Paul Moore <p.f.moore@gmail.com> wrote:
Well, classes can be callable already, so how about
@DBC class myfunction: def __call__(self, args): ... @precondition def requires(self): ... @postcondition def ensures(self, result): ...
The DBC class decorator does something like
def DBC(cls): def wrapper(*args, **kw): fn = cls() fn.args = args fn.kw = kw for pre in fn.__preconditions__: pre() result = fn(*args, **kw) for post in fn.__postconditions__: post(result) return wrapper
Pre and post conditions can access the args via self.args and self.kw. The method decorators would let you have multiple pre- and post-conditions. Or you could use "magic" names and omit the decorators.
I'd rather use magic names - something like: @DBC class myfunction: def body(self, args): ... def requires(self): ... def ensures(self, result): ... and then the DBC decorator can create a __call__ method. This still has one nasty problem though: the requires and ensures functions can't see function arguments. You could get around this by duplicating the argument list onto the other two, but who wants to do that? ChrisA
![](https://secure.gravatar.com/avatar/d995b462a98fea412efa79d17ba3787a.jpg?s=120&d=mm&r=g)
On 26 January 2016 at 15:24, Chris Angelico <rosuav@gmail.com> wrote:
This still has one nasty problem though: the requires and ensures functions can't see function arguments.
See my code - you can put the args onto the instance as attributes for requires/ensures to inspect. Paul
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Wed, Jan 27, 2016 at 2:42 AM, Paul Moore <p.f.moore@gmail.com> wrote:
On 26 January 2016 at 15:24, Chris Angelico <rosuav@gmail.com> wrote:
This still has one nasty problem though: the requires and ensures functions can't see function arguments.
See my code - you can put the args onto the instance as attributes for requires/ensures to inspect.
Except that there can be only one of those at any given time, so you run into issues with recursion or threads/async/etc; plus, it's still not properly clean - you have to check either args or kwargs, depending on whether the argument was passed positionally or by keyword. I don't see that as a solution. (Maybe what we need is a "keyword-to-positional" functools feature - anything in **kwargs that can be interpreted positionally gets removed and added to *args. Or the other way - keywordify everything.) ChrisA
![](https://secure.gravatar.com/avatar/870d613430249e453343efc9667ef636.jpg?s=120&d=mm&r=g)
On 01/26/2016 04:51 PM, Chris Angelico wrote:
On Wed, Jan 27, 2016 at 2:42 AM, Paul Moore <p.f.moore@gmail.com> wrote:
On 26 January 2016 at 15:24, Chris Angelico <rosuav@gmail.com> wrote:
This still has one nasty problem though: the requires and ensures functions can't see function arguments.
See my code - you can put the args onto the instance as attributes for requires/ensures to inspect.
Except that there can be only one of those at any given time, so you run into issues with recursion or threads/async/etc; plus, it's still not properly clean - you have to check either args or kwargs, depending on whether the argument was passed positionally or by keyword. I don't see that as a solution.
(Maybe what we need is a "keyword-to-positional" functools feature - anything in **kwargs that can be interpreted positionally gets removed and added to *args. Or the other way - keywordify everything.)
Well, it's not in functools. import inspect def keyword_to_positional(func, args, kwargs): sig = inspect.signature(func).bind(*args, **kwargs) sig.apply_defaults() return sig.args, sig.kwargs def keywordify_everything(func, args, kwargs): sig = inspect.signature(func).bind(*args, **kwargs) sig.apply_defaults() return sig.arguments
![](https://secure.gravatar.com/avatar/7e41acaa8f6a0e0f5a7c645e93add55a.jpg?s=120&d=mm&r=g)
There are probably a dozen DBC packages on PyPI, and dozens more that never even got that far. If this is doable without language changes, surely it'll be done on PyPI first, get traction there, and only then be considered for inclusion in the stdlib (so that it can be used to contractify parts of the stdlib), right? But, since this is fun: On Jan 26, 2016, at 07:24, Chris Angelico <rosuav@gmail.com> wrote:
On Wed, Jan 27, 2016 at 2:06 AM, Paul Moore <p.f.moore@gmail.com> wrote: Well, classes can be callable already, so how about
@DBC class myfunction: def __call__(self, args): ... @precondition def requires(self): ... @postcondition def ensures(self, result): ...
The DBC class decorator does something like
def DBC(cls): def wrapper(*args, **kw): fn = cls() fn.args = args fn.kw = kw for pre in fn.__preconditions__: pre() result = fn(*args, **kw) for post in fn.__postconditions__: post(result) return wrapper
Pre and post conditions can access the args via self.args and self.kw. The method decorators would let you have multiple pre- and post-conditions. Or you could use "magic" names and omit the decorators.
I'd rather use magic names - something like:
@DBC class myfunction: def body(self, args): ... def requires(self): ... def ensures(self, result): ...
and then the DBC decorator can create a __call__ method. This still has one nasty problem though: the requires and ensures functions can't see function arguments. You could get around this by duplicating the argument list onto the other two, but who wants to do that?
You could do this pretty easily with a macro that returns (the AST for) something like this: def myfunction([func.body.params]): [func.requires.body] try: return_to_raise([func.body.body]) except Return as r: result, exc = r.args(0), None [func.ensures.body] return result except Exception as exc: [func.ensures.body] raise (I deliberately didn't write this in MacroPy style, but obviously if you really wanted to implement this, that's how you'd do it.) There are still a few things missing here. For example, many postconditions are specified in terms of the pre- and post- values of mutable parameters, with self as a very important special case. And fitting class invariant testing into this scheme should be extra fun. But I think it's all doable.
![](https://secure.gravatar.com/avatar/89b67ecc87148f077c349f4fb6f705f6.jpg?s=120&d=mm&r=g)
On Mon, Jan 25, 2016 at 7:24 PM David Mertz <mertz@gnosis.cx> wrote:
Just curious, Michael, what would you like the Python syntax version to look like if you *can* do whatever metaclass or stack hackery that's needed? I'm a little confused when you mention a decorator and a context manager in the same sentence since those would seem like different approaches.
Now that you mention it, that does seem weird. Initially the pattern of trying to factor out a setup/cleanup feels like a context manager. But we also need to capture the function arguments and return value. So that feels like a decorator. I started by implementing an abstract base class Contract that sets up the require/ensure behavior. One inherits and overrides to implement a particular contract. The overridden require/ensure functions would receive the arguments/result of a decorated function. class StayPositive(Contract): def require(self, *args, **kwargs): assert sum(args + list(kwargs.values()) def ensure(self, result, *args, **kwargs): # ensure receives not only the result, but also same argument objs assert sum(result) @StayPositive def foo(i + am + happy): return i + am + happy One thing I like here is that the require/ensure doesn't clutter the function definition with awkward decorator parameters. The contract terms are completely separate. This does put the burden on wisely naming the contract subclass name. The combination of decorator and context manager was unnecessary. The internals of my Contract base class included an awkward ``with self:``. If I were to refactor, I'd separate out a context manager helper from the decorator object. Seeing some of the stubs folks have written makes me think this ends with exec-ing a template a la namedtuple.
![](https://secure.gravatar.com/avatar/f001f209958608e012b1e171be2079e2.jpg?s=120&d=mm&r=g)
On Tue, Jan 26, 2016 at 10:34:55AM +1100, Steven D'Aprano wrote:
On Wed, Jan 20, 2016 at 05:04:21PM -0800, Guido van Rossum wrote:
On Wed, Jan 20, 2016 at 4:10 PM, Steven D'Aprano <steve@pearwood.info> wrote: [...]
(I'm saving my energy for Eiffel-like require/ensure blocks *wink*).
Now you're making me curious.
Okay, just to satisfy your curiosity, and not as a concrete proposal at this time, here is a sketch of the sort of thing Eiffel uses for Design By Contract.
Each function or method has an (optional, but recommended) pre-condition and post-condition. Using a hybrid Eiffel/Python syntax, here is a toy example:
class Lunch: def __init__(self, arg): self.meat = self.spam(arg)
def spam(self, n:int=5): """Set the lunch meat to n servings of spam.""" require: # Assert the pre-conditions of the method. assert n >= 1 ensure: # Assert the post-conditions of the method. assert self.meat.startswith('Spam') if ' ' in self.meat: assert ' spam' in self.meat # main body of the method, as usual serves = ['spam']*n serves[0] = serves.title() self.meat = ' '.join(serves)
The require block runs before the body of the method, and the ensure block runs after the body, but before the method returns to the caller. If either fail their assertions, the method fails and raises an exception.
Benefits:
- The pre- and post-conditions make up (part of) the method's contract, which is part of the executable documentation of the method. Documentation tools can extract the ensure and require sections as present them as part of the API docs.
- The compiler can turn the contract checking on or off as needed, with the ensure/require sections handled independently.
- Testing pre- and post-conditions is logically separate from the method's implementation. This allows the implementation to vary while keeping the contract the same.
- But at the same time, the contract is right there with the method, not seperated in some potentially distant part of the code base.
One thing I immediately thought of was using decorators. def requires(*conditions): def decorator(func): # TODO: Do some hackery such that the signature of wrapper # matches the signature of `func`. def wrapper(*args, **kwargs): for condition in conditions assert eval(condition, {}, locals()) return func(*args, **kwargs) return wrapper return decorator def ensure(*conditions): def decorator(func): def wrapper(*args, **kwargs): try: return func(*args, **kwargs) finally: for condition in conditions: assert eval(condition, {}, locals()) return decorator Maybe do some checking for the optimization-level flag, and replace the decorator function with `return func` instead of another wrapper? The `ensure` part isn't quite to my liking yet, but I think that the `ensure` should have no need to access internal variables of the function, but only the externally visible state. (This somewhat mimics what I'm trying to fiddle around with in my own time: writing a decorator that does run-time checking of argument and return types of functions.)
![](https://secure.gravatar.com/avatar/de311342220232e618cb27c9936ab9bf.jpg?s=120&d=mm&r=g)
On 01/25/2016 03:34 PM, Steven D'Aprano wrote:
On Wed, Jan 20, 2016 at 05:04:21PM -0800, Guido van Rossum wrote:
On Wed, Jan 20, 2016 at 4:10 PM, Steven D'Aprano wrote: [...]
(I'm saving my energy for Eiffel-like require/ensure blocks *wink*).
Now you're making me curious.
Okay, just to satisfy your curiosity, and not as a concrete proposal at this time, here is a sketch of the sort of thing Eiffel uses for Design By Contract.
Each function or method has an (optional, but recommended) pre-condition and post-condition. Using a hybrid Eiffel/Python syntax, here is a toy example:
class Lunch: def __init__(self, arg): self.meat = self.spam(arg)
def spam(self, n:int=5): """Set the lunch meat to n servings of spam.""" require: # Assert the pre-conditions of the method. assert n >= 1 ensure: # Assert the post-conditions of the method. assert self.meat.startswith('Spam') if ' ' in self.meat: assert ' spam' in self.meat # main body of the method, as usual serves = ['spam']*n serves[0] = serves.title() self.meat = ' '.join(serves)
I like that syntax. Currently, something not too ugly would be to use descriptors -- something like: from dbc import require, ensure class Frobnigate(object): @require def spammer(self, desc): desc.assertInRange(arg1, 0, 99) @spammer def _spammer(self, arg1, arg2): return arg1 // arg2 + arg1 @spammer.ensure def spammer(self, desc, res): if desc.arg2 % 2 == 1: desc.assertEqual(res % 2, 1) else: desc.assertEqual(res % 2, 0) @ensure def egger(self, desc, res): desc.assertIsType(res, str) @egger def _egger(self, egg_type): 'scrambled, poached, boiled, etc' return egg_type Where 'desc' in the above code is 'self' for the descriptor so saved arguments could be accessed, etc. I put a leading underscore on the body so it could be kept separate and more easily subclassed without losing the DBC portions. If 'require' is not needed, one can use 'ensure'; both create the DBC object which would take care of calling any/all requires, then the function, then any/all ensures, and also grabbing and saving the function signature and actual parameters. -- ~Ethan~
![](https://secure.gravatar.com/avatar/de311342220232e618cb27c9936ab9bf.jpg?s=120&d=mm&r=g)
On 01/26/2016 08:55 AM, Ethan Furman wrote:
Currently, something not too ugly would be to use descriptors -- something like:
from dbc import require, ensure
class Frobnigate(object): @require def spammer(self, desc): desc.assertInRange(arg1, 0, 99)
@spammer def _spammer(self, arg1, arg2): return arg1 // arg2 + arg1
@spammer.ensure def spammer(self, desc, res): if desc.arg2 % 2 == 1: desc.assertEqual(res % 2, 1) else: desc.assertEqual(res % 2, 0)
@ensure def egger(self, desc, res): desc.assertIsType(res, str)
@egger def _egger(self, egg_type): 'scrambled, poached, boiled, etc' return egg_type
Where 'desc' in the above code is 'self' for the descriptor so saved arguments could be accessed, etc.
I put a leading underscore on the body so it could be kept separate and more easily subclassed without losing the DBC portions.
If 'require' is not needed, one can use 'ensure'; both create the DBC object which would take care of calling any/all requires, then the function, then any/all ensures, and also grabbing and saving the function signature and actual parameters.
The descriptor itself might look like: # untested class require: def __init__(desc, func=None): desc.require = [] desc.ensure = [] desc.name = None desc.func = None def __call__(desc, func): # desc.func is not None, func is the actual function, # otherwise it's a requires function if desc.func is None: self.require.append(func) return desc else: desc.func_name = name = func.__name__ if name.startswith('_'): name = name[1:] desc.name = name return func def __get__(desc, self, cls): function = self.getattr(desc.func_name) def caller(self, *args, **kwds): for require in desc.require: require(self, desc, *args, **kwds) res = function(self, *args, **kwds) for ensure in desc.ensure: ensure(self, desc, res, *args, **kwds) return res return caller def ensure(desc, func): self.ensure.append(func) return desc def require(desc, func): self.require.append(func) return desc I decided to pass args and kwds rather than save them to the descriptor instance, hoping threading would be easier that way. The 'ensure' class would be very similar. This style does require the programmer to have both names: 'spammer' and '_spammer' -- it would be a bit cleaner to have a metaclass with a custom __getattribute__, but a lot more work and possible metaclass conflicts when combining with other interesting metaclasses. -- ~Ethan~
![](https://secure.gravatar.com/avatar/daa45563a98419bb1b6b63904ce71f95.jpg?s=120&d=mm&r=g)
Hi, Sorry but I'm lost in this long thread. Do you want to extend the Python language to declare constant in a function? Maybe I'm completly off-topic, sorry. 2016-01-21 1:10 GMT+01:00 Steven D'Aprano <steve@pearwood.info>:
(2) If we limit this to only capturing the same name, then we can only write (say) "static x", and that does look like a declaration. But maybe we want to allow the local name to differ from the global name:
static x = y
3 months ago, Serhiy Storchaka proposed a "const var = expr" syntax: https://mail.python.org/pipermail/python-ideas/2015-October/037028.html With a shortcut "const len" which is like "const len = len". In the meanwhile, I implemented an optimization in my FAT Python project: "Copy builtins to constant". It's quite simple: replace the "LOAD_GLOBAL builtin" instruction with a "LOAD_CONST builtin" transation and "patch" co_consts constants of a code object at runtime: def hello(): print("hello world") is replaced with: def hello(): "LOAD_GLOBAL print"("hello world") hello.__code__ = fat.replace_consts(hello.__code__, {'LOAD_GLOBAL print': print}) Where fat.replace_consts() is an helper to create a new code object replacing constants with the specified mapping: http://fatoptimizer.readthedocs.org/en/latest/fat.html#replace_consts Replacing print(...) with "LOAD_GLOBAL"(...) is done in the fatoptimizer (an AST optimpizer): http://fatoptimizer.readthedocs.org/en/latest/optimizations.html#copy-builti... We have to inject the builtin function at runtime. It cannot be done when the code object is created by "def ..." because a code object can only contain objects serializable by marshal (to be able to compile a .py file to a .pyc file).
I acknowledge that this goes beyond what the OP asked for, and I think that YAGNI is a reasonable response to the static block idea. I'm not going to champion it any further unless there's a bunch of interest from others. (I'm saving my energy for Eiffel-like require/ensure blocks *wink*).
The difference between "def hello(print=print): ..." and Serhiy's const idea (or my optimization) is that "def hello(print=print): ..." changes the signature of the function which can be a serious issue in an API. Note: The other optimization "local_print = print" in the function is only useful for loops (when the builtin is loaded multiple times) and it still loads the builtin once per function call, whereas my optimization uses a constant and so no lookup is required anymore. Then guards are used to disable the optimization if builtins are modified. See the PEP 510 for an explanation on that part. Victor
![](https://secure.gravatar.com/avatar/0a2191a85455df6d2efdb22c7463c304.jpg?s=120&d=mm&r=g)
On 21.01.2016 09:48, Victor Stinner wrote:
The difference between "def hello(print=print): ..." and Serhiy's const idea (or my optimization) is that "def hello(print=print): ..." changes the signature of the function which can be a serious issue in an API.
Note: The other optimization "local_print = print" in the function is only useful for loops (when the builtin is loaded multiple times) and it still loads the builtin once per function call, whereas my optimization uses a constant and so no lookup is required anymore.
Then guards are used to disable the optimization if builtins are modified. See the PEP 510 for an explanation on that part.
I ran performance tests on these optimization tricks (and others) in 2014. See this talk: http://www.egenix.com/library/presentations/PyCon-UK-2014-When-performance-m... (slides 33ff.) The keyword trick doesn't really pay off in terms of added performance vs. danger of introducing weird bugs. Still, it would be great to have a way to say "please look this symbol up at compile time and stick the result in a local variable" (which is basically what the keyword trick does), only in a form that's easier to detect when reading the code and doesn't change the function signature. A decorator could help with this (by transforming the byte code and localizing the symbols), e.g. @localize(len) def f(seq): z = 0 for x in seq: if x: z += len(x) return z but the more we move language features to decorators, the less readable the code will get by having long tails of decorators on many functions (we don't really want our functions to resemble snakes, do we ? :-)). So perhaps it is indeed time for a new keyword to localize symbols in a function or module, say: # module scope localization, applies to all code objects in # this module: localize len def f(seq): ... or: def f(seq): # Localize len in this function, since we need it in # tight loops localize len ... All that said, I don't really believe that this is a high priority feature request. The gained performance win is not all that great and only becomes relevant when used in tight loops. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jan 21 2016)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ Python Database Interfaces ... http://products.egenix.com/ Plone/Zope Database Interfaces ... http://zope.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/
![](https://secure.gravatar.com/avatar/daa45563a98419bb1b6b63904ce71f95.jpg?s=120&d=mm&r=g)
2016-01-21 10:39 GMT+01:00 M.-A. Lemburg <mal@egenix.com>:
I ran performance tests on these optimization tricks (and others) in 2014. See this talk:
http://www.egenix.com/library/presentations/PyCon-UK-2014-When-performance-m... (slides 33ff.)
Ah nice, thanks for the slides.
The keyword trick doesn't really pay off in terms of added performance vs. danger of introducing weird bugs.
I ran a quick microbenchmark to measure the cost of LOAD_GLOBAL to load a global: call func("abc") with mylen = len def func(obj): return mylen(obj) Result: 117 ns: original bytecode (LOAD_GLOBAL) 109 ns: LOAD_CONST 116 ns: LOAD_CONST with guard LOAD_CONST avoids 1 dict lookup (globals) and reduces the runtime by 8 ns: 7% faster. But the guard has a cost of 7 ns: we only win 1 nanosecond. Not really interesting here. LOAD_CONST means that the LOAD_GLOBAL instruction has been replaced with a LOAD_CONST instruction. The guard checks if the frame globals and globals()['mylen'] didn't change. I ran a second microbenchmark on func("abc") to measure the cost LOAD_GLOBAL to load a builtin: call func("abc") with def func(obj): return len(obj) Result: 124 ns: original bytecode (LOAD_GLOBAL) 107 ns: LOAD_CONST 116 ns: LOAD_CONST with guard on builtins + globals LOAD_CONST avoids 2 dict lookup (globals, builtins) and reduces the runtime by 17 ns: 14% faster. But the guard has a cost of 9 ns: we win 8 nanosecond, 6% faster. Here is the guard is more complex: checks if the frame builtins, the frame globals, builtins.__dict__['len'] and globals()['len'] didn't change. If you avoid guards, it's always faster, but it changes the Python semantics. The speedup on such very small example is low. It's more interesting when the global or builtin variable is used in a loop: the speedup is multipled by the number of loop iterations.
A decorator could help with this (by transforming the byte code and localizing the symbols), e.g.
@localize(len) def f(seq): z = 0 for x in seq: if x: z += len(x) return z
FYI https://pypi.python.org/pypi/codetransformer has such decorator: @asconstants(len=len).
All that said, I don't really believe that this is a high priority feature request. The gained performance win is not all that great and only becomes relevant when used in tight loops.
Yeah, in the Python stdlib, the hack is only used for loops. Victor
![](https://secure.gravatar.com/avatar/0a2191a85455df6d2efdb22c7463c304.jpg?s=120&d=mm&r=g)
On 21.01.2016 14:19, Victor Stinner wrote:
2016-01-21 10:39 GMT+01:00 M.-A. Lemburg <mal@egenix.com>:
I ran performance tests on these optimization tricks (and others) in 2014. See this talk:
http://www.egenix.com/library/presentations/PyCon-UK-2014-When-performance-m... (slides 33ff.)
Ah nice, thanks for the slides.
Forgot to mention the benchmarks I used: https://github.com/egenix/when-performance-matters
The keyword trick doesn't really pay off in terms of added performance vs. danger of introducing weird bugs.
I ran a quick microbenchmark to measure the cost of LOAD_GLOBAL to load a global: call func("abc") with
mylen = len def func(obj): return mylen(obj)
Result:
117 ns: original bytecode (LOAD_GLOBAL) 109 ns: LOAD_CONST 116 ns: LOAD_CONST with guard
LOAD_CONST avoids 1 dict lookup (globals) and reduces the runtime by 8 ns: 7% faster. But the guard has a cost of 7 ns: we only win 1 nanosecond. Not really interesting here.
LOAD_CONST means that the LOAD_GLOBAL instruction has been replaced with a LOAD_CONST instruction. The guard checks if the frame globals and globals()['mylen'] didn't change.
I ran a second microbenchmark on func("abc") to measure the cost LOAD_GLOBAL to load a builtin: call func("abc") with
def func(obj): return len(obj)
Result:
124 ns: original bytecode (LOAD_GLOBAL) 107 ns: LOAD_CONST 116 ns: LOAD_CONST with guard on builtins + globals
LOAD_CONST avoids 2 dict lookup (globals, builtins) and reduces the runtime by 17 ns: 14% faster. But the guard has a cost of 9 ns: we win 8 nanosecond, 6% faster.
Here is the guard is more complex: checks if the frame builtins, the frame globals, builtins.__dict__['len'] and globals()['len'] didn't change.
If you avoid guards, it's always faster, but it changes the Python semantics.
The speedup on such very small example is low. It's more interesting when the global or builtin variable is used in a loop: the speedup is multipled by the number of loop iterations.
Sure, but for those, you'd probably simply use the in-function localization: def f(seq): z = 0 local_len = len for x in seq: if x: z += local_len(x) return z This results in a LOAD_FAST inside the loop and is probably the better way to speed things up.
A decorator could help with this (by transforming the byte code and localizing the symbols), e.g.
@localize(len) def f(seq): z = 0 for x in seq: if x: z += len(x) return z
FYI https://pypi.python.org/pypi/codetransformer has such decorator: @asconstants(len=len).
Interesting :-)
All that said, I don't really believe that this is a high priority feature request. The gained performance win is not all that great and only becomes relevant when used in tight loops.
Yeah, in the Python stdlib, the hack is only used for loops.
Right. The only advantage I'd see in having a keyword to "configure" the behavior is that you could easily apply the change to a whole module/function without having to add explicit localizations everywhere. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jan 21 2016)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ Python Database Interfaces ... http://products.egenix.com/ Plone/Zope Database Interfaces ... http://zope.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/
![](https://secure.gravatar.com/avatar/7e41acaa8f6a0e0f5a7c645e93add55a.jpg?s=120&d=mm&r=g)
On Jan 21, 2016, at 00:48, Victor Stinner <victor.stinner@gmail.com> wrote:
Hi,
Sorry but I'm lost in this long thread.
I think the whole issue of const optimization is taking this discussion way off track, so let me try to summarize the actual issue. What the thread is ultimately looking for is a solution to the "closures capturing loop variables" problem. This problem has been in the official programming FAQ[1] for decades, as "Why do lambdas defined in a loop with different values all return the same result"? powers = [lambda x: x**i for i in range(10)] This gives you ten functions that all return x**9, which is probably not what you wanted. The reason this is a problem is that Python uses "late binding", which in this context means that each of those functions is a closure that captures the variable i in a way that looks up the value of i at call time. All ten functions capture the same variable, and when you later call them, that variable's value is 9. Almost every language with real closures and for-each loops has the same problem, but someone who's coming to Python as a first language, or coming from a language like C that doesn't have those features, is almost guaranteed to be confused by this when he first sees it. (Presumably, that's why it's in the FAQ.) The OP proposed that we should add some syntax, borrowed from C++, to function definitions that specifies that some things get captured by value. You could instead describe this as early binding the specified names, or as not capturing at all, but however you describe it, the idea is pretty simple. The obvious way to implement it is to copy the values into the function object at function-creation time, then copy them into locals at call time--exactly like default parameter values. (Not too surprising, because default parameter values are the idiomatic workaround today.) A few alternatives to the parameter-like syntax borrowed from C++ were proposed, including "def power(x; i):" (notice the semicolon) and "def power(x)(i):". A few people also proposed a new declaration statement similar to "global" and "nonlocal"--which opens the question of what to call it; suggested names included "shared", "sharedlocal", and "capture". People also suggested an optimization: store them like constants, instead of like default values, so they don't need to be copied into locals. (This is similar to the global->const optimizations being discussed in the FAT threads, but here it's optimizing the equivalent of default parameter values, not globals. Which means it's much less important of an optimization, since defaults are only fetched once per call, after which they're looked up the same as locals, which are just as fast as consts. It _could_ potentially feed into further FAT-type optimizations, but that's getting pretty speculative.) The obvious downside here is that constants are stored in the code object, so instead of 10 (small) function objects all sharing the same (big) code object, you'd have 10 function objects with 10 separate (big) code objects Another alternative, which I don't think anyone seriously considered, is to flag the specified freevars so that, at function creation time, we copy the cell and bind that copy, instead of binding the original cell. (This alternative can't really be called "early binding" or "capture by value", but it has the same net effect.) Finally, Terry suggested a completely different solution to the problem: don't change closures; change for loops. Make them create a new variable each time through the loop, instead of reusing the same variable. When the variable isn't captured, this would make no difference, but when it is, closures from different iterations would capture different variables (and therefore different cells). For backward-compatibility reasons, this might have to be optional, which means new syntax; he proposed "for new i in range(10):". I don't know of any languages that use the C++-style solution that don't have lvalues to worry about. It's actually necessary for other reasons in C++ (capturing a variable doesn't extend its lifetime, so you need to be able to explicitly copy things or you end up with dangling references), but those reasons don't apply to Python (or C#, Swift, JavaScript, etc.). Still, it is a well-known solution to the problem. Terry's solution, on the other hand, is used by Swift (from the start, even though it _does_ have lvalues), C# (since 5.0), and Ruby (since 1.9), among other languages. C#, in particular, decided to add it as a breaking change to a mature language, rather than adding new syntax, because Eric Lippert believed that almost any code that's relying on the old behavior is probably a bug rather than intentional. [1]: https://docs.python.org/3/faq/programming.html#why-do-lambdas-defined-in-a-l...
Do you want to extend the Python language to declare constant in a function? Maybe I'm completly off-topic, sorry.
2016-01-21 1:10 GMT+01:00 Steven D'Aprano <steve@pearwood.info>:
(2) If we limit this to only capturing the same name, then we can only write (say) "static x", and that does look like a declaration. But maybe we want to allow the local name to differ from the global name:
static x = y
3 months ago, Serhiy Storchaka proposed a "const var = expr" syntax: https://mail.python.org/pipermail/python-ideas/2015-October/037028.html
With a shortcut "const len" which is like "const len = len".
In the meanwhile, I implemented an optimization in my FAT Python project: "Copy builtins to constant". It's quite simple: replace the "LOAD_GLOBAL builtin" instruction with a "LOAD_CONST builtin" transation and "patch" co_consts constants of a code object at runtime:
def hello(): print("hello world")
is replaced with:
def hello(): "LOAD_GLOBAL print"("hello world") hello.__code__ = fat.replace_consts(hello.__code__, {'LOAD_GLOBAL print': print})
Where fat.replace_consts() is an helper to create a new code object replacing constants with the specified mapping: http://fatoptimizer.readthedocs.org/en/latest/fat.html#replace_consts
Replacing print(...) with "LOAD_GLOBAL"(...) is done in the fatoptimizer (an AST optimpizer): http://fatoptimizer.readthedocs.org/en/latest/optimizations.html#copy-builti...
We have to inject the builtin function at runtime. It cannot be done when the code object is created by "def ..." because a code object can only contain objects serializable by marshal (to be able to compile a .py file to a .pyc file).
I acknowledge that this goes beyond what the OP asked for, and I think that YAGNI is a reasonable response to the static block idea. I'm not going to champion it any further unless there's a bunch of interest from others. (I'm saving my energy for Eiffel-like require/ensure blocks *wink*).
The difference between "def hello(print=print): ..." and Serhiy's const idea (or my optimization) is that "def hello(print=print): ..." changes the signature of the function which can be a serious issue in an API.
Note: The other optimization "local_print = print" in the function is only useful for loops (when the builtin is loaded multiple times) and it still loads the builtin once per function call, whereas my optimization uses a constant and so no lookup is required anymore.
Then guards are used to disable the optimization if builtins are modified. See the PEP 510 for an explanation on that part.
Victor _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
![](https://secure.gravatar.com/avatar/89b67ecc87148f077c349f4fb6f705f6.jpg?s=120&d=mm&r=g)
On Jan 22, 2016, at 11:50 PM, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
On Jan 21, 2016, at 00:48, Victor Stinner <victor.stinner@gmail.com> wrote:
Hi,
Sorry but I'm lost in this long thread.
I think the whole issue of const optimization is taking this discussion way off track, so let me try to summarize the actual issue.
What the thread is ultimately looking for is a solution to the "closures capturing loop variables" problem. This problem has been in the official programming FAQ[1] for decades, as "Why do lambdas defined in a loop with different values all return the same result"?
powers = [lambda x: x**i for i in range(10)]
This gives you ten functions that all return x**9, which is probably not what you wanted.
The original request could have also been solved with ``functools.partial``. Sure, this is a toy solution, but the problem as originally shared was a toy problem.
from functors import partial a = 1 f = partial(lambda a,x: a+x, a) f(10) 11 a = 2 f(10) 11
Seems to me quite similar to the original suggestion from haael: """ a = 1 b = 2 c = 3 fun = lambda[a, b, c] x, y: a + b + c + x + y """
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Sat, Jan 23, 2016 at 3:50 PM, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
Finally, Terry suggested a completely different solution to the problem: don't change closures; change for loops. Make them create a new variable each time through the loop, instead of reusing the same variable. When the variable isn't captured, this would make no difference, but when it is, closures from different iterations would capture different variables (and therefore different cells). For backward-compatibility reasons, this might have to be optional, which means new syntax; he proposed "for new i in range(10):".
Not just for backward compatibility. Python's scoping and assignment rules are currently very straight-forward: assignment creates a local name unless told otherwise by a global/nonlocal declaration, and *all* name binding follows the same rules as assignment. Off the top of my head, I can think of two special cases, neither of which is truly a change to the binding semantics: "except X as Y:" triggers an unbinding at the end of the block, and comprehensions have a hidden function boundary that means their iteration variables are more local than you might think. Making for loops behave differently by default would be a stark break from that tidiness. It seems odd to change this on the loop, though. Is there any reason to use "for new i in range(10):" if you're not making a series of nested functions? Seems most logical to make this a special way of creating functions, not of looping. ChrisA
![](https://secure.gravatar.com/avatar/7e41acaa8f6a0e0f5a7c645e93add55a.jpg?s=120&d=mm&r=g)
On Jan 22, 2016, at 21:06, Chris Angelico <rosuav@gmail.com> wrote:
On Sat, Jan 23, 2016 at 3:50 PM, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
Finally, Terry suggested a completely different solution to the problem: don't change closures; change for loops. Make them create a new variable each time through the loop, instead of reusing the same variable. When the variable isn't captured, this would make no difference, but when it is, closures from different iterations would capture different variables (and therefore different cells). For backward-compatibility reasons, this might have to be optional, which means new syntax; he proposed "for new i in range(10):".
Not just for backward compatibility. Python's scoping and assignment rules are currently very straight-forward: assignment creates a local name unless told otherwise by a global/nonlocal declaration, and *all* name binding follows the same rules as assignment. Off the top of my head, I can think of two special cases, neither of which is truly a change to the binding semantics: "except X as Y:" triggers an unbinding at the end of the block, and comprehensions have a hidden function boundary that means their iteration variables are more local than you might think. Making for loops behave differently by default would be a stark break from that tidiness.
As a side note, notice that if you don't capture the variable, there is no observable difference (which means CPython would be well within its rights to optimize it by reusing the same variable unless it's a cellvar). Anyway, yes, it's still something that you have to learn--but the unexpected-on-first-encounter interaction between loop variables and closures is also something that everybody has to learn. And, even after you understand it, it still doesn't become obvious until you've been bitten by it enough times (and if you're going back and forth between Python and a language that's solved the problem, one way or the other, you may keep relearning it). So, theoretically, the status quo is certainly simpler, but in practice, I'm not sure it is.
It seems odd to change this on the loop, though. Is there any reason to use "for new i in range(10):" if you're not making a series of nested functions?
Rarely if ever. But is there any reason to "def spam(x; i):" or "def [i](x):" or whatever syntax people like if you're not overwriting i with a different and unwanted value? And is there any reason to reuse a variable you've bound in that way if a loop isn't forcing you to do so? This problem comes up all the time, in all kinds of languages, when loops and closures intersect. It almost never comes up with loops alone or closures alone.
Seems most logical to make this a special way of creating functions, not of looping.
There are also some good theoretical motivations for changing loops, but I'm really hoping someone else (maybe the Swift or C# dev team blogs) has already written it up, so I can just post a link and a short "... and here's why it also applies to Python" (complicated by the fact that one of the motivations _doesn't_ apply to Python...). Also, the idea of a closure "capturing by value" is pretty strange on the surface; you have to think through why that doesn't just mean "not capturing" in a language like Python. Nick Coghlan suggests calling it "capture at definition" vs. "capture at call", which helps, but it's still weird. Weirder than loops creating a new binding that has the same name as the old one in a let-less language? I don't know. They're both weird. And so is the existing behavior, despite the fact that it makes perfect sense once you work it through. Anyway, for now, I'll just repeat that Ruby, Swift, C#, etc. all solved this by changing for loops, while only C++, which already needed to change closures because of its lifetime rules, solved it by changing closures. On the other hand, JavaScript and Java both explicitly rejected any change to fix the problem, and Python has lived with it for a long time, so...
![](https://secure.gravatar.com/avatar/7e41acaa8f6a0e0f5a7c645e93add55a.jpg?s=120&d=mm&r=g)
I said I'd write something up over the weekend if I couldn't find a good writeup from the Swift, C#, or Scala communities. I couldn't, so I did: https://stupidpythonideas.blogspot.com/2016/01/for-each-loops-should-define-... Apologies for the formatting (which I blame on blogspot--my markdown-to-html-with-workarounds-for-blogspot-sucking toolchain is still not perfect), and for being not entirely focused on Python (which is a consequence of Ruby and C# people being vaguely interested in it), and for being overly verbose (which is entirely my fault, as usual). Sent from my iPhone
On Jan 22, 2016, at 21:36, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
On Jan 22, 2016, at 21:06, Chris Angelico <rosuav@gmail.com> wrote:
On Sat, Jan 23, 2016 at 3:50 PM, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
Finally, Terry suggested a completely different solution to the problem: don't change closures; change for loops. Make them create a new variable each time through the loop, instead of reusing the same variable. When the variable isn't captured, this would make no difference, but when it is, closures from different iterations would capture different variables (and therefore different cells). For backward-compatibility reasons, this might have to be optional, which means new syntax; he proposed "for new i in range(10):".
Not just for backward compatibility. Python's scoping and assignment rules are currently very straight-forward: assignment creates a local name unless told otherwise by a global/nonlocal declaration, and *all* name binding follows the same rules as assignment. Off the top of my head, I can think of two special cases, neither of which is truly a change to the binding semantics: "except X as Y:" triggers an unbinding at the end of the block, and comprehensions have a hidden function boundary that means their iteration variables are more local than you might think. Making for loops behave differently by default would be a stark break from that tidiness.
As a side note, notice that if you don't capture the variable, there is no observable difference (which means CPython would be well within its rights to optimize it by reusing the same variable unless it's a cellvar).
Anyway, yes, it's still something that you have to learn--but the unexpected-on-first-encounter interaction between loop variables and closures is also something that everybody has to learn. And, even after you understand it, it still doesn't become obvious until you've been bitten by it enough times (and if you're going back and forth between Python and a language that's solved the problem, one way or the other, you may keep relearning it). So, theoretically, the status quo is certainly simpler, but in practice, I'm not sure it is.
It seems odd to change this on the loop, though. Is there any reason to use "for new i in range(10):" if you're not making a series of nested functions?
Rarely if ever. But is there any reason to "def spam(x; i):" or "def [i](x):" or whatever syntax people like if you're not overwriting i with a different and unwanted value? And is there any reason to reuse a variable you've bound in that way if a loop isn't forcing you to do so?
This problem comes up all the time, in all kinds of languages, when loops and closures intersect. It almost never comes up with loops alone or closures alone.
Seems most logical to make this a special way of creating functions, not of looping.
There are also some good theoretical motivations for changing loops, but I'm really hoping someone else (maybe the Swift or C# dev team blogs) has already written it up, so I can just post a link and a short "... and here's why it also applies to Python" (complicated by the fact that one of the motivations _doesn't_ apply to Python...).
Also, the idea of a closure "capturing by value" is pretty strange on the surface; you have to think through why that doesn't just mean "not capturing" in a language like Python. Nick Coghlan suggests calling it "capture at definition" vs. "capture at call", which helps, but it's still weird. Weirder than loops creating a new binding that has the same name as the old one in a let-less language? I don't know. They're both weird. And so is the existing behavior, despite the fact that it makes perfect sense once you work it through.
Anyway, for now, I'll just repeat that Ruby, Swift, C#, etc. all solved this by changing for loops, while only C++, which already needed to change closures because of its lifetime rules, solved it by changing closures. On the other hand, JavaScript and Java both explicitly rejected any change to fix the problem, and Python has lived with it for a long time, so...
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
![](https://secure.gravatar.com/avatar/72ee673975357d43d79069ac1cd6abda.jpg?s=120&d=mm&r=g)
Terry Reedy wrote:
Finally, Terry suggested a completely different solution to the problem: don't change closures; change for loops.
I remember that proposal, but it was someone other than me.
If you're looking for the perpetrator of "for new i in ...", I confess it was me. -- Greg
![](https://secure.gravatar.com/avatar/f3ba3ecffd20251d73749afbfa636786.jpg?s=120&d=mm&r=g)
On 23 January 2016 at 14:50, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
What the thread is ultimately looking for is a solution to the "closures capturing loop variables" problem. This problem has been in the official programming FAQ[1] for decades, as "Why do lambdas defined in a loop with different values all return the same result"?
powers = [lambda x: x**i for i in range(10)]
This gives you ten functions that all return x**9, which is probably not what you wanted.
The reason this is a problem is that Python uses "late binding", which in this context means that each of those functions is a closure that captures the variable i in a way that looks up the value of i at call time. All ten functions capture the same variable, and when you later call them, that variable's value is 9.
Thanks for that summary, Andrew. While I do make some further thoughts below, I'll also note explicitly that I think the status quo in this area is entirely acceptable, and we don't actually *need* to change anything. However, there have already been some new ways of looking at the question that haven't come up previously, so I think it's a worthwhile discussion, even though the most likely outcome is still "No change".
The OP proposed that we should add some syntax, borrowed from C++, to function definitions that specifies that some things get captured by value. You could instead describe this as early binding the specified names, or as not capturing at all, but however you describe it, the idea is pretty simple. The obvious way to implement it is to copy the values into the function object at function-creation time, then copy them into locals at call time--exactly like default parameter values. (Not too surprising, because default parameter values are the idiomatic workaround today.)
In an off-list discussion with Andrew, I noted that one reason the "capture by value" terminology was confusing me was because it made me think in terms of "pass by reference" and "pass by value" in C/C++, neither of which is actually relevant to the discussion at hand. However, he also pointed out that "early binding" vs "late binding" was also confusing, since the compile-time/definition-time/call-time distinction in Python is relatively unique, and in many other contexts "early binding" refers to things that happen at compile time. As a result (and as Andrew already noted in another email), I'm currently thinking of the behaviour of nonlocal and global variables as "capture at call", while the values of default parameters are "capture at definition". (If "capture" sounds weird, "resolve at call" and "resolve at definition" also work). The subtlety of this distinction actually shows up in *two* entries in the programming FAQ. Andrew already mentioned the interaction of loops and closures, where capture-at-call surprises people: https://docs.python.org/3/faq/programming.html#why-do-lambdas-defined-in-a-l... However, there are also mutable default arguments, where it is capture-at-definition that is often surprising: https://docs.python.org/3/faq/programming.html#why-are-default-values-shared... While nobody's proposing to change the latter, providing an explicit syntax for "capture at definition" may still have a beneficial side effect in making it easier to explain the way default arguments are evaluated and stored on the function object at function definition time rather than created anew each time the function runs. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
![](https://secure.gravatar.com/avatar/72ee673975357d43d79069ac1cd6abda.jpg?s=120&d=mm&r=g)
Nick Coghlan wrote:
As a result (and as Andrew already noted in another email), I'm currently thinking of the behaviour of nonlocal and global variables as "capture at call",
That's not right either, because if a free variable gets reassigned between the time of the call and the time the variable is used within the function, the new value is seen. -- Greg
![](https://secure.gravatar.com/avatar/1e126970cb50fcf90ed4cb1a089ebd73.jpg?s=120&d=mm&r=g)
Nick Coghlan <ncoghlan@...> writes:
On 23 January 2016 at 14:50, Andrew Barnert via Python-ideas <python-ideas@...> wrote:
What the thread is ultimately looking for is a solution to the "closures capturing loop variables" problem. This problem has been in the official programming FAQ[1] for decades, as "Why do lambdas defined in a loop with different values all return the same result"?
powers = [lambda x: x**i for i in range(10)]
This gives you ten functions that all return x**9, which is probably not what you wanted.
The reason this is a problem is that Python uses "late binding", which in this context means that each of those functions is a closure that captures the variable i in a way that looks up the value of i at call time. All ten functions capture the same variable, and when you later call them, that variable's value is 9.
I've never liked the use of "late binding" in this context. The behavior is totally standard for closures that use mutable values. Here's OCaml, using refs (mutable reference cells) instead of the regular immutable values. BTW, no one would write OCaml like in the following example, it's just for clarity): let i = ref 0.0;; # val i : float ref = {contents = 0.} let rpow = ref [];; # val rpow : '_a list ref = {contents = []} while (!i < 10.0) do rpow := (fun x -> x**(!i)) :: !rpow; i := !i +. 1.0 done;; - : unit = () let powers = List.rev !rpow;; val powers : (float -> float) list = [<fun>; <fun>; <fun>; <fun>; <fun>; <fun>; <fun>; <fun>; <fun>; <fun>] List.map (fun f -> f 10.0) powers;; - : float list = [10000000000.; 10000000000.; 10000000000.; 10000000000.; 10000000000.; 10000000000.; 10000000000.; 10000000000.; 10000000000.; 10000000000.] # You see that "i" is a reference cell, i.e. it's compiled to a C struct and lookups are just a pointer dereference. Conceptually Python's dictionaries are really just the same as reference cells, except they hold more than one value. So, to me the entire question is more one of immutable vs. mutable rather than late vs. early binding. Stefan Krah
![](https://secure.gravatar.com/avatar/047f2332cde3730f1ed661eebb0c5686.jpg?s=120&d=mm&r=g)
On Sat, Jan 23, 2016 at 4:57 AM, Stefan Krah <skrah.temporarily@gmail.com> wrote:
I've never liked the use of "late binding" in this context. The behavior is totally standard for closures that use mutable values.
I wonder if the problem isn't that "binding" is a term imported from a different language philosophy, and the idea there is just fundamentally different from Python's philosophy about variables. In Python, a variable is *conceptually* just a key in a dict (and often, like for globals, builtins and instance or class variables, that really is how it's implemented). The variable name is the key, and there are implicit (and often dynamic) rules for deciding which dict to use. For local variables this is a bit of a lie, but the language goes out of its way to make it appear true (e.g. the existence of locals()). This concept is also valid for nonlocals (either the implicit PY2 kind, of the explicit PY3 kind introduced by a nonlocal statement). The implementation through "cells" is nearly unobservable (try getting a hold of a cell object through introspection without using ctypes!) and is just an optimization. Semantically (if we don't mind keeping other objects alive loger), nonlocals can be implemented by just holding on to the stack frame of the function call where they live, or, if locals hadn't been optimized, holding on to the dict containing that frame's locals would also work. So, I don't really want to introduce "for new x in ..." because it suddenly introduces a completely different concept into the language, and it would be really hard to explain what it does to someone who has correctly grasped Python's concept of variables as keys in a dict. What dict hold x in "for new x ..."? It would have to be considered a new dict created just to hold x, but other variables assigned in the body of the for loop would still be in the dict holding all the other locals of the function. Bah. -- --Guido van Rossum (python.org/~guido)
![](https://secure.gravatar.com/avatar/1e126970cb50fcf90ed4cb1a089ebd73.jpg?s=120&d=mm&r=g)
Guido van Rossum <guido@...> writes:
I've never liked the use of "late binding" in this context. The behavior is totally standard for closures that use mutable values.
I wonder if the problem isn't that "binding" is a term imported from a different language philosophy, and the idea there is just fundamentally different from Python's philosophy about variables.
I think my point is that even if "late binding" is the best term for Python's symbol resolution scheme, it may not be optimal to use it as an explanation for this particular closure behavior, since all languages with mutable closures behave in the same manner (and most of them would be classified as "early binding" languages). Stefan Krah
![](https://secure.gravatar.com/avatar/72ee673975357d43d79069ac1cd6abda.jpg?s=120&d=mm&r=g)
Guido van Rossum wrote:
So, I don't really want to introduce "for new x in ..." because it suddenly introduces a completely different concept into the language,
What dict hold x in "for new x ..."? It would have to be considered a new dict created just to hold x, but other variables assigned in the body of the for loop would still be in the dict holding all the other locals of the function.
We could say that the body of a "for new" loop is a nested scope in which all other referenced variables are implicitly declared "nonlocal". -- Greg
![](https://secure.gravatar.com/avatar/f3ba3ecffd20251d73749afbfa636786.jpg?s=120&d=mm&r=g)
On 24 January 2016 at 07:16, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Guido van Rossum wrote:
So, I don't really want to introduce "for new x in ..." because it suddenly introduces a completely different concept into the language,
What dict hold x in "for new x ..."? It would have to be considered a new dict created just to hold x, but other variables assigned in the body of the for loop would still be in the dict holding all the other locals of the function.
We could say that the body of a "for new" loop is a nested scope in which all other referenced variables are implicitly declared "nonlocal".
This actually ties into an idea your suggestion prompted: it would likely suffice if we had a way to request "create a new scope per iteration" behaviour in for loops and comprehensions, with no implicit nonlocal behaviour at all. Consider Guido's spelled out list comprehension equivalent: powers = [] for i in range(10): def f(x): return x**i powers.append(f) There's no rebinding of values in the current scope there - only mutation of a list. Container comprehensions and generator expressions have the same characteristic - no name rebinding occurs in the loop body, so the default handling of rebinding of names other than the iteration variables doesn't matter. Accordingly, a statement like: powers = [] for new i in range(10): def f(x): return x**i powers.append(f) Could be semantically equivalent to: powers = [] for i in range(10): def _for_loop_suite(i=i): def f(x): return x**i powers.append(f) _for_loop_suite() del _for_loop_suite Capturing additional values on each iteration would be possible with a generator expression: for new i, a, b, c in (i, a, b, c for i range(10)): def f(x): return x**i, a, b, c While nonlocal and global declarations would work the same way they do in any other nested function. For a practical example of this, consider the ThreadPoolExecutor example from the concurrent.futures docs: https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor... A scope-per-iteration construct makes it much easier to use a closure to define the operation submitted to the executor for each URL: with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor: # Start the load operations and mark each future with its URL future_to_site = {} for new site_url in sites_to_load: def load_site(): with urllib.request.urlopen(site_url, timeout=60) as conn: return conn.read() future_to_site[executor.submit(load_site)] = site_url # Report results as they become available for future in concurrent.futures.as_completed(future_to_site): site_url = future_to_site[future] try: data = future.result() except Exception as exc: print('%r generated an exception: %s' % (site_url, exc)) else: print('%r page is %d bytes' % (site_url, len(data))) If you try to write that code that way today (i.e. without the "new" on the first for loop), you'll end up with a race condition between the main thread changing the value of "site_url" and the executor issuing the URL open request. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
![](https://secure.gravatar.com/avatar/047f2332cde3730f1ed661eebb0c5686.jpg?s=120&d=mm&r=g)
On Sat, Jan 23, 2016 at 7:22 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
[...] For a practical example of this, consider the ThreadPoolExecutor example from the concurrent.futures docs: https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor...
A scope-per-iteration construct makes it much easier to use a closure to define the operation submitted to the executor for each URL:
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor: # Start the load operations and mark each future with its URL future_to_site = {} for new site_url in sites_to_load: def load_site(): with urllib.request.urlopen(site_url, timeout=60) as conn: return conn.read() future_to_site[executor.submit(load_site)] = site_url # Report results as they become available for future in concurrent.futures.as_completed(future_to_site): site_url = future_to_site[future] try: data = future.result() except Exception as exc: print('%r generated an exception: %s' % (site_url, exc)) else: print('%r page is %d bytes' % (site_url, len(data)))
If you try to write that code that way today (i.e. without the "new" on the first for loop), you'll end up with a race condition between the main thread changing the value of "site_url" and the executor issuing the URL open request.
I wonder if kids today aren't too much in love with local function definitions. :-) There's a reason why executor.submit() takes a function *and arguments*. If you move the function out of the for loop and pass the url as a parameter to submit(), problem solved, and you waste fewer resources on function objects and cells to hold nonlocals. A generation ago most people would have naturally used such a solution (since most languages didn't support the alternative :-). -- --Guido van Rossum (python.org/~guido)
![](https://secure.gravatar.com/avatar/f3ba3ecffd20251d73749afbfa636786.jpg?s=120&d=mm&r=g)
On 24 January 2016 at 15:16, Guido van Rossum <guido@python.org> wrote:
I wonder if kids today aren't too much in love with local function definitions. :-) There's a reason why executor.submit() takes a function *and arguments*. If you move the function out of the for loop and pass the url as a parameter to submit(), problem solved, and you waste fewer resources on function objects and cells to hold nonlocals.
Aye, that's how the current example code in the docs handles it - there's an up front definition of the page loading function, and then the submission to the executor is with a dict comprehension. The only thing "wrong" with it is that when reading the code, the potentially single-use function is introduced first without any context, and it's only later that you get to see what it's for.
A generation ago most people would have naturally used such a solution (since most languages didn't support the alternative :-).
In programming we would have, but I don't think the same is true when writing work instructions for other people to follow - for those, we're more likely to use nested bullets to describe subtasks, and only pull them out to a separate document or section if we need to reference the same subtask from multiple places. While my view is admittedly only based on intuition rather than hard data, it seems to me that when folks are reaching for nested functions, it's that "subtask as a nested bulleted list" idiom they're aiming to express, and Python is otherwise so accommodating of English structural idioms that it's jarring when it doesn't work properly. (I also suspect that's why it's a question we keep returning to - as a *programming language*, making closures play more nicely with iteration variables doesn't add any real power to Python, but as *executable pseudo-code*, it makes it a little bit easier to express certain ideas in the same way we'd describe them to another person). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
![](https://secure.gravatar.com/avatar/e8600d16ba667cc8d7f00ddc9f254340.jpg?s=120&d=mm&r=g)
On Sun, Jan 24, 2016, 04:55 Nick Coghlan <ncoghlan@gmail.com> wrote:
On 24 January 2016 at 15:16, Guido van Rossum <guido@python.org> wrote:
I wonder if kids today aren't too much in love with local function definitions. :-) There's a reason why executor.submit() takes a function *and arguments*. If you move the function out of the for loop and pass the url as a parameter to submit(), problem solved, and you waste fewer resources on function objects and cells to hold nonlocals.
Aye, that's how the current example code in the docs handles it - there's an up front definition of the page loading function, and then the submission to the executor is with a dict comprehension.
The only thing "wrong" with it is that when reading the code, the potentially single-use function is introduced first without any context, and it's only later that you get to see what it's for.
So the doics just need an added comment to help explain it. Want to file an issue for that?
A generation ago most people would have naturally used such a solution (since most languages didn't support the alternative :-).
In programming we would have, but I don't think the same is true when writing work instructions for other people to follow - for those, we're more likely to use nested bullets to describe subtasks, and only pull them out to a separate document or section if we need to reference the same subtask from multiple places.
While my view is admittedly only based on intuition rather than hard data, it seems to me that when folks are reaching for nested functions, it's that "subtask as a nested bulleted list" idiom they're aiming to express, and Python is otherwise so accommodating of English structural idioms that it's jarring when it doesn't work properly. (I also suspect that's why it's a question we keep returning to - as a *programming language*, making closures play more nicely with iteration variables doesn't add any real power to Python, but as *executable pseudo-code*, it makes it a little bit easier to express certain ideas in the same way we'd describe them to another person).
I personally like the semantics we currently have. I get why people bring this up, but I'm voting for the programming language side over the pseudo-code angle. -Brett
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
![](https://secure.gravatar.com/avatar/f3ba3ecffd20251d73749afbfa636786.jpg?s=120&d=mm&r=g)
On 25 January 2016 at 05:57, Brett Cannon <brett@python.org> wrote:
On Sun, Jan 24, 2016, 04:55 Nick Coghlan <ncoghlan@gmail.com> wrote:
On 24 January 2016 at 15:16, Guido van Rossum <guido@python.org> wrote:
I wonder if kids today aren't too much in love with local function definitions. :-) There's a reason why executor.submit() takes a function *and arguments*. If you move the function out of the for loop and pass the url as a parameter to submit(), problem solved, and you waste fewer resources on function objects and cells to hold nonlocals.
Aye, that's how the current example code in the docs handles it - there's an up front definition of the page loading function, and then the submission to the executor is with a dict comprehension.
The only thing "wrong" with it is that when reading the code, the potentially single-use function is introduced first without any context, and it's only later that you get to see what it's for.
So the doics just need an added comment to help explain it. Want to file an issue for that?
There's nothing to comment on given the Python semantics we have today - what's there is a sensible way to write that code, and the design FAQ covers why the inline closure approach wouldn't work. As noted, I suspect the only reason the topic keeps coming up is the niggling sense that the closure based approach "should" work, and the fact that it doesn't is a case where underlying technical details that we generally aim to let people gloss over make themselves apparent. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
![](https://secure.gravatar.com/avatar/d6b9415353e04ffa6de5a8f3aaea0553.jpg?s=120&d=mm&r=g)
On 1/24/2016 7:54 AM, Nick Coghlan wrote:
On 24 January 2016 at 15:16, Guido van Rossum <guido@python.org> wrote:
I wonder if kids today aren't too much in love with local function definitions. :-) There's a reason why executor.submit() takes a function *and arguments*. If you move the function out of the for loop
What I've concluded from this thread is that function definitions (with direct use 'def' or 'lambda') do not fit well within loops, though I used them there myself. When delayed function calls are are needed, what belongs within loops is packaging of a pre-defined function with one or more arguments within a callable. Instance.method is an elegant syntax for doing so. functools.partial(func, args, ...) is a much clumsier generalized expression, which requires an import. Note that 'partial' returns a function for delayed execution even when a complete, not partial, set of arguments is passed. A major attempted (and tempting) use for definitions within a loop is multiple callbacks for multiple gui widgets, where delayed execution is needed. The three answers to multiple 'why doesn't this work' on both python-list and Stackoverflow are multiple definitions with variant 'default args', a custom make_function function outside the loop called multiple times within the loop, and a direct function outside the loop called with partial within the loop. I am going to start using partial more. Making partial a builtin would make it easier to use and more attractive. Even more attractive would be syntax that abbreviates delayed calls with pre-bound arguments in the way that inst.meth abbreviates a much more complicated expression roughly equivalent to "bind(inst.__getattr__('meth'), inst)". A possibility would be to make {} a delayed and possibly partial call operator, in parallel to the current use of () as a immediate and total call operator. expr{arguments} would evaluate to a function, whether of type <function> or a special class similar to bound methods. The 'arguments' would be anything allowed within partial, which I believe is anything allowed in any function call. I chose {} because expr{...} is currently illegal, just as expr(arguments) is for anything other than a function call. On the other hand, expr[...] is currently legal, at least up to '[', as is expr<...> at least up to '<'.
and pass the url as a parameter to submit(), problem solved, and you waste fewer resources on function objects and cells to hold nonlocals.
executor.submit appears to me to be a specialized version of partial, with all arguments required. With the proposal above, I think submit(func{all args}) would work.
Aye, that's how the current example code in the docs handles it - there's an up front definition of the page loading function, and then the submission to the executor is with a dict comprehension.
I presume you both are referring to ThreadPoolExecutor Example. The load_url function, which I think should be 'get_page' has a comment that is wrong (it does not 'report the url') and no docstring. My suggestion: # Define an example function for the executor.submit call below. def get_page(url, timeout): "Return the page, as a string, retrieved from the url." with ...
The only thing "wrong" with it is that when reading the code, the potentially single-use function is introduced first without any context, and it's only later that you get to see what it's for.
A proper comment would fix this I think. That aside, if the main code were packaged within def main, as in the following ProcessPoolExecutor Example, so as to delay the lookup of 'load_url' or 'get_page', then the two functions definitions could be in *either* order. The general convention in Pythonland seems to be to put main last (bottom up, define everything before use), but in a recent python-list thread, at least one person, and I think two, said they like to start with def main (top down style, which you seem to like). I just checked and PEP8 seems to be silent on the placement of 'def main'. So unless Guido says otherwise, I would not mind if you revised one of the examples to start with def main, just to show that that is a legitimate alternative. It is a feature of Python that one can do this without having to add, before the first appearance of a function name within a function, a dummy 'forward declaration' giving the function signature.
A generation ago most people would have naturally used such a solution (since most languages didn't support the alternative :-).
In programming we would have, but I don't think the same is true when writing work instructions for other people to follow - for those, we're more likely to use nested bullets to describe subtasks, and only pull them out to a separate document or section if we need to reference the same subtask from multiple places.
People can and do jump around while reading code for understanding. They can do this without markers as explicit as needed for machines. Current compilers and interpreters initially read code linearly, with only one character or token lookahead. For Python, a def header is needed for forward reference, to delay name resolution to call time, after the whole file has been read.
While my view is admittedly only based on intuition rather than hard data, it seems to me that when folks are reaching for nested functions, it's that "subtask as a nested bulleted list" idiom they're aiming to express, and Python is otherwise so accommodating of English structural idioms that it's jarring when it doesn't work properly. (I also suspect that's why it's a question we keep returning to - as a *programming language*, making closures play more nicely with iteration variables doesn't add any real power to Python, but as *executable pseudo-code*, it makes it a little bit easier to express certain ideas in the same way we'd describe them to another person).
I thought about some explicit examples and it is not necessarily clear how to translate bullet points to code. But in general, I do not believe that instructions to another person are meant to induce in the mind of a listener multiple functions that only differ in a default argumnet object. In other words, I do not see for i in it: def f(i=i): pass as corresponding to natural language. Hence my initial statement above. -- Terry Jan Reedy
![](https://secure.gravatar.com/avatar/047f2332cde3730f1ed661eebb0c5686.jpg?s=120&d=mm&r=g)
On Sun, Jan 24, 2016 at 10:32 PM, Terry Reedy <tjreedy@udel.edu> wrote:
What I've concluded from this thread is that function definitions (with direct use 'def' or 'lambda') do not fit well within loops, though I used them there myself.
Right. When you can avoid them, you avoid extra work in an inner loop, which is often a good idea.
When delayed function calls are are needed, what belongs within loops is packaging of a pre-defined function with one or more arguments within a callable. Instance.method is an elegant syntax for doing so. functools.partial(func, args, ...) is a much clumsier generalized expression, which requires an import. Note that 'partial' returns a function for delayed execution even when a complete, not partial, set of arguments is passed.
Right. I've always hated partial() (which is why it's not a builtin) because usually a lambda is clearer (it's difficult to calculate in your head the signature of the thing it returns from the arguments passed), but this is one thing where partial() wins, since it captures values.
A major attempted (and tempting) use for definitions within a loop is multiple callbacks for multiple gui widgets, where delayed execution is needed. The three answers to multiple 'why doesn't this work' on both python-list and Stackoverflow are multiple definitions with variant 'default args', a custom make_function function outside the loop called multiple times within the loop, and a direct function outside the loop called with partial within the loop. I am going to start using partial more.
Yes, the make_function() approach is just a custom partial().
Making partial a builtin would make it easier to use and more attractive. Even more attractive would be syntax that abbreviates delayed calls with pre-bound arguments in the way that inst.meth abbreviates a much more complicated expression roughly equivalent to "bind(inst.__getattr__('meth'), inst)".
A recommended best practice / idiom is more useful, because it can be applied to all Python versions.
A possibility would be to make {} a delayed and possibly partial call operator, in parallel to the current use of () as a immediate and total call operator. expr{arguments} would evaluate to a function, whether of type <function> or a special class similar to bound methods. The 'arguments' would be anything allowed within partial, which I believe is anything allowed in any function call. I chose {} because expr{...} is currently illegal, just as expr(arguments) is for anything other than a function call. On the other hand, expr[...] is currently legal, at least up to '[', as is expr<...> at least up to '<'.
-1 on expr{...}.
and pass the url as a parameter to submit(), problem solved, and you waste fewer resources on function objects and cells to hold nonlocals.
executor.submit appears to me to be a specialized version of partial, with all arguments required. With the proposal above, I think submit(func{all args}) would work.
But not before 3.6.
Aye, that's how the current example code in the docs handles it - there's an up front definition of the page loading function, and then the submission to the executor is with a dict comprehension.
I presume you both are referring to ThreadPoolExecutor Example. The load_url function, which I think should be 'get_page' has a comment that is wrong (it does not 'report the url') and no docstring. My suggestion:
# Define an example function for the executor.submit call below. def get_page(url, timeout): "Return the page, as a string, retrieved from the url." with ...
The only thing "wrong" with it is that when reading the code, the potentially single-use function is introduced first without any context, and it's only later that you get to see what it's for.
A proper comment would fix this I think. That aside, if the main code were packaged within def main, as in the following ProcessPoolExecutor Example, so as to delay the lookup of 'load_url' or 'get_page', then the two functions definitions could be in *either* order. The general convention in Pythonland seems to be to put main last (bottom up, define everything before use), but in a recent python-list thread, at least one person, and I think two, said they like to start with def main (top down style, which you seem to like).
I like both. :-) -- --Guido van Rossum (python.org/~guido)
![](https://secure.gravatar.com/avatar/d6b9415353e04ffa6de5a8f3aaea0553.jpg?s=120&d=mm&r=g)
On 1/25/2016 1:52 PM, Guido van Rossum wrote:
On Sun, Jan 24, 2016 at 10:32 PM, Terry Reedy <tjreedy@udel.edu> wrote:
What I've concluded from this thread is that function definitions (with direct use 'def' or 'lambda') do not fit well within loops, though I used them there myself.
Right. When you can avoid them, you avoid extra work in an inner loop, which is often a good idea.
When delayed function calls are are needed, what belongs within loops is packaging of a pre-defined function with one or more arguments within a callable. Instance.method is an elegant syntax for doing so. functools.partial(func, args, ...) is a much clumsier generalized expression, which requires an import. Note that 'partial' returns a function for delayed execution even when a complete, not partial, set of arguments is passed.
Right. I've always hated partial() (which is why it's not a builtin) because usually a lambda is clearer (it's difficult to calculate in your head the signature of the thing it returns from the arguments passed), but this is one thing where partial() wins, since it captures values.
I agree that the difficulty of immediately grokking the signature partials that binds arbitrary parameters is a downside to their use. Fake parameters set to a constant necessarily go at the end of the parameter list. The actual signature is the list with those parameters chopped off and ignored. To eliminate the possibility of accidentally supplying a different value positionally, such parameters could (and I think should) be made keyword-only. def f(a, b='default', *, int=int): pass Bound methods necessarily bind the first parameter, often called 'self'. This again makes the actual signature easy to determine.
A major attempted (and tempting) use for definitions within a loop is multiple callbacks for multiple gui widgets, where delayed execution is needed. The three answers to multiple 'why doesn't this work' on both python-list and Stackoverflow are multiple definitions with variant 'default args', a custom make_function function outside the loop called multiple times within the loop, and a direct function outside the loop called with partial within the loop. I am going to start using partial more.
Since writing this, I realized that defining a custom class and using bound methods is a fourth option, which I also like. This binds the differentiating data to an instance, which is then bound to the function, rather than to the function directly. A toy example: ---- import tkinter as tk root = tk.Tk() class Card(tk.Button): hide = 'XXX' def __init__(self, txt): tk.Button.__init__(self, root, text=self.hide) # or # super().__init__(root, text=self.hide) self.txt = txt self.exposed = False def flip(self): self['text'] = self.hide if self.exposed else self.txt self.exposed = not self.exposed for i, txt in enumerate(('one', 'two')): card = Card(txt) card['command'] = card.flip card.grid(row=0, column=i) #root.mainloop() # uncomment if run on command line without -i ---- The main problem with this is that some beginners are trying to write (or being told to write) tkinter guis before they learn about class statements. The super() form is easier to write, but its use is even more 'advanced'. -- Terry Jan Reedy
![](https://secure.gravatar.com/avatar/89b67ecc87148f077c349f4fb6f705f6.jpg?s=120&d=mm&r=g)
On Feb 16, 2016, at 4:07 PM, Terry Reedy <tjreedy@udel.edu> wrote:
Since writing this, I realized that defining a custom class and using bound methods is a fourth option, which I also like. This binds the differentiating data to an instance, which is then bound to the function, rather than to the function directly. A toy example:
Does this deserve a link to the closures == classes koan? I think so :-) https://people.csail.mit.edu/gregs/ll1-discuss-archive-html/msg03277.html Anywhere someone suggests a closure, a class can probably do the work. And vice-versa, though one will often be more elegant than the other for a particular circumstance.
![](https://secure.gravatar.com/avatar/7e41acaa8f6a0e0f5a7c645e93add55a.jpg?s=120&d=mm&r=g)
On Jan 24, 2016, at 22:32, Terry Reedy <tjreedy@udel.edu> wrote:
A possibility would be to make {} a delayed and possibly partial call operator, in parallel to the current use of () as a immediate and total call operator. expr{arguments} would evaluate to a function, whether of type <function> or a special class similar to bound methods. The 'arguments' would be anything allowed within partial, which I believe is anything allowed in any function call. I chose {} because expr{...} is currently illegal, just as expr(arguments) is for anything other than a function call. On the other hand, expr[...] is currently legal, at least up to '[', as is expr<...> at least up to '<'.
I like the idea of "easy" partials, but I don't like this syntax. Many languages (Scala, C++ with boost::lambda, etc.) use a syntax something like this: hex = int(_, 16) binopen = open(_, "rb", *_, **_) setspam = setattr(spam, attr, _) The equivalent functions are: lambda x: int(x, 16) lambda arg, *args, **kw: open(arg, "rb", *args, **kw) lambda arg, *, _spam=spam, _attr=attr: setattr(_spam, _attr, arg) You can extend this to allow reordering arguments, similarly to the way %-formatting handles reordering: modexp = pow(_3, _1, _2) Obviously '_' only works if that's not a valid identifier (or if you're implementing things with horrible template metaprogramming tricks and argument-dependent lookup rather than in the language), but some other symbol like ':', '%', or '$' might work. I won't get into the ways you can extend this to expressions other than calls, like 2*_ or just (2*). The first problem with this syntax is that it doesn't give you a way to specify _all_ of the arguments and return a nullary partial. But you can always work around that with dummy params with default values. And it really doesn't come up that often in practice anyway in languages with this syntax, except in the special case that Python already handles with bound methods. The other big problem is that it just doesn't look like Python, no matter how much you squint. But going only half-way there, via an extended functools.partial that's more like boost bind than boost lambda isn't nearly as bad: hex = partial(int, _, 16) binopen = partial(open, _, "rb", *_, **_) setspam = partial(setattr, spam, attr, _) Only the last one can be built with partial today, and even that one seems a lot more comprehensible with the explicit ', _' showing that the resulting function takes one argument, and you can see exactly where that argument will go, than with the current implicit version. At any rate, I'm not sure I like either of these, but I definitely like them both better than: setspam = setattr{spam, attr}
![](https://secure.gravatar.com/avatar/130fe9f08ce5d2b1716d32438a58c867.jpg?s=120&d=mm&r=g)
On 24.01.2016 06:16, Guido van Rossum wrote:
I wonder if kids today aren't too much in love with local function definitions. :-) There's a reason why executor.submit() takes a function *and arguments*. If you move the function out of the for loop and pass the url as a parameter to submit(), problem solved, and you waste fewer resources on function objects and cells to hold nonlocals. A generation ago most people would have naturally used such a solution (since most languages didn't support the alternative :-).
Well said. I remember js be a hatchery of this kind of programming. My main concern always was "how can I test these inner functions?" Almost impossible but a good excuse not to. So, it's unprofessional from my point of view but things may change. On-topic: I like the way Python allows me to bind early. It's simple and that's the main argument for it and against introducing an yet-another syntax (like colons, brakes, etc.); especially for solving such a side issue. Best, Sven
![](https://secure.gravatar.com/avatar/72ee673975357d43d79069ac1cd6abda.jpg?s=120&d=mm&r=g)
Nick Coghlan wrote:
Capturing additional values on each iteration would be possible with a generator expression:
for new i, a, b, c in (i, a, b, c for i range(10)): def f(x): return x**i, a, b, c
I'm not sure I see the point of this. If you're needing to capture a, b and c from an outer scope, presumably it's because there's some outer loop that's changing them -- in which case you can just make *that* loop a "new" loop as well. BTW, should there be a "while new" loop too? -- Greg
![](https://secure.gravatar.com/avatar/7e41acaa8f6a0e0f5a7c645e93add55a.jpg?s=120&d=mm&r=g)
On Jan 23, 2016, at 19:22, Nick Coghlan <ncoghlan@gmail.com> wrote:
Accordingly, a statement like:
powers = [] for new i in range(10): def f(x): return x**i powers.append(f)
Could be semantically equivalent to:
powers = [] for i in range(10): def _for_loop_suite(i=i): def f(x): return x**i powers.append(f) _for_loop_suite() del _for_loop_suite
A simpler translation of the Swift/C#/etc. behavior might be:
powers = [] for i in range(10): def _for_loop_suite(i): def f(x): return x**i powers.append(f) _for_loop_suite(i) del _for_loop_suite
This is, after all, how comprehensions work, and how you mechanically translate let bindings from other languages to Python (I believe MacroPy even has a let macro that does exactly this); it's slightly simpler to understand under the hood; it's even slightly more efficient (not that it will ever matter). Of course that raises an important point: when you're _not_ mechanically translating, you rarely translate a let this way; instead, you translate it by rewriting the code at a higher level. (And the fact that this translation _is_ idiomatic in JavaScript is exactly why JS code is ugly in the way that Guido and others decry in this thread.) Do we want the compiler doing something under the hood that we wouldn't want to write ourselves? (Again, people in JS, and other languages like C#, don't consider that a problem--both languages define async as effectively a macro that transforms your code into something you wouldn't want to look at, and those kinds of macros are almost the whole point of Lisp, but I think part of why people like Python is that the semantics of most sugar can be described in terms that are just as readable as the sugared version, except for being longer.) That's why I think I prefer not-Terry's (sorry for the misattribution) version: if something is going to act differently from the usual semantics, maybe it's better to describe it honestly as a new rule you have to learn, than to describe it as a translation to code that has familiar semantics but is nowhere near idiomatic.
![](https://secure.gravatar.com/avatar/334b870d5b26878a79b2dc4cfcc500bc.jpg?s=120&d=mm&r=g)
Andrew Barnert via Python-ideas writes:
powers = [lambda x: x**i for i in range(10)]
This gives you ten functions that all return x**9, which is probably not what you wanted.
The reason this is a problem is that Python uses "late binding", which in this context means that each of those functions is a closure that captures the variable i in a way that looks up the value of i at call time. All ten functions capture the same variable, and when you later call them, that variable's value is 9.
But this explanation going to confuse people who understand the concept of variable in Python to mean names that are bound and re-bound to objects. The comprehension's binding of i disappears before any element of powers can be called. So from their point of view, either that expression is an error, or powers[i] closes over a new binding of the name "i", specific to "the lambda's scope" (see below), to the current value of i in the comprehension. Of course the same phenomenon is observable with other scopes. In particular global scope behaves this way, as importing this file i = 0 def f(x): return x + i i = 1 and calling f(0) will demonstrate. But changing the value of a global, used the way i is here, within a library module is a rather unusual thing to do; I doubt people will observe it. Also, once again the semantics of lambda (specifically, that unlike def it doesn't create a scope) seem to be a source of confusion more than anything else. Maybe it's possible to exhibit the same issue with def, but the def equivalent to the above lambda >>> def make_increment(i): ... def _(x): ... return x + i ... return _ ... >>> funcs = [make_increment(j) for j in range(3)] >>> [f(0) for f in funcs] [0, 1, 2] closes over i in the expected way. (Of course in practicality, it's way more verbose, and in purity, it's not truly equivalent since there's at least one extra nesting of scope involved.) While >>> def make_increment(): ... def _(x): ... return x + i ... return _ ... >>> funcs = [make_increment() for i in range(3)] >>> [f(0) for f in funcs] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in <listcomp> File "<stdin>", line 3, in _ NameError: name 'i' is not defined >>> i = 6 >>> [f(0) for f in funcs] [6, 6, 6] doesn't make closures at all, but rather retains the global binding.
![](https://secure.gravatar.com/avatar/047f2332cde3730f1ed661eebb0c5686.jpg?s=120&d=mm&r=g)
On Sat, Jan 23, 2016 at 3:53 AM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Andrew Barnert via Python-ideas writes:
powers = [lambda x: x**i for i in range(10)]
This gives you ten functions that all return x**9, which is probably not what you wanted.
The reason this is a problem is that Python uses "late binding", which in this context means that each of those functions is a closure that captures the variable i in a way that looks up the value of i at call time. All ten functions capture the same variable, and when you later call them, that variable's value is 9.
Actually it doesn't look up the value at call time, but each time it's used. This technicality matters if in between uses you call something that has write access to the same variable (typically using nonlocal) and modifies it.
But this explanation going to confuse people who understand the concept of variable in Python to mean names that are bound and re-bound to objects. The comprehension's binding of i disappears before any element of powers can be called. So from their point of view, either that expression is an error, or powers[i] closes over a new binding of the name "i", specific to "the lambda's scope" (see below), to the current value of i in the comprehension.
But this seems to refer to a very specific definition of "binding" that doesn't have root in Python's semantic model. I suppose it may come from Lisp (which didn't influence Python quite as much as people think :-). So I think what you're saying here comes down that it will confuse people who misunderstand Python's variables. Given that the misunderstanding you're supposing here is pretty specific (it's not just due to people who've never thought much about variables) I'm not sure I care much.
Of course the same phenomenon is observable with other scopes. In particular global scope behaves this way, as importing this file
i = 0 def f(x): return x + i i = 1
and calling f(0) will demonstrate. But changing the value of a global, used the way i is here, within a library module is a rather unusual thing to do; I doubt people will observe it.
I disagree again: in interactive mode most of what you do is global and you will see this quite often. And all scopes in Python behave the same way.
Also, once again the semantics of lambda (specifically, that unlike def it doesn't create a scope)
Uh, what? I can sort of guess what you are referring to here (namely, that no syntactic construct permissible in a lambda can assign to a local variable -- or any variable, for that matter) but it certainly has a scope (to hold the arguments, which are just variables, as one quickly learns from experimenting with the arguments to a function defined using def).
seem to be a source of confusion more than anything else. Maybe it's possible to exhibit the same issue with def, but the def equivalent to the above lambda
>>> def make_increment(i): ... def _(x): ... return x + i ... return _ ... >>> funcs = [make_increment(j) for j in range(3)] >>> [f(0) for f in funcs] [0, 1, 2]
closes over i in the expected way. (Of course in practicality, it's way more verbose, and in purity, it's not truly equivalent since there's at least one extra nesting of scope involved.)
It's such a strawman that I'm surprised you bring it up. Who would even *think* of using that idiom as equivalent to the simple lambda? If I were to deconstruct the original statement, I would start by replacing the list comprehension with a plain old for loop. That would also not be truly equivalent because the comprehension introduces a scope while the for loop doesn't, but the difference only matters if it stomps on another variable -- the semantics relative to the lambda are exactly the same. In particular, this example exhibits the same phenomenon without using a comprehension: powers = [] for i in range(10): powers.append(lambda x: x**i) This in turn can be rewritten without changing the semantics related to scopes using a def that's equivalent (truly equivalent except for its __name__ attribute!): powers = [] for i in range(10): def f(x): return x**i powers.append(f) (Note that the leakage of f here is irrelevant to the problem.) This has the same problem, without being distracted by lambda or comprehensions, and we can now explore its semantics through experimentation. We could even unroll the for loop and get the same issue: powers = [] i = 0 def f(x): return x**i powers.append(f) i = 1 def f(x): return x**i powers.append(f) # Etc.
While
>>> def make_increment(): ... def _(x): ... return x + i ... return _ ... >>> funcs = [make_increment() for i in range(3)] >>> [f(0) for f in funcs] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in <listcomp> File "<stdin>", line 3, in _ NameError: name 'i' is not defined >>> i = 6 >>> [f(0) for f in funcs] [6, 6, 6]
doesn't make closures at all, but rather retains the global binding.
Totally different idiom again -- another strawman. -- --Guido van Rossum (python.org/~guido)
![](https://secure.gravatar.com/avatar/334b870d5b26878a79b2dc4cfcc500bc.jpg?s=120&d=mm&r=g)
Guido, Thank you for taking the trouble to address my rather confused post. Guido van Rossum writes:
If I were to deconstruct the original statement, I would start by replacing the list comprehension with a plain old for loop.
I did that. But that actually doesn't bother me because the loop index's identifier doesn't go out of scope. I now see why that's a red herring, but maybe documentation can be improved. Anyway, I wrote that post before seeing your explanation that things just aren't that difficult, they all follow from "variable reference as dictionary lookup". The clue I needed was the way to view a scope as an object, and then realize that all free variable references are the same, except for visibility of the relevant scope to the other code at the call site. For me it's now a documentation issue (I know why the comprehension of lambdas work as they do, and I also know how to get the "expected", more useful result). I'll go take a look at the language reference, and tutorial, and see if I think they can be improved.
![](https://secure.gravatar.com/avatar/047f2332cde3730f1ed661eebb0c5686.jpg?s=120&d=mm&r=g)
On Sat, Jan 23, 2016 at 10:27 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Guido,
Thank you for taking the trouble to address my rather confused post.
You're welcome. And thanks for taking it as constructive criticism.
Guido van Rossum writes:
If I were to deconstruct the original statement, I would start by replacing the list comprehension with a plain old for loop.
I did that. But that actually doesn't bother me because the loop index's identifier doesn't go out of scope. I now see why that's a red herring, but maybe documentation can be improved.
Anyway, I wrote that post before seeing your explanation that things just aren't that difficult, they all follow from "variable reference as dictionary lookup". The clue I needed was the way to view a scope as an object, and then realize that all free variable references are the same, except for visibility of the relevant scope to the other code at the call site.
For me it's now a documentation issue (I know why the comprehension of lambdas work as they do, and I also know how to get the "expected", more useful result). I'll go take a look at the language reference, and tutorial, and see if I think they can be improved.
I expect that the tutorial just needs some touch-up or an extra section on these issues. But the language reference... Well, it's a mess, it is often confusing and not all that exact. I should take a year off to rewrite it from scratch (what a book that would be!), but I don't have the kind of discipline to finish long writing projects. :-( -- --Guido van Rossum (python.org/~guido)
![](https://secure.gravatar.com/avatar/e8600d16ba667cc8d7f00ddc9f254340.jpg?s=120&d=mm&r=g)
On Sat, Jan 23, 2016, 22:55 Guido van Rossum <guido@python.org> wrote:
On Sat, Jan 23, 2016 at 10:27 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Guido,
Thank you for taking the trouble to address my rather confused post.
You're welcome. And thanks for taking it as constructive criticism.
Guido van Rossum writes:
If I were to deconstruct the original statement, I would start by replacing the list comprehension with a plain old for loop.
I did that. But that actually doesn't bother me because the loop index's identifier doesn't go out of scope. I now see why that's a red herring, but maybe documentation can be improved.
Anyway, I wrote that post before seeing your explanation that things just aren't that difficult, they all follow from "variable reference as dictionary lookup". The clue I needed was the way to view a scope as an object, and then realize that all free variable references are the same, except for visibility of the relevant scope to the other code at the call site.
For me it's now a documentation issue (I know why the comprehension of lambdas work as they do, and I also know how to get the "expected", more useful result). I'll go take a look at the language reference, and tutorial, and see if I think they can be improved.
I expect that the tutorial just needs some touch-up or an extra section on these issues. But the language reference... Well, it's a mess, it is often confusing and not all that exact. I should take a year off to rewrite it from scratch (what a book that would be!), but I don't have the kind of discipline to finish long writing projects. :-(
Would doing something like the Ruby community where we write a spec using a BDD-style so it's more a set of tests than verbiage be easier? -Brett
-- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
![](https://secure.gravatar.com/avatar/047f2332cde3730f1ed661eebb0c5686.jpg?s=120&d=mm&r=g)
On Sun, Jan 24, 2016 at 12:17 PM, Brett Cannon <brett@python.org> wrote:
Would doing something like the Ruby community where we write a spec using a BDD-style so it's more a set of tests than verbiage be easier?
I haven't seen that, bu tif it's anything like the typical way of writing unit tests in Ruby, please no. -- --Guido van Rossum (python.org/~guido)
![](https://secure.gravatar.com/avatar/92136170d43d61a5eeb6ea8784294aa2.jpg?s=120&d=mm&r=g)
On Sat, Jan 23, 2016 at 8:54 AM, Guido van Rossum <guido@python.org> wrote:
Also, once again the semantics of lambda (specifically, that unlike
def it doesn't create a scope)
Uh, what? I can sort of guess what you are referring to here (namely, that no syntactic construct permissible in a lambda can assign to a local variable -- or any variable, for that matter).
That's not even quite true, you can assign to global variables in a lambda:
myglobal = 1 f = lambda: globals().__setitem__('myglobal', 2) or 42 f() myglobal 2
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
![](https://secure.gravatar.com/avatar/5615a372d9866f203a22b2c437527bbb.jpg?s=120&d=mm&r=g)
Excellent summary, thank you, but I want to take exception to something you wrote. I fear that you have inadvertently derailed the thread into a considerably narrower focus than it should have. On Fri, Jan 22, 2016 at 08:50:52PM -0800, Andrew Barnert wrote:
What the thread is ultimately looking for is a solution to the "closures capturing loop variables" problem. This problem has been in the official programming FAQ[1] for decades, as "Why do lambdas defined in a loop with different values all return the same result"?
The issue is not loop variables, or rather, it's not *only* loop variables, and so any solution which focuses on fixing loop variables is only half a solution. If we look back at Haael's original post, his example captures *three* variables, not one, and there is no suggestion that they are necessarily loop variables. It's nice that since we have lambda and list comps we can occasionally write closures in a one-liner loop like so:
powers = [lambda x: x**i for i in range(10)]
This gives you ten functions that all return x**9, which is probably not what you wanted.
but in my option, that's really a toy example suitable only for demonstrating the nature of the issue and the difference between early and late binding. Outside of such toys, we often find ourselves closing over at least one variable which is derived from the loop variable, but not the loop variable itself: # Still a toy, but perhaps a bit more of a realistic toy. searchers = [] for provider in search_provider: key = API_KEYS[provider] url = SEARCH_URLS[provider] def lookup(*terms): terms = "/q=" + "+".join(escape(t) for t in terms) u = url + ("key=%s" % key) + terms return fetch(u) or [] searchers.append(lookup)
The OP proposed that we should add some syntax, borrowed from C++, to function definitions that specifies that some things get captured by value. [...]
Regardless of the syntax chosen, this has a few things to recommend it: - It's completely explicit. If you want a value captured, you have to say so explicitly, otherwise you will get the normal variable lookup behaviour that Python uses now. - It's general. We can capture locals, nonlocals, globals or builtins, not just loop variables. - It allows us to avoid the "default argument" idiom, in cases where we really don't want the argument, we just want to capture the value. There are a lot of functions which have their parameter list polluted by extraneous arguments that should never be used by the caller simply because that's the only way to get early binding/value capturing.
Finally, Terry suggested a completely different solution to the problem: don't change closures; change for loops. Make them create a new variable each time through the loop, instead of reusing the same variable. When the variable isn't captured, this would make no difference, but when it is, closures from different iterations would capture different variables (and therefore different cells).
It was actually Greg, not Terry. I strongly dislike this suggestion (sorry Greg), and I am concerned that the thread seems to have been derailed into treating loop variables as special enough to break the rules. It does nothing to solve the general problem of capturing values. It doesn't work for my "searchers" example above, or even the toy example here: funcs = [] for i in range(10): n = i**2 funcs.append(lambda x: x + n) This example can be easily re-written to close over the loop variable directly, that's not the point. The point is that we frequently need to capture more than just the loop variable. Coming up with a solution that only solves the issue for loop variables isn't enough, and it is a mistake to think that this is about "closures capturing loop variables". I won't speak for other languages, but in Python, where loops don't introduce a new scope, "closures capturing loop variables" shouldn't even be seen as a seperate problem from the more general issue of capturing values early rather than late. It's just a common, easily stumbled across, manifestation of the same.
For backward-compatibility reasons, this might have to be optional, which means new syntax; he proposed "for new i in range(10):".
I would not like to see "new" become a keyword. I have a lot of code using new (and old) as a variable. -- Steve
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Tue, Jan 26, 2016 at 10:21 AM, Steven D'Aprano <steve@pearwood.info> wrote:
- It allows us to avoid the "default argument" idiom, in cases where we really don't want the argument, we just want to capture the value. There are a lot of functions which have their parameter list polluted by extraneous arguments that should never be used by the caller simply because that's the only way to get early binding/value capturing.
Can you actually name a few, please? I went digging earlier, and couldn't find any really good examples in the stdlib - they're mostly internal functions (underscore-prefixed) that shouldn't be being called from outside their own module anyway. Maybe this isn't as common an issue as I'd thought. ChrisA
![](https://secure.gravatar.com/avatar/5615a372d9866f203a22b2c437527bbb.jpg?s=120&d=mm&r=g)
On Tue, Jan 26, 2016 at 10:52:59AM +1100, Chris Angelico wrote:
On Tue, Jan 26, 2016 at 10:21 AM, Steven D'Aprano <steve@pearwood.info> wrote:
- It allows us to avoid the "default argument" idiom, in cases where we really don't want the argument, we just want to capture the value. There are a lot of functions which have their parameter list polluted by extraneous arguments that should never be used by the caller simply because that's the only way to get early binding/value capturing.
Can you actually name a few, please?
The random module is the first example that comes to mind. Up until 3.3, the last argument was spelled "int" with no underscore: py> inspect.signature(random.randrange) <Signature (start, stop=None, step=1, _int=<class 'int'>)> random.shuffle also used to have an int=int argument, but it seems to be gone in 3.5.
I went digging earlier, and couldn't find any really good examples in the stdlib - they're mostly internal functions (underscore-prefixed) that shouldn't be being called from outside their own module anyway. Maybe this isn't as common an issue as I'd thought.
Obviously you can get away with more junk in a private function than a public function, but it's still unpleasant. Even if it only effects the maintainer of the library, not the users of it, a polluted signature is still polluted. -- Steve
![](https://secure.gravatar.com/avatar/7e41acaa8f6a0e0f5a7c645e93add55a.jpg?s=120&d=mm&r=g)
On Jan 25, 2016, at 15:21, Steven D'Aprano <steve@pearwood.info> wrote:
Excellent summary, thank you, but I want to take exception to something you wrote. I fear that you have inadvertently derailed the thread into a considerably narrower focus than it should have.
On Fri, Jan 22, 2016 at 08:50:52PM -0800, Andrew Barnert wrote:
What the thread is ultimately looking for is a solution to the "closures capturing loop variables" problem. This problem has been in the official programming FAQ[1] for decades, as "Why do lambdas defined in a loop with different values all return the same result"?
The issue is not loop variables, or rather, it's not *only* loop variables, and so any solution which focuses on fixing loop variables is only half a solution.
I think it really _is_ only loop variables--or at least 95% loop variables.
... Outside of such toys, we often find ourselves closing over at least one variable which is derived from the loop variable, but not the loop variable itself:
But, depending on how you write that, either (a) it already works the way you'd naively expect, or (b) the only reason you'd expect it to work is if you don't understand Python scoping (that is, you think every block is a scope). That's different from the case with loop variables: even people who know Python scoping still regularly make the mistake with loop variables, swear at themselves, and write the default-value trick on the first debug pass. (Novices, of course, swear at themselves, try 28 random changes, then post their code on StackOverflow titled "Why Python closures does suck this way?") It's the loop variable problem that's in the FAQ. And it does in fact come up all the time in some kinds of programs, like Tkinter code that wants to create callbacks for each of 10 buttons. And again, looking at other languages, it's the loop variable problem that's in their FAQs, and the new-variable-per-instance solution would work across most of them, and is actually used in some. Again, I definitely acknowledge that Python's non-granular scopes make the issue much less clear-cut than in those languages where "key = API_KEYS[provider]" would actually work. That's why I said that if there's one mainstream language that _shouldn't_ use my solution, it's Python. And, ultimately, I'm still -0 about any change--the default-value solution has worked for decades, everyone who uses Python understands it, and there's no serious problem with it. But I think "capture by value" or "capture early" would, outside the loop-variable case, be more often an attractive nuisance for code you shouldn't be writing than a help for code you should. If you think we _should_ solve the problem with "loop-body-local" variables, that would definitely be an argument for Nick's "define and call a function" implementation over the new-cell implementation, because his version does actually define a new scope, and can easily be written to make those variables actually loop-body-local. However, I think that, if we wanted that, it would be better to have a more general solution--maybe a "scope" statement that defines a new scope for its suite, or even a "let" statement that defines a new variable only until the end of the current suite. Or, of course, we could toss this on the large pile of "problems that would be solved by light-weight multi-line lambda" (and I think it doesn't add nearly enough weight to that pile to make the problem worth solving, either).
The OP proposed that we should add some syntax, borrowed from C++, to function definitions that specifies that some things get captured by value. [...]
Regardless of the syntax chosen, this has a few things to recommend it:
- It's completely explicit. If you want a value captured, you have to say so explicitly, otherwise you will get the normal variable lookup behaviour that Python uses now.
Surely "for new i" is just as explicit about the fact that the variable is "special" as "def f(x; i):" or "sharedlocal i"? The difference is only _where_ it's marked, not _whether_ it's marked.
- It's general. We can capture locals, nonlocals, globals or builtins, not just loop variables.
Sure, but it may be an overly-general solution to a very specific problem. If not, then great, but... Do you really have code that would be clearer if you could capture a global variable by value? (Of course there's code that does that as an optimization--but that's not to make the code clearer; it's to make the code slightly faster despite being less clear.)
- It allows us to avoid the "default argument" idiom, in cases where we really don't want the argument, we just want to capture the value. There are a lot of functions which have their parameter list polluted by extraneous arguments that should never be used by the caller simply because that's the only way to get early binding/value capturing.
It's not the _only_ way. When you really want a new scope, you can always define and call a local function. Or, usually better, refactor things so you're calling a global function, or using an object, or some other solution. The default-value idiom is just the most _concise_ way. Meanwhile, have you ever actually had a bug where someone passed an override for the i=i or len=len parameter? I suspect if people really were worried about that, they would use "*, _spam=spam", but they never do. (The only place I've seen anything like that is in generated code--e.g., a currying macro.)
For backward-compatibility reasons, this might have to be optional, which means new syntax; he proposed "for new i in range(10):".
I would not like to see "new" become a keyword. I have a lot of code using new (and old) as a variable.
I've even got some 2.5 code that runs in 3.3+ thanks to modernize, but still uses the "new" module. :) Of course it could become a context-sensitive keyword, like async. But yeah, that seems more like a last-resort idea than something to emulate wherever possible...
![](https://secure.gravatar.com/avatar/89b67ecc87148f077c349f4fb6f705f6.jpg?s=120&d=mm&r=g)
On Mon, Jan 25, 2016 at 5:22 PM Steven D'Aprano <steve@pearwood.info> wrote:
# Still a toy, but perhaps a bit more of a realistic toy. searchers = [] for provider in search_provider: key = API_KEYS[provider] url = SEARCH_URLS[provider] def lookup(*terms): terms = "/q=" + "+".join(escape(t) for t in terms) u = url + ("key=%s" % key) + terms return fetch(u) or [] searchers.append(lookup)
I'd define the basic function outside the loop. def lookup(root_url, api_key, *terms): args = root_url, api_key, "+".join(escape(t) for t in terms) url = '%s?key=%s&q=%s' % args return fetch(url) or [] Then use ``functools.partial`` inside the loop to create the closure. searchers = [] for provider in search_provider: key = API_KEYS[provider] url = SEARCH_URLS[provider] searchers.append(partial(lookup, url, key)) Or even more concisely, you could use a comprehension at that point. searchers = [partial(lookup, SEARCH_URLS[p], API_KEYS[p]) for p in search_provider]
![](https://secure.gravatar.com/avatar/72ee673975357d43d79069ac1cd6abda.jpg?s=120&d=mm&r=g)
Steven D'Aprano wrote:
Outside of such toys, we often find ourselves closing over at least one variable which is derived from the loop variable, but not the loop variable itself:
Terry's idea of a variant of the for-loop whose body is a nested scope (with everything that implies) would address that, because any name assigned within the body (and not declared nonlocal) would be part of the captured scope.
I would not like to see "new" become a keyword.
I'm open to alternatives. Would "foreach" be better keyword material? We could say foreach i in things: ... although the difference between "for" and "foreach" would be far from obvious. I'd like to do something with "let", which is famliar from other languages as a binding-creation construct, and it doesn't seem a likely choice for a variable namne. Maybe if we had a general statement for introducing a new scope, independent of looping: let: ... Then loops other than for-loops could be treated like this: i = 0 while i < n: let: x = things[i] funcs.append(lambda: process(x)) The for-loop is a special case, because it assigns a variable in a place where we can't capture it in a let-block. So we introduce a variant: for let x in things: funcs.append(lambda: process(x)) Refinements" 1) Other special cases could be provided, but I don't think any other special cases are strictly needed. For example, you might want: with open(filename) as let f: process(f) but that could be written as with open(filename) as f: let: g = f process(g) 2) It may be desirable to allow assignments on the same line as "let", e.g. with open(filename) as f: let g = f: process(g) which seems marginally more readable. Also, the RHS of the assignment would be evaluated outside the scope being created, allowing with open(filename) as f: let f = f: process(f) although I'm not sure that's a style that should be encouraged. Code that apparently assigns something to itself always looks a bit wanky to me. :-( -- Greg
![](https://secure.gravatar.com/avatar/7e41acaa8f6a0e0f5a7c645e93add55a.jpg?s=120&d=mm&r=g)
On Jan 26, 2016, at 15:59, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
I'd like to do something with "let", which is famliar from other languages as a binding-creation construct, and it doesn't seem a likely choice for a variable namne.
Maybe if we had a general statement for introducing a new scope, independent of looping:
let: ...
A few years ago, I played with using an import hook to add let statements to Python (by AST-translating them to a function definition and call). It's a neat idea, but I couldn't find any actual uses that made my code more readable. Or, rather, I found a small a handful, but every time it was actually far _more_ readable to just refactor the let body out into a separate (non-nested) function or method. I don't know if this would be true more universally than for my code. But I think it's worth trying to come up with non-toy examples of where you'd actually use this. Put another way: flat is better than nested. When you actually need a closure, you have to go nested--but most of the time, you don't. And if you go flat most of the time, the few cases where you go nested now signal that something is special (you actually need a closure). So, unless there really are common cases where you need a closure over some variables, but early binding/value capture/whatever for others, I think this may harm readability more than it helps.
The for-loop is a special case, because it assigns a variable in a place where we can't capture it in a let-block. So we introduce a variant:
for let x in things: funcs.append(lambda: process(x))
This reads weird to me. I think it's because I've been spending too much time in Swift, but I also think Swift may have gotten things right here, so that's not totally irrelevant. In Swift, almost anywhere you want to create a new binding--whether normal declaration statements, the equivalent of C99 "if (ch = getch())", or even pattern matching--you have to use the "let" keyword. But "for" statements are the one place you _don't_ use "let", because they _always_ create a new binding for the loop variable. As I've mentioned before, both C# and Ruby made breaking changes from the Python behavior to the Swift behavior, because they couldn't find any legitimate code that would be broken by that change. And there have been few if any complaints since. If we really are considering adding something like "for let", we should seriously consider whether anyone would ever have a good reason to use "for" instead of "for let". If not, just change "for" instead.
2) It may be desirable to allow assignments on the same line as "let", e.g.
with open(filename) as f: let g = f: process(g)
which seems marginally more readable.
It's also probably a lot more familiar to people who are used to let from functional languages. And I don't _think_ it's a misleading/false-cognate kind of familiarity, although I'm not positive about that.
![](https://secure.gravatar.com/avatar/5615a372d9866f203a22b2c437527bbb.jpg?s=120&d=mm&r=g)
On Wed, Jan 27, 2016 at 12:49:12PM -0800, Andrew Barnert via Python-ideas wrote:
The for-loop is a special case, because it assigns a variable in a place where we can't capture it in a let-block. So we introduce a variant:
for let x in things: funcs.append(lambda: process(x))
This reads weird to me. I think it's because I've been spending too much time in Swift, but I also think Swift may have gotten things right here, so that's not totally irrelevant.
It reads weird to me too, because "for let x in ..." is just weird. It's uncanny valley for English grammar: at first glance it looks like valid grammar, but it's not. [...]
As I've mentioned before, both C# and Ruby made breaking changes from the Python behavior to the Swift behavior, because they couldn't find any legitimate code that would be broken by that change.
I'm not sure if you intended this or not, but that sounds like "they found plenty of code that would break, but decided it wasn't legitimate so they didn't care". -- Steve
![](https://secure.gravatar.com/avatar/de311342220232e618cb27c9936ab9bf.jpg?s=120&d=mm&r=g)
On 01/27/2016 05:06 PM, Steven D'Aprano wrote:
On Wed, Jan 27, 2016 at 12:49:12PM -0800, Andrew Barnert wrote:
This reads weird to me. I think it's because I've been spending too much time in Swift, but I also think Swift may have gotten things right here, so that's not totally irrelevant.
It reads weird to me too, because "for let x in ..." is just weird. It's uncanny valley for English grammar: at first glance it looks like valid grammar, but it's not.
As I've mentioned before, both C# and Ruby made breaking changes from the Python behavior to the Swift behavior, because they couldn't find any legitimate code that would be broken by that change.
I'm not sure if you intended this or not, but that sounds like "they found plenty of code that would break, but decided it wasn't legitimate so they didn't care".
Or, "they found code that would break, because it was already broken but nobody had noticed yet." -- ~Ethan~
![](https://secure.gravatar.com/avatar/7e41acaa8f6a0e0f5a7c645e93add55a.jpg?s=120&d=mm&r=g)
On Jan 27, 2016, at 17:06, Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Jan 27, 2016 at 12:49:12PM -0800, Andrew Barnert via Python-ideas wrote:
The for-loop is a special case, because it assigns a variable in a place where we can't capture it in a let-block. So we introduce a variant:
for let x in things: funcs.append(lambda: process(x))
This reads weird to me. I think it's because I've been spending too much time in Swift, but I also think Swift may have gotten things right here, so that's not totally irrelevant.
It reads weird to me too, because "for let x in ..." is just weird. It's uncanny valley for English grammar: at first glance it looks like valid grammar, but it's not.
Ah, good point.
[...]
As I've mentioned before, both C# and Ruby made breaking changes from the Python behavior to the Swift behavior, because they couldn't find any legitimate code that would be broken by that change.
I'm not sure if you intended this or not, but that sounds like "they found plenty of code that would break, but decided it wasn't legitimate so they didn't care".
:) What I meant is they found a small number of examples of code that would be affected, but all of them were clearly bugs, and therefore not legitimate. Obviously that can be a judgment call, but usually it's a pretty easy one. Like the function that creates N callbacks that all use the last name, instead of creating one callback for each name, preceded by this comment: # Don't call this function! Ruby sucks but when I complain they tell me I'm too dumb to fix it so just don't use it!!!! Whether the 1.9 change fixed that function or re-broke it differently scarcely matters; clearly no one was depending on the old behavior. Maybe Python is different, and we would find code that really _does_ need 10 separate functions that all compute x**9 or that all disable the last button or... well, probably something more useful than that, which I can't guess in advance. I certainly wouldn't suggest just changing Python based on the results of a search of Ruby code! But I would definitely suggest doing a similar search of Python code before giving people two similar but different statements to hang themselves with.
![](https://secure.gravatar.com/avatar/d91ce240d2445584e295b5406d12df70.jpg?s=120&d=mm&r=g)
On Wed Jan 27 15:49:12 EST 2016, Andrew Barnert wrote:
both C# and Ruby made breaking changes from the Python behavior to the Swift behavior, because they couldn't find any legitimate code that would be broken by that change. And there have been few if any complaints since. If we really are considering adding something like "for let", we should seriously consider whether anyone would ever have a good reason to use "for" instead of "for let". If not, just change "for" instead.
The first few times I saw this, I figured Python had a stronger (and longer) backwards compatibility guarantee. But now that I consider the actual breakage, I'm not so sure... >>> for i in range(10): print (i) i=i+3 print(i) i is explicitly changed, but it doesn't affect the flow control -- it gets reset to the next sequence item as if nothing had happened. It would break things to hide the final value of i after the loop is over, but that isn't needed. I think the only way it even *could* matter is if the loop variable is captured in a closure each time through the loop. What would it look like for the current behavior to be intentional? >>> for cache in (4, 5, 6, {}): def f(): cache['haha!'] = "I know only the last will really get used!" funcs.append(f) -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ
![](https://secure.gravatar.com/avatar/de311342220232e618cb27c9936ab9bf.jpg?s=120&d=mm&r=g)
On 01/28/2016 11:51 AM, Jim J. Jewett wrote:
I think the only way it even *could* matter is if the loop variable is captured in a closure each time through the loop. What would it look like for the current behavior to be intentional?
>>> for cache in (4, 5, 6, {}): def f(): cache['haha!'] = "I know only the last will really get used!" funcs.append(f)
I think that falls into the "not legitimate" category. ;) -- ~Ethan~
![](https://secure.gravatar.com/avatar/7e41acaa8f6a0e0f5a7c645e93add55a.jpg?s=120&d=mm&r=g)
On Thursday, January 28, 2016 11:51 AM, Jim J. Jewett <jimjjewett@gmail.com> wrote:
On Wed Jan 27 15:49:12 EST 2016, Andrew Barnert wrote:
both C# and Ruby made breaking changes from the Python behavior
...
The first few times I saw this, I figured Python had a stronger (and longer) backwards compatibility guarantee.
Ruby, sure, but C#, I don't think so. Most of the worst warts in C# 6.0 are there for backward compatibility.[1]
But now that I consider the actual breakage, I'm not so sure...
>>> for i in range(10): print (i) i=i+3 print(i)
i is explicitly changed, but it doesn't affect the flow control -- it gets reset to the next sequence item as if nothing had happened.
Yeah, that confusion is actually a separate issue. Explaining it in text is a bit difficult, but let's translate to the equivalent while loop: _it = iter(range(10)) try: while True: i = next(_it) print(i) i=i+3 print(i) except StopIteration: pass Now it should be obvious why you aren't affecting the control flow. And it should also be obvious why the "for let" change wouldn't make any difference here. Could Python solve that confusion? Sure. Swift, Scala, and lots of other languages make the loop variable constant/read-only/l-immutable/whatever, so that "i=i+3" either fails to compile, or raises at runtime, with a "ConstError". The idea is that "i=i=3" is more often a confusing bug than intentional--and, when it is intentional, the workaround is trivial (just write "j=i+3" and use j). But in dynamic languages, const tends to be more annoying than useful, so the smart ones (like Python) don't bother with it. [1] For example: Non-generic Task interferes with type inference for generic Task<T> much harder, and isn't used except by accident, but they added it anyway, in C# 5 in 2012, because it was needed for consistency with the non-generic collections, which have been deprecated since C# 2 in 2005 but can't be removed because some code might break.
![](https://secure.gravatar.com/avatar/3dd475b8aaa5292d74cb0c3f76c3f196.jpg?s=120&d=mm&r=g)
Steven D'Aprano <steve@pearwood.info> writes:
But neither of these approaches would be good for lambdas. I'm okay with that -- lambda is a lightweight syntax, for lightweight needs. If your needs are great (doc strings, annotations, multiple statements) don't use lambda.
Yeah, but the fact that it's specifically part of C++'s lambda syntax suggests that it is a very common thing to need with a lambda, doesn't it? What about... lambda a, = b: [stuff with captured value b] ?
![](https://secure.gravatar.com/avatar/047f2332cde3730f1ed661eebb0c5686.jpg?s=120&d=mm&r=g)
On Tue, Jan 19, 2016 at 8:44 PM, Random832 <random832@fastmail.com> wrote:
Steven D'Aprano <steve@pearwood.info> writes:
But neither of these approaches would be good for lambdas. I'm okay with that -- lambda is a lightweight syntax, for lightweight needs. If your needs are great (doc strings, annotations, multiple statements) don't use lambda.
Yeah, but the fact that it's specifically part of C++'s lambda syntax suggests that it is a very common thing to need with a lambda, doesn't it?
No, that's because in C++ "lambdas" are the only things with closures.
What about... lambda a, = b: [stuff with captured value b] ?
Noooooo! -- --Guido van Rossum (python.org/~guido)
![](https://secure.gravatar.com/avatar/5615a372d9866f203a22b2c437527bbb.jpg?s=120&d=mm&r=g)
On Tue, Jan 19, 2016 at 03:10:35PM +0100, haael@interia.pl wrote:
Hi
C++ has a nice feature of explicit variable capture list for lambdas:
int a = 1, b = 2, c = 3; auto fun = [a, b, c](int x, int y){ return a + b + c + x + y};
For the benefit of those who don't speak C++, could you explain what that does? Are C++ name binding semantics the same as Python's? Specifically, inside fun, does "a" refer to the global a? If you rebind global a, what happens to fun? fun(0, 0) # returns 6 a = 0 fun(0, 0) # returns 5 or 6?
This allows easy construction of closures. In Python to achieve that, you need to say:
def make_closure(a, b, c): def fun(x, y): return a + b + c + x + y return def a = 1 b = 2 c = 3 fun = make_closure(a, b, c)
I cannot tell whether the C++ semantics above are the same as the Python semantics here. Andrew's response to you suggests that it is not.
My proposal: create a special variable qualifier (like global and nonlocal) to automatically capture variables
"Variables" is an ambiguous term. I don't want to get into a debate about "Python doesn't have variables", but it's not clear what you mean here. Python certainly has names, and values, and when you talk about "variables" do you mean the name or the value or both?
a = 1 b = 2 c = 3 def fun(x, y): capture a, b, c return a + b + c + x + y
This will have an effect that symbols a, b and c in the body of the function have values as they had at the moment of function creation. The variables a, b, c must be defined at the time of function creation. If they are not, an error is thrown.
If I have understood you correctly, we can already do that in Python, and don't even need a closure: a, b, c = 1, 2, 3 fun = lambda x, y, a=a, b=b, c=c: a + b + c + x + y will capture the current *value* of GLOBAL a, b and c, store them as default values, and use them as the LOCAL a, b and c. You may consider it a strength or a weakness that they are exposed as regular function parameters: fun(x, y) # intended call signature fun(x, y, a, b, c) # actual call signature but if you really care about hiding the extra parameters, a second approach will work: from functools import partial a, b, c = 1, 2, 3 fun = partial(lambda a, b, c, x, y: a + b + c + x + y, a, b, c) If a, b, c are mutable objects, you can make a copy of the value: fun = partial(lambda a, b, c, x, y: a + b + c + x + y, a, b, copy.copy(c) ) for example. Does your proposal behave any differently from these examples? -- Steve
![](https://secure.gravatar.com/avatar/db5b03704c129196a4e9415e55413ce6.jpg?s=120&d=mm&r=g)
On January 19, 2016 5:51:15 PM CST, Steven D'Aprano <steve@pearwood.info> wrote:
On Tue, Jan 19, 2016 at 03:10:35PM +0100, haael@interia.pl wrote:
Hi
C++ has a nice feature of explicit variable capture list for lambdas:
int a = 1, b = 2, c = 3; auto fun = [a, b, c](int x, int y){ return a + b + c + x + y};
For the benefit of those who don't speak C++, could you explain what that does? Are C++ name binding semantics the same as Python's?
No.
Specifically, inside fun, does "a" refer to the global a? If you rebind global a, what happens to fun?
fun(0, 0) # returns 6 a = 0 fun(0, 0) # returns 5 or 6?
The given C++ lambda syntax copies the input parameters, so it would return 5. This would return 6: auto fun = [&a, &b, &c](int x, int y){ return a + b + c + x + y};
This allows easy construction of closures. In Python to achieve that, you need to say:
def make_closure(a, b, c): def fun(x, y): return a + b + c + x + y return def a = 1 b = 2 c = 3 fun = make_closure(a, b, c)
I cannot tell whether the C++ semantics above are the same as the Python semantics here. Andrew's response to you suggests that it is not.
My proposal: create a special variable qualifier (like global and nonlocal) to automatically capture variables
"Variables" is an ambiguous term. I don't want to get into a debate about "Python doesn't have variables", but it's not clear what you mean
here. Python certainly has names, and values, and when you talk about "variables" do you mean the name or the value or both?
a = 1 b = 2 c = 3 def fun(x, y): capture a, b, c return a + b + c + x + y
This will have an effect that symbols a, b and c in the body of the function have values as they had at the moment of function creation. The variables a, b, c must be defined at the time of function creation. If they are not, an error is thrown.
If I have understood you correctly, we can already do that in Python, and don't even need a closure:
a, b, c = 1, 2, 3 fun = lambda x, y, a=a, b=b, c=c: a + b + c + x + y
will capture the current *value* of GLOBAL a, b and c, store them as default values, and use them as the LOCAL a, b and c.
You may consider it a strength or a weakness that they are exposed as regular function parameters:
fun(x, y) # intended call signature fun(x, y, a, b, c) # actual call signature
but if you really care about hiding the extra parameters, a second approach will work:
from functools import partial a, b, c = 1, 2, 3 fun = partial(lambda a, b, c, x, y: a + b + c + x + y, a, b, c)
If a, b, c are mutable objects, you can make a copy of the value:
fun = partial(lambda a, b, c, x, y: a + b + c + x + y, a, b, copy.copy(c) )
for example.
Does your proposal behave any differently from these examples?
-- Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity.
participants (28)
-
Andrew Barnert
-
Brett Cannon
-
Cameron Simpson
-
Chris Angelico
-
David Mertz
-
Devin Jeanpierre
-
Erik
-
Ethan Furman
-
Greg Ewing
-
Guido van Rossum
-
haael@interia.pl
-
Jim J. Jewett
-
M.-A. Lemburg
-
Michael Selik
-
Nick Coghlan
-
Paul Moore
-
Petr Viktorin
-
Random832
-
Ryan Gonzalez
-
Serhiy Storchaka
-
Sjoerd Job Postmus
-
Stefan Krah
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Sven R. Kunze
-
Terry Reedy
-
Victor Stinner
-
Yury Selivanov