Re: [Python-ideas] proto-PEP: Fixing Non-constant Default Arguments

29 Jan 2007

      Well, I (obviously) like the idea, but your pep misses some important  
points (and some not so important).

The important one is about the scoping of variables in default  
expressions. The pep says nothing about them.
If I read the pep correctly, any variables in default expressions are  
handled the same as variables in the body of a function. This means the  
compiler decides if they should be local, lexical or global. If there is  
an assignment to a variable, the compiler makes it a local, else it finds  
the right enclosing scope (lexical or global). In the current python, this  
works fine:
...
...
...
a = 123
def foo(b=a):
       	a = 2
       	print a, b
...
...
...
foo()
  2 123
a = 42
foo()
  2 123
In the pep, the a in the default expression would be handled just like any  
other a in the function body, which means it wil become a _local_  
variable. Calling the function would then result in an UnboundLocalError.  
Just like this in current python:
...
...
...
a = 123
def foo(b=None):
       	b = a if b==None else b
       	a = 2
       	print a
...
...
...
foo()
Traceback (most recent call last):
    File "", line 1, in <module>
      foo()
    File "", line 2, in foo
      b = a if b==None else b
  UnboundLocalError: local variable 'a' referenced before assignment

The solution, I think, as I wrote in my previous messages, is to have the  
compiler explicitly make variables in default expressions lexical or  
global variables.
This would still break the foo() in my example above, because you can't  
assign to a lexical variable, or to a global that isn't declared global.  
Therefore I think the compiler should distinguish between the a in the  
default expression and the a in the function body, and treat them as two  
different variables. You can think about it as if the compiler silently  
renames one of them (without this rename being visible to python code.   
AFAIK the bytecode is stack based and closure vars are put in some kind of  
anonymous cell, which means they both don't actually have a name anyway.)

see below for more comments regarding this and other things

On Sun, 28 Jan 2007 20:22:44 +0100, Chris Rebert   
wrote:
...
The following is a proto-PEP based on the discussion in the thread
"fixing mutable default argument values". Comments would be greatly
appreciated.
- Chris Rebert
Title: Fixing Non-constant Default Arguments
Abstract
This PEP proposes new semantics for default arguments to remove
boilerplate code associated with non-constant default argument values,
allowing them to be expressed more clearly and succinctly.
Motivation
Currently, to write functions using non-constant default arguments,
one must use the idiom:
def foo(non_const=None):
         if non_const is None:
             non_const = some_expr
         #rest of function
or equivalent code. Naive programmers desiring mutable default arguments
often make the mistake of writing the following:
def foo(mutable=some_expr_producing_mutable):
         #rest of function
However, this does not work as intended, as
'some_expr_producing_mutable' is evaluated only *once* at
definition-time, rather than once per call at call-time.  This results
in all calls to 'foo' using the same default value, which can result in
unintended consequences.  This necessitates the previously mentioned
idiom. This unintuitive behavior is such a frequent stumbling block for
newbies that it is present in at least 3 lists of Python's problems [0]
[1] [2].
Also, I just found out that python's own documentation refers to this with  
an "Important warning: The default value is evaluated only once. This  
makes a difference when the default is a mutable object such as a list,  
dictionary, or instances of most classes. ..."
(http://docs.python.org/tut/node6.html#SECTION006710000000000000000)
this indicates imo that also the python doc writers don't think of the  
current situation as optimal.
...
There are currently few, if any, known good uses of the current
behavior of mutable default arguments.  The most common one is to
preserve function state between calls.  However, as one of the lists [2]
comments, this purpose is much better served by decorators, classes, or
(though less preferred) global variables.
     Therefore, since the current semantics aren't useful for
non-constant default values and an idiom is necessary to work around
this deficiency, why not change the semantics so that people can write
what they mean more directly, without the annoying boilerplate?
Rationale
Originally, it was proposed that all default argument values be
deep-copied from the original (evaluated at definition-time) at each
invocation of the function where the default value was required.
However, this doesn't take into account default values that are not
literals, e.g. function calls, subscripts, attribute accesses.  Thus,
the new idea was to re-evaluate the default arguments at each call where
they were needed. There was some concern over the possible performance
hit this could cause, and whether there should be new syntax so that
code could use the existing semantics for performance reasons.  Some of
the proposed syntaxes were:
def foo(bar=<baz>):
         #code
def foo(bar=new baz):
         #code
def foo(bar=fresh baz):
         #code
def foo(bar=separate baz):
         #code
def foo(bar=another baz):
         #code
def foo(bar=unique baz):
         #code
where the new keyword (or angle brackets) would indicate that the
parameter's default argument should use the new semantics.  Other
parameters would continue to use the old semantics. It was generally
agreed that the angle-bracket syntax was particularly ugly, leading to
the proposal of the other syntaxes. However, having 2 different sets of
semantics could be confusing and leaving in the old semantics just for
performance might be premature optimization. Refactorings to deal with
the possible performance hit are discussed below.
Specification
The current semantics for default arguments are replaced by the
following semantics:
     - Whenever a function is called, and the caller does not provide a
     value for a parameter with a default expression, the parameter's
     default expression shall be evaluated in the function's scope.  The
     resulting value shall be assigned to a local variable in the
     function's scope with the same name as the parameter.
Include something saying that any variables in a default expression shall  
be lexical variables, with their scope being the first outer scope that  
defines a variable with the same name (they should just use the same rules  
as other lexical/closure variables), and that if the function body defines  
a local variable with the same name as a variable in a default expression,  
those variables shall be handled as two separate variables.
...
- The default argument expressions shall be evaluated before the
     body of the function.
     - The evaluation of default argument expressions shall proceed in
     the same order as that of the parameter list in the function's
     definition.
Given these semantics, it makes more sense to refer to default argument
expressions rather than default argument values, as the expression is
re-evaluated at each call, rather than just once at definition-time.
Therefore, we shall do so hereafter.
Demonstrative examples of new semantics:
     #default argument expressions can refer to
     #variables in the enclosing scope...
     CONST = "hi"
     def foo(a=CONST):
         print a
>>> foo()
     hi
     >>> CONST="bye"
     >>> foo()
     bye
#...or even other arguments
     def ncopies(container, n=len(container)):
         return [container for i in range(n)]
>>> ncopies([1, 2], 5)
     [[1, 2], [1, 2], [1, 2], [1, 2], [1, 2]]
     >>> ncopies([1, 2, 3])
     [[1, 2, 3], [1, 2, 3], [1, 2, 3]]
     >>> #ncopies grabbed n from [1, 2, 3]'s length (3)
I'm not sure if this can be combined elegantly with what I said about  
variables being lexical variables. The first argument to ncopies,  
'container', is clearly a local variable to ncopies. The 'container' in  
the second arg default expr should, if my comments above are accepted, be  
a lexical variable referring to the 'container' in the global scope.
The best way to combine the two features seems to be to let 'container' be  
a local var if any of the preceding args is named 'container', and let it  
be a lexically scoped variable otherwise. However, I'm not convinced this  
complexity is worth it and the vars in default expressions shouldn't just  
always be lexical vars.
...
#default argument expressions are arbitrary expressions
     def my_sum(lst):
         cur_sum = lst[0]
         for i in lst[1:]: cur_sum += i
         return cur_sum
def bar(b=my_sum((["b"] * (2 * 3))[:4])):
         print b
>>> bar()
     bbbb
#default argument expressions are re-evaluated at every call...
     from random import randint
     def baz(c=randint(1,3)):
         print c
>>> baz()
     2
     >>> baz()
     3
#...but only when they're required
     def silly():
         print "spam"
         return 42
def qux(d=silly()):
         pass
>>> qux()
     spam
     >>> qux(17)
     >>> qux(d=17)
     >>> qux(*[17])
     >>> qux(**{'d':17})
     >>> #no output because silly() never called because d's value was
specified in the calls
#Rule 3
     count = 0
     def next():
         global count
         count += 1
         return count - 1
def frobnicate(g=next(), h=next(), i=next()):
         print g, h, i
>>> frobnicate()
     0 1 2
     >>> #g, h, and i's default argument expressions are evaluated in
the same order as the parameter definition
Backwards Compatibility
This change in semantics breaks all code which uses mutable default
argument values. Such code can be refactored from:
Wow, let's not scare everyone away just yet. This should read:
"This change in semantics breaks code which uses mutable default argument  
expressions and depends on those expressions being evaluated only once, or  
code that assigns new incompatible values in a parent scope to variables  
used in default expressions"
...
def foo(bar=mutable):
         #code
if 'mutable' is just a single variable, this isn't gonna break, unless the  
global scope decides to do something like this:

def foo(bar=mutable):
     #code

mutable = incompatible_mutable
# ...
foo()
...
to
def stateify(state):
         def _wrap(func):
             def _wrapper(*args, **kwds):
                 kwds['bar'] = state
                 return func(*args, **kwds)
             return _wrapper
         return _wrap
@stateify(mutable)
     def foo(bar):
         #code
or
state = mutable
     def foo(bar=state):
         #code
or
class Baz(object):
         def __init__(self):
             self.state = mutable
def foo(self, bar=self.state):
             #code
Minor point: the stateify decorator looks a bit scary to me as it uses  
three levels of nested functions. (that's inherent to decorators, but  
still.) Suggest you name the class and global var solutions first, and the  
decorator as last, just to prevent people from stopping reading the pep  
and voting '-1' right when they hit the decorator solution.
...
The changes in this PEP are backwards-compatible with all code whose
default argument values are immutable
...or don't depend on being evaluated only once, and don't modify in a  
parent scope the variables in default expressions in an incompatible way,

(hmm, the 'or' and 'and' may need some disambiguation parentheses...)
...
including code using the idiom
mentioned in the 'Motivation' section. However, such values will now be
recomputed for each call for which they are required. This may cause
performance degradation. If such recomputation is significantly
expensive, the same refactorings mentioned above can be used.
In relation to Python 3.0, this PEP's proposal is compatible with
those of PEP 3102 [3] and PEP 3107 [4]. Also, this PEP does not depend
on the acceptance of either of those PEPs.
Reference Implementation
All code of the form:
def foo(bar=some_expr, baz=other_expr):
             #body
Should act as if it had read (in pseudo-Python):
def foo(bar=_undefined, baz=_undefined):
             if bar is _undefined:
                 bar = some_expr
             if baz is _undefined:
                 baz = other_expr
             #body
and, if there are any variables occuring in the function body and in  
some_expr or other_expr, rename those in the function body to something  
that doesn't name-clash.
...
where _undefined is the value given to a parameter when the caller
didn't specify a value for it. This is not intended to be a literal
translation, but rather a demonstration as to how Python's internal
argument-handling machinery should be changed.
References
[0] 10 Python pitfalls
         http://zephyrfalcon.org/labs/python_pitfalls.html
[1] Python Gotchas
         http://www.ferg.org/projects/python_gotchas.html#contents_item_6
[2] When Pythons Attack
     http://www.onlamp.com/pub/a/python/2004/02/05/learn_python.html?page=2
[3] Keyword-Only Arguments
         http://www.python.org/dev/peps/pep-3102/
[4] Function Annotations
         http://www.python.org/dev/peps/pep-3107/
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
http://mail.python.org/mailman/listinfo/python-ideas
I'm not quite sure if your decision not to include the  
default-expr-vars-become-lexical part was intentional or not. If it is,  
can you tell me why you'd want that? Else you can incorporate my comments  
in the pep.
Another minor point: I'm personally not too fond of too many 'shall's  
close together. It makes me think of lots of bureaucracy and  
design-by-committee. I think several peps don't have a  shall in them, but  
maybe it is the right language for this one. What's the right language to  
use in peps?

Well, that's about it I think.

- Jan

Re: [Python-ideas] proto-PEP: Fixing Non-constant Default Arguments

Jan Kanis