Re: [Python-ideas] fixing mutable default argument values

Feb. 1, 2007

      (my response is a bit late, I needed some time to come up with a good  
answer to your objections)

On Tue, 30 Jan 2007 16:48:54 +0100, Greg Falcon <veloso@verylowsodium.com>  
wrote:
...
On 1/30/07, Jan Kanis <jan.kanis@phil.uu.nl> wrote:
...
On the other hand, are there really any good reasons to choose the  
current
semantics of evaluation at definition time?
While I sympathize with the programmer that falls for this common
Python gotcha, and would not have minded if Python's semantics were
different from the start (though the current behavior is cleaner and
more consistent), making such a radical change to such a core part of
the language semantics now is a very bad idea for many reasons.
It would be a py 3.0 change. Other important stuff is going to change as  
well. This part of python is IMO not that much part of the core that it  
can't change at all. Especially since the overwhelming majority of all  
uses of default args have immutable values, so their behaviour isn't going  
to change anyway. (judging by the usage in the std lib.)
Things like list comprehension and generators were a much greater change  
to python, drastically changing the way an idiomatic python program is  
written. They were added in 2.x because they could be implementen backward  
compatible. With python 3.0, backward compatibility isn't so important  
anymore. The whole reason for python 3.0's existance is to fix backward  
incompatible stuff.
...
...
What I've heard basically
boils down to two arguments:
- "let's not change anything", i.e. resist change because it is change,
which I don't think is a very pythonic argument.
The argument here is not "let's not change anything because it's
change," but rather "let's not break large amounts of existing code
without a very good reason."  As has been stated here by others,
making obsolete a common two-line idiom is not a compelling enough
reason to do so.
py3k is going to break large ammounts of code anyway. This pep certainly  
won't break the most of it. And there's gonna be an automatic py2 -> py3  
refactoring tool, that can catch any possible breakage from this pep as  
well.
...
Helping out beginning Python programmers, while well-intentioned,
doesn't feel like enough of a motivation either.  Notice that the main
challenge for the novice programmer is not to learn how default
arguments work -- novices can learn to recognize and write the idiom
easily enough -- but rather to learn how variables and objects work in
general.
[snip]
At some point in his Python career, a novice is going to have to
understand why b "changed" but d didn't.  Fixing the default argument
"wart" doesn't remove the necessity to understand the nature of
mutable objects and variable bindings in Python; it just postpones the
problem.  This is a fact worth keeping in mind when deciding whether
the sweeping change in semantics is worth the costs.
The change was never intended to prevent newbies from learning about  
pythons object model. There are other ways to do that. But keeping a  
'wart' because newbies will learn from it seems like really bad reasoning,  
language-design wise.
...
...
- Arguments based on the assumption that people actually do make lots of
use of the fact that default arguments are shared between function
invocations, many of which will result in (much) more code if it has to  
be
transformed to using one of the alternative idioms. If this is true, it  
is
a valid argument. I guess there's still some stdlib grepping to do to
decide this.
Though it's been decried here as unPythonic, I can't be the only
person who uses the idiom
def foo(..., cache={}):
for making a cache when the function in question does not rise to the
level of deserving to be a class object instead.  I don't apologize
for finding it less ugly than using a global variable.
How often do you use this compared to the x=None idiom?

This idiom is really going to be the only idiom that's going to break.  
There are many ways around it, I wouldn't mind an @cache(var={}) decorator  
somewhere (perhaps in the stdlib). These kind of things seem to be exactly  
what decorators are good at.
...
I know I'm not the only user of the idiom because I didn't invent it
-- I learned it from the Python community.  And the fact that people
have already found usages of the current default argument behavior in
the standard library is an argument against the "unPythonic" claim.
I'm reminded of GvR's post on what happened when he made strings
non-iterable in a local build (iterable strings being another "wart"
that people thought needed fixing):
http://mail.python.org/pipermail/python-3000/2006-April/000824.html
In that thread, Guido is at first in favour of making strings  
non-iterable, one of the arguments being that it sometimes bites people  
who expect e.g. a list of strings and get a string. He decides not to make  
the change because there appear to be a number of valid use cases that are  
hard to change, and the number of people actually getting bitten by it is  
actually quite small. (To support that last part, note for example that  
none of the 'python problems' pages listed in the pep talk about string  
iteration while all talk about default arguments, some with dire warnings  
and quite a bit of text.)
In the end, the numbers are going to be important. There seems to be only  
a single use case in favour of definition time semantics for default  
variables (caching), which isn't very hard to do in a different way.  
Though seasoned python programmers don't get bitten by default args all  
the time, they have to work around it all the time using =None.

If it turns out that people are actually using caching and other idioms  
that require definition time semantics all the time, and the =None idiom  
is used only very rarely, I'd be all in favour of rejecting this pep.
...
...
So, are there any _other_ arguments in favour of the current semantics??
Yes.  First, consistency.
[factoring out the first argument into another email. It's taking me some  
effort to get my head around the early/late binding part of the generator  
expressions pep, and the way you find an argument in that. As far as I  
understand it currently, either you or I do not understand that part of  
the pep correctly. I'll try to get this mail out somewhere tomorrow]
...
Second, the a tool can't fix all usages of the old idiom.  When things
break, they can break in subtle or confusing ways.  Consider my module
"greeter":
== begin greeter.py ==
import sys
def say_hi(out = sys.stdout):
 print >> out, "Hi!"
del sys # don't want to leak greeter.sys to the outside world
== end greeter.py ==
Nothing I've done here is strange or unidiomatic, and yet your
proposed change breaks it, and it's unclear how an automated tool
should fix it.
Sure this can be fixed by a tool:

import sys
@caching(out = sys.stdout)
def say_hi(out):
	print >> out, "Hi!"
del sys

where the function with the 'caching' wrapper checks to see if an argument  
for 'out' is provided, or else provides it itself. The caching(out =  
sys.stdout) is actually a function _call_, so it's sys.stdout gets  
evaluated immediately.

possible implementation of caching:

def caching(**cachevars):
	def inner(func):
		def wrapper(**argdict):
			for var in cachevars:
				if not var in argdict:
					argdict[var] = cachevars[var]
			return func(**argdict)
		return wrapper
	return inner

Defining a decorator unfortunately requires three levels of nested  
functions, but apart from that the thing is pretty straightforward, and it  
only needs to be defined once to use on every occasion of the caching  
idiom.
It doesn't currently handle positional vars, but that can be added.
...
What's worse about the breakage is that it doesn't
break when greeter is imported,
That's true of any function with a bug in it. Do you want to abandon  
functions alltogether?
...
or even when greeter.say_hi is called
with an argument.
Currently for people using x=None, if x=None <calculate default value>,  
this difference is a branch in the code. That's why you need to test _all_  
possible branches in your unit test. Analagously you need to test all  
combinations of arguments if you want to catch as many bugs as possible.
...
It might take a while before getting a very
surprising error "global name 'sys' is not defined".
However, your greeter module actually has a slight bug. What if I do this:

import sys, greeter
sys.stdout = my_output_proxy()
greeter.say_hi()

Now say_hi() still uses the old sys.stdout, which is most likely not what  
you want. If greeter were implemented like this:

import sys as _sys
def say_hi(out = _sys.stdout):
	print >> out, "Hi!"

under the proposed semantics, it would all by itself do a late binding of  
_sys.stdout, so when I change sys.stdout somewhere else, say_hi uses the  
new stdout.
Deleting sys in order not to 'leak' it to any other module is really not  
useful. Everybody knows that python does not actually enforce  
encapsulation, nor does it provide any kind of security barriers between  
modules. So if some other module wants to get at sys it can get there  
anyway, and if you want to indicate that sys isn't exporters and greeter's  
sys shouldn't be messed around with, the renaming import above does that  
just fine.
...
Third, the old idiom is less surprising.
def foo(x=None):
 if x is None:
   x=<some_expr>
<some_expr> may take arbitrarily long to complete.  It may have side
effects.  It may throw an exception.  It is evaluated inside the
function call, but only evaluated when the default value is used (or
the function is passed None).
There is nothing surprising about any of that.  Now:
def foo(x=<some_expr>):
 pass
Everything I said before applies.  The expression can take a long
time, have side effects, throw an exception.  It is conditionally
evaluated inside the function call.
Only now, all of that is terribly confusing and surprising (IMO).
Read the "what's new in python 3.0" (assuming the pep gets incorporated,  
of course).
Exception tracebacks and profiler stats will point you at the right line,  
and you will figure it out. As you said above, all of it is true under the  
current =None idiom, so there are no totally new ways in which a program  
can break. If you know the ways current python can break (take too long,  
unwanted side effects, exceptions) you will figure it out in the new  
version.
Anyway, many python newbies consider it confusing and surprising that an  
empty list default value doesn't stay empty, and all other pythoneers have  
to work around it a lot of times. It will be a pretty unique python  
programmer whose program will break in ways mentioned above by the default  
expression being evaluated at call time, and wouldn't have broken under  
python's current behaviour, and who isn't able to figure out what happened  
in a reasonable amout of time. So even if your argument holds, it will  
still be a net win to accept the pep.
...
Greg F
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
http://mail.python.org/mailman/listinfo/python-ideas
- Jan